Qin Jin

Orcid: 0000-0001-6486-6020

According to our database1, Qin Jin authored at least 227 papers between 1998 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Robust Speaker Recognition
PhD thesis, 2024

Temporally Language Grounding With Multi-Modal Multi-Prompt Tuning.
IEEE Trans. Multim., 2024

Quo Vadis, Motion Generation? From Large Language Models to Large Motion Models.
CoRR, 2024

ESPnet-Codec: Comprehensive Training and Evaluation of Neural Codecs for Audio, Music, and Speech.
CoRR, 2024

Unveiling Visual Biases in Audio-Visual Localization Benchmarks.
CoRR, 2024

mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding.
CoRR, 2024

What Makes a Good Story and How Can We Measure It? A Comprehensive Survey of Story Evaluation.
CoRR, 2024

QuadrupedGPT: Towards a Versatile Quadruped Agent in Open-ended Worlds.
CoRR, 2024

SingMOS: An extensive Open-Source Singing Voice Dataset for MOS Prediction.
CoRR, 2024

SingOMD: Singing Oriented Multi-resolution Discrete Representation Construction from Speech Models.
CoRR, 2024

TokSing: Singing Voice Synthesis based on Discrete Tokens.
CoRR, 2024

The Interspeech 2024 Challenge on Speech Processing Using Discrete Units.
CoRR, 2024

EgoNCE++: Do Egocentric Video-Language Models Really Understand Hand-Object Interactions?
CoRR, 2024

TinyChart: Efficient Chart Understanding with Visual Token Merging and Program-of-Thoughts Learning.
CoRR, 2024

Think-Program-reCtify: 3D Situated Reasoning with Large Language Models.
CoRR, 2024

Movie101v2: Improved Movie Narration Benchmark.
CoRR, 2024

mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding.
CoRR, 2024

SPAFormer: Sequential 3D Part Assembly with Transformers.
CoRR, 2024

Singing Voice Data Scaling-up: An Introduction to ACE-Opencpop and KiSing-v2.
CoRR, 2024

Edit As You Wish: Video Caption Editing with Multi-grained User Control.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Muskits-ESPnet: A Comprehensive Toolkit for Singing Voice Synthesis in New Paradigm.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Reversed in Time: A Novel Temporal-Emphasized Benchmark for Cross-Modal Video-Text Retrieval.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

UBiSS: A Unified Framework for Bimodal Semantic Summarization of Videos.
Proceedings of the 2024 International Conference on Multimedia Retrieval, 2024

ECR-Chain: Advancing Generative Language Models to Better Emotion-Cause Reasoners through Reasoning Chains.
Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024

Adaptive Temporal Motion Guided Graph Convolution Network for Micro-expression Recognition.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2024

TinyChart: Efficient Chart Understanding with Program-of-Thoughts Learning and Visual Token Merging.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Revealing Personality Traits: A New Benchmark Dataset for Explainable Personality Recognition on Dialogues.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

ESCoT: Towards Interpretable Emotional Support Dialogue Systems.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

Respond in my Language: Mitigating Language Inconsistency in Response Generation based on Large Language Models.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

Synchronized Video Storytelling: Generating Video Narrations with Structured Storyline.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023
Multimodal Pretraining from Monolingual to Multilingual.
Mach. Intell. Res., April, 2023

Global Sea Surface Height Measurement From CYGNSS Based on Machine Learning.
IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens., 2023

UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model.
CoRR, 2023

A Systematic Exploration of Joint-training for Singing Voice Synthesis.
CoRR, 2023

No-frills Temporal Video Grounding: Multi-Scale Neighboring Attention and Zoom-in Boundary Detection.
CoRR, 2023

Edit As You Wish: Video Description Editing with Multi-grained Commands.
CoRR, 2023

TikTalk: A Multi-Modal Dialogue Dataset for Real-World Chitchat.
CoRR, 2023

CapEnrich: Enriching Caption Semantics for Web Images via Cross-modal Pre-trained Knowledge.
Proceedings of the ACM Web Conference 2023, 2023

Rethinking Benchmarks for Cross-modal Image-text Retrieval.
Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2023

Two-Stage Adaptation for Cross-Corpus Multimodal Emotion Recognition.
Proceedings of the Natural Language Processing and Chinese Computing, 2023

Learning Descriptive Image Captioning via Semipermeable Maximum Likelihood Estimation.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Visual Captioning at Will: Describing Images and Videos Guided by a Few Stylized Sentences.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

POV: Prompt-Oriented View-Agnostic Learning for Egocentric Hand-Object Interaction in the Multi-view World.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

TikTalk: A Video-Based Dialogue Dataset for Multi-Modal Chitchat in Real World.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Emotionally Situated Text-to-Speech Synthesis in User-Agent Conversation.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Knowledge Enhanced Model for Live Video Comment Generation.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2023

Explore and Tell: Embodied Visual Captioning in 3D Environments.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Phoneix: Acoustic Feature Processing Strategy for Enhanced Singing Pronunciation With Phoneme Distribution Predictor.
Proceedings of the IEEE International Conference on Acoustics, 2023

UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Open-Category Human-Object Interaction Pre-training via Language Modeling Framework.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Movie101: A New Movie Understanding Benchmark.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

Attractive Storyteller: Stylized Visual Storytelling with Unpaired Text.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

UniLG: A Unified Structure-aware Framework for Lyrics Generation.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

InfoMetIC: An Informative Metric for Reference-free Image Caption Evaluation.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

MPMQA: Multimodal Question Answering on Product Manuals.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

Multi-Modal Knowledge Hypergraph for Diverse Image Retrieval.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

Accommodating Audio Modality in CLIP for Multimodal Processing.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

Token Mixing: Parameter-Efficient Transfer Learning from Image-Language to Video-Language.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
Survey: Transformer based video-language pre-training.
AI Open, January, 2022

Enhancing Neural Machine Translation With Dual-Side Multimodal Awareness.
IEEE Trans. Multim., 2022

Exploring Anchor-based Detection for Ego4D Natural Language Query.
CoRR, 2022

Generalizing Multimodal Pre-training into Multilingual via Language Acquisition.
CoRR, 2022

Muskits: an End-to-End Music Processing Toolkit for Singing Voice Synthesis.
CoRR, 2022

Multi-modal Emotion Estimation for in-the-wild Videos.
CoRR, 2022

Progressive Learning for Image Retrieval with Hybrid-Modality Queries.
Proceedings of the SIGIR '22: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, July 11, 2022

Multi-Lingual Acquisition on Multimodal Pre-training for Cross-modal Retrieval.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

M4MM '22: 1st International Workshop on Methodologies for Multimedia.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

PIC'22: 4th Person in Context Workshop.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Muskits: an End-to-end Music Processing Toolkit for Singing Voice Synthesis.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

SingAug: Data Augmentation for Singing Voice Synthesis with Cycle-consistent Training Strategy.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Memobert: Pre-Training Model with Prompt-Based Learning for Multimodal Emotion Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2022

Training Strategies for Automatic Song Writing: A Unified Framework Perspective.
Proceedings of the IEEE International Conference on Acoustics, 2022

Leveraging Trust Relations to Improve Academic Patent Recommendation.
Proceedings of the 55th Hawaii International Conference on System Sciences, 2022

MovieUN: A Dataset for Movie Understanding and Narrating.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022

Few-Shot Action Recognition with Hierarchical Matching and Contrastive Learning.
Proceedings of the Computer Vision - ECCV 2022, 2022

Multi-Task Learning Framework for Emotion Recognition In-the-Wild.
Proceedings of the Computer Vision - ECCV 2022 Workshops, 2022

Unifying Event Detection and Captioning as Sequence Generation via Pre-training.
Proceedings of the Computer Vision - ECCV 2022, 2022

TS2-Net: Token Shift and Selection Transformer for Text-Video Retrieval.
Proceedings of the Computer Vision - ECCV 2022, 2022

VRDFormer: End-to-End Video Visual Relation Detection with Transformers.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Valence and Arousal Estimation based on Multimodal Temporal-Aware Features for Videos in the Wild.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2022

DialogueEIN: Emotion Interaction Network for Dialogue Affective Analysis.
Proceedings of the 29th International Conference on Computational Linguistics, 2022

M3ED: Multi-modal Multi-scene Multi-label Emotional Dialogue Database.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

Image Difference Captioning with Pre-training and Contrastive Learning.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021
Pre-Trained Models: Past, Present and Future.
CoRR, 2021

Team RUC_AIM3 Technical Report at ActivityNet 2021: Entities Object Localization.
CoRR, 2021

WenLan: Bridging Vision and Language by Large-Scale Multi-Modal Pre-Training.
CoRR, 2021

Pre-trained models: Past, present and future.
AI Open, 2021

Efficient Proposal Generation with U-shaped Network for Temporal Sentence Grounding.
Proceedings of the MMAsia '21: ACM Multimedia Asia, Gold Coast, Australia, December 1, 2021

Multimodal Fusion Strategies for Physiological-emotion Analysis.
Proceedings of the MuSe '21: Proceedings of the 2nd on Multimodal Sentiment Analysis Challenge, 2021

Question-controlled Text-aware Image Captioning.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Product-oriented Machine Translation with Cross-modal Cross-lingual Pre-training.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

MMPT'21: International Joint Workshop on Multi-Modal Pre-Training for Multimedia Understanding.
Proceedings of the ICMR '21: International Conference on Multimedia Retrieval, 2021

Speech Emotion Recognition via Multi-Level Cross-Modal Distillation.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Sequence-To-Sequence Singing Voice Synthesis With Perceptual Entropy Loss.
Proceedings of the IEEE International Conference on Acoustics, 2021

Language Resource Efficient Learning for Captioning.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2021, 2021

Towards Diverse Paragraph Captioning for Untrimmed Videos.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Missing Modality Imagination Network for Emotion Recognition with Uncertain Missing Modalities.
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

MMGCN: Multimodal Fusion via Deep Graph Convolution Network for Emotion Recognition in Conversation.
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

2020
The End-of-End-to-End: A Video Understanding Pentathlon Challenge (2020).
CoRR, 2020

Team RUC_AIM3 Technical Report at Activitynet 2020 Task 2: Exploring Sequential Events Detection for Dense Video Captioning.
CoRR, 2020

YouMakeup VQA Challenge: Towards Fine-grained Action Understanding in Domain-Specific Videos.
CoRR, 2020

RUC_AIM3 at TRECVID 2020: Ad-hoc Video Search & Video to Text Description.
Proceedings of the 2020 TREC Video Retrieval Evaluation, 2020

VideoIC: A Video Interactive Comments Dataset and Multimodal Multitask Learning for Comments Generation.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Semi-supervised Multi-modal Emotion Recognition with Cross-Modal Distribution Matching.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Multi-modal Fusion for Video Sentiment Analysis.
Proceedings of the MuSe'20: Proceedings of the 1st International on Multimodal Sentiment Analysis in Real-life Media Challenge and Workshop, 2020

ICECAP: Information Concentrated Entity-aware Image Captioning.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Context-Aware Goodness of Pronunciation for Computer-Assisted Pronunciation Training.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Skeleton-Based Interactive Graph Network For Human Object Interaction Detection.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2020

Fine-Grained Video-Text Retrieval With Hierarchical Graph Reasoning.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Say As You Wish: Fine-Grained Control of Image Caption Generation With Abstract Scene Graphs.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Better Captioning With Sequence-Level Exploration.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019
Generating Video Descriptions With Latent Topic Guidance.
IEEE Trans. Multim., 2019

Integrating Temporal and Spatial Attentions for VATEX Video Captioning Challenge 2019.
CoRR, 2019

Activitynet 2019 Task 3: Exploring Contexts for Dense Captioning Events in Videos.
CoRR, 2019

RUC_AIM3 at TRECVID 2019: Video to Text.
Proceedings of the 2019 TREC Video Retrieval Evaluation, 2019

Visual Relation Detection with Multi-Level Attention.
Proceedings of the 27th ACM International Conference on Multimedia, 2019

Relation Understanding in Videos.
Proceedings of the 27th ACM International Conference on Multimedia, 2019

Adversarial Domain Adaption for Multi-Cultural Dimensional Emotion Recognition in Dyadic Interactions.
Proceedings of the 9th International on Audio/Visual Emotion Challenge and Workshop, 2019

Neural Storyboard Artist: Visualizing Stories with Coherent Image Sequences.
Proceedings of the 27th ACM International Conference on Multimedia, 2019

Unpaired Cross-lingual Image Caption Generation with Self-Supervised Rewards.
Proceedings of the 27th ACM International Conference on Multimedia, 2019

RUC at MediaEval 2019: Video Memorability Prediction Based on Visual Textual and Concept Related Features.
Proceedings of the Working Notes Proceedings of the MediaEval 2019 Workshop, 2019

Speech Emotion Recognition in Dyadic Dialogues with Attentive Interaction Modeling.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

From Words to Sentences: A Progressive Learning Approach for Zero-resource Machine Translation with Visual Pivots.
Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019

Cross-culture Multimodal Emotion Recognition with Adversarial Learning.
Proceedings of the IEEE International Conference on Acoustics, 2019

YouMakeup: A Large-Scale Domain-Specific Multimodal Dataset for Fine-Grained Semantic Comprehension.
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019

Semi-supervised Multimodal Emotion Recognition with Improved Wasserstein GANs.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Unsupervised Bilingual Lexicon Induction from Mono-Lingual Multimodal Data.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018
RUC+CMU: System Report for Dense Captioning Events in Videos.
CoRR, 2018

Informedia @ TRECVID 2018: Ad-hoc Video Search, Video to Text Description, Activities in Extended video.
Proceedings of the 2018 TREC Video Retrieval Evaluation, 2018

Multimodal Dimensional and Continuous Emotion Recognition in Dyadic Video Interactions.
Proceedings of the Advances in Multimedia Information Processing - PCM 2018, 2018

iMakeup: Makeup Instructional Video Dataset for Fine-Grained Dense Video Captioning.
Proceedings of the Advances in Multimedia Information Processing - PCM 2018, 2018

Multi-modal Multi-cultural Dimensional Continues Emotion Recognition in Dyadic Interactions.
Proceedings of the 2018 on Audio/Visual Emotion Challenge and Workshop, 2018

Session details: Deep-2 (Recognition).
Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018

Class-aware Self-Attention for Audio Event Recognition.
Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval, 2018

RUC at MediaEval 2018: Visual and Textual Features Exploration for Predicting Media Memorability.
Proceedings of the Working Notes Proceedings of the MediaEval 2018 Workshop, 2018

2017
Group division based on common weights in cross efficiency evaluation.
Int. J. Inf. Decis. Sci., 2017

Informedia @ TRECVID 2017.
Proceedings of the 2017 TREC Video Retrieval Evaluation, 2017

Knowing Yourself: Improving Video Caption via In-depth Recap.
Proceedings of the 2017 ACM on Multimedia Conference, 2017

Multimodal Multi-task Learning for Dimensional and Continuous Emotion Recognition.
Proceedings of the 7th Annual Workshop on Audio/Visual Emotion Challenge, Mountain View, CA, USA, October 23, 2017

Video Captioning with Guidance of Multimodal Latent Topics.
Proceedings of the 2017 ACM on Multimedia Conference, 2017

Generating Video Descriptions with Topic Guidance.
Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval, 2017

RUC at MediaEval 2017: Predicting Media Interestingness Task.
Proceedings of the Working Notes Proceedings of the MediaEval 2017 Workshop co-located with the Conference and Labs of the Evaluation Forum (CLEF 2017), 2017

Emotion recognition with multimodal features and temporal models.
Proceedings of the 19th ACM International Conference on Multimodal Interaction, 2017

Facial Action Units Detection with Multi-Features and -AUs Fusion.
Proceedings of the 12th IEEE International Conference on Automatic Face & Gesture Recognition, 2017

2016
Boosting Recommendation in Unexplored Categories by User Price Preference.
ACM Trans. Inf. Syst., 2016

The Study of the Entrepreneurial Leadership Style of Real Estate Industry in China: Based on the Content Analysis of Microblog.
Int. J. Knowl. Based Organ., 2016

Coordinate the Express Delivery Supply Chain with Option Contracts.
Int. J. Inf. Syst. Supply Chain Manag., 2016

A hybrid approach based on stochastic competitive Hopfield neural network and efficient genetic algorithm for frequency assignment problem.
Appl. Soft Comput., 2016

Informedia @ TRECVID 2016.
Proceedings of the 2016 TREC Video Retrieval Evaluation, 2016

Improving Image Captioning by Concept-Based Sentence Reranking.
Proceedings of the Advances in Multimedia Information Processing - PCM 2016, 2016

History Rhyme: Searching Historic Events by Multimedia Knowledge.
Proceedings of the 2016 ACM Conference on Multimedia Conference, 2016

Detecting Violence in Video using Subclasses.
Proceedings of the 2016 ACM Conference on Multimedia Conference, 2016

Describing Videos using Multi-modal Fusion.
Proceedings of the 2016 ACM Conference on Multimedia Conference, 2016

Semantic Image Profiling for Historic Events: Linking Images to Phrases.
Proceedings of the 2016 ACM Conference on Multimedia Conference, 2016

Multi-modal Conditional Attention Fusion for Dimensional Emotion Prediction.
Proceedings of the 2016 ACM Conference on Multimedia Conference, 2016

Video Description Generation using Audio and Visual Cues.
Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval, 2016

RUC at MediaEval 2016 Emotional Impact of Movies Task: Fusion of Multimodal Features.
Proceedings of the Working Notes Proceedings of the MediaEval 2016 Workshop, 2016

RUC at MediaEval 2016: Predicting Media Interestingness Task.
Proceedings of the Working Notes Proceedings of the MediaEval 2016 Workshop, 2016

Generating Natural Video Descriptions via Multimodal Processing.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Video emotion recognition in the wild based on fusion of multimodal features.
Proceedings of the 18th ACM International Conference on Multimodal Interaction, 2016

Violent Scene Detection Using Convolutional Neural Networks and Deep Audio Features.
Proceedings of the Pattern Recognition - 7th Chinese Conference, 2016

Emotion Recognition in Videos via Fusing Multimodal Features.
Proceedings of the Pattern Recognition - 7th Chinese Conference, 2016

2015
Exploitation and Exploration Balanced Hierarchical Summary for Landmark Images.
IEEE Trans. Multim., 2015

Persistent B+-Trees in Non-Volatile Main Memory.
Proc. VLDB Endow., 2015

基于声学特征的语言情感识别 (Speech Emotion Recognition Based on Acoustic Features).
计算机科学, 2015

Lead curve detection in drawings with complex cross-points.
Neurocomputing, 2015

Image Profiling for History Events on the Fly.
Proceedings of the 23rd Annual ACM Conference on Multimedia Conference, MM '15, Brisbane, Australia, October 26, 2015

Multi-modal Dimensional Emotion Recognition using Recurrent Neural Networks.
Proceedings of the 5th International Workshop on Audio/Visual Emotion Challenge, 2015

Semantic Concept Annotation For User Generated Videos Using Soundtracks.
Proceedings of the 5th ACM on International Conference on Multimedia Retrieval, 2015

RUCMM at MediaEval 2015 Affective Impact of Movies Task: Fusion of Audio and Visual Cues.
Proceedings of the Working Notes Proceedings of the MediaEval 2015 Workshop, 2015

Detecting semantic concepts in consumer videos using audio.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Speech emotion recognition with acoustic and lexical features.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

RUC-Tencent at ImageCLEF 2015: Concept Detection, Localization and Sentence Generation.
Proceedings of the Working Notes of CLEF 2015, 2015

Improving emotion classification on Chinese microblog texts with auxiliary cross-domain data.
Proceedings of the 2015 International Conference on Affective Computing and Intelligent Interaction, 2015

2014
Special Issue on "Hybrid intelligence for growing internet and its applications".
Future Gener. Comput. Syst., 2014

A guided Hopfield evolutionary algorithm with local search for maximum clique problem.
Proceedings of the 2014 IEEE International Conference on Systems, Man, and Cybernetics, 2014

Does product recommendation meet its waterloo in unexplored categories?: no, price comes to help.
Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2014

Semantic Concept Annotation of Consumer Videos at Frame-Level Using Audio.
Proceedings of the Advances in Multimedia Information Processing - PCM 2014, 2014

Adaptive Tag Selection for Image Annotation.
Proceedings of the Advances in Multimedia Information Processing - PCM 2014, 2014

Emotion Classification of Chinese Microblog Text via Fusion of BoW and eVector Feature Representations.
Proceedings of the Natural Language Processing and Chinese Computing, 2014

Speech emotion classification using acoustic features.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

Structure Perturbation Optimization for Hopfield-Type Neural Networks.
Proceedings of the Artificial Neural Networks and Machine Learning - ICANN 2014, 2014

Renmin University of China at ImageCLEF 2014 Scalable Concept Image Annotation.
Proceedings of the Working Notes for CLEF 2014 Conference, 2014

An overview of robustness related issues in speaker recognition.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2014

2013
Tell me what happened here in history.
Proceedings of the ACM Multimedia Conference, 2013

Renmin University of China at ImageCLEF 2013 Scalable Concept Image Annotation.
Proceedings of the Working Notes for CLEF 2013 Conference , 2013

2012
Event-based Video Retrieval Using Audio.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

2011
Informedia@TRECVID 2011: Surveillance Event Detection.
Proceedings of the 2011 TREC Video Retrieval Evaluation, 2011

Investigation of Cross-Show Speaker Diarization.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Analysis of Dialectal Influence in Pan-Arabic ASR.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Harmonic Structure Transform for Speaker Recognition.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

2010
Modeling Prosody for Speaker Recognition: Why Estimating Pitch May Be a Red Herring.
Proceedings of the Odyssey 2010: The Speaker and Language Recognition Workshop, Brno, Czech Republic, June 28, 2010

The 2010 CMU GALE speech-to-text system.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Speaker identification with distant microphone speech.
Proceedings of the IEEE International Conference on Acoustics, 2010

2009
Speaker identification using warped MVDR cepstral features.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Improving speaker segmentation via speaker identification and text segmentation.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009


Modeling instantaneous intonation for speaker identification using the fundamental frequency variation spectrum.
Proceedings of the IEEE International Conference on Acoustics, 2009

Voice convergin: Speaker de-identification by voice transformation.
Proceedings of the IEEE International Conference on Acoustics, 2009

Detecting bandlimited audio in broadcast television shows.
Proceedings of the IEEE International Conference on Acoustics, 2009

Speaker de-identification via voice transformation.
Proceedings of the 2009 IEEE Workshop on Automatic Speech Recognition & Understanding, 2009

2008
Robust far-field speaker identification under mismatched conditions.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

The CMU-interACT 2008 Mandarin transcription system.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Is voice transformation a threat to speaker identification?
Proceedings of the IEEE International Conference on Acoustics, 2008

2007
Far-Field Speaker Recognition.
IEEE Trans. Speech Audio Process., 2007

Whispering Speaker Identification.
Proceedings of the 2007 IEEE International Conference on Multimedia and Expo, 2007

Multi-modal Person Identification in a Smart Environment.
Proceedings of the 2007 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2007), 2007

ISL Person Identification Systems in the CLEAR 2007 Evaluations.
Proceedings of the Multimodal Technologies for Perception of Humans, 2007

2006
Far-Field Speaker Recognition.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

ISL Person Identification Systems in the CLEAR Evaluations.
Proceedings of the Multimodal Technologies for Perception of Humans, 2006

2005
CMU Informedia's TRECVID 2005 Skirmishes.
Proceedings of the 2005 TREC Video Retrieval Evaluation, 2005

2004
Issues in meeting transcription - the ISL meeting transcription system.
Proceedings of the 8th International Conference on Spoken Language Processing, 2004

Crosscorrelation-based multispeaker speech activity detection.
Proceedings of the 8th International Conference on Spoken Language Processing, 2004

Speaker segmentation and clustering in meetings.
Proceedings of the 8th International Conference on Spoken Language Processing, 2004

The 2003 ISL rich transcription system for conversational telephony speech.
Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

2003
The SuperSID project: exploiting high-level information for high-accuracy speaker recognition.
Proceedings of the 2003 IEEE International Conference on Acoustics, 2003

Phonetic speaker recognition using maximum-likelihood binary-decision tree models.
Proceedings of the 2003 IEEE International Conference on Acoustics, 2003

Combining cross-stream and time dimensions in phonetic speaker recognition.
Proceedings of the 2003 IEEE International Conference on Acoustics, 2003

2002
Phonetic speaker identification.
Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

Speaker identification using multilingual phone strings.
Proceedings of the IEEE International Conference on Acoustics, 2002

Improvements in Non-Verbal Cue Identification Using Multilingual Phone Strings.
Proceedings of the Workshop on Speech-to-Speech Translation: Algorithms and Systems@ACL 2002, 2002

2000
A na ve de-lambing method for speaker identification.
Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

Application of LDA to speaker recognition.
Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

1998
A high-performance text-independent speaker identification system based on BCDM.
Proceedings of the 5th International Conference on Spoken Language Processing, Incorporating The 7th Australian International Speech Science and Technology Conference, Sydney Convention Centre, Sydney, Australia, 30th November, 1998


  Loading...