Qian Chen

Orcid: 0000-0001-6939-7438

Affiliations:
  • Alibaba Group, DAMO Academy, Speech Lab, China
  • University of Science and Technology of China, National Engineering Laboratory of Speech and Language Information Processing, Hefei, China


According to our database1, Qian Chen authored at least 79 papers between 2015 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Tuning Large Language Model for Speech Recognition With Mixed-Scale Re-Tokenization.
IEEE Signal Process. Lett., 2024

WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling.
CoRR, 2024

Integrating Audio, Visual, and Semantic Information for Enhanced Multimodal Speaker Diarization.
CoRR, 2024

Recording for Eyes, Not Echoing to Ears: Contextualized Spoken-to-Written Conversion of ASR Transcripts.
CoRR, 2024

Multimodal Fusion and Coherence Modeling for Video Topic Segmentation.
CoRR, 2024

CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens.
CoRR, 2024

ClinicalLab: Aligning Agents for Multi-Departmental Clinical Diagnostics in the Real World.
CoRR, 2024

Skip-Layer Attention: Bridging Abstract and Detailed Dependencies in Transformers.
CoRR, 2024

ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and Zero-shot Language Style Control With Decoupled Codec.
CoRR, 2024

CodeHalu: Code Hallucinations in LLMs Driven by Execution-based Verification.
CoRR, 2024

An Embarrassingly Simple Approach for LLM with Strong ASR Capacity.
CoRR, 2024

E-chat: Emotion-sensitive Spoken Dialogue System with Large Language Models.
CoRR, 2024

Leveraging Speech PTM, Text LLM, And Emotional TTS For Speech Emotion Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2024

Loss Masking Is Not Needed In Decoder-Only Transformer For Discrete-Token-Based ASR.
Proceedings of the IEEE International Conference on Acoustics, 2024

Advancing Precise Outline-Conditioned Text Generation with Task Duality and Explicit Outline Control.
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics, 2024

CodeScope: An Execution-based Multilingual Multitask Multidimensional Benchmark for Evaluating LLMs on Code Understanding and Generation.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023
LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT.
CoRR, 2023

Improving Speaker Diarization using Semantic Information: Joint Pairwise Constraints Propagation.
CoRR, 2023

Self-Distillation Network with Ensemble Prototypes: Learning Robust Speaker Representations without Supervision.
CoRR, 2023

Improving BERT with Hybrid Pooling Network and Drop Mask.
CoRR, 2023

3D-Speaker: A Large-Scale Multi-Device, Multi-Distance, and Multi-Dialect Corpus for Speech Representation Disentanglement.
CoRR, 2023

Exploiting Correlations Between Contexts and Definitions with Multiple Definition Modeling.
CoRR, 2023

Enhancing Generation through Summarization Duality and Explicit Outline Control.
CoRR, 2023

Enhancing Multi-modal Multi-hop Question Answering via Structured Knowledge and Unified Retrieval-Generation.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

DePA: Improving Non-autoregressive Translation with Dependency-Aware Decoder.
Proceedings of the 20th International Conference on Spoken Language Translation, 2023

CAM++: A Fast and Efficient Network for Speaker Verification Using Context-Aware Masking.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Semantic VAD: Low-Latency Voice Activity Detection for Speech Interaction.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

CASA-ASR: Context-Aware Speaker-Attributed ASR.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Adapter-tuning with Effective Token-dependent Representation Shift for Automatic Speech Recognition.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

BA-SOT: Boundary-Aware Serialized Output Training for Multi-Talker ASR.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Personality-aware Training based Speaker Adaptation for End-to-end Speech Recognition.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

An Enhanced Res2Net with Local and Global Feature Fusion for Speaker Verification.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

MUG: A General Meeting Understanding and Generation Benchmark.
Proceedings of the IEEE International Conference on Acoustics, 2023

Overview of the ICASSP 2023 General Meeting Understanding and Generation Challenge (MUG).
Proceedings of the IEEE International Conference on Acoustics, 2023

Weighted Sampling for Masked Language Modeling.
Proceedings of the IEEE International Conference on Acoustics, 2023

Auxiliary Pooling Layer For Spoken Language Understanding.
Proceedings of the IEEE International Conference on Acoustics, 2023

Meeting Action Item Detection with Regularized Context Modeling.
Proceedings of the IEEE International Conference on Acoustics, 2023

Pushing the Limits of Self-Supervised Speaker Verification using Regularized Distillation Framework.
Proceedings of the IEEE International Conference on Acoustics, 2023

Improving Long Document Topic Segmentation Models With Enhanced Coherence Modeling.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

CodeTransOcean: A Comprehensive Multilingual Benchmark for Code Translation.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Ditto: A Simple and Efficient Approach to Improve Sentence Embeddings.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

The Second Multi-Channel Multi-Party Meeting Transcription Challenge (M2MeT 2.0): A Benchmark for Speaker-Attributed ASR.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

A Comparative Study on Multichannel Speaker-Attributed Automatic Speech Recognition in Multi-party Meetings.
Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2023

DopplerBAS: Binaural Audio Synthesis Addressing Doppler Effect.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

Exploring Speaker-Related Information in Spoken Language Understanding for Better Speaker Diarization.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

2022
Enhancing Multi-modal and Multi-hop Question Answering via Structured Knowledge and Unified Retrieval-Generation.
CoRR, 2022

Non-autoregressive Translation with Dependency-Aware Decoder.
CoRR, 2022

PRISM: Pre-trained Indeterminate Speaker Representation Model for Speaker Diarization and Speaker Verification.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

PoNet: Pooling Network for Efficient Token Mixing in Long Sequences.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Prosospeech: Enhancing Prosody with Quantized Vector Pre-Training in Text-To-Speech.
Proceedings of the IEEE International Conference on Acoustics, 2022

MDERank: A Masked Document Embedding Rank Approach for Unsupervised Keyphrase Extraction.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2022, 2022

2021
BeamTransformer: Microphone Array-based Overlapping Speech Detection.
CoRR, 2021

TRS: Transferability Reduced Ensemble via Encouraging Gradient Diversity and Model Smoothness.
CoRR, 2021

TRS: Transferability Reduced Ensemble via Promoting Gradient Diversity and Model Smoothness.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Pre-Training for Spoken Language Understanding with Joint Textual and Phonetic Representation Learning.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Discriminative Self-Training for Punctuation Prediction.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Sequence Model with Self-Adaptive Sliding Window for Efficient Spoken Document Segmentation.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

2020
Sequential neural networks for noetic end-to-end response selection.
Comput. Speech Lang., 2020

Controllable Time-Delay Transformer for Real-Time Punctuation Prediction and Disfluency Detection.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

T3: Tree-Autoencoder Constrained Adversarial Text Generation for Targeted Attack.
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

2019
Align, Mask and Select: A Simple Method for Incorporating Commonsense Knowledge into Language Representation Models.
CoRR, 2019

Several Experiments on Investigating Pretraining and Knowledge-Enhanced Models for Natural Language Inference.
CoRR, 2019

BERT for Joint Intent Classification and Slot Filling.
CoRR, 2019

Sequential Attention-based Network for Noetic End-to-End Response Selection.
CoRR, 2019

Sequential Matching Model for End-to-end Multi-turn Response Selection.
Proceedings of the IEEE International Conference on Acoustics, 2019

Transfer Learning for Context-Aware Spoken Language Understanding.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

2018
A Sequential Neural Encoder With Latent Structured Description for Modeling Sentences.
IEEE ACM Trans. Audio Speech Lang. Process., 2018

VisDrone-DET2018: The Vision Meets Drone Object Detection in Image Challenge Results.
Proceedings of the Computer Vision - ECCV 2018 Workshops, 2018

Enhancing Sentence Embedding with Generalized Pooling.
Proceedings of the 27th International Conference on Computational Linguistics, 2018

Neural Natural Language Inference Models Enhanced with External Knowledge.
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018

2017
Natural Language Inference with External Knowledge.
CoRR, 2017

Exploring Question Understanding and Adaptation in Neural-Network-Based Question Answering.
CoRR, 2017

Recurrent Neural Network-Based Sentence Encoder with Gated Attention for Natural Language Inference.
Proceedings of the 2nd Workshop on Evaluating Vector Space Representations for NLP, 2017

Enhanced LSTM for Natural Language Inference.
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017

2016
Distraction-Based Neural Networks for Document Summarization.
CoRR, 2016

Enhancing and Combining Sequential and Tree LSTM for Natural Language Inference.
CoRR, 2016

Distraction-Based Neural Networks for Modeling Document.
Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, 2016

2015
Automatic phrase boundary labeling of speech synthesis database using context-dependent HMMs and n-gram prior distributions.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Revisiting Word Embedding for Contrasting Meaning.
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, 2015


  Loading...