2025
Pushing the Frontiers of Self-Distillation Prototypes Network with Dimension Regularization and Score Normalization.
CoRR, May, 2025

MinMo: A Multimodal Large Language Model for Seamless Voice Interaction.
CoRR, January, 2025

Exploring Text-Queried Sound Event Detection with Audio Source Separation.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

3D-Speaker-Toolkit: An Open-Source Toolkit for Multimodal Speaker Verification and Diarization.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Self-Distillation Prototypes Network: Learning Robust Speaker Representations without Supervision.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Recording for Eyes, Not Echoing to Ears: Contextualized Spoken-to-Written Conversion of ASR Transcripts.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024
Strong Consistency of Spectral Clustering for the Sparse Degree-Corrected Hypergraph Stochastic Block Model.
IEEE Trans. Inf. Theory, 2024

Lightweight Detection Methods for Insulator Self-Explosion Defects.
Sensors, 2024

CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models.
CoRR, 2024

OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation.
CoRR, 2024

Recording for Eyes, Not Echoing to Ears: Contextualized Spoken-to-Written Conversion of ASR Transcripts.
CoRR, 2024

Multimodal Fusion and Coherence Modeling for Video Topic Segmentation.
CoRR, 2024

FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs.
CoRR, 2024

Skip-Layer Attention: Bridging Abstract and Detailed Dependencies in Transformers.
CoRR, 2024

Loss Masking Is Not Needed In Decoder-Only Transformer For Discrete-Token-Based ASR.
Proceedings of the IEEE International Conference on Acoustics, 2024

Sliding Mode Control Model of Two-phase Hybrid Stepping Motor Based on Improved Harris Hawks Optimization Algorithm.
Proceedings of the International Conference on Advanced Robotics and Mechatronics, 2024

2023
Improving BERT with Hybrid Pooling Network and Drop Mask.
CoRR, 2023

Hyperlink prediction via local random walks and Jensen-Shannon divergence.
CoRR, 2023

MUG: A General Meeting Understanding and Generation Benchmark.
Proceedings of the IEEE International Conference on Acoustics, 2023

Overview of the ICASSP 2023 General Meeting Understanding and Generation Challenge (MUG).
Proceedings of the IEEE International Conference on Acoustics, 2023

Weighted Sampling for Masked Language Modeling.
Proceedings of the IEEE International Conference on Acoustics, 2023

Meeting Action Item Detection with Regularized Context Modeling.
Proceedings of the IEEE International Conference on Acoustics, 2023

Improving Long Document Topic Segmentation Models With Enhanced Coherence Modeling.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Ditto: A Simple and Efficient Approach to Improve Sentence Embeddings.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

2022
MDERank: A Masked Document Embedding Rank Approach for Unsupervised Keyphrase Extraction.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2022, 2022

2018
LCQMC: A Large-scale Chinese Question Matching Corpus.
Proceedings of the 27th International Conference on Computational Linguistics, 2018

2017
Optical flow-based face tracking in <i>The Mummy</i>.
Proceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference, 2017