Pushing the Frontiers of Self-Distillation Prototypes Network with Dimension Regularization and Score Normalization.
CoRR, May, 2025
Exploring Text-Queried Sound Event Detection with Audio Source Separation.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025
3D-Speaker-Toolkit: An Open-Source Toolkit for Multimodal Speaker Verification and Diarization.
,
,
,
,
,
,
,
,
,
,
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025
Self-Distillation Prototypes Network: Learning Robust Speaker Representations without Supervision.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025
Recording for Eyes, Not Echoing to Ears: Contextualized Spoken-to-Written Conversion of ASR Transcripts.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025
Strong Consistency of Spectral Clustering for the Sparse Degree-Corrected Hypergraph Stochastic Block Model.
IEEE Trans. Inf. Theory, 2024
Lightweight Detection Methods for Insulator Self-Explosion Defects.
Sensors, 2024
CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2024
OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation.
CoRR, 2024
Recording for Eyes, Not Echoing to Ears: Contextualized Spoken-to-Written Conversion of ASR Transcripts.
CoRR, 2024
Multimodal Fusion and Coherence Modeling for Video Topic Segmentation.
CoRR, 2024
FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2024
Skip-Layer Attention: Bridging Abstract and Detailed Dependencies in Transformers.
CoRR, 2024
Loss Masking Is Not Needed In Decoder-Only Transformer For Discrete-Token-Based ASR.
Proceedings of the IEEE International Conference on Acoustics, 2024
Sliding Mode Control Model of Two-phase Hybrid Stepping Motor Based on Improved Harris Hawks Optimization Algorithm.
Proceedings of the International Conference on Advanced Robotics and Mechatronics, 2024
Improving BERT with Hybrid Pooling Network and Drop Mask.
CoRR, 2023
Hyperlink prediction via local random walks and Jensen-Shannon divergence.
CoRR, 2023
MUG: A General Meeting Understanding and Generation Benchmark.
Proceedings of the IEEE International Conference on Acoustics, 2023
Overview of the ICASSP 2023 General Meeting Understanding and Generation Challenge (MUG).
Proceedings of the IEEE International Conference on Acoustics, 2023
Weighted Sampling for Masked Language Modeling.
Proceedings of the IEEE International Conference on Acoustics, 2023
Meeting Action Item Detection with Regularized Context Modeling.
Proceedings of the IEEE International Conference on Acoustics, 2023
Improving Long Document Topic Segmentation Models With Enhanced Coherence Modeling.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023
Ditto: A Simple and Efficient Approach to Improve Sentence Embeddings.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023
MDERank: A Masked Document Embedding Rank Approach for Unsupervised Keyphrase Extraction.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2022, 2022
LCQMC: A Large-scale Chinese Question Matching Corpus.
Proceedings of the 27th International Conference on Computational Linguistics, 2018
Optical flow-based face tracking in <i>The Mummy</i>.
Proceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference, 2017