PAL: Prompting Analytic Learning with Missing Modality for Multi-Modal Class-Incremental Learning.
CoRR, January, 2025
Enhancing Real-World Active Speaker Detection With Multi-Modal Extraction Pre-Training.
IEEE Trans. Multim., 2025
k-path-connectivity of the complete balanced tripartite graph Kn,n,n for n+1≤k≤2n-4.
Discret. Appl. Math., 2025
MEDSAGE: Enhancing Robustness of Medical Dialogue Summarization to ASR Errors with LLM-generated Synthetic Dialogues.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025
Text-Guided HuBERT: Self-Supervised Speech Pre-Training via Generative Adversarial Networks.
IEEE Signal Process. Lett., 2024
Transferable Adversarial Attacks Against ASR.
IEEE Signal Process. Lett., 2024
VoiceBench: Benchmarking LLM-Based Voice Assistants.
CoRR, 2024
Emo-DPO: Controllable Emotional Speech Synthesis through Direct Preference Optimization.
CoRR, 2024
Speech-Mamba: Long-Context Speech Recognition with Selective State Spaces Models.
Proceedings of the IEEE Spoken Language Technology Workshop, 2024
Beyond Single-Audio: Advancing Multi-Audio Processing in Audio Large Language Models.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024
The single- and dual-brain mechanisms underlying the adviser's confidence expression strategy switching during influence management.
NeuroImage, April, 2023
PoLyScriber: Integrated Fine-Tuning of Extractor and Lyrics Transcriber for Polyphonic Music.
IEEE ACM Trans. Audio Speech Lang. Process., 2023
Token2vec: A Joint Self-Supervised Pre-Training Framework Using Unpaired Speech and Text.
Proceedings of the IEEE International Conference on Acoustics, 2023
Self-Transriber: Few-Shot Lyrics Transcription With Self-Training.
Proceedings of the IEEE International Conference on Acoustics, 2023
Automatic Lyrics Transcription of Polyphonic Music With Lyrics-Chord Multi-Task Learning.
IEEE ACM Trans. Audio Speech Lang. Process., 2022
Note on Path-Connectivity of Complete Bipartite Graphs.
J. Interconnect. Networks, 2022
PoLyScribers: Joint Training of Vocal Extractor and Lyrics Transcriber for Polyphonic Music.
CoRR, 2022
k-Path-Connectivity of Completely Balanced Tripartite Graphs.
Axioms, 2022
Genre-Conditioned Acoustic Models for Automatic Lyrics Transcription of Polyphonic Music.
Proceedings of the IEEE International Conference on Acoustics, 2022
NHSS: A speech and singing parallel database.
Speech Commun., 2021
The mutuality of social emotions: How the victim's reactive attitude influences the transgressor's emotional responses.
NeuroImage, 2021
Affective evaluation of others' altruistic decisions under risk and ambiguity.
NeuroImage, 2020
Personalized Singing Voice Generation Using WaveRNN.
Proceedings of the Odyssey 2020: The Speaker and Language Recognition Workshop, 2020
NUS Speak-to-Sing: A Web Platform for Personalized Speech-to-Singing Conversion.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Speaker-independent Spectral Mapping for Speech-to-Singing Conversion.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019
Behaviour Pattern When Designers Have Difficulties.
Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics, 2018
Analysis of Speech and Singing Signals for Temporal Alignment.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018