Qihui Jin

Discret. Appl. Math., 2025

MEDSAGE: Enhancing Robustness of Medical Dialogue Summarization to ASR Errors with LLM-generated Synthetic Dialogues.

[DOI]

Kuluhan Binici

Abhinav Ramesh Kashyap

Viktor Schlegel

Andy T. Liu

Vijay Prakash Dwivedi

Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024

Text-Guided HuBERT: Self-Supervised Speech Pre-Training via Generative Adversarial Networks.

[DOI]

IEEE Signal Process. Lett., 2024

Transferable Adversarial Attacks Against ASR.

[DOI]

IEEE Signal Process. Lett., 2024

VoiceBench: Benchmarking LLM-Based Voice Assistants.

[DOI]

CoRR, 2024

Emo-DPO: Controllable Emotional Speech Synthesis through Direct Preference Optimization.

[DOI]

CoRR, 2024

Speech-Mamba: Long-Context Speech Recognition with Selective State Spaces Models.

[DOI]

Nancy F. Chen

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

Beyond Single-Audio: Advancing Multi-Audio Processing in Audio Large Language Models.

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

2023

The single- and dual-brain mechanisms underlying the adviser's confidence expression strategy switching during influence management.

[DOI]

NeuroImage, April, 2023

PoLyScriber: Integrated Fine-Tuning of Extractor and Lyrics Transcriber for Polyphonic Music.

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2023

Token2vec: A Joint Self-Supervised Pre-Training Framework Using Unpaired Speech and Text.

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Self-Transriber: Few-Shot Lyrics Transcription With Self-Training.

[DOI]

Xianghu Yue

Proceedings of the IEEE International Conference on Acoustics, 2023

2022

Automatic Lyrics Transcription of Polyphonic Music With Lyrics-Chord Multi-Task Learning.

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2022

Note on Path-Connectivity of Complete Bipartite Graphs.

[DOI]

Shasha Li

Yan Zhao

J. Interconnect. Networks, 2022

PoLyScribers: Joint Training of Vocal Extractor and Lyrics Transcriber for Polyphonic Music.

[DOI]

CoRR, 2022

k-Path-Connectivity of Completely Balanced Tripartite Graphs.

[DOI]

Pi Wang

Shasha Li

Axioms, 2022

Genre-Conditioned Acoustic Models for Automatic Lyrics Transcription of Polyphonic Music.

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

2021

NHSS: A speech and singing parallel database.

[DOI]

Speech Commun., 2021

The mutuality of social emotions: How the victim's reactive attitude influences the transgressor's emotional responses.

[DOI]

NeuroImage, 2021

2020

Affective evaluation of others' altruistic decisions under risk and ambiguity.

[DOI]

NeuroImage, 2020

Personalized Singing Voice Generation Using WaveRNN.

[DOI]

Proceedings of the Odyssey 2020: The Speaker and Language Recognition Workshop, 2020

2019

NUS Speak-to-Sing: A Web Platform for Personalized Speech-to-Singing Conversion.

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Speaker-independent Spectral Mapping for Speech-to-Singing Conversion.

[DOI]

Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

2018

Behaviour Pattern When Designers Have Difficulties.

[DOI]

Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics, 2018

Analysis of Speech and Singing Signals for Temporal Alignment.

[DOI]

Karthika Vijayan