Yingming Gao
Orcid: 0000-0001-5881-3723
According to our database1,
Yingming Gao
authored at least 39 papers
between 2015 and 2024.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
On csauthors.net:
Bibliography
2024
Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation.
IEEE ACM Trans. Audio Speech Lang. Process., 2024
Articulatory Copy Synthesis Based on the Speech Synthesizer VocalTractLab and Convolutional Recurrent Neural Networks.
IEEE ACM Trans. Audio Speech Lang. Process., 2024
Enhancing Modal Fusion by Alignment and Label Matching for Multimodal Emotion Recognition.
CoRR, 2024
Retrieval Augmented Generation in Prompt-based Text-to-Speech Synthesis with Context-Aware Contrastive Language-Audio Pretraining.
CoRR, 2024
Improving Audio Codec-based Zero-Shot Text-to-Speech Synthesis with Multi-Modal Context and Large Language Model.
CoRR, 2024
Proceedings of the IEEE International Conference on Acoustics, 2024
Concss: Contrastive-based Context Comprehension for Dialogue-Appropriate Prosody in Conversational Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2024
2023
CoRR, 2023
CoRR, 2023
Mining High-quality Samples from Raw Data and Majority Voting Method for Multimodal Emotion Recognition.
Proceedings of the 31st ACM International Conference on Multimedia, 2023
CMCU-CSS: Enhancing Naturalness via Commonsense-based Multi-modal Context Understanding in Conversational Speech Synthesis.
Proceedings of the 31st ACM International Conference on Multimedia, 2023
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Dual Audio Encoders Based Mandarin Prosodic Boundary Prediction by Using Multi-Granularity Prosodic Representations.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Exploring the interpretability in speech-based adolescent depression detection by SHAP.
Proceedings of the 9th International Conference on Communication and Information Processing, 2023
GaitParse: Gait Parsing Algorithm with Self-Supervised Fine-Tuning for Gait Recognition.
Proceedings of the 9th International Conference on Communication and Information Processing, 2023
M<sup>2</sup>-CTTS: End-to-End Multi-Scale Multi-Modal Conversational Text-to-Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2023
2022
PhD thesis, 2022
IEEE ACM Trans. Audio Speech Lang. Process., 2022
A Keypoint Based Enhancement Method for Audio Driven Free View Talking Head Synthesis.
Proceedings of the 24th IEEE International Workshop on Multimedia Signal Processing, 2022
An Entropy-based Study on the Acquisition of Mandarin Initial Consonants by Korean Learners.
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022
The Disyllabic Tone Production and Tone Context Effect in Mandarin-speaking Children with Cochlear Implants.
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022
The Contribution of Phonological and Fluency Factors to Chinese L2 Comprehensibility Ratings: A Case Study of Urdu-speaking Learners.
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022
A study of production error analysis for Mandarin-speaking Children with Hearing Impairment.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
The Importance of Lexical Tone for Sentence Understanding: Utilizing Functional Load Principle to Simulate Comprehension Process.
Proceedings of the International Conference on Asian Language Processing, 2022
2021
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021
2020
J. Signal Process. Syst., 2020
An Investigation of the Target Approximation Model for Tone Modeling and Recognition in Continuous Mandarin Speech.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Formant Tracking Using Dilated Convolutional Networks Through Dense Connection with Gating Mechanism.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
2019
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Proceedings of the Green Energy and Networking - 6th EAI International Conference, 2019
2018
Improving Mandarin Tone Recognition Based on DNN by Combining Acoustic and Articulatory Features Using Extended Recognition Networks.
J. Signal Process. Syst., 2018
Speaking Rate Changes Affect Phone Durations Differently for Neutral and Emotional Speech.
Proceedings of the 26th European Signal Processing Conference, 2018
2017
Improving pronunciation erroneous tendency detection with convolutional long short-term memory.
Proceedings of the 2017 International Conference on Asian Language Processing, 2017
2016
Improving Mandarin tone recognition based on DNN by combining acoustic and articulatory features.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016
2015
A study on robust detection of pronunciation erroneous tendency based on deep neural network.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015