Yingming Gao

Orcid: 0000-0001-5881-3723

According to our database1, Yingming Gao authored at least 39 papers between 2015 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

Articulatory Copy Synthesis Based on the Speech Synthesizer VocalTractLab and Convolutional Recurrent Neural Networks.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

Enhancing Modal Fusion by Alignment and Label Matching for Multimodal Emotion Recognition.
CoRR, 2024

ICAGC 2024: Inspirational and Convincing Audio Generation Challenge 2024.
CoRR, 2024

SPA-SVC: Self-supervised Pitch Augmentation for Singing Voice Conversion.
CoRR, 2024

Retrieval Augmented Generation in Prompt-based Text-to-Speech Synthesis with Context-Aware Contrastive Language-Audio Pretraining.
CoRR, 2024

Improving Audio Codec-based Zero-Shot Text-to-Speech Synthesis with Multi-Modal Context and Large Language Model.
CoRR, 2024

Frame-Level Emotional State Alignment Method for Speech Emotion Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2024

Concss: Contrastive-based Context Comprehension for Dialogue-Appropriate Prosody in Conversational Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2024

2023
Spoken Language Intelligence of Large Language Models for Language Learning.
CoRR, 2023

M2-CTTS: End-to-End Multi-scale Multi-modal Conversational Text-to-Speech Synthesis.
CoRR, 2023

Mining High-quality Samples from Raw Data and Majority Voting Method for Multimodal Emotion Recognition.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

CMCU-CSS: Enhancing Naturalness via Commonsense-based Multi-modal Context Understanding in Conversational Speech Synthesis.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

FTA-net: A Frequency and Time Attention Network for Speech Depression Detection.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Dual Audio Encoders Based Mandarin Prosodic Boundary Prediction by Using Multi-Granularity Prosodic Representations.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Exploring the interpretability in speech-based adolescent depression detection by SHAP.
Proceedings of the 9th International Conference on Communication and Information Processing, 2023

GaitParse: Gait Parsing Algorithm with Self-Supervised Fine-Tuning for Gait Recognition.
Proceedings of the 9th International Conference on Communication and Information Processing, 2023

M<sup>2</sup>-CTTS: End-to-End Multi-Scale Multi-Modal Conversational Text-to-Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2023

2022
Articulatory Copy Synthesis Based on the Speech Synthesizer VocalTractLab.
PhD thesis, 2022

Articulatory Synthesis of Vocalized /r/ Allophones in German.
IEEE ACM Trans. Audio Speech Lang. Process., 2022

Text-Aware End-to-end Mispronunciation Detection and Diagnosis.
CoRR, 2022

A Keypoint Based Enhancement Method for Audio Driven Free View Talking Head Synthesis.
Proceedings of the 24th IEEE International Workshop on Multimedia Signal Processing, 2022

An Entropy-based Study on the Acquisition of Mandarin Initial Consonants by Korean Learners.
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

The Disyllabic Tone Production and Tone Context Effect in Mandarin-speaking Children with Cochlear Implants.
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

The Contribution of Phonological and Fluency Factors to Chinese L2 Comprehensibility Ratings: A Case Study of Urdu-speaking Learners.
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

A study of production error analysis for Mandarin-speaking Children with Hearing Impairment.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

The Importance of Lexical Tone for Sentence Understanding: Utilizing Functional Load Principle to Simulate Comprehension Process.
Proceedings of the International Conference on Asian Language Processing, 2022

2021
A Practical Way to Improve Automatic Phonetic Segmentation Performance.
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

2020
Improving Pronunciation Erroneous Tendency Detection with Multi-Model Soft Targets.
J. Signal Process. Syst., 2020

An Investigation of the Target Approximation Model for Tone Modeling and Recognition in Continuous Mandarin Speech.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Formant Tracking Using Dilated Convolutional Networks Through Dense Connection with Gating Mechanism.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

2019
Articulatory Copy Synthesis Based on a Genetic Algorithm.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Research on Illumination Estimation Based on Data Fitting.
Proceedings of the Green Energy and Networking - 6th EAI International Conference, 2019

2018
Improving Mandarin Tone Recognition Based on DNN by Combining Acoustic and Articulatory Features Using Extended Recognition Networks.
J. Signal Process. Syst., 2018

Speaking Rate Changes Affect Phone Durations Differently for Neutral and Emotional Speech.
Proceedings of the 26th European Signal Processing Conference, 2018

2017
Improving pronunciation erroneous tendency detection with convolutional long short-term memory.
Proceedings of the 2017 International Conference on Asian Language Processing, 2017

2016
Improving Mandarin tone recognition based on DNN by combining acoustic and articulatory features.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

DNN based detection of pronunciation erroneous tendency in data sparse condition.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016

2015
A study on robust detection of pronunciation erroneous tendency based on deep neural network.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015


  Loading...