Dongchao Yang

Orcid: 0000-0003-0879-4047

According to our database1, Dongchao Yang authored at least 52 papers between 2020 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
InstructTTS: Modelling Expressive TTS in Discrete Latent Space With Natural Language Style Prompt.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

Codec-SUPERB @ SLT 2024: A lightweight benchmark for neural audio codec models.
CoRR, 2024

AudioComposer: Towards Fine-grained Audio Generation with Natural Language Descriptions.
CoRR, 2024

Speaking from Coarse to Fine: Improving Neural Codec Language Model via Multi-Scale Speech Coding and Generation.
CoRR, 2024

SoCodec: A Semantic-Ordered Multi-Stream Speech Codec for Efficient Language Model Based Text-to-Speech Synthesis.
CoRR, 2024

SimpleSpeech 2: Towards Simple and Efficient Text-to-Speech with Flow-based Scalar Latent Transformer Diffusion Models.
CoRR, 2024

UniAudio 1.5: Large Language Model-driven Audio Codec is A Few-shot Audio Task Learner.
CoRR, 2024

CoLM-DSR: Leveraging Neural Codec Language Modeling for Multi-Modal Dysarthric Speech Reconstruction.
CoRR, 2024

Addressing Index Collapse of Large-Codebook Speech Tokenizer with Dual-Decoding Product-Quantized Variational Auto-Encoder.
CoRR, 2024

SimpleSpeech: Towards Simple and Efficient Text-to-Speech with Scalar Latent Transformer Diffusion Models.
CoRR, 2024

RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis.
CoRR, 2024

NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models.
CoRR, 2024

VoiceTuner: Self-Supervised Pre-training and Efficient Fine-tuning For Voice Generation.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

UniAudio: Towards Universal Audio Generation with Large Language Models.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

InstructSpeech: Following Speech Editing Instructions via Large Language Models.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

PromptTTS 2: Describing and Generating Voices with Text Prompt.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Consistent and Relevant: Rethink the Query Embedding in General Sound Separation.
Proceedings of the IEEE International Conference on Acoustics, 2024

DPM-TSE: A Diffusion Probabilistic Model for Target Sound Extraction.
Proceedings of the IEEE International Conference on Acoustics, 2024

Make-A-Voice: Revisiting Voice Large Language Models as Scalable Multilingual and Multitask Learners.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Diffsound: Discrete Diffusion Model for Text-to-Sound Generation.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

UniAudio: An Audio Foundation Model Toward Universal Audio Generation.
CoRR, 2023

PromptTTS 2: Describing and Generating Voices with Text Prompt.
CoRR, 2023

Make-A-Voice: Unified Voice Synthesis With Discrete Representation.
CoRR, 2023

Make-An-Audio 2: Temporal-Enhanced Text-to-Audio Generation.
CoRR, 2023

HiFi-Codec: Group-residual Vector quantization for High Fidelity Audio Codec.
CoRR, 2023

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head.
CoRR, 2023

InstructTTS: Modelling Expressive TTS in Discrete Latent Space with Natural Language Style Prompt.
CoRR, 2023

NoreSpeech: Knowledge Distillation based Conditional Diffusion Model for Noise-robust Expressive TTS.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Background-aware Modeling for Weakly Supervised Sound Event Detection.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models.
Proceedings of the International Conference on Machine Learning, 2023

Improving Text-Audio Retrieval by Text-Aware Attention Pooling and Prior Matrix Revised Loss.
Proceedings of the IEEE International Conference on Acoustics, 2023

Improving Weakly Supervised Sound Event Detection with Causal Intervention.
Proceedings of the IEEE International Conference on Acoustics, 2023

NADiffuSE: Noise-aware Diffusion-based Model for Speech Enhancement.
Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2023

2022
A Two-student Learning Framework for Mixed Supervised Target Sound Detection.
CoRR, 2022

A Mobile Robot Design for Efficient and Large-Scale Solar Panel Cleaning.
Proceedings of the IEEE International Conference on Robotics and Biomimetics, 2022

Target Confusion in End-to-end Speaker Extraction: Analysis and Approaches.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Speaker-Aware Mixture of Mixtures Training for Weakly Supervised Speaker Extraction.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

RaDur: A Reference-aware and Duration-robust Network for Target Sound Detection.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Audio Pyramid Transformer with Domain Adaption for Weakly Supervised Sound Event Detection and Audio Classification.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Improving Target Sound Extraction with Timestamp Information.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

A Mutual Learning Framework for Few-Shot Sound Event Detection.
Proceedings of the IEEE International Conference on Acoustics, 2022

A Mixed Supervised Learning Framework For Target Sound Detection.
Proceedings of the 7th Workshop on Detection and Classification of Acoustic Scenes and Events 2022, 2022

Detect What You Want: Target Sound Detection.
Proceedings of the 7th Workshop on Detection and Classification of Acoustic Scenes and Events 2022, 2022

Omnidirectional Motion Control Method of Quadruped Robot Based on 3D-CPG Oscillator Group.
Proceedings of the Robotics in Natural Settings, 2022

2021
Detect what you want: Target Sound Detection.
CoRR, 2021

Unsupervised Multi-Target Domain Adaptation for Acoustic Scene Classification.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

YOLOv3 with Asymmetric Intersection over Union Based Loss Function for Human Detection.
Proceedings of the ICMLSC '21: 2021 The 5th International Conference on Machine Learning and Soft Computing, 2021

Improving the Performance of Automated Audio Captioning via Integrating the Acoustic and Semantic Information.
Proceedings of the 6th Workshop on Detection and Classification of Acoustic Scenes and Events 2021 (DCASE 2021), 2021

2020
Towards Data Distillation for End-to-end Spoken Conversational Question Answering.
CoRR, 2020

A petal-array capacitive tactile sensor with micro-pin for robotic fingertip sensing.
Proceedings of the 3rd IEEE International Conference on Soft Robotics, 2020


  Loading...