Chenpeng Du

Orcid: 0000-0001-5329-0847

According to our database1, Chenpeng Du authored at least 35 papers between 2019 and 2024.

Collaborative distances:
  • Dijkstra number2 of five.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
E$^{3}$TTS: End-to-End Text-Based Speech Editing TTS System and Its Applications.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

Why Do Speech Language Models Fail to Generate Semantically Coherent Outputs? A Modality Evolving Perspective.
CoRR, 2024

LSCodec: Low-Bitrate and Speaker-Decoupled Discrete Speech Codec.
CoRR, 2024

vec2wav 2.0: Advancing Voice Conversion via Discrete Token Vocoders.
CoRR, 2024

Language Model Can Listen While Speaking.
CoRR, 2024

Attention-Constrained Inference for Robust Decoder-Only Text-to-Speech.
CoRR, 2024

GSTalker: Real-time Audio-Driven Talking Face Generation via Deformable Gaussian Splatting.
CoRR, 2024

The X-LANCE Technical Report for Interspeech 2024 Speech Processing Using Discrete Speech Unit Challenge.
CoRR, 2024

VALL-T: Decoder-Only Generative Transducer for Robust and Decoding-Controllable Text-to-Speech.
CoRR, 2024

AniTalker: Animate Vivid and Diverse Talking Faces through Identity-Decoupled Facial Motion Encoding.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Generation-Based Target Speech Extraction with Speech Discretization and Vocoder.
Proceedings of the IEEE International Conference on Acoustics, 2024

Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS.
Proceedings of the IEEE International Conference on Acoustics, 2024

Acoustic BPE for Speech Generation with Discrete Tokens.
Proceedings of the IEEE International Conference on Acoustics, 2024

DiffDub: Person-Generic Visual Dubbing Using Inpainting Renderer with Diffusion Auto-Encoder.
Proceedings of the IEEE International Conference on Acoustics, 2024

VoiceFlow: Efficient Text-To-Speech with Rectified Flow Matching.
Proceedings of the IEEE International Conference on Acoustics, 2024

UniCATS: A Unified Context-Aware Text-to-Speech Framework with Contextual VQ-Diffusion and Vocoding.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Speaker Adaptive Text-to-Speech With Timbre-Normalized Vector-Quantized Feature.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

Improving Code-Switching and Named Entity Recognition in ASR with Speech Editing based Data Augmentation.
CoRR, 2023

DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

DSE-TTS: Dual Speaker Embedding for Cross-Lingual Text-to-Speech.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Improving Code-Switching and Name Entity Recognition in ASR with Speech Editing based Data Augmentation.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Emodiff: Intensity Controllable Emotional Text-to-Speech with Soft-Label Guidance.
Proceedings of the IEEE International Conference on Acoustics, 2023

Multi-Speaker Multi-Lingual VQTTS System for LIMMITS 2023 Challenge.
Proceedings of the IEEE International Conference on Acoustics, 2023

2022
Phone-Level Prosody Modelling With GMM-Based MDN for Diverse and Controllable Speech Synthesis.
IEEE ACM Trans. Audio Speech Lang. Process., 2022

Neural Fusion for Voice Cloning.
IEEE ACM Trans. Audio Speech Lang. Process., 2022

VQTTS: High-Fidelity Text-to-Speech Synthesis with Self-Supervised VQ Acoustic Feature.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Unsupervised Word-Level Prosody Tagging for Controllable Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2022

2021
Diverse and Controllable Speech Synthesis with GMM-Based Phone-Level Prosody Modelling.
CoRR, 2021

Mixture Density Network for Phone-Level Prosody Modelling in Speech Synthesis.
CoRR, 2021

Data Augmentation for end-to-end Code-Switching Speech Recognition.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Rich Prosody Diversity Modelling with Phone-Level Mixture Density Network.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Towards Data Selection on TTS Data for Children's Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2021

SynAug: Synthesis-Based Data Augmentation for Text-Dependent Speaker Verification.
Proceedings of the IEEE International Conference on Acoustics, 2021

2020
Speaker Augmentation for Low Resource Speech Recognition.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019
SJTU Entry in Blizzard Challenge 2019.
Proceedings of the Blizzard Challenge 2019, Vienna, Austria, September 23, 2019, 2019


  Loading...