Chao Weng

Orcid: 0009-0009-8712-9176

According to our database1, Chao Weng authored at least 96 papers between 2011 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
InstructTTS: Modelling Expressive TTS in Discrete Latent Space With Natural Language Style Prompt.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

Gull: A Generative Multifunctional Audio Codec.
CoRR, 2024

Consistent and Relevant: Rethink the Query Embedding in General Sound Separation.
Proceedings of the IEEE International Conference on Acoustics, 2024

Opine: Leveraging a Optimization-Inspired Deep Unfolding Method for Multi-Channel Speech Enhancement.
Proceedings of the IEEE International Conference on Acoustics, 2024

DurIAN-E 2: Duration Informed Attention Network with Adaptive Variational Autoencoder and Adversarial Learning for Expressive Text-to-Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2024

Sifisinger: A High-Fidelity End-to-End Singing Voice Synthesizer Based on Source-Filter Model.
Proceedings of the IEEE International Conference on Acoustics, 2024

Complexity Scaling for Speech Denoising.
Proceedings of the IEEE International Conference on Acoustics, 2024

VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Make-A-Voice: Revisiting Voice Large Language Models as Scalable Multilingual and Multitask Learners.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023
Diffsound: Discrete Diffusion Model for Text-to-Sound Generation.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

Integrating Lattice-Free MMI Into End-to-End Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

VideoCrafter1: Open Diffusion Models for High-Quality Video Generation.
CoRR, 2023

DurIAN-E: Duration Informed Attention Network For Expressive Text-to-Speech Synthesis.
CoRR, 2023

Rep2wav: Noise Robust text-to-speech Using self-supervised representations.
CoRR, 2023

Animate-A-Story: Storytelling with Retrieval-Augmented Video Generation.
CoRR, 2023

Make-A-Voice: Unified Voice Synthesis With Discrete Representation.
CoRR, 2023

HiFi-Codec: Group-residual Vector quantization for High Fidelity Audio Codec.
CoRR, 2023

InstructTTS: Modelling Expressive TTS in Discrete Latent Space with Natural Language Style Prompt.
CoRR, 2023

High Fidelity Speech Enhancement with Band-split RNN.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

NoreSpeech: Knowledge Distillation based Conditional Diffusion Model for Noise-robust Expressive TTS.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Bayes Risk Transducer: Transducer with Controllable Alignment Prediction.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Ultra Dual-Path Compression For Joint Echo Cancellation And Noise Suppression.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Diverse and Expressive Speech Prosody Prediction with Denoising Diffusion Probabilistic Model.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

SnakeGAN: A Universal Vocoder Leveraging DDSP Prior Knowledge and Periodic Inductive Bias.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2023

Bayes Risk CTC: Controllable CTC Alignment in Sequence-to-Sequence Tasks.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

TSpeech-AI System Description to the 5th Deep Noise Suppression (DNS) Challenge.
Proceedings of the IEEE International Conference on Acoustics, 2023

2022
Improving Mandarin End-to-End Speech Recognition With Word N-Gram Language Model.
IEEE Signal Process. Lett., 2022

Deep learning based multi-source localization with source splitting and its effectiveness in multi-talker speech recognition.
Comput. Speech Lang., 2022

An investigation of neural uncertainty estimation for target speaker extraction equipped RNN transducer.
Comput. Speech Lang., 2022

The DKU-Tencent System for the VoxCeleb Speaker Recognition Challenge 2022.
CoRR, 2022

LAE: Language-Aware Encoder for Monolingual and Multilingual ASR.
CoRR, 2022

Integrate Lattice-Free MMI into End-to-End Speech Recognition.
CoRR, 2022

Improving Target Sound Extraction with Timestamp Information.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Cross-Age Speaker Verification: Learning Age-Invariant Speaker Embeddings.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Multi-Channel Speaker Diarization Using Spatial Features for Meetings.
Proceedings of the IEEE International Conference on Acoustics, 2022

The CUHK-Tencent Speaker Diarization System for the ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Challenge.
Proceedings of the IEEE International Conference on Acoustics, 2022

Towards end-to-end Speaker Diarization with Generalized Neural Speaker Clustering.
Proceedings of the IEEE International Conference on Acoustics, 2022

Joint Modeling of Code-Switched and Monolingual ASR via Conditional Factorization.
Proceedings of the IEEE International Conference on Acoustics, 2022

Consistent Training and Decoding for End-to-End Speech Recognition Using Lattice-Free MMI.
Proceedings of the IEEE International Conference on Acoustics, 2022

Simple Attention Module Based Speaker Verification with Iterative Noisy Label Detection.
Proceedings of the IEEE International Conference on Acoustics, 2022

Enhancing Speaking Styles in Conversational Text-to-Speech Synthesis with Graph-Based Multi-Modal Context Modeling.
Proceedings of the IEEE International Conference on Acoustics, 2022

2021
Detect what you want: Target Sound Detection.
CoRR, 2021

GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10, 000 Hours of Transcribed Audio.
CoRR, 2021

Spoken Style Learning with Multi-modal Hierarchical Context Encoding for Conversational Text-to-Speech Synthesis.
CoRR, 2021

VARA-TTS: Non-Autoregressive Text-to-Speech Synthesis based on Very Deep VAE with Residual Attention.
CoRR, 2021

Neural Mask based Multi-channel Convolutional Beamforming for Joint Dereverberation, Echo Cancellation and Denoising.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

TeCANet: Temporal-Contextual Attention Network for Environment-Aware Speech Dereverberation.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Raw Waveform Encoder with Multi-Scale Globally Attentive Locally Recurrent Networks for End-to-End Speech Recognition.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

GigaSpeech: An Evolving, Multi-Domain ASR Corpus with 10, 000 Hours of Transcribed Audio.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

A Joint Training Framework of Multi-Look Separator and Speaker Embedding Extractor for Overlapped Speech.
Proceedings of the IEEE International Conference on Acoustics, 2021

Towards Robust Speaker Verification with Target Speaker Enhancement.
Proceedings of the IEEE International Conference on Acoustics, 2021

Self-Supervised Text-Independent Speaker Verification Using Prototypical Momentum Contrastive Learning.
Proceedings of the IEEE International Conference on Acoustics, 2021

Directional ASR: A New Paradigm for E2E Multi-Speaker Speech Recognition with Source Localization.
Proceedings of the IEEE International Conference on Acoustics, 2021

Non-Autoregressive Transformer ASR with CTC-Enhanced Decoder Input.
Proceedings of the IEEE International Conference on Acoustics, 2021

Improving RNN Transducer with Target Speaker Extraction and Neural Uncertainty Estimation.
Proceedings of the IEEE International Conference on Acoustics, 2021

Replay and Synthetic Speech Detection with Res2Net Architecture.
Proceedings of the IEEE International Conference on Acoustics, 2021

2020
DurIAN-SC: Duration Informed Attention Network Based Singing Voice Conversion System.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

DurIAN: Duration Informed Attention Network for Speech Synthesis.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Neural Spatio-Temporal Beamformer for Target Speech Separation.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Peking Opera Synthesis via Duration Informed Attention Network.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Minimum Bayes Risk Training of RNN-Transducer for End-to-End Speech Recognition.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Dfsmn-San with Persistent Memory Model for Automatic Speech Recognition.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Far-Field Location Guided Target Speech Extraction Using End-to-End Speech Recognition Objectives.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Pitchnet: Unsupervised Singing Voice Conversion with Pitch Adversarial Network.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

The Tencent speech synthesis system for Blizzard Challenge 2020.
Proceedings of the Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, 2020

2019
Erratum to: Past review, current progress, and challenges ahead on the cocktail party problem.
Frontiers Inf. Technol. Electron. Eng., 2019

Synthesising Expressiveness in Peking Opera via Duration Informed Attention Network.
CoRR, 2019

Learning Singing From Speech.
CoRR, 2019

DurIAN: Duration Informed Attention Network For Multimodal Synthesis.
CoRR, 2019

Large Margin Training for Attention Based End-to-End Speech Recognition.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Joint Training of Complex Ratio Mask Based Beamformer and Acoustic Model for Noise Robust Asr.
Proceedings of the IEEE International Conference on Acoustics, 2019

A Comparison of Lattice-free Discriminative Training Criteria for Purely Sequence-trained Neural Network Acoustic Models.
Proceedings of the IEEE International Conference on Acoustics, 2019

Token-wise Training for Attention Based End-to-end Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2019

Investigating End-to-end Speech Recognition for Mandarin-english Code-switching.
Proceedings of the IEEE International Conference on Acoustics, 2019

Component Fusion: Learning Replaceable Language Model Component for End-to-end Speech Recognition System.
Proceedings of the IEEE International Conference on Acoustics, 2019

Parametric Cepstral Mean Normalization for Robust Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2019

2018
Past review, current progress, and challenges ahead on the cocktail party problem.
Frontiers Inf. Technol. Electron. Eng., 2018

An Exploration of Directly Using Word as ACOUSTIC Modeling Unit for Speech Recognition.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Improving Attention-Based End-to-End ASR Systems with Sequence-Based Loss Functions.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

A Multistage Training Framework for Acoustic-to-Word Model.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Improving Attention Based Sequence-to-Sequence Models for End-to-End English Conversational Speech Recognition.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

2016
基于不可见字符的主副式网页信息隐藏算法 (Primary and Secondary Webpage Information Hiding Algorithm Based on Invisible Characters).
计算机科学, 2016

2015
Towards robust conversational speech recognition and understanding.
PhD thesis, 2015

Deep Neural Networks for Single-Channel Multi-Talker Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2015

Discriminative Training Using Non-Uniform Criteria for Keyword Spotting on Spontaneous Speech.
IEEE ACM Trans. Audio Speech Lang. Process., 2015

2014
Latent semantic rational kernels for topic spotting on conversational speech.
IEEE ACM Trans. Audio Speech Lang. Process., 2014

Beyond cross-entropy: towards better frame-level objective functions for deep neural network training in automatic speech recognition.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Feature space maximum a posteriori linear regression for adaptation of deep neural networks.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Recurrent deep neural networks for robust speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2014

Single-channel mixed speech recognition using deep neural networks.
Proceedings of the IEEE International Conference on Acoustics, 2014

Deep learning vector quantization for acoustic information retrieval.
Proceedings of the IEEE International Conference on Acoustics, 2014

2013
Latent semantic rational kernels for topic spotting on spontaneous conversational speech.
Proceedings of the IEEE International Conference on Acoustics, 2013

Adaptive boosted non-uniform mce for keyword spotting on spontaneous speech.
Proceedings of the IEEE International Conference on Acoustics, 2013

2012
Discriminative Training Using Non-uniform Criteria for Keyword Spotting on Spontaneous Speech.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

A comparative study of discriminative training using non-uniform criteria for cross-layer acoustic modeling.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

2011
Recent development of discriminative training using non-uniform criteria for cross-level acoustic modeling.
Proceedings of the IEEE International Conference on Acoustics, 2011


  Loading...