Xin Wang

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

A Study of Child Speech Extraction Using Joint Speech Enhancement and Separation in Realistic Conditions.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Zero-Shot Multi-Speaker Text-To-Speech with State-Of-The-Art Neural Speaker Embeddings.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

Transformation of low-quality device-recorded speech to high-quality speech using improved SEGAN model.

[BibT_eX]

[DOI]

CoRR, 2019

The ASVspoof 2019 database.

[BibT_eX]

[DOI]

CoRR, 2019

Initial investigation of an encoder-decoder end-to-end TTS framework using marginalization of monotonic hard latent alignments.

[BibT_eX]

[DOI]

Yusuke Yasuda

CoRR, 2019

Initial investigation of encoder-decoder end-to-end TTS using marginalization of monotonic hard alignments.

[BibT_eX]

[DOI]

Yusuke Yasuda

Proceedings of the 10th ISCA Speech Synthesis Workshop, 2019

Rakugo speech synthesis using segment-to-segment neural transduction and style tokens - toward speech synthesis for entertaining audiences.

[BibT_eX]

[DOI]

Proceedings of the 10th ISCA Speech Synthesis Workshop, 2019

Speaker Anonymization Using X-vector and Neural Waveform Models.

[BibT_eX]

[DOI]

Jean-François Bonastre

Proceedings of the 10th ISCA Speech Synthesis Workshop, 2019

Neural Harmonic-plus-Noise Waveform Model with Trainable Maximum Voice Frequency for Text-to-Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 10th ISCA Speech Synthesis Workshop, 2019

ASVspoof 2019: Future Horizons in Spoofed and Fake Audio Detection.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Training Multi-Speaker Neural Text-to-Speech Systems Using Speaker-Imbalanced Speech Corpora.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

MOSNet: Deep Learning-Based Objective Assessment for Voice Conversion.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Joint Training Framework for Text-to-Speech and Voice Conversion Using Multi-Source Tacotron and WaveNet.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Investigation of Enhanced Tacotron Text-to-speech Synthesis Systems with Self-attention for Pitch Accent Language.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Neural Source-filter-based Waveform Model for Statistical Parametric Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

STFT Spectral Loss for Training a Neural Speech Waveform Model.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Audiovisual Speaker Conversion: Jointly and Simultaneously Transforming Facial Expression and Acoustic Characteristics.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

2018

Fundamental Frequency Modeling for Neural-Network-Based Statistical Parametric Speech Synthesis.

[BibT_eX]

[DOI]

PhD thesis, 2018

Autoregressive Neural F0 Model for Statistical Parametric Speech Synthesis.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2018

Investigating very deep highway networks for parametric speech synthesis.

[BibT_eX]

[DOI]

Speech Commun., 2018

Deep Encoder-Decoder Models for Unsupervised Learning of Controllable Speech Synthesis.

[BibT_eX]

[DOI]

Gustav Eje Henter

CoRR, 2018

Can we steal your vocal identity from the Internet?: Initial investigation of cloning Obama's voice using GAN, WaveNet and low-quality found data.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2018: The Speaker and Language Recognition Workshop, 2018

A Progressive Deep Learning Approach to Child Speech Separation.

[BibT_eX]

[DOI]

Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

Investigating Accuracy of Pitch-accent Annotations in Neural Network-based Speech Synthesis and Denoising Effects.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

A Comparison of Recent Waveform Generation and Acoustic Modeling Methods for Neural-Network-Based Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Speech Waveform Synthesis from MFCC Sequences with Generative Adversarial Networks.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Cyborg Speech: Deep Multilingual Speech Synthesis for Generating Segmental Foreign Accent with Natural Prosody.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017

An RNN-Based Quantized F0 Model with Multi-Tier Feedback Links for Text-to-Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Principles for Learning Controllable TTS from Annotated and Latent Variation.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

An autoregressive recurrent mixture density network for parametric speech synthesis.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

A maximum likelihood approach to deep neural network based speech dereverberation.

[BibT_eX]

[DOI]

Jun Du

Yannan Wang

Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

2016

Investigation of Using Continuous Representation of Various Linguistic Units in Neural Network Based Text-to-Speech Synthesis.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2016

Concept-to-Speech generation with knowledge sharing for acoustic modelling and utterance filtering.

[BibT_eX]

[DOI]

Li-Rong Dai

Comput. Speech Lang., 2016

A Comparative Study of the Performance of HMM, DNN, and RNN based Speech Synthesis Systems Trained on Very Large Speaker-Dependent Corpora.

[BibT_eX]

[DOI]

Cassia Valentini-Botinhao

Proceedings of the 9th ISCA Speech Synthesis Workshop, 2016

Investigating RNN-based speech enhancement methods for noise-robust Text-to-Speech.

[BibT_eX]

[DOI]

Proceedings of the 9th ISCA Speech Synthesis Workshop, 2016

Enhance the Word Vector with Prosodic Information for the Recurrent Neural Network Based TTS System.

[BibT_eX]

[DOI]

Cassia Valentini-Botinhao

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Speech Enhancement for a Noise-Robust Text-to-Speech Synthesis System Using Deep Recurrent Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Using Text and Acoustic Features in Predicting Glottal Excitation Waveforms for Parametric Speech Synthesis with Recurrent Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

A full training framework of cross-stream dependence modelling for HMM-based singing voice synthesis.

[BibT_eX]

[DOI]

Minghui Dong

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

The NII speech synthesis entry for Blizzard Challenge 2016.

[BibT_eX]

[DOI]

Proceedings of the Blizzard Challenge 2016, Cuppertino, CA, USA, September 16, 2016, 2016

2014

Concept-to-speech generation by integrating syntagmatic features into HMM-based speech synthesis.

[BibT_eX]

[DOI]

Li-Rong Dai

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

2013

An anisotropic diffusion filter based on multidirectional separability.

[BibT_eX]

[DOI]

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

2012

Cross-stream dependency modeling using continuous F0 model for HMM-based speech synthesis.

[BibT_eX]

[DOI]