Zhizheng Wu

Proceedings of the 14th IEEE International Symposium on Chinese Spoken Language Processing, 2024

NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

ADVSV: An Over-the-Air Adversarial Attack Dataset for Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

An Initial Investigation of Neural Replay Simulator for Over-The-Air Adversarial Perturbations to Automatic Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Multi-Scale Sub-Band Constant-Q Transform Discriminator for High-Fidelity Vocoder.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

2023

Optimization of Cross-Lingual Voice Conversion With Linguistics Losses to Reduce Foreign Accents.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2023

TTS-Guided Training for Accent Conversion Without Parallel Data.

[BibT_eX]

[DOI]

IEEE Signal Process. Lett., 2023

Towards Zero-Shot Multi-Speaker Multi-Accent Text-to-Speech Synthesis.

[BibT_eX]

[DOI]

IEEE Signal Process. Lett., 2023

Amphion: An Open-Source Audio, Music and Speech Generation Toolkit.

[BibT_eX]

[DOI]

CoRR, 2023

Leveraging Content-based Features from Multiple Acoustic Models for Singing Voice Conversion.

[BibT_eX]

[DOI]

CoRR, 2023

Audio compression-assisted feature extraction for voice replay attack detection.

[BibT_eX]

[DOI]

CoRR, 2023

AUDIT: Audio Editing by Following Instructions with Latent Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

PIAVE: A Pose-Invariant Audio-Visual Speaker Extraction Network.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Zero-shot multi-speaker accent TTS with limited accent data.

[BibT_eX]

[DOI]

Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2023

2022

Audio Splicing Localization: Can We Accurately Locate the Splicing Tampering?

[BibT_eX]

[DOI]

Zhiping Zeng

Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

2021

Cross-Lingual Voice Conversion with a Cycle Consistency Loss on Linguistic Representation.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

2019

Building a Mixed-Lingual Neural TTS System with Only Monolingual Data.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

The Blizzard Challenge 2019.

[BibT_eX]

[DOI]

Zhihang Xie

Proceedings of the Blizzard Challenge 2019, Vienna, Austria, September 23, 2019, 2019

2017

An Exemplar-Based Approach to Frequency Warping for Voice Conversion.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2017

Deep Feature Engineering for Noise Robust Spoofing Detection.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2017

ASVspoof: The Automatic Speaker Verification Spoofing and Countermeasures Challenge.

[BibT_eX]

[DOI]

IEEE J. Sel. Top. Signal Process., 2017

Siri On-Device Deep Learning-Guided Unit Selection Text-to-Speech System.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

2016

Anti-Spoofing for Text-Independent Speaker Verification: An Initial Database, Comparison of Countermeasures, and Human Performance.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2016

Improving Trajectory Modelling for DNN-Based Speech Synthesis by Using Stacked Bottleneck Features and Minimum Generation Error Training.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2016

Synthetic speech detection using phase information.

[BibT_eX]

[DOI]

Speech Commun., 2016

On the study of replay and voice conversion attacks to text-dependent speaker verification.

[BibT_eX]

[DOI]

Multim. Tools Appl., 2016

Improving Trajectory Modelling for DNN-based Speech Synthesis by using Stacked Bottleneck Features and Minimum Trajectory Error Training.

[BibT_eX]

[DOI]

CoRR, 2016

Investigating gated recurrent neural networks for speech synthesis.

[BibT_eX]

[DOI]

CoRR, 2016

Spoofing detection under noisy conditions: a preliminary investigation and an initial database.

[BibT_eX]

[DOI]

CoRR, 2016

Merlin: An Open Source Neural Network Speech Synthesis System.

[BibT_eX]

[DOI]

Oliver Watts

Proceedings of the 9th ISCA Speech Synthesis Workshop, 2016

Multidimensional scaling of systems in the Voice Conversion Challenge 2016.

[BibT_eX]

[DOI]

Proceedings of the 9th ISCA Speech Synthesis Workshop, 2016

A Demonstration of the Merlin Open Source Neural Network Speech Synthesis System.

[BibT_eX]

[DOI]

Proceedings of the 9th ISCA Speech Synthesis Workshop, 2016

On the impact of phoneme alignment in DNN-based speech synthesis.

[BibT_eX]

[DOI]

Mei Li

Proceedings of the 9th ISCA Speech Synthesis Workshop, 2016

Analysis of the Voice Conversion Challenge 2016 Evaluation Results.

[BibT_eX]

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

The Voice Conversion Challenge 2016.

[BibT_eX]

[DOI]

Tomoki Toda

Ling-Hui Chen

Daisuke Saito

Fernando Villavicencio

Cassia Valentini-Botinhao

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

An Investigation of Spoofing Speech Detection Under Additive Noise and Reverberant Conditions.

[BibT_eX]

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

A Template-Based Approach for Speech Synthesis Intonation Generation Using LSTMs.

[BibT_eX]

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Waveform Generation Based on Signal Reshaping for Statistical Parametric Speech Synthesis.

[BibT_eX]

[DOI]

Felipe Espic

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

GlottDNN - A Full-Band Glottal Vocoder for Statistical Parametric Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Investigating gated recurrent networks for speech synthesis.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

From HMMS to DNNS: Where do the improvements come from?

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Spoofing detection from a feature representation perspective.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Deep neural network-guided unit selection synthesis.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Robust TTS duration modelling using DNNS.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

The CSTR entry to the Blizzard Challenge 2016.

[BibT_eX]

[DOI]

Proceedings of the Blizzard Challenge 2016, Cuppertino, CA, USA, September 16, 2016, 2016

On the training of DNN-based average voice model for speech synthesis.

[BibT_eX]

[DOI]

Shan Yang

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016

On the use of I-vectors and average voice model for voice conversion without parallel data.

[BibT_eX]

[DOI]

Jie Wu

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016

Predicting articulatory movement from text using deep architecture with stacked bottleneck features.

[BibT_eX]

[DOI]

Zhen Wei

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016

2015

Anti-spoofing, Voice Conversion.

[BibT_eX]

[DOI]

Proceedings of the Encyclopedia of Biometrics, Second Edition, 2015

Anti-spoofing, Voice Databases.

[BibT_eX]

[DOI]

Proceedings of the Encyclopedia of Biometrics, Second Edition, 2015

Spectral mapping for voice conversion

[BibT_eX]

[DOI]

PhD thesis, 2015

Joint Speaker Verification and Antispoofing in the i-Vector Space.

[BibT_eX]

[DOI]

IEEE Trans. Inf. Forensics Secur., 2015

Spoofing and countermeasures for speaker verification: A survey.

[BibT_eX]

[DOI]

Speech Commun., 2015

Exemplar-based voice conversion using joint nonnegative matrix factorization.

[BibT_eX]

[DOI]

Engsiong Chng

Multim. Tools Appl., 2015

A study of speaker adaptation for DNN-based speech synthesis.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

ASVspoof 2015: the first automatic speaker verification spoofing and countermeasures challenge.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Automatic speaker verification spoofing and countermeasures (ASVspoof 2015): introductory talk by the organizers.

[BibT_eX]

[DOI]

Tomi Kinnunen

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Minimum trajectory error training for deep neural networks, combined with stacked bottleneck features.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Human vs machine spoofing detection on wideband and narrowband data.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Sentence-level control vectors for deep neural network speech synthesis.

[BibT_eX]

[DOI]

Oliver Watts

Cassia Valentini-Botinhao

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Towards minimum perceptual error training for DNN-based speech synthesis.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

System fusion for high-performance voice conversion.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Deep neural network context embeddings for model selection in rich-context HMM synthesis.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Fusion of multiple parameterisations for DNN-based sinusoidal speech synthesis with multi-task learning.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Deep neural networks employing Multi-Task Learning and stacked bottleneck features for speech synthesis.

[BibT_eX]

[DOI]

Cassia Valentini-Botinhao

Oliver Watts

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

SAS: A speaker verification spoofing database containing diverse attacks.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Sparse representation for frequency warping based voice conversion.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

2014

Speaker Recognition Anti-spoofing.

[BibT_eX]

[DOI]

Proceedings of the Handbook of Biometric Anti-Spoofing, 2014

Exemplar-Based Sparse Representation With Residual Compensation for Voice Conversion.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2014

Correlation-based frequency warping for voice conversion.

[BibT_eX]

[DOI]

Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

Joint nonnegative matrix factorization for exemplar-based voice conversion.

[BibT_eX]

[DOI]

Chng Eng Siong

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

A comparative study of spectral transformation techniques for singing voice synthesis.

[BibT_eX]

[DOI]

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Introducing i-vectors for joint anti-spoofing and speaker verification.

[BibT_eX]

[DOI]

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

A study on replay attack and anti-spoofing for text-dependent speaker verification.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2014

2013

Exemplar-based voice conversion using non-negative spectrogram deconvolution.

[BibT_eX]

[DOI]

Proceedings of the Eighth ISCA Tutorial and Research Workshop on Speech Synthesis, 2013

Exemplar-based unit selection for voice conversion utilizing temporal information.

[BibT_eX]

[DOI]

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Vulnerability evaluation of speaker verification under voice conversion spoofing: the effect of text constraints.

[BibT_eX]

[DOI]

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Synthetic speech detection using temporal modulation feature.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2013

Conditional restricted Boltzmann machine for voice conversion.

[BibT_eX]

[DOI]

Engsiong Chng

Proceedings of the 2013 IEEE China Summit and International Conference on Signal and Information Processing, 2013

Voice conversion and spoofing attack on speaker verification systems.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2013

Local partial least square regression for spectral mapping in voice conversion.

[BibT_eX]

[DOI]

Xiaohai Tian

Engsiong Chng

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2013

2012

Mixture of Factor Analyzers Using Priors From Non-Parallel Speech for Voice Conversion.

[BibT_eX]

[DOI]

IEEE Signal Process. Lett., 2012

Detecting Converted Speech and Natural Speech for anti-Spoofing Attack in Speaker Recognition.

[BibT_eX]

[DOI]

Chng Eng Siong

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Vulnerability of speaker verification systems against voice conversion spoofing attacks: The case of telephone speech.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

A study on spoofing attack in state-of-the-art speaker verification: the telephone speech case.

[BibT_eX]

[DOI]

Eliathamby Ambikairajah

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2012

2011

Improved Prosody Generation by Maximizing Joint Probability of State and Longer Units.

[BibT_eX]

[DOI]

IEEE Trans. Speech Audio Process., 2011

2010

Automatic prosody prediction and detection with Conditional Random Field (CRF) models.

[BibT_eX]

[DOI]

Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010

Text-independent F0 transformation with non-parallel data for voice conversion.

[BibT_eX]

[DOI]

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

2009

A minimum v/u error approach to F0 generation in HMM-based TTS.

[BibT_eX]

[DOI]

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Improved prosody generation by maximizing joint likelihood of state and longer units.

[BibT_eX]

[DOI]

Yao Qian