Heiga Zen

Michelle Tadmor Ramanovich

Proceedings of the IEEE International Conference on Acoustics, 2024

2023

Twenty-Five Years of Evolution in Speech and Language Processing.

[BibT_eX]

[DOI]

IEEE Signal Process. Mag., July, 2023

Extracting representative subset from extensive text data for training pre-trained language models.

[BibT_eX]

[DOI]

Jun Suzuki

Michelle Tadmor Ramanovich

Hideto Kazawa

Inf. Process. Manag., May, 2023

Guest Editorial: Special Issue on Affective Speech and Language Synthesis, Generation, and Conversion.

[BibT_eX]

[DOI]

IEEE Trans. Affect. Comput., 2023

Miipher: A Robust Speech Restoration Model Integrating Self-Supervised Speech and Text Representations.

[BibT_eX]

[DOI]

Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2023

FiPPiE: A Computationally Efficient Differentiable method for Estimating Fundamental Frequency From Spectrograms.

[BibT_eX]

[DOI]

Proceedings of the 12th ISCA Speech Synthesis Workshop, 2023

LibriTTS-R: A Restored Multi-Speaker Text-to-Speech Corpus.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Lightweight, Multi-Speaker, Multi-Lingual Indic Text-to-Speech.

[BibT_eX]

[DOI]

Mark Hasegawa-Johnson

Philipp Olbrich

Proceedings of the IEEE International Conference on Acoustics, 2023

Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-to-Speech.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

SayTap: Language to Quadrupedal Locomotion.

[BibT_eX]

[DOI]

Proceedings of the Conference on Robot Learning, 2023

2022

Residual Adapters for Few-Shot Text-to-Speech Speaker Adaptation.

[BibT_eX]

[DOI]

CoRR, 2022

Wavefit: an Iterative and Non-Autoregressive Neural Vocoder Based on Fixed-Point Iteration.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

CVSS Corpus and Massively Multilingual Speech-to-Speech Translation.

[BibT_eX]

[DOI]

Ye Jia

Quan Wang

Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022

SpecGrad: Diffusion Probabilistic Model based Neural Vocoder with Adaptive Noise Spectral Shaping.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Training Text-To-Speech Systems From Synthetic Data: A Practical Approach For Accent Transfer Tasks.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

MAESTRO: Matched Speech Text Representations through Modality Matching.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

2021

PnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTS.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Semi-Supervision in ASR: Sequential MixMatch and Factorized TTS-Based Augmentation.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

WaveGrad: Estimating Gradients for Waveform Generation.

[BibT_eX]

[DOI]

Proceedings of the 9th International Conference on Learning Representations, 2021

Parallel Tacotron: Non-Autoregressive and Controllable TTS.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

2020

Non-Attentive Tacotron: Robust and Controllable Neural TTS Synthesis Including Unsupervised Duration Modeling.

[BibT_eX]

[DOI]

CoRR, 2020

Generating diverse and natural text-to-speech samples using a quantized fine-grained VAE and auto-regressive prosody prior.

[BibT_eX]

[DOI]

CoRR, 2020

Fully-Hierarchical Fine-Grained Prosody Modeling For Interpretable Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Generating Diverse and Natural Text-to-Speech Samples Using a Quantized Fine-Grained VAE and Autoregressive Prosody Prior.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

Speech Processing for Digital Home Assistants: Combining signal processing with deep-learning techniques.

[BibT_eX]

[DOI]

IEEE Signal Process. Mag., 2019

Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling.

[BibT_eX]

[DOI]

CoRR, 2019

Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Hierarchical Generative Modeling for Controllable Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 7th International Conference on Learning Representations, 2019

Sample Efficient Adaptive Text-to-Speech.

[BibT_eX]

[DOI]

Proceedings of the 7th International Conference on Learning Representations, 2019

2018

Sequence-to-sequence Neural Network Model with 2D Attention for Learning Japanese Pitch Accents.

[BibT_eX]

[DOI]

Antoine Bruguier

Arkady Arkhangorodsky

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Parallel WaveNet: Fast High-Fidelity Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 35th International Conference on Machine Learning, 2018

[Invited] Generative Model-Based Text-to-Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the IEEE 7th Global Conference on Consumer Electronics, 2018

2017

Speech Research at Google to Enable Universal Speech Interfaces.

[BibT_eX]

[DOI]

Proceedings of the New Era for Robust Speech Recognition, Exploiting Deep Learning., 2017

2016

WaveNet: A Generative Model for Raw Audio.

[BibT_eX]

[DOI]

Proceedings of the 9th ISCA Speech Synthesis Workshop, 2016

Using instantaneous frequency and aperiodicity detection to estimate F0 for high-quality speech synthesis.

[BibT_eX]

[DOI]

Hideki Kawahara

Yannis Agiomyrgiannakis

Proceedings of the 9th ISCA Speech Synthesis Workshop, 2016

Fast, Compact, and High Quality LSTM-RNN Based Statistical Parametric Speech Synthesizers for Mobile Devices.

[BibT_eX]

[DOI]

Yannis Agiomyrgiannakis

Niels Egberts

Fergus Henderson

Przemyslaw Szczepaniak

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Multi-Language Multi-Speaker Acoustic Modeling for LSTM-RNN Based Statistical Parametric Speech Synthesis.

[BibT_eX]

[DOI]

Bo Li

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Directly modeling voiced and unvoiced components in speech waveforms by neural networks.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

2015

Deep Learning for Acoustic Modeling in Parametric Speech Generation: A systematic review of existing techniques and future trends.

[BibT_eX]

[DOI]

IEEE Signal Process. Mag., 2015

Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis.

[BibT_eX]

[DOI]

Hasim Sak

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Directly modeling speech waveforms by neural networks for statistical parametric speech synthesis.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

2014

Deep mixture density networks for acoustic modeling in statistical parametric speech synthesis.

[BibT_eX]

[DOI]

Andrew W. Senior

Proceedings of the IEEE International Conference on Acoustics, 2014

2013

Autoregressive Models for Statistical Parametric Speech Synthesis.

[BibT_eX]

[DOI]

Matt Shannon

William Byrne

IEEE Trans. Speech Audio Process., 2013

Speech Synthesis Based on Hidden Markov Models.

[BibT_eX]

[DOI]

Proc. IEEE, 2013

Deep learning in speech synthesis.

[BibT_eX]

[DOI]

Proceedings of the Eighth ISCA Tutorial and Research Workshop on Speech Synthesis, 2013

Statistical parametric speech synthesis using deep neural networks.

[BibT_eX]

[DOI]

Andrew W. Senior

Mike Schuster

Proceedings of the IEEE International Conference on Acoustics, 2013

2012

Product of Experts for Statistical Parametric Speech Synthesis.

[BibT_eX]

[DOI]

IEEE Trans. Speech Audio Process., 2012

Statistical Parametric Speech Synthesis Based on Speaker and Language Factorization.

[BibT_eX]

[DOI]

Cassia Valentini-Botinhao

Norbert Braunschweiler

IEEE Trans. Speech Audio Process., 2012

Combining multiple high quality corpora for improving HMM-TTS.

[BibT_eX]

[DOI]

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Cepstral analysis based on the glimpse proportion measure for improving the intelligibility of HMM-based synthetic speech in noise.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

2011

Continuous Stochastic Feature Mapping Based on Trajectory HMMs.

[BibT_eX]

[DOI]

IEEE Trans. Speech Audio Process., 2011

Context adaptive training with factorized decision trees for HMM-based statistical parametric speech synthesis.

[BibT_eX]

[DOI]

Speech Commun., 2011

Bayesian Context Clustering Using Cross Validation for Speech Recognition.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2011

The Effect of Using Normalized Models in Statistical Speech Synthesis.

[BibT_eX]

[DOI]

Matt Shannon

William J. Byrne

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Gaussian Process Experts for Voice Conversion.

[BibT_eX]

[DOI]

Nicholas Pilkington

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Multipulse Sequences for Residual Signal Modeling.

[BibT_eX]

[DOI]

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Estimation of Window Coefficients for Dynamic Feature Extraction for HMM-Based Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Decision tree-based context clustering based on cross validation and hierarchical priors.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2011

2010

A Covariance-Tying Technique for HMM-Based Speech Synthesis.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2010

HMM-based polyglot speech synthesis by speaker and language adaptive training.

[BibT_eX]

[DOI]

Norbert Braunschweiler

Proceedings of the Seventh ISCA Tutorial and Research Workshop on Speech Synthesis, 2010

Statistical parametric speech synthesis with joint estimation of acoustic and excitation model parameters.

[BibT_eX]

[DOI]

Ranniery Maia

Proceedings of the Seventh ISCA Tutorial and Research Workshop on Speech Synthesis, 2010

Speaker and language adaptive training for HMM-based polyglot speech synthesis.

[BibT_eX]

[DOI]

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Context adaptive training with factorized decision trees for HMM-based speech synthesis.

[BibT_eX]

[DOI]

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

An implementation of decision tree-based context clustering on graphics processing units.

[BibT_eX]

[DOI]

Nicholas Pilkington

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Training a parametric-based logF0 model with the minimum generation error criterion.

[BibT_eX]

[DOI]

Javier Latorre

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Statistical parametric speech synthesis based on product of experts.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2010

2009

Robust Speaker-Adaptive HMM-Based Text-to-Speech Synthesis.

[BibT_eX]

[DOI]

IEEE Trans. Speech Audio Process., 2009

Statistical parametric speech synthesis.

[BibT_eX]

[DOI]

Alan W. Black

Speech Commun., 2009

Context-dependent additive log f_0 model for HMM-based speech synthesis.

[BibT_eX]

[DOI]

Norbert Braunschweiler

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Tying covariance matrices to reduce the footprint of HMM-based speech synthesis systems.

[BibT_eX]

[DOI]

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Stereo-based stochastic noise compensation based on trajectory GMMS.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2009

A Bayesian approach to HMM-based speech synthesis.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2009

2008

The Nitech-NAIST HMM-Based Speech Synthesis System for the Blizzard Challenge 2006.

[BibT_eX]

[DOI]

Tomoki Toda

IEICE Trans. Inf. Syst., 2008

A Fully Consistent Hidden Semi-Markov Model-Based Speech Recognition System.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2008

Probabilistic feature mapping based on trajectory HMMs.

[BibT_eX]

[DOI]

Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Acoustic modeling based on model structure annealing for speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Unsupervised adaptation for HMM-based speech synthesis.

[BibT_eX]

[DOI]

Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Bayesian context clustering using cross valid prior distribution for HMM-based speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Performance evaluation of the speaker-independent HMM-based speech synthesis system "HTS 2007" for the Blizzard Challenge 2007.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2008

Acoustic modeling with contextual additive structure for HMM-based speech recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2008

The HTS-2008 System: Yet Another Evaluation of the Speaker-Adaptive HMM-based Speech Synthesis System in The 2008 Blizzard Challenge.

[BibT_eX]

[DOI]

Proceedings of the Blizzard Challenge 2008, 2008

2007

Details of the Nitech HMM-Based Speech Synthesis System for the Blizzard Challenge 2005.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2007

A Hidden Semi-Markov Model-Based Speech Synthesis System.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2007

State Duration Modeling for HMM-Based Speech Synthesis.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2007

Reformulating the HMM as a trajectory model by imposing explicit relationships between static and dynamic feature vector sequences.

[BibT_eX]

[DOI]

Comput. Speech Lang., 2007

The HMM-based speech synthesis system (HTS) version 2.0.

[BibT_eX]

[DOI]

Proceedings of the Sixth ISCA Workshop on Speech Synthesis, 2007

Improved average-voice-based speech synthesis using gender-mixed modeling and a parameter generation algorithm considering GV.

[BibT_eX]

[DOI]

Proceedings of the Sixth ISCA Workshop on Speech Synthesis, 2007

An excitation model for HMM-based speech synthesis based on residual modeling.

[BibT_eX]

[DOI]

Proceedings of the Sixth ISCA Workshop on Speech Synthesis, 2007

Model-space MLLR for trajectory HMMs.

[BibT_eX]

[DOI]

Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

A trainable excitation model for HMM-based speech synthesis.

[BibT_eX]

[DOI]

Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Speaker-independent HMM-based speech synthesis system - HTS-2007 system for the Blizzard Challenge 2007.

[BibT_eX]

[DOI]

Proceedings of the Evaluation of text-to-speech systems: Blizzard Challenge 2007, 2007

2006

Speaker adaptation of trajectory HMMs using feature-space MLLR.

[BibT_eX]

[DOI]

Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

An HMM-based singing voice synthesis system.

[BibT_eX]

[DOI]

Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

Estimating Trajectory Hmm Parameters Using Monte Carlo Em With Gibbs Sampler.

[BibT_eX]

[DOI]

Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

Hidden Semi-Markov Model Based Speech Recognition System using Weighted Finite-State Transducer.

[BibT_eX]

[DOI]

Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

2005

Simultaneous clustering of phonetic context, dimension, and state position for acoustic modeling using decision trees.

[BibT_eX]

[DOI]

Syst. Comput. Jpn., 2005

Continuous Speech Recognition Based on General Factor Dependent Acoustic Models.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2005

Applying Sparse KPCA for Feature Extraction in Speech Recognition.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2005

Deterministic Annealing EM Algorithm in Acoustic Modeling for Speaker and Speech Recognition.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2005

An overview of nitech HMM-based speech synthesis system for blizzard challenge 2005.

[BibT_eX]

[DOI]

Tomoki Toda

Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

On building a concatenative speech synthesis system from the blizzard challenge speech databases.

[BibT_eX]

[DOI]

Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

Sparse KPCA for Feature Extraction in Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

2004

On the Use of Kernel PCA for Feature Extraction in Speech Recognition.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2004

An introduction of trajectory model into HMM-based speech synthesis.

[BibT_eX]

[DOI]

Proceedings of the Fifth ISCA ITRW on Speech Synthesis, 2004

Hidden semi-Markov model based speech synthesis.

[BibT_eX]

[DOI]

Proceedings of the 8th International Conference on Spoken Language Processing, 2004

Constructing emotional speech synthesizers with limited speech database.

[BibT_eX]

[DOI]

Murtaza Bulut

Shrikanth S. Narayanan

Ryosuke Tsuzuki

Proceedings of the 8th International Conference on Spoken Language Processing, 2004

Deterministic annealing EM algorithm in parameter estimation for acoustic model.

[BibT_eX]

[DOI]

Proceedings of the 8th International Conference on Spoken Language Processing, 2004

A Viterbi algorithm for a trajectory model derived from HMM with explicit relationship between static and dynamic features.

[BibT_eX]

[DOI]

Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

2003

Decision tree-based simultaneous clustering of phonetic contexts, dimensions, and state positions for acoustic modeling.

[BibT_eX]

[DOI]

Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Trajectory modeling based on HMMs with the explicit relationship between static and dynamic features.

[BibT_eX]

[DOI]

Fernando Gil Vianna Resende Jr.

Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Towards the development of a brazilian portuguese text-to-speech system based on HMM.

[BibT_eX]

[DOI]

Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Speech recognition using voice-characteristic-dependent acoustic models.

[BibT_eX]

[DOI]

Proceedings of the 2003 IEEE International Conference on Acoustics, 2003

Improving the performance of HMM-based very low bit rate speech coding.

[BibT_eX]

[DOI]

Proceedings of the 2003 IEEE International Conference on Acoustics, 2003

2002

Decision tree distribution tying based on a dimensional split technique.

[BibT_eX]

[DOI]