Shinji Takaki

Orcid: 0000-0001-7294-7699

According to our database1, Shinji Takaki authored at least 52 papers between 2010 and 2023.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.




In proceedings 
PhD thesis 




Embedding a Differentiable Mel-Cepstral Synthesis Filter to a Neural Speech Synthesis System.
Proceedings of the IEEE International Conference on Acoustics, 2023

PeriodNet: A Non-Autoregressive Raw Waveform Generative Model With a Structure Separating Periodic and Aperiodic Components.
IEEE Access, 2021

Periodnet: A Non-Autoregressive Waveform Generation Model with a Structure Separating Periodic and Aperiodic Components.
Proceedings of the IEEE International Conference on Acoustics, 2021

A Vector Quantized Variational Autoencoder (VQ-VAE) Autoregressive Neural F<sub>0</sub> Model for Statistical Parametric Speech Synthesis.
IEEE ACM Trans. Audio Speech Lang. Process., 2020

Neural Source-Filter Waveform Models for Statistical Parametric Speech Synthesis.
IEEE ACM Trans. Audio Speech Lang. Process., 2020

Modeling of Rakugo Speech and Its Limitations: Toward Speech Synthesis That Entertains Audiences.
IEEE Access, 2020

Fast and High-Quality Singing Voice Synthesis System Based on Convolutional Neural Networks.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Semi-Supervised Learning Based on Hierarchical Generative Models for End-to-End Speech Synthesis.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Complex-Valued Restricted Boltzmann Machine for Speaker-Dependent Speech Parameterization From Complex Spectra.
IEEE ACM Trans. Audio Speech Lang. Process., 2019

Transformation of low-quality device-recorded speech to high-quality speech using improved SEGAN model.
CoRR, 2019

Training a Neural Speech Waveform Model using Spectral Losses of Short-Time Fourier Transform and Continuous Wavelet Transform.
CoRR, 2019

Rakugo speech synthesis using segment-to-segment neural transduction and style tokens - toward speech synthesis for entertaining audiences.
Proceedings of the 10th ISCA Speech Synthesis Workshop, 2019

Does the Lombard Effect Improve Emotional Communication in Noise? - Analysis of Emotional Speech Acted in Noise.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Investigation of Enhanced Tacotron Text-to-speech Synthesis Systems with Self-attention for Pitch Accent Language.
Proceedings of the IEEE International Conference on Acoustics, 2019

Neural Source-filter-based Waveform Model for Statistical Parametric Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2019

STFT Spectral Loss for Training a Neural Speech Waveform Model.
Proceedings of the IEEE International Conference on Acoustics, 2019

Autoregressive Neural F0 Model for Statistical Parametric Speech Synthesis.
IEEE ACM Trans. Audio Speech Lang. Process., 2018

Investigating very deep highway networks for parametric speech synthesis.
Speech Commun., 2018

Investigating different representations for modeling and controlling multiple emotions in DNN-based speech synthesis.
Speech Commun., 2018

Complex-Valued Restricted Boltzmann Machine for Direct Speech Parameterization from Complex Spectra.
CoRR, 2018

Wasserstein GAN and Waveform Loss-Based Acoustic Model Training for Multi-Speaker Text-to-Speech Synthesis Systems Using a WaveNet Vocoder.
IEEE Access, 2018

Bidirectional Voice Conversion Based on Joint Training Using Gaussian-Gaussian Deep Relational Model.
Proceedings of the Odyssey 2018: The Speaker and Language Recognition Workshop, 2018

A Comparison of Recent Waveform Generation and Acoustic Modeling Methods for Neural-Network-Based Speech Synthesis.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Unsupervised Speaker Adaptation for DNN-based Speech Synthesis using Input Codes.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018

An RNN-Based Quantized F0 Model with Multi-Tier Feedback Links for Text-to-Speech Synthesis.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Direct Modeling of Frequency Spectra and Waveform Generation Based on Phase Recovery for DNN-Based Speech Synthesis.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Complex-Valued Restricted Boltzmann Machine for Direct Learning of Frequency Spectra.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Generative Adversarial Network-Based Postfilter for STFT Spectrograms.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

An autoregressive recurrent mixture density network for parametric speech synthesis.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Adapting and controlling DNN-based speech synthesis using input codes.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Constructing a Deep Neural Network Based Spectral Model for Statistical Speech Synthesis.
Proceedings of the Recent Advances in Nonlinear Speech Processing, 2016

Investigation of Using Continuous Representation of Various Linguistic Units in Neural Network Based Text-to-Speech Synthesis.
IEICE Trans. Inf. Syst., 2016

A Comparative Study of the Performance of HMM, DNN, and RNN based Speech Synthesis Systems Trained on Very Large Speaker-Dependent Corpora.
Proceedings of the 9th ISCA Speech Synthesis Workshop, 2016

Investigating RNN-based speech enhancement methods for noise-robust Text-to-Speech.
Proceedings of the 9th ISCA Speech Synthesis Workshop, 2016

Speaker Adaptation of Various Components in Deep Neural Network based Speech Synthesis.
Proceedings of the 9th ISCA Speech Synthesis Workshop, 2016

Enhance the Word Vector with Prosodic Information for the Recurrent Neural Network Based TTS System.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Speech Enhancement for a Noise-Robust Text-to-Speech Synthesis System Using Deep Recurrent Neural Networks.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Using Text and Acoustic Features in Predicting Glottal Excitation Waveforms for Parametric Speech Synthesis with Recurrent Neural Networks.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

A deep auto-encoder based low-dimensional feature extraction from FFT spectral envelopes for statistical parametric speech synthesis.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

The NII speech synthesis entry for Blizzard Challenge 2016.
Proceedings of the Blizzard Challenge 2016, Cuppertino, CA, USA, September 16, 2016, 2016

Deep Denoising Auto-encoder for Statistical Speech Synthesis.
CoRR, 2015

Characteristics of Polyphonic Music Style and Markov Model of Pitch-Class Intervals.
Proceedings of the Mathematics and Computation in Music - 5th International Conference, 2015

Multiple feed-forward deep neural networks for statistical parametric speech synthesis.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Contextual Additive Structure for HMM-Based Speech Synthesis.
IEEE J. Sel. Top. Signal Process., 2014

Overview of NITECH HMM-based text-to-speech system for Blizzard Challenge 2014.
Proceedings of the Blizzard Challenge 2014, Singapore, Singapore, September 19, 2014, 2014

Contextual partial additive structure for HMM-based speech synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2013

Separable lattice 2-D HMMS introducing state duration control for recognition of images with various variations.
Proceedings of the IEEE International Conference on Acoustics, 2013

Overview of NITECH HMM-based speech synthesis system for Blizzard Challenge 2013.
Proceedings of the Blizzard Challenge 2013, 2013

Overview of NIT HMM-based speech synthesis system for Blizzard Challenge 2012.
Proceedings of the Blizzard Challenge 2012, Portland, OR, USA, September 14, 2012, 2012

An optimization algorithm of independent mean and variance parameter tying structures for HMM-based speech synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2011

Overview of NIT HMM-based speech synthesis system for Blizzard Challenge 2011.
Proceedings of the Blizzard Challenge 2011, Turin, Italy, September 2, 2011, 2011

Spectral modeling with contextual additive structure for HMM-based speech synthesis.
Proceedings of the Seventh ISCA Tutorial and Research Workshop on Speech Synthesis, 2010
