Shinsuke Sakai

According to our database1, Shinsuke Sakai authored at least 55 papers between 1990 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.



In proceedings 
PhD thesis 




Efficient and Robust Long-Form Speech Recognition with Hybrid H3-Conformer.
CoRR, 2024

Distilling the Knowledge of BERT for CTC-based ASR.
CoRR, 2022

Non-autoregressive Error Correction for CTC-based ASR with Phone-conditioned Masked LM.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Data Augmentation for ASR Using TTS Via a Discrete Representation.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

ASR Rescoring and Confidence Estimation with Electra.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

An End-To-End Model from Speech to Clean Transcript for Parliamentary Meetings.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

Speech Corpus of Ainu Folklore and End-to-end Speech Recognition for Ainu Language.
Proceedings of The 12th Language Resources and Evaluation Conference, 2020

Generative Adversarial Training Data Adaptation for Very Low-Resource Automatic Speech Recognition.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Distilling the Knowledge of BERT for Sequence-to-Sequence ASR.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Multi-speaker Sequence-to-sequence Speech Synthesis for Data Augmentation in Acoustic-to-word Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2019

Leveraging Sequence-to-Sequence Speech Synthesis for Enhancing Acoustic-to-Word Speech Recognition.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Improving OOV Detection and Resolution with External Language Models in Acoustic-to-Word ASR.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Encoder Transfer for Attention-based Acoustic-to-word Speech Recognition.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Forward-Backward Attention Decoder.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Combined Multi-Channel NMF-Based Robust Beamforming for Noisy Speech Recognition.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Semi-supervised ensemble DNN acoustic model training.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Cross-domain speech recognition using nonparallel corpora with cycle-consistent adversarial networks.
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

Joint Optimization of Denoising Autoencoder and DNN Acoustic Model Based on Multi-Target Learning for Noisy Speech Recognition.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Reverberant speech recognition combining deep neural networks and deep autoencoders augmented with a phone-class feature.
EURASIP J. Adv. Signal Process., 2015

Speech dereverberation using long short-term memory.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Deep autoencoders augmented with phone-class feature for reverberant speech recognition.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Exploring deep neural networks and deep autoencoders in reverberant speech recognition.
Proceedings of the 4th Joint Workshop on Hands-free Speech Communication and Microphone Arrays, 2014

Admissible Stopping in Viterbi Beam Search for Unit Selection Speech Synthesis.
IEICE Trans. Inf. Syst., 2013

A-STAR: Toward translating Asian spoken languages.
Comput. Speech Lang., 2013

Probabilistic Concatenation Modeling for Corpus-Based Speech Synthesis.
IEICE Trans. Inf. Syst., 2011

A sampling-based environment population projection approach for rapid acoustic model adaptation.
Proceedings of the IEEE International Conference on Acoustics, 2011

Improved training of excitation for HMM-based parametric speech synthesis.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

NICT Blizzard Challenge 2010 Entry.
Proceedings of the Blizzard Challenge 2010, Kansai Science City, Japan, September 25, 2010, 2010

Hyperbolic structure of fundamental frequency contour.
Proceedings of the 3rd International Universal Communication Symposium, 2009

A close look into the probabilistic concatenation model for corpus-based speech synthesis.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

A decision tree-based clustering approach to state definition in an excitation modeling framework for HMM-based speech synthesis.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Optimal learning of P-Layer additive F0 models with cross-validation.
Proceedings of the IEEE International Conference on Acoustics, 2009

CART-based modeling of Chinese tonal patterns with a functional model tracing the fundamental frequency trajectories.
Proceedings of the IEEE International Conference on Acoustics, 2009

The NICT Entry for the Blizzard Challenge 2009: an Enhanced HMM-based Speech Synthesis System with Trajectory Training considering Global Variance and State-Dependent Mixed Excitation.
Proceedings of the Blizzard Challenge 2009, Edinburgh, Scotland, UK, September 4, 2009, 2009

Prosody Modeling from Tone to Intonation in Chinese using a Functional F0 Model.
Proceedings of the ISUC 2008, 2008

Simultaneous Acoustic, Prosodic, and Phrasing Model Training for TTs Conversion Systems.
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008

Frequency Modulation Technique for Prosodic Modification.
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008

Development of Indonesian Large Vocabulary Continuous Speech Recognition System within A-STAR Project.
Proceedings of the Third International Joint Conference on Natural Language Processing, 2008

The NICT/ATR speech synthesis system for the Blizzard Challenge 2008.
Proceedings of the Blizzard Challenge 2008, 2008

Communicative speech synthesis with XIMERA: a first step.
Proceedings of the Sixth ISCA Workshop on Speech Synthesis, 2007

ATRECSS - ATR English speech corpus for speech synthesis.
Proceedings of the Evaluation of text-to-speech systems: Blizzard Challenge 2007, 2007

Decision tree-based training of probabilistic concatenation models for corpus-based speech synthesis.
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

Fundamental Frequency Modeling for Speech Synthesis Based on a Statistical Learning Technique.
IEICE Trans. Inf. Syst., 2005

A probabilistic approach to unit selection for corpus-based speech synthesis.
Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

Additive Modeling of English F0 Contour for Speech Synthesis.
Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

F0 modeling with multi-layer additive modeling based on a statistical learning technique.
Proceedings of the Fifth ISCA ITRW on Speech Synthesis, 2004

An automatic interpretation system for travel conversation.
Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

Continuous speech recognition with parse filtering.
Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

Multilingual spoken-language understanding in the MIT Voyager system.
Speech Commun., 1995

An automatic voice dialing system developed on PC speech i/o platform.
Proceedings of the 3rd International Conference on Spoken Language Processing, 1994

A Bilingual VOYAGER System.
Proceedings of the Human Language Technology: Proceedings of a Workshop Held at Plainsboro, 1993

J-SUMMIT: Japanese spontaneous speech recognition.
Proceedings of the Third European Conference on Speech Communication and Technology, 1993

A bilingual Voyager system.
Proceedings of the Third European Conference on Speech Communication and Technology, 1993

J-SUMMIT: a Japanese segment-based speech recognition system.
Proceedings of the Second International Conference on Spoken Language Processing, 1992

From interlingua to speech: generating prosodic information from conceptual representation.
Proceedings of the 1990 International Conference on Acoustics, 1990
