Gakuto Kurata

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Global RNN Transducer Models For Multi-dialect Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Improving Generalization of Deep Neural Network Acoustic Models with Length Perturbation and N-best Based Label Smoothing.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

2021

Knowledge Distillation Leveraging Alternative Soft Targets from Non-Parallel Qualified Speech Data.

[BibT_eX]

[DOI]

Tohru Nagano

CoRR, 2021

Improving Customization of Neural Transducers by Mitigating Acoustic Mismatch of Synthesized Audio.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Generalized Knowledge Distillation from an Ensemble of Specialized Teachers Leveraging Unsupervised Neural Clustering.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

RNN Transducer Models for Spoken Language Understanding.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

2020

Knowledge Distillation from Offline to Streaming RNN Transducer for End-to-End Speech Recognition.

[BibT_eX]

[DOI]

George Saon

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

End-to-End Spoken Language Understanding Without Full Transcripts.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

New Advances in Speaker Diarization.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Speaker Embeddings Incorporating Acoustic Conditions for Diarization.

[BibT_eX]

[DOI]

Yosuke Higuchi

Masayuki Suzuki

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Converting Written Language to Spoken Language with Neural Machine Translation for Language Modeling.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

Multi-Task CTC Training with Auxiliary Feature Reconstruction for End-to-End Speech Recognition.

[BibT_eX]

[DOI]

Kartik Audhkhasi

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Guiding CTC Posterior Spike Timings for Improved Posterior Fusion and Knowledge Distillation.

[BibT_eX]

[DOI]

Kartik Audhkhasi

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Direct Neuron-Wise Fusion of Cognate Neural Networks.

[BibT_eX]

[DOI]

Masayuki Suzuki

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

English Broadcast News Speech Recognition by Humans and Machines.

[BibT_eX]

[DOI]

Alice Kaiser-Schatzlein

Bern Samko

Proceedings of the IEEE International Conference on Acoustics, 2019

Improvements to N-gram Language Model Using Text Generated from Neural Language Model.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Data Augmentation Based on Vowel Stretch for Improving Children's Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

2018

Improved Knowledge Distillation from Bi-Directional to Uni-Directional LSTM CTC for End-to-End Speech Recognition.

[BibT_eX]

[DOI]

Kartik Audhkhasi

Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Inference-Invariant Transformation of Batch Normalization for Domain Adaptation of Acoustic Models.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Data Augmentation Improves Recognition of Foreign Accented Speech.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

2017

Symbol Sequence Search from Telephone Conversation.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

English Conversational Telephone Speech Recognition by Humans and Machines.

[BibT_eX]

[DOI]

Dimitrios Dimitriadis

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Empirical Exploration of Novel Architectures and Objectives for Language Models.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Factorial Modeling for Effective Suppression of Directional Noise.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Ensembles of Multi-Scale VGG Acoustic Models.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Efficient Knowledge Distillation from an Ensemble of Teachers.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Harmonic feature fusion for robust neural network-based acoustic modeling.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Effective joint training of denoising feature space transforms and Neural Network based acoustic models.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Language modeling with highway LSTM.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

2016

Leveraging Sentence-level Information with Encoder LSTM for Natural Language Understanding.

[BibT_eX]

[DOI]

CoRR, 2016

Improved Neural Network-based Multi-label Classification with Better Initialization Leveraging Label Co-occurrence.

[BibT_eX]

[DOI]

Bing Xiang

Bowen Zhou

Proceedings of the NAACL HLT 2016, 2016

Labeled Data Generation with Encoder-Decoder LSTM for Semantic Slot Filling.

[BibT_eX]

[DOI]

Bing Xiang

Bowen Zhou

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Improved Neural Network Initialization by Grouping Context-Dependent Targets for Acoustic Modeling.

[BibT_eX]

[DOI]

Brian Kingsbury

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Speech recognition robust against speech overlapping in monaural recordings of telephone conversations.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Leveraging Sentence-level Information with Encoder LSTM for Semantic Slot Filling.

[BibT_eX]

[DOI]

Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2016

2015

Discriminative re-ranking for automatic speech recognition by leveraging invariant structures.

[BibT_eX]

[DOI]

Speech Commun., 2015

Deep neural network training emphasizing central frames.

[BibT_eX]

[DOI]

Daniel Willett

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

A metric for evaluating speech recognizer output based on human-perception model.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

2014

Leveraging phonetic context dependent invariant structure for continuous speech recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE China Summit & International Conference on Signal and Information Processing, 2014

2012

Acoustically discriminative language model training with pseudo-hypothesis.

[BibT_eX]

[DOI]

Speech Commun., 2012

Leveraging word confusion networks for named entity modeling and detection from conversational telephone speech.

[BibT_eX]

[DOI]

Speech Commun., 2012

Discriminative Reranking for LVCSR Leveraging Invariant Structure.

[BibT_eX]

[DOI]

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

2011

Continuous Digits Recognition Leveraging Invariant Structure.

[BibT_eX]

[DOI]

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Acoustic Model Training with Detecting Transcription Errors in the Training Data.

[BibT_eX]

[DOI]

Nobuyasu Itoh

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Named entity recognition from Conversational Telephone Speech leveraging Word Confusion Networks for training and recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2011

Training of error-corrective model for ASR without using audio data.

[BibT_eX]

[DOI]

Nobuyasu Itoh

Proceedings of the IEEE International Conference on Acoustics, 2011

2009

Acoustically discriminative training for language models.

[BibT_eX]

[DOI]

Nobuyasu Itoh

Proceedings of the IEEE International Conference on Acoustics, 2009

2007

Automatic Prosody Labeling Using Multiple Models for Japanese.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2007

Preliminary experiments toward automatic generation of new TTS voices from recorded speech alone.

[BibT_eX]

[DOI]

Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Unsupervised Lexicon Acquisition from Speech and Text.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2007

2006

Unsupervised Adaptation of a Stochastic Language Model Using a Japanese Raw Corpus.

[BibT_eX]

[DOI]

Shinsuke Mori

Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

Phoneme-to-Text Transcription System with an Infinite Vocabulary.

[BibT_eX]

[DOI]

Shinsuke Mori

Daisuke Takuma

Proceedings of the ACL 2006, 2006

2005

Class-based variable memory length Markov model.

[BibT_eX]

[DOI]

Shinsuke Mori

Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

2004

GDQA: Graph Driven Question Answering System - NTCIR-4 QAC2 Experiments.

[BibT_eX]

[DOI]

Naoaki Okazaki

Mitsuru Ishizuka

Proceedings of the Fourth NTCIR Workshop on Research in Information Access Technologies Information Retrieval, 2004

2002

Corpus-based analysis of English spoken by Japanese students in view of the entire phonemic system of English.

[BibT_eX]

[DOI]

Nobuaki Minematsu

Keikichi Hirose

Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

Integration of MLLR adaptation with pronunciation proficiency adaptation for non-native speech recognition.

[BibT_eX]

[DOI]

Nobuaki Minematsu