We stand with Ukraine

We stand with Ukraine

Atsunori Ogawa

Orcid: 0000-0002-2888-101X

According to our database¹, Atsunori Ogawa authored at least 120 papers between 1998 and 2024.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2024

Recognition of target domain Japanese speech using language model replacement.

[BibT_eX]

[DOI]

,

,

Ryota Nishimura

,

,

Norihide Kitaoka

EURASIP J. Audio Speech Music. Process., December, 2024

Sentence-wise Speech Summarization: Task, Datasets, and End-to-End Modeling with LM Knowledge Distillation.

[BibT_eX]

[DOI]

,

Takanori Ashihara

,

Takafumi Moriya

,

,

,

,

CoRR, 2024

Applying LLMs for Rescoring N-best ASR Hypotheses of Casual Conversations: Effects of Domain Adaptation and Context Carry-over.

[BibT_eX]

[DOI]

,

,

,

Takanori Ashihara

,

Takafumi Moriya

,

,

,

CoRR, 2024

NTT Speaker Diarization System for Chime-7: Multi-Domain, Multi-Microphone end-to-end and Vector Clustering Diarization.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2024

Train Long and Test Long: Leveraging Full Document Contexts in Speech Processing.

[BibT_eX]

[DOI]

,

,

,

,

Shinji Watanabe

Proceedings of the IEEE International Conference on Acoustics, 2024

2023

Knowledge Distillation for Neural Transducer-based Target-Speaker ASR: Exploiting Parallel Mixture/Single-Talker Speech Data.

[BibT_eX]

[DOI]

Takafumi Moriya

,

,

,

,

Takanori Ashihara

,

,

Tomohiro Tanaka

,

,

,

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Transfer Learning from Pre-trained Language Models Improves End-to-End Speech Summarization.

[BibT_eX]

[DOI]

,

Takanori Ashihara

,

Takafumi Moriya

,

Tomohiro Tanaka

,

,

,

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

What are differences? Comparing DNN and Human by Their Performance and Characteristics in Speaker Age Estimation.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Multi-Stream Extension of Variational Bayesian HMM Clustering (MS-VBx) for Combined End-to-End and Vector Clustering-based Diarization.

[BibT_eX]

[DOI]

,

,

,

Federico Landini

,

,

,

Tomohiro Nakatani

,

,

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Impact of Residual Noise and Artifacts in Speech Enhancement Errors on Intelligibility of Human and Machine.

[BibT_eX]

[DOI]

,

,

,

,

,

Tomohiro Nakatani

,

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Iterative Shallow Fusion of Backward Language Model for End-To-End Speech Recognition.

[BibT_eX]

[DOI]

,

Takafumi Moriya

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2023

Leveraging Large Text Corpora For End-To-End Speech Summarization.

[BibT_eX]

[DOI]

,

Takanori Ashihara

,

Takafumi Moriya

,

Tomohiro Tanaka

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2023

Speech Summarization of Long Spoken Document: Improving Memory Efficiency of Speech/Text Encoders.

[BibT_eX]

[DOI]

,

,

,

Roshan S. Sharma

,

,

Shinji Watanabe

Proceedings of the IEEE International Conference on Acoustics, 2023

Espnet-Summ: Introducing a Novel Large Dataset, Toolkit, and a Cross-Corpora Evaluation of Speech Summarization Systems.

[BibT_eX]

[DOI]

Roshan S. Sharma

,

,

,

,

,

Shinji Watanabe

,

,

,

,

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Summarize While Translating: Universal Model With Parallel Decoding for Summarization and Translation.

[BibT_eX]

[DOI]

,

,

,

,

Takanori Ashihara

,

,

Shinji Watanabe

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Streaming End-to-End ASR Using CTC Decoder and DRA for Linguistic Information Substitution.

[BibT_eX]

[DOI]

Tatsunari Takagi

,

,

Norihide Kitaoka

,

Yukoh Wakabayashi

Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2023

Coarse-Age Loss: A New Training Method Using Coarse-Age Labeled Data for Speaker Age Estimation.

[BibT_eX]

[DOI]

,

Hosana Kamiyama

,

,

,

Noboru Miyazaki

,

Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2023

Language modeling for spontaneous speech recognition based on disfluency labeling and generation of disfluent text.

[BibT_eX]

[DOI]

,

,

Ryota Nishimura

,

,

Norihide Kitaoka

Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2023

Combining multiple end-to-end speech recognition models based on density ratio approach.

[BibT_eX]

[DOI]

,

,

Yukoh Wakabayashi

,

,

,

Norihide Kitaoka

Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2023

2022

Multi-Source Domain Generalization Using Domain Attributes for Recurrent Neural Network Language Models.

[BibT_eX]

[DOI]

,

,

,

Hiroto Ashikawa

,

Tetsunori Kobayashi

,

IEICE Trans. Inf. Syst., 2022

Subjective intelligibility of speech sounds enhanced by ideal ratio mask via crowdsourced remote experiments with effective data screening.

[BibT_eX]

[DOI]

,

,

,

,

,

Keisuke Kinoshita

,

Tomohiro Nakatani

CoRR, 2022

End-to-End Spontaneous Speech Recognition Using Disfluency Labeling.

[BibT_eX]

[DOI]

,

,

,

Ryota Nishimura

,

,

Norihide Kitaoka

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Lattice Rescoring Based on Large Ensemble of Complementary Neural Language Models.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2022

Integrating Multiple ASR Systems into NLP Backend with Attention Fusion.

[BibT_eX]

[DOI]

,

,

,

Shinji Watanabe

Proceedings of the IEEE International Conference on Acoustics, 2022

2021

Comparison of Remote Experiments Using Crowdsourcing and Laboratory Experiments on Speech Intelligibility.

[BibT_eX]

[DOI]

,

,

,

,

,

Keisuke Kinoshita

,

Tomohiro Nakatani

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Age-VOX-Celeb: Multi-Modal Corpus for Facial and Speech Estimation.

[BibT_eX]

[DOI]

,

,

,

Hosana Kamiyama

Proceedings of the IEEE International Conference on Acoustics, 2021

BLSTM-Based Confidence Estimation for End-to-End Speech Recognition.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2021

Robust Speech-Age Estimation Using Local Maximum Mean Discrepancy Under Mismatched Recording Conditions.

[BibT_eX]

[DOI]

,

,

,

Hosana Kamiyama

,

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

Attention-Based Multi-Hypothesis Fusion for Speech Summarization.

[BibT_eX]

[DOI]

,

,

,

Shinji Watanabe

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

Advanced language model fusion method for encoder-decoder model in Japanese speech recognition.

[BibT_eX]

[DOI]

,

,

Ryota Nishimura

,

,

Norihide Kitaoka

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

End-to-End Spontaneous Speech Recognition Using Hesitation Labeling.

[BibT_eX]

[DOI]

,

,

,

Ryota Nishimura

,

,

Norihide Kitaoka

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

2020

Language Model Data Augmentation Based on Text Domain Transfer.

[BibT_eX]

[DOI]

,

,

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Predicting Intelligibility of Enhanced Speech Using Posteriors Derived from DNN-Based ASR System.

[BibT_eX]

[DOI]

,

,

,

Keisuke Kinoshita

,

Tomohiro Nakatani

,

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Frame-Level Phoneme-Invariant Speaker Embedding for Text-Independent Speaker Recognition on Extremely Short Utterances.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Improving Speaker-Attribute Estimation by Voting Based on Speaker Cluster Information.

[BibT_eX]

[DOI]

,

Hosana Kamiyama

,

Satoshi Kobashikawa

,

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Noise-robust Attention Learning for End-to-End Speech Recognition.

[BibT_eX]

[DOI]

,

,

,

,

Tetsunori Kobayashi

,

Proceedings of the 28th European Signal Processing Conference, 2020

2019

Feature Based Domain Adaptation for Neural Network Language Models with Factorised Hidden Layers.

[BibT_eX]

[DOI]

Michael Hentschel

,

,

,

,

Tomohiro Nakatani

IEICE Trans. Inf. Syst., 2019

Improved Deep Duel Model for Rescoring N-Best Speech Recognition List Using Backward LSTMLM and Ensemble Encoders.

[BibT_eX]

[DOI]

,

,

,

Tomohiro Nakatani

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Multimodal SpeakerBeam: Single Channel Target Speech Extraction with Audio-Visual Speaker Clues.

[BibT_eX]

[DOI]

,

,

Keisuke Kinoshita

,

,

Tomohiro Nakatani

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Improving Transformer-Based End-to-End Speech Recognition with Connectionist Temporal Classification and Language Model Integration.

[BibT_eX]

[DOI]

,

Nelson Enrique Yalta Soplin

,

Shinji Watanabe

,

,

,

Tomohiro Nakatani

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

End-to-End SpeakerBeam for Single Channel Target Speech Recognition.

[BibT_eX]

[DOI]

,

Shinji Watanabe

,

,

Keisuke Kinoshita

,

,

,

Tomohiro Nakatani

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Predicting Speech Intelligibility of Enhanced Speech Using Phone Accuracy of DNN-Based ASR System.

[BibT_eX]

[DOI]

,

,

,

Keisuke Kinoshita

,

Tomohiro Nakatani

,

Katsuhiko Yamamoto

,

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

ILP-based Compressive Speech Summarization with Content Word Coverage Maximization and Its Oracle Performance Analysis.

[BibT_eX]

[DOI]

,

,

Tomohiro Nakatani

,

Proceedings of the IEEE International Conference on Acoustics, 2019

A Unified Framework for Neural Speech Separation and Extraction.

[BibT_eX]

[DOI]

,

,

Keisuke Kinoshita

,

,

Tomohiro Nakatani

Proceedings of the IEEE International Conference on Acoustics, 2019

Semi-supervised End-to-end Speech Recognition Using Text-to-speech and Autoencoders.

[BibT_eX]

[DOI]

,

Shinji Watanabe

,

,

,

,

Tomohiro Nakatani

Proceedings of the IEEE International Conference on Acoustics, 2019

A Unified Framework for Feature-based Domain Adaptation of Neural Network Language Models.

[BibT_eX]

[DOI]

Michael Hentschel

,

,

,

,

Tomohiro Nakatani

Proceedings of the IEEE International Conference on Acoustics, 2019

2018

Context Adaptive Neural Network Based Acoustic Models for Rapid Adaptation.

[BibT_eX]

[DOI]

,

Keisuke Kinoshita

,

,

Christian Huemmer

,

Tomohiro Nakatani

IEEE ACM Trans. Audio Speech Lang. Process., 2018

Semi-Supervised End-to-End Speech Recognition.

[BibT_eX]

[DOI]

,

Shinji Watanabe

,

,

,

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Auxiliary Feature Based Adaptation of End-to-end ASR Systems.

[BibT_eX]

[DOI]

,

Shinji Watanabe

,

,

,

Tomohiro Nakatani

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Rescoring N-Best Speech Recognition List Based on One-on-One Hypothesis Comparison Using Encoder-Classifier Model.

[BibT_eX]

[DOI]

,

,

,

Tomohiro Nakatani

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Language Model Domain Adaptation Via Recurrent Neural Networks with Domain-Shared and Domain-Specific Representations.

[BibT_eX]

[DOI]

Tsuyoshi Morioka

,

,

,

,

,

Tetsunori Kobayashi

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Sequence Training of Encoder-Decoder Model Using Policy Gradient for End-to-End Speech Recognition.

[BibT_eX]

[DOI]

,

,

,

Tomohiro Nakatani

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Single Channel Target Speaker Extraction and Recognition with Speaker Beam.

[BibT_eX]

[DOI]

,

Katerina Zmolíková

,

Keisuke Kinoshita

,

,

Tomohiro Nakatani

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Feature-Based Learning Hidden Unit Contributions for Domain Adaptation of RNN-LMs.

[BibT_eX]

[DOI]

Michael Hentschel

,

,

,

Tomohiro Nakatani

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018

Factorised Hidden Layer Based Domain Adaptation for Recurrent Neural Network Language Models.

[BibT_eX]

[DOI]

Michael Hentschel

,

,

,

,

Tomohiro Nakatani

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018

2017

Error detection and accuracy estimation in automatic speech recognition using deep bidirectional recurrent neural networks.

[BibT_eX]

[DOI]

,

Speech Commun., 2017

Speaker-Aware Neural Network Based Beamformer for Speaker Extraction in Speech Mixtures.

[BibT_eX]

[DOI]

Katerina Zmolíková

,

,

Keisuke Kinoshita

,

,

,

Tomohiro Nakatani

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Uncertainty Decoding with Adaptive Sampling for Noise Robust DNN-Based Acoustic Modeling.

[BibT_eX]

[DOI]

,

,

,

Tomohiro Nakatani

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Unfolded Deep Recurrent Convolutional Neural Network with Jump Ahead Connections for Acoustic Modeling.

[BibT_eX]

[DOI]

,

,

,

Michael Hentschel

,

,

Tomohiro Nakatani

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Improved Example-Based Speech Enhancement by Using Deep Neural Network Acoustic Model for Noise Robust Example Search.

[BibT_eX]

[DOI]

,

Keisuke Kinoshita

,

,

Tomohiro Nakatani

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Forward-Backward Convolutional LSTM for Acoustic Modeling.

[BibT_eX]

[DOI]

,

,

,

Tomohiro Nakatani

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Feedback connection for deep neural network-based acoustic modeling.

[BibT_eX]

[DOI]

,

,

,

Christian Huemmer

,

Tomohiro Nakatani

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Cumulative moving averaged bottleneck speaker vectors for online speaker adaptation of CNN-based acoustic models.

[BibT_eX]

[DOI]

,

,

Keisuke Kinoshita

,

,

,

Shigeru Katagiri

,

Tomohiro Nakatani

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Deep mixture density network for statistical model-based feature enhancement.

[BibT_eX]

[DOI]

Keisuke Kinoshita

,

,

,

,

Tomohiro Nakatani

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Online environmental adaptation of CNN-based acoustic models using spatial diffuseness features.

[BibT_eX]

[DOI]

Christian Huemmer

,

,

,

Keisuke Kinoshita

,

Tomohiro Nakatani

,

Walter Kellermann

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Online meeting recognition in noisy environments with time-frequency mask based MVDR beamforming.

[BibT_eX]

[DOI]

,

,

,

,

Keisuke Kinoshita

,

,

Takuya Yoshioka

,

,

,

Tomohiro Nakatani

Proceedings of the Hands-free Speech Communications and Microphone Arrays, 2017

Learning speaker representation for neural network based multichannel speaker extraction.

[BibT_eX]

[DOI]

Katerina Zmolíková

,

,

Keisuke Kinoshita

,

,

,

Tomohiro Nakatani

Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

Exploiting imbalanced textual and acoustic data for training prosodically-enhanced RNNLMs.

[BibT_eX]

[DOI]

Michael Hentschel

,

,

,

Tomohiro Nakatani

,

Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

Exploiting end of sentences and speaker alternations in language modeling for multiparty conversations.

[BibT_eX]

[DOI]

Hiroto Ashikawa

,

,

,

,

Tetsunori Kobayashi

,

Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

Multichannel Speech Enhancement Approaches to DNN-Based Far-Field Speech Recognition.

[BibT_eX]

[DOI]

,

Takuya Yoshioka

,

,

,

Keisuke Kinoshita

,

Masakiyo Fujimoto

,

,

,

Tomohiro Nakatani

Proceedings of the New Era for Robust Speech Recognition, Exploiting Deep Learning., 2017

2016

Estimating Speech Recognition Accuracy Based on Error Type Classification.

[BibT_eX]

[DOI]

,

,

Atsushi Nakamura

IEEE ACM Trans. Audio Speech Lang. Process., 2016

Differenced maximum mutual information criterion for robust unsupervised acoustic model adaptation.

[BibT_eX]

[DOI]

,

,

,

Tomohiro Nakatani

,

Atsushi Nakamura

Comput. Speech Lang., 2016

Factorized Linear Input Network for Acoustic Model Adaptation in Noisy Conditions.

[BibT_eX]

[DOI]

,

,

,

Tomohiro Nakatani

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Robust Example Search Using Bottleneck Features for Example-Based Speech Enhancement.

[BibT_eX]

[DOI]

,

,

Keisuke Kinoshita

,

,

Takuya Yoshioka

,

Tomohiro Nakatani

,

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Context Adaptive Neural Network for Rapid Adaptation of Deep CNN Based Acoustic Models.

[BibT_eX]

[DOI]

,

Keisuke Kinoshita

,

,

Takuya Yoshioka

,

,

Tomohiro Nakatani

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Context adaptive deep neural networks for fast acoustic model adaptation in noisy conditions.

[BibT_eX]

[DOI]

,

Keisuke Kinoshita

,

,

,

Takuya Yoshioka

,

Tomohiro Nakatani

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Spatial correlation model based observation vector clustering and MVDR beamforming for meeting recognition.

[BibT_eX]

[DOI]

,

,

,

,

Tomohiro Nakatani

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

2015

Strategies for distant speech recognitionin reverberant environments.

[BibT_eX]

[DOI]

,

Takuya Yoshioka

,

,

,

Masakiyo Fujimoto

,

,

Keisuke Kinoshita

,

,

,

,

Tomohiro Nakatani

EURASIP J. Adv. Signal Process., 2015

Robust i-vector extraction for neural network adaptation in noisy environment.

[BibT_eX]

[DOI]

,

,

,

Takuya Yoshioka

,

Tomohiro Nakatani

,

John H. L. Hansen

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Text-informed speech enhancement with deep neural networks.

[BibT_eX]

[DOI]

Keisuke Kinoshita

,

,

,

Tomohiro Nakatani

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

ASR error detection and recognition rate estimation using deep bidirectional recurrent neural networks.

[BibT_eX]

[DOI]

,

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Double-layer neighborhood graph based similarity search for fast query-by-example spoken term detection.

[BibT_eX]

[DOI]

,

,

Takashi Hattori

,

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

The NTT CHiME-3 system: Advances in speech enhancement and recognition for mobile multi-microphone devices.

[BibT_eX]

[DOI]

Takuya Yoshioka

,

,

,

,

Keisuke Kinoshita

,

Masakiyo Fujimoto

,

,

Wojciech J. Fabian

,

,

,

,

Tomohiro Nakatani

Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

2014

Automatic Vocabulary Adaptation Based on Semantic and Acoustic Similarities.

[BibT_eX]

[DOI]

,

Yoshikazu Yamaguchi

,

,

Hirokazu Masataki

,

,

Satoshi Takahashi

IEICE Trans. Inf. Syst., 2014

Fast segment search for corpus-based speech enhancement based on speech recognition technology.

[BibT_eX]

[DOI]

,

Keisuke Kinoshita

,

,

Tomohiro Nakatani

,

Atsushi Nakamura

Proceedings of the IEEE International Conference on Acoustics, 2014

Zero-resource spoken term detection using hierarchical graph-based similarity search.

[BibT_eX]

[DOI]

,

,

Takashi Hattori

,

,

Atsushi Nakamura

Proceedings of the IEEE International Conference on Acoustics, 2014

Defeating reverberation: Advanced dereverberation and recognition techniques for hands-free speech recognition.

[BibT_eX]

[DOI]

,

Takuya Yoshioka

,

,

,

Masakiyo Fujimoto

,

,

Keisuke Kinoshita

,

,

,

,

Tomohiro Nakatani

Proceedings of the 2014 IEEE Global Conference on Signal and Information Processing, 2014

2013

Prior-shared feature and model space speaker adaptation by consistently employing map estimation.

[BibT_eX]

[DOI]

,

Shinji Watanabe

,

,

Masakiyo Fujimoto

,

,

Atsushi Nakamura

Speech Commun., 2013

Fast unsupervised adaptation based on efficient statistics accumulation using frame independent confidence within monophone states.

[BibT_eX]

[DOI]

Satoshi Kobashikawa

,

,

,

Yoshikazu Yamaguchi

,

Hirokazu Masataki

,

Satoshi Takahashi

Comput. Speech Lang., 2013

Speech recognition in living rooms: Integrated speech enhancement and recognition system based on spatial, spectral and temporal modeling of sounds.

[BibT_eX]

[DOI]

,

Keisuke Kinoshita

,

Tomohiro Nakatani

,

,

,

,

Shinji Watanabe

,

Masakiyo Fujimoto

,

Takuya Yoshioka

,

,

,

,

,

Atsushi Nakamura

Comput. Speech Lang., 2013

Unsupervised discriminative language modeling using error rate estimator.

[BibT_eX]

[DOI]

,

,

,

Hirokazu Masataki

,

Atsushi Nakamura

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Discriminative recognition rate estimation for N-best list and its application to N-best rescoring.

[BibT_eX]

[DOI]

,

,

Atsushi Nakamura

Proceedings of the IEEE International Conference on Acoustics, 2013

Coupling beamforming with spatial and spectral feature based spectral enhancement and its application to meeting recognition.

[BibT_eX]

[DOI]

Tomohiro Nakatani

,

,

,

Takuya Yoshioka

,

,

Proceedings of the IEEE International Conference on Acoustics, 2013

Feature space variational Bayesian linear regression and its combination with model space VBLR.

[BibT_eX]

[DOI]

,

,

,

Masakiyo Fujimoto

,

,

Atsushi Nakamura

Proceedings of the IEEE International Conference on Acoustics, 2013

Unsupervised discriminative adaptation using differenced maximum mutual information based linear regression.

[BibT_eX]

[DOI]

,

,

,

Tomohiro Nakatani

,

Atsushi Nakamura

Proceedings of the IEEE International Conference on Acoustics, 2013

Graph index based query-by-example search on a large speech data set.

[BibT_eX]

[DOI]

,

,

Takashi Hattori

,

,

Atsushi Nakamura

Proceedings of the IEEE International Conference on Acoustics, 2013

2012

Low-Latency Real-Time Meeting Recognition and Understanding Using Distant Microphones and Omni-Directional Camera.

[BibT_eX]

[DOI]

,

,

Takuya Yoshioka

,

Masakiyo Fujimoto

,

Shinji Watanabe

,

,

,

Kazuhiro Otsuka

,

,

Keisuke Kinoshita

,

Tomohiro Nakatani

,

Atsushi Nakamura

,

IEEE Trans. Speech Audio Process., 2012

Joint estimation of confidence and error causes in speech recognition.

[BibT_eX]

[DOI]

,

Atsushi Nakamura

Speech Commun., 2012

Recognition rate estimation based on word alignment network and discriminative error type classification.

[BibT_eX]

[DOI]

,

,

Atsushi Nakamura

Proceedings of the 2012 IEEE Spoken Language Technology Workshop (SLT), 2012

Dynamic variance adaptation using differenced maximum mutual information.

[BibT_eX]

[DOI]

,

,

Tomohiro Nakatani

,

Atsushi Nakamura

Proceedings of the 2012 Symposium on Machine Learning in Speech and Language Processing, 2012

Automatic Vocabulary Adaptation Based on Semantic Similarity and Speech Recognition Confidence Measure.

[BibT_eX]

[DOI]

,

Yoshikazu Yamaguchi

,

,

Hirokazu Masataki

,

,

Satoshi Takahashi

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Speaker Adaptation Using Variational Bayesian Linear Regression in Normalized Feature Space.

[BibT_eX]

[DOI]

,

,

Masakiyo Fujimoto

,

,

Atsushi Nakamura

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Error type classification and word accuracy estimation using alignment features from word confusion network.

[BibT_eX]

[DOI]

,

,

Atsushi Nakamura

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Discriminative feature transforms using differenced maximum mutual information.

[BibT_eX]

[DOI]

,

,

Shinji Watanabe

,

Tomohiro Nakatani

,

Atsushi Nakamura

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

2011

Efficient Combination of Likelihood Recycling and Batch Calculation for Fast Acoustic Likelihood Calculation.

[BibT_eX]

[DOI]

,

Satoshi Takahashi

,

Atsushi Nakamura

IEICE Trans. Inf. Syst., 2011

Machine and acoustical condition dependency analyses for fast acoustic likelihood calculation techniques.

[BibT_eX]

[DOI]

,

Satoshi Takahashi

,

Atsushi Nakamura

Proceedings of the IEEE International Conference on Acoustics, 2011

2010

Real-time meeting recognition and understanding using distant microphones and omni-directional camera.

[BibT_eX]

[DOI]

,

,

Takuya Yoshioka

,

Masakiyo Fujimoto

,

Shinji Watanabe

,

,

,

Kazuhiro Otsuka

,

,

Keisuke Kinoshita

,

Tomohiro Nakatani

,

Atsushi Nakamura

,

Proceedings of the 2010 IEEE Spoken Language Technology Workshop, 2010

A novel confidence measure based on marginalization of jointly estimated error cause probabilities.

[BibT_eX]

[DOI]

,

Atsushi Nakamura

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Discriminative confidence and error cause estimation for extended speech recognition function.

[BibT_eX]

[DOI]

,

Atsushi Nakamura

Proceedings of the IEEE International Conference on Acoustics, 2010

2009

Simultaneous estimation of confidence and error cause in speech recognition using discriminative model.

[BibT_eX]

[DOI]

,

Atsushi Nakamura

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Rapid unsupervised adaptation using frame independent output probabilities of gender and context independent phoneme models.

[BibT_eX]

[DOI]

Satoshi Kobashikawa

,

,

Yoshikazu Yamaguchi

,

Satoshi Takahashi

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Efficient combination of likelihood recycling and batch calculation based on conditional fast processing and acoustic back-off.

[BibT_eX]

[DOI]

,

Satoshi Takahashi

,

Atsushi Nakamura

Proceedings of the IEEE International Conference on Acoustics, 2009

2008

Weighted distance measures for efficient reduction of Gaussian mixture components in HMM-based acoustic model.

[BibT_eX]

[DOI]

,

Satoshi Takahashi

Proceedings of the IEEE International Conference on Acoustics, 2008

2005

Children's speech recognition using elementary-school-student speech database.

[BibT_eX]

[DOI]

,

Yoshikazu Yamaguchi

,

Shoichi Matsunaga

Syst. Comput. Jpn., 2005

Rapid response and robust speech recognition by preliminary model adaptation for additive and convolutional noise.

[BibT_eX]

[DOI]

Satoshi Kobashikawa

,

Satoshi Takahashi

,

Yoshikazu Yamaguchi

,

Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

2003

Speaker adaptation for non-native speakers using bilingual English lexicon and acoustic models.

[BibT_eX]

[DOI]

Shoichi Matsunaga

,

,

Yoshikazu Yamaguchi

,

Akihiro Imamura

Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Non-native English speech recognition using bilingual English lexicon and acoustic models.

[BibT_eX]

[DOI]

Shoichi Matsunaga

,

,

Yoshikazu Yamaguchi

,

Akihiro Imamura

Proceedings of the 2003 IEEE International Conference on Acoustics, 2003

2000

Novel two-pass search strategy using time-asynchronous shortest-first second-pass beam search.

[BibT_eX]

[DOI]

,

,

Shoichi Matsunaga

Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

1998

Estimating entropy of a language from optimal word insertion penalty.

[BibT_eX]

[DOI]

,

,

Fumitada Itakura

Proceedings of the 5th International Conference on Spoken Language Processing, Incorporating The 7th Australian International Speech Science and Technology Conference, Sydney Convention Centre, Sydney, Australia, 30th November, 1998

Balancing acoustic and linguistic probabilities.

[BibT_eX]

[DOI]

,

,

Fumitada Itakura

Proceedings of the 1998 IEEE International Conference on Acoustics, 1998

Loading...