Takaaki Hori

Orcid: 0000-0003-4560-8039

According to our database1, Takaaki Hori authored at least 143 papers between 2001 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
End-to-End Speech Recognition: A Survey.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

2023
Variable Attention Masking for Configurable Transformer Transducer Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2023

2022
Momentum Pseudo-Labeling: Semi-Supervised ASR With Continuously Improving Pseudo-Labels.
IEEE J. Sel. Top. Signal Process., 2022

Low-Latency Online Streaming VideoQA Using Audio-Visual Transformers.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Audio-Visual Scene-Aware Dialog and Reasoning Using Audio-Visual Transformers with Joint Student-Teacher Learning.
Proceedings of the IEEE International Conference on Acoustics, 2022

Sequence Transduction with Graph-Based Supervision.
Proceedings of the IEEE International Conference on Acoustics, 2022

Advancing Momentum Pseudo-Labeling with Conformer and Initialization Strategy.
Proceedings of the IEEE International Conference on Acoustics, 2022

Extended Graph Temporal Classification for Multi-Speaker End-to-End ASR.
Proceedings of the IEEE International Conference on Acoustics, 2022

2021
Optimizing Latency for Online Video CaptioningUsing Audio-Visual Transformers.
CoRR, 2021

Dual Causal/Non-Causal Self-Attention for Streaming End-to-End Speech Recognition.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Advanced Long-Context End-to-End Speech Recognition Using Context-Expanded Transformers.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Optimizing Latency for Online Video Captioning Using Audio-Visual Transformers.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Momentum Pseudo-Labeling for Semi-Supervised Speech Recognition.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Semi-Supervised Speech Recognition Via Graph-Based Temporal Classification.
Proceedings of the IEEE International Conference on Acoustics, 2021

Capturing Multi-Resolution Context by Dilated Self-Attention.
Proceedings of the IEEE International Conference on Acoustics, 2021

Unsupervised Domain Adaptation for Speech Recognition via Uncertainty Driven Self-Training.
Proceedings of the IEEE International Conference on Acoustics, 2021

2020
Multi-Stream End-to-End Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2020

The 2020 ESPnet update: new features, broadened applications, performance improvements, and future plans.
CoRR, 2020

Multi-Pass Transformer for Machine Translation.
CoRR, 2020

All-in-One Transformer: Unifying Speech Recognition, Audio Tagging, and Event Detection.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Transformer-Based Long-Context End-to-End Speech Recognition.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Unsupervised Speaker Adaptation Using Attention-Based Speaker Memory for End-to-End ASR.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Streaming Automatic Speech Recognition with the Transformer Model.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019
Adversarial training and decoding strategies for end-to-end neural conversation models.
Comput. Speech Lang., 2019

Overview of the sixth dialog system technology challenge: DSTC6.
Comput. Speech Lang., 2019

Self-supervised Sequence-to-sequence ASR using Unpaired Speech and Text.
CoRR, 2019

End-to-End Multilingual Multi-Speaker Speech Recognition.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Vectorized Beam Search for CTC-Attention-Based Speech Recognition.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Unidirectional Neural Network Architectures for End-to-End Automatic Speech Recognition.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Analysis of Multilingual Sequence-to-Sequence Speech Recognition Systems.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Joint Student-Teacher Learning for Audio-Visual Scene-Aware Dialog.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Semi-Supervised Sequence-to-Sequence ASR Using Unpaired Speech and Text.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Stream Attention-based Multi-array End-to-end Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2019

Triggered Attention for End-to-end Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2019

Cycle-consistency Training for End-to-end Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2019

End-to-end Audio Visual Scene-aware Dialog Using Multimodal Attention-based Video Features.
Proceedings of the IEEE International Conference on Acoustics, 2019

Language Model Integration Based on Memory Control for Sequence to Sequence Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2019

Promising Accurate Prefix Boosting for Sequence-to-sequence ASR.
Proceedings of the IEEE International Conference on Acoustics, 2019

CNN-based Multichannel End-to-End Speech Recognition for Everyday Home Environments<sup>*</sup>.
Proceedings of the 27th European Signal Processing Conference, 2019

Streaming End-to-End Speech Recognition with Joint CTC-Attention Based Models.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

A Comparative Study on Transformer vs RNN in Speech Applications.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

2018
Multi-encoder multi-resolution framework for end-to-end speech recognition.
CoRR, 2018

Vectorization of hypotheses and speech for faster beam search in encoder decoder-based speech recognition.
CoRR, 2018

CNN-based MultiChannel End-to-End Speech Recognition for everyday home environments.
CoRR, 2018

End-to-end Speech Recognition With Word-Based Rnn Language Models.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Back-Translation-Style Data Augmentation for end-to-end ASR.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Multilingual Sequence-to-Sequence Speech Recognition: Architecture, Transfer Learning, and Language Modeling.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

ESPnet: End-to-End Speech Processing Toolkit.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

End-to-End Multi-Speaker Speech Recognition.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

An End-to-End Language-Tracking Speech Recognizer for Mixed-Language Speech.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Speaker Adaptation for Multichannel End-to-End Speech Recognition.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Multimodal Attention for Fusion of Audio and Spatiotemporal Features for Video Description.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2018

A Purely End-to-End System for Multi-speaker Speech Recognition.
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018

2017
Duration-Controlled LSTM for Polyphonic Sound Event Detection.
IEEE ACM Trans. Audio Speech Lang. Process., 2017

Error detection and accuracy estimation in automatic speech recognition using deep bidirectional recurrent neural networks.
Speech Commun., 2017

Hybrid CTC/Attention Architecture for End-to-End Speech Recognition.
IEEE J. Sel. Top. Signal Process., 2017

Unified Architecture for Multichannel End-to-End Speech Recognition With Neural Beamforming.
IEEE J. Sel. Top. Signal Process., 2017

Multi-microphone speech recognition integrating beamforming, robust feature extraction, and advanced DNN/RNN backend.
Comput. Speech Lang., 2017

Attention-Based Multimodal Fusion for Video Description.
CoRR, 2017

End-to-end Conversation Modeling Track in DSTC6.
CoRR, 2017

Advances in Joint CTC-Attention Based End-to-End Speech Recognition with a Deep CNN Encoder and RNN-LM.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Multichannel End-to-end Speech Recognition.
Proceedings of the 34th International Conference on Machine Learning, 2017

Attention-Based Multimodal Fusion for Video Description.
Proceedings of the IEEE International Conference on Computer Vision, 2017

Student-teacher network learning with enhanced features.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Joint CTC-attention based end-to-end speech recognition using multi-task learning.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

BLSTM-HMM hybrid system combined with sound activity detection network for polyphonic Sound Event Detection.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Language independent end-to-end architecture for joint language identification and speech recognition.
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

Multi-level language modeling and decoding for open vocabulary end-to-end speech recognition.
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

Early and late integration of audio features for automatic video description.
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

Joint CTC/attention decoding for end-to-end speech recognition.
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017

Toolkits for Robust Speech Processing.
Proceedings of the New Era for Robust Speech Recognition, Exploiting Deep Learning., 2017

2016
Estimating Speech Recognition Accuracy Based on Error Type Classification.
IEEE ACM Trans. Audio Speech Lang. Process., 2016

Automated structure discovery and parameter tuning of neural network language model based on evolution strategy.
Proceedings of the 2016 IEEE Spoken Language Technology Workshop, 2016

Dialog state tracking with attention-based sequence-to-sequence learning.
Proceedings of the 2016 IEEE Spoken Language Technology Workshop, 2016

Context-Sensitive and Role-Dependent Spoken Language Understanding Using Bidirectional and Attention LSTMs.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Driver confusion status detection using recurrent neural networks.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2016

Minimum word error training of long short-term memory recurrent neural network language models for speech recognition.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Bidirectional LSTM-HMM Hybrid System for Polyphonic Sound Event Detection.
Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events, 2016

2015
Strategies for distant speech recognitionin reverberant environments.
EURASIP J. Adv. Signal Process., 2015

Multiscale recurrent neural network based language model.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

ASR error detection and recognition rate estimation using deep bidirectional recurrent neural networks.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

WFST-based structural classification integrating dnn acoustic features and RNN language features for speech recognition.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Context adaptive deep neural networks for fast acoustic model adaptation.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Double-layer neighborhood graph based similarity search for fast query-by-example spoken term detection.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

The MERL/SRI system for the 3RD CHiME challenge using beamforming, robust feature extraction, and advanced speech recognition.
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

2014
Restructuring output layers of deep neural networks using minimum risk parameter clustering.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Fast segment search for corpus-based speech enhancement based on speech recognition technology.
Proceedings of the IEEE International Conference on Acoustics, 2014

Real-time one-pass decoding with recurrent neural network language model for speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2014

Zero-resource spoken term detection using hierarchical graph-based similarity search.
Proceedings of the IEEE International Conference on Acoustics, 2014

Defeating reverberation: Advanced dereverberation and recognition techniques for hands-free speech recognition.
Proceedings of the 2014 IEEE Global Conference on Signal and Information Processing, 2014

2013
Speech Recognition Algorithms Based on Weighted Finite-State Transducers
Synthesis Lectures on Speech and Audio Processing, Morgan & Claypool Publishers, ISBN: 978-3-031-02562-4, 2013

Prior-shared feature and model space speaker adaptation by consistently employing map estimation.
Speech Commun., 2013

Speech recognition in living rooms: Integrated speech enhancement and recognition system based on spatial, spectral and temporal modeling of sounds.
Comput. Speech Lang., 2013

Unsupervised discriminative language modeling using error rate estimator.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

A method for structure estimation of weighted finite-state transducers and its application to grapheme-to-phoneme conversion.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Discriminative recognition rate estimation for N-best list and its application to N-best rescoring.
Proceedings of the IEEE International Conference on Acoustics, 2013

Coupling beamforming with spatial and spectral feature based spectral enhancement and its application to meeting recognition.
Proceedings of the IEEE International Conference on Acoustics, 2013

Large vocabulary continuous speech recognition based on WFST structured classifiers and deep bottleneck features.
Proceedings of the IEEE International Conference on Acoustics, 2013

Feature space variational Bayesian linear regression and its combination with model space VBLR.
Proceedings of the IEEE International Conference on Acoustics, 2013

Graph index based query-by-example search on a large speech data set.
Proceedings of the IEEE International Conference on Acoustics, 2013

2012
Round-Robin Duel Discriminative Language Models.
IEEE Trans. Speech Audio Process., 2012

Structural Classification Methods Based on Weighted Finite-State Transducers for Automatic Speech Recognition.
IEEE Trans. Speech Audio Process., 2012

Low-Latency Real-Time Meeting Recognition and Understanding Using Distant Microphones and Omni-Directional Camera.
IEEE Trans. Speech Audio Process., 2012

Efficient training of discriminative language models by sample selection.
Speech Commun., 2012

Model Shrinkage for Discriminative Language Models.
IEICE Trans. Inf. Syst., 2012

Efficient prior and incremental beam width control to suppress excessive speech recognition time based on score range estimation.
Proceedings of the 2012 IEEE Spoken Language Technology Workshop (SLT), 2012

Recognition rate estimation based on word alignment network and discriminative error type classification.
Proceedings of the 2012 IEEE Spoken Language Technology Workshop (SLT), 2012

Integrating Deep Neural Networks into Structural Classification Approach based on Weighted Finite-State Transducers.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Efficient Beam Width Control to Suppress Excessive Speech Recognition Computation Time Based on Prior Score Range Normalization.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Speaker Adaptation Using Variational Bayesian Linear Regression in Normalized Feature Space.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Bag Of ARCS: New representation of speech segment features based on finite state machines.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Error type classification and word accuracy estimation using alignment features from word confusion network.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Spoken document retrieval by discriminative modeling in a high dimensional feature space.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Handling uncertain observations in unsupervised topic-mixture language model adaptation.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

2011
Topic tracking language model for speech recognition.
Comput. Speech Lang., 2011

Gibbs sampling based Multi-scale Mixture Model for speaker clustering.
Proceedings of the IEEE International Conference on Acoustics, 2011

Round-robin duel discriminative language models in one-pass decoding with on-the-fly error correction.
Proceedings of the IEEE International Conference on Acoustics, 2011

2010
Improved Sequential Dependency Analysis Integrating Labeling-Based Sentence Boundary Detection.
IEICE Trans. Inf. Syst., 2010

Application of topic tracking model to language model adaptation and meeting analysis.
Proceedings of the 2010 IEEE Spoken Language Technology Workshop, 2010

Real-time meeting recognition and understanding using distant microphones and omni-directional camera.
Proceedings of the 2010 IEEE Spoken Language Technology Workshop, 2010

Large vocabulary continuous speech recognition using WFST-based linear classifier for structured data.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Round-robin discrimination model for reranking ASR hypotheses.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Improvements of search error risk minimization in viterbi beam search for speech recognition.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

A discriminative model for continuous speech recognition based on Weighted Finite State Transducers.
Proceedings of the IEEE International Conference on Acoustics, 2010

A comparative study on methods of Weighted language model training for reranking lvcsr N-best hypotheses.
Proceedings of the IEEE International Conference on Acoustics, 2010

Search error risk minimization in Viterbi beam search for speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2010

2008
Sequential dependency analysis for online spontaneous speech processing.
Speech Commun., 2008

2007
Efficient WFST-Based One-Pass Decoding With On-The-Fly Hypothesis Rescoring in Extremely Large Vocabulary Continuous Speech Recognition.
IEEE Trans. Speech Audio Process., 2007

An approach to efficient generation of high-accuracy and compact error-corrective models for speech recognition.
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Open-Vocabulary Spoken Utterance Retrieval using Confusion Networks.
Proceedings of the IEEE International Conference on Acoustics, 2007

2006
Advanced computational models and learning theories for spoken language processing.
IEEE Comput. Intell. Mag., 2006

Sentence boundary detection using sequential dependency analysis combined with CRF-based chunking.
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

An Extremely Large Vocabulary Approach to Named Entity Extraction from Speech.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

2005
Experiments with probabilistic principal component analysis in LVCSR.
Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

Generalized fast on-the-fly composition algorithm for WFST-based speech recognition.
Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

Efficient Generation of high-order context-dependent Weighted Finite State Transducers for Speech Recognition.
Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

2004
Fast on-the-fly composition for weighted finite-state transducers in 1.8 million-word vocabulary continuous speech recognition.
Proceedings of the 8th International Conference on Spoken Language Processing, 2004

2003
Speech summarization using weighted finite-state transducers.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Evaluation method for automatic speech summarization.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Language model adaptation using WFST-based speaking-style translation.
Proceedings of the 2003 IEEE International Conference on Acoustics, 2003

Deriving disambiguous queries in a spoken interactive ODQA system.
Proceedings of the 2003 IEEE International Conference on Acoustics, 2003

Spoken Interactive ODQA System: SPIQA.
Proceedings of the ACL 2003, 2003

2001
Improved phoneme-history-dependent search for large-vocabulary continuous-speech recognition.
Proceedings of the EUROSPEECH 2001 Scandinavia, 2001


  Loading...