Naoyuki Kanda

Takuya Yoshioka

Yang Liu

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Simulating Realistic Speech Overlaps Improves Multi-Talker ASR.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Target Speaker Voice Activity Detection with Transformers and Its Integration with End-To-End Neural Diarization.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Vararray Meets T-Sot: Advancing the State of the Art of Streaming Distant Conversational Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Self-Supervised Learning with Bi-Label Masked Speech Prediction for Streaming Multi-Talker Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Speech Separation with Large-Scale Self-Supervised Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

i-Code: An Integrative and Composable Multimodal Learning Framework.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022

WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing.

[BibT_eX]

[DOI]

IEEE J. Sel. Top. Signal Process., 2022

A review of speaker diarization: Recent advances with deep learning.

[BibT_eX]

[DOI]

Tae Jin Park

Sarangarajan Parthasarathy

Dimitrios Dimitriadis

Kyu Jeong Han

Shinji Watanabe

Shrikanth Narayanan

Comput. Speech Lang., 2022

Breaking trade-offs in speech separation with sparsely-gated mixture of experts.

[BibT_eX]

[DOI]

CoRR, 2022

Target Speaker Voice Activity Detection with Transformers and Its Integration with End-to-End Neural Diarization.

[BibT_eX]

[DOI]

CoRR, 2022

Separating Long-Form Speech with Group-wise Permutation Invariant Training.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Leveraging Real Conversational Data for Multi-Channel Continuous Speech Separation.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Internal Language Model Adaptation with Text-Only Data for End-to-End Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Streaming Multi-Talker ASR with Token-Level Serialized Output Training.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

All-Neural Beamformer for Continuous Speech Separation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

VarArray: Array-Geometry-Agnostic Continuous Speech Separation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Transcribe-to-Diarize: Neural Speaker Diarization for Unlimited Number of Speakers Using End-to-End Speaker-Attributed ASR.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

2021

Streaming End-to-End Multi-Talker Speech Recognition.

[BibT_eX]

[DOI]

IEEE Signal Process. Lett., 2021

WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing.

[BibT_eX]

[DOI]

CoRR, 2021

Exploring End-to-End Multi-Channel ASR with Bias Information for Meeting Transcription.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Integration of Speech Separation, Diarization, and Recognition for Multi-Speaker Meetings: System Description, Comparison, and Analysis.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Internal Language Model Estimation for Domain-Adaptive End-to-End Speech Recognition.

[BibT_eX]

[DOI]

Zhong Meng

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Investigation of End-to-End Speaker-Attributed ASR for Continuous Multi-Talker Recordings.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Investigation of Practical Aspects of Single Channel Speech Separation for ASR.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Minimum Word Error Rate Training with Language Model Fusion for End-to-End Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Large-Scale Pre-Training of End-to-End Multi-Talker ASR for Meeting Transcription with Single Distant Microphone.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

End-to-End Speaker-Attributed ASR with Transformer.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

On Minimum Word Error Rate Training of the Hybrid Autoregressive Transducer.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Streaming Multi-Talker Speech Recognition with Joint Speaker Identification.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Microsoft Speaker Diarization System for the Voxceleb Speaker Recognition Challenge 2020.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Speech-Language Pre-Training for End-to-End Spoken Language Understanding.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Internal Language Model Training for Domain-Adaptive End-To-End Speech Recognition.

[BibT_eX]

[DOI]

Zhong Meng

Sarangarajan Parthasarathy

Yashesh Gaur

Proceedings of the IEEE International Conference on Acoustics, 2021

Minimum Bayes Risk Training for End-to-End Speaker-Attributed ASR.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Hypothesis Stitcher for End-to-End Speaker-Attributed ASR on Long-Form Multi-Talker Recordings.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

A Comparative Study of Modular and Joint Approaches for Speaker-Attributed ASR on Monaural Long-Form Audio.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

2020

Serialized Output Training for End-to-End Overlapped Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Joint Speaker Counting, Speech Recognition, and Speaker Identification for Overlapped Speech of any Number of Speakers.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

2019

Auxiliary Interference Speaker Loss for Target-Speaker Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Guided Source Separation Meets a Strong ASR Backend: Hitachi/Paderborn University Joint Investigation for Dinner Party ASR.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Multimodal Response Obligation Detection with Unsupervised Online Domain Adaptation.

[BibT_eX]

[DOI]

Shota Horiguchi

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

End-to-End Neural Speaker Diarization with Permutation-Free Objectives.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Acoustic Modeling for Distant Multi-talker Speech Recognition with Single- and Multi-channel Branches.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Simultaneous Speech Recognition and Speaker Diarization for Monaural Dialogue Recordings with Target-Speaker Acoustic Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

End-to-End Neural Speaker Diarization with Self-Attention.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

2018

Face-Voice Matching using Cross-modal Embeddings.

[BibT_eX]

[DOI]

Shota Horiguchi

Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018

Lattice-free State-level Minimum Bayes Risk Training of Acoustic Models.

[BibT_eX]

[DOI]

Yusuke Fujita

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Sequence Distillation for Purely Sequence Trained Acoustic Models.

[BibT_eX]

[DOI]

Yusuke Fujita

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017

Maximum-a-Posteriori-Based Decoding for End-to-End Acoustic Models.

[BibT_eX]

[DOI]

Xugang Lu

Hisashi Kawai

IEEE ACM Trans. Audio Speech Lang. Process., 2017

Minimum Bayes risk training of CTC acoustic models in maximum a posteriori based decoding framework.

[BibT_eX]

[DOI]

Xugang Lu

Hisashi Kawai

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Investigation of lattice-free maximum mutual information-based acoustic models with sequence-level Kullback-Leibler divergence.

[BibT_eX]

[DOI]

Yusuke Fujita

Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

2016

Combination of multiple acoustic models with unsupervised adaptation for lecture speech transcription.

[BibT_eX]

[DOI]

Speech Commun., 2016

Maximum a posteriori Based Decoding for CTC Acoustic Models.

[BibT_eX]

[DOI]

Xugang Lu

Hisashi Kawai

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Investigation of Semi-Supervised Acoustic Model Training Based on the Committee of Heterogeneous Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

2015

Training data pseudo-shuffling and direct decoding framework for recurrent neural network based acoustic modeling.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

2014

Open-ended Spoken Language Technology: Studies on Spoken Dialogue Systems and Spoken Document Retrieval Systems.

[BibT_eX]

[DOI]

PhD thesis, 2014

The NCT ASR system for IWSLT 2014.

[BibT_eX]

[DOI]

Proceedings of the 11th International Workshop on Spoken Language Translation: Evaluation Campaign@IWSLT 2014, 2014

Boundary contraction training for acoustic models based on discrete deep neural networks.

[BibT_eX]

[DOI]

Nobuo Nukaga

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

2013

Noise robust speaker verification with delta cepstrum normalization.

[BibT_eX]

[DOI]

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Multiple index combination for Japanese spoken term detection with optimum index selection based on OOV-region classifier.

[BibT_eX]

[DOI]

Katsutoshi Itoyama

Hiroshi G. Okuno

Proceedings of the IEEE International Conference on Acoustics, 2013

Elastic spectral distortion for low resource speech recognition with deep neural networks.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013

2012

Using rhythmic features for Japanese spoken term detection.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE Spoken Language Technology Workshop (SLT), 2012

Voice activity detection based on augmented statistical noise suppression.

[BibT_eX]

[DOI]