Hirofumi Inaguma

Orcid: 0000-0003-0610-1251

According to our database1, Hirofumi Inaguma authored at least 52 papers between 2016 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
SSR: Alignment-Aware Modality Connector for Speech Language Models.
CoRR, 2024

Investigating Decoder-only Large Language Models for Speech-to-text Translation.
CoRR, 2024

MMM: Multi-Layer Multi-Residual Multi-Stream Discrete Speech Representation from Self-supervised Learning Model.
CoRR, 2024

Multi-resolution HuBERT: Multi-resolution Speech Self-Supervised Learning with Masked Unit Prediction.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

2023
Alignment Knowledge Distillation for Online Streaming Attention-Based Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

Seamless: Multilingual Expressive and Streaming Speech Translation.
CoRR, 2023

Efficient Monotonic Multihead Attention.
CoRR, 2023

SeamlessM4T-Massively Multilingual & Multimodal Machine Translation.
CoRR, 2023

Exploration on HuBERT with Multiple Resolutions.
CoRR, 2023


Exploration on HuBERT with Multiple Resolution.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Enhancing Speech-To-Speech Translation with Multiple TTS Targets.
Proceedings of the IEEE International Conference on Acoustics, 2023

Named Entity Detection and Injection for Direct Speech Translation.
Proceedings of the IEEE International Conference on Acoustics, 2023

ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 2023

Simple and Effective Unsupervised Speech Translation.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

UnitY: Two-pass Direct Speech-to-speech Translation with Discrete Units.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

Speech-to-Speech Translation for a Real-world Unwritten Language.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

Hybrid Transducer and Attention based Encoder-Decoder Modeling for Speech-to-Text Tasks.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

2022
Distilling the Knowledge of BERT for CTC-based ASR.
CoRR, 2022

Non-autoregressive Error Correction for CTC-based ASR with Phone-conditioned Masked LM.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

2021
Fast and Low-Latency End-to-End Speech Recognition and Translation.
PhD thesis, 2021

Non-autoregressive End-to-end Speech Translation with Parallel Autoregressive Rescoring.
CoRR, 2021

Source and Target Bidirectional Knowledge Distillation for End-to-end Speech Translation.
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

ESPnet-ST IWSLT 2021 Offline Speech Translation System.
Proceedings of the 18th International Conference on Spoken Language Translation, 2021

VAD-Free Streaming Hybrid CTC/Attention ASR for Unsegmented Recording.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

StableEmit: Selection Probability Discount for Reducing Emission Latency of Streaming Monotonic Attention ASR.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

ORTHROS: non-autoregressive end-to-end speech translation With dual-decoder.
Proceedings of the IEEE International Conference on Acoustics, 2021

Improved Mask-CTC for Non-Autoregressive End-to-End ASR.
Proceedings of the IEEE International Conference on Acoustics, 2021

Recent Developments on Espnet Toolkit Boosted By Conformer.
Proceedings of the IEEE International Conference on Acoustics, 2021

Fast-MD: Fast Multi-Decoder End-to-End Speech Translation with Non-Autoregressive Hidden Intermediates.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

A Comparative Study on Non-Autoregressive Modelings for Speech-to-Text Generation.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

ASR Rescoring and Confidence Estimation with Electra.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

A Study of Transducer Based End-to-End ASR with ESPnet: Architecture, Auxiliary Loss and Decoding Strategies.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

2020
The 2020 ESPnet update: new features, broadened applications, performance improvements, and future plans.
CoRR, 2020

Enhancing Monotonic Multihead Attention for Streaming ASR.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

CTC-Synchronous Training for Monotonic Attention Model.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Distilling the Knowledge of BERT for Sequence-to-Sequence ASR.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

End-to-End Speech-to-Dialog-Act Recognition.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Minimum Latency Training Strategies for Streaming Sequence-to-Sequence ASR.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

ESPnet-ST: All-in-One Speech Translation Toolkit.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 2020

2019
ESPnet How2 Speech Translation System for IWSLT 2019: Pre-training, Knowledge Distillation, and Going Deeper.
Proceedings of the 16th International Conference on Spoken Language Translation, 2019

Transfer Learning of Language-independent End-to-end ASR with Language Model Fusion.
Proceedings of the IEEE International Conference on Acoustics, 2019

Language Model Integration Based on Memory Control for Sequence to Sequence Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2019

A Comparative Study on Transformer vs RNN in Speech Applications.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Multilingual End-to-End Speech Translation.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

2018
Leveraging Sequence-to-Sequence Speech Synthesis for Enhancing Acoustic-to-Word Speech Recognition.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Improving OOV Detection and Resolution with External Language Models in Acoustic-to-Word ASR.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

The JHU/KyotoU Speech Translation System for IWSLT 2018.
Proceedings of the 15th International Conference on Spoken Language Translation, 2018

Acoustic-to-Word Attention-Based Model Complemented with Character-Level CTC-Based Model.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

An End-to-End Approach to Joint Social Signal Detection and Automatic Speech Recognition.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017
Social Signal Detection in Spontaneous Dialogue Using Bidirectional LSTM-CTC.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

2016
Prediction of ice-breaking between participants using prosodic features in the first meeting dialogue.
Proceedings of the 2nd Workshop on Advancements in Social Signal Processing for Multimodal Interaction, 2016


  Loading...