Ryo Masumura

Orcid: 0000-0002-2415-4149

According to our database1, Ryo Masumura authored at least 131 papers between 2010 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Boosting Hybrid Autoregressive Transducer-based ASR with Internal Acoustic Model Training and Dual Blank Thresholding.
CoRR, 2024

Alignment-Free Training for Transducer-based Multi-Talker ASR.
CoRR, 2024

Factor-Conditioned Speaking-Style Captioning.
CoRR, 2024

Talking Face Generation for Impression Conversion Considering Speech Semantics.
Proceedings of the IEEE International Conference on Acoustics, 2024

2023
Multi-region CNN-Transformer for Micro-gesture Recognition in Face and Upper Body.
Proceedings of the ACM Multimedia Asia 2023, 2023

Downstream Task Agnostic Speech Enhancement with Self-Supervised Representation Loss.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Knowledge Distillation for Neural Transducer-based Target-Speaker ASR: Exploiting Parallel Mixture/Single-Talker Speech Data.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

End-to-End Joint Target and Non-Target Speakers ASR.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Joint Autoregressive Modeling of End-to-End Multi-Talker Overlapped Speech Recognition and Utterance-level Timestamp Prediction.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

What are differences? Comparing DNN and Human by Their Performance and Characteristics in Speaker Age Estimation.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Transcribing Speech as Spoken and Written Dual Text Using an Autoregressive Model.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Audio-Visual Praise Estimation for Conversational Video based on Synchronization-Guided Multimodal Transformer.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Retrieval, Masking, and Generation: Feedback Comment Generation using Masked Comment Examples.
Proceedings of the 16th International Natural Language Generation Conference, 2023

Open-Set Recognition for Facial-Expression Recognition.
Proceedings of the IEEE International Conference on Image Processing, 2023

OnDA-DETR: Online Domain Adaptation for Detection Transformers with Self-Training Framework.
Proceedings of the IEEE International Conference on Image Processing, 2023

Distilling Knowledge of Bidirectional Language Model for Scene Text Recognition.
Proceedings of the IEEE International Conference on Image Processing, 2023

Adversarial Finetuning with Latent Representation Constraint to Mitigate Accuracy-Robustness Tradeoff.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Leveraging Language Embeddings for Cross-Lingual Self-Supervised Speech Representation Learning.
Proceedings of the IEEE International Conference on Acoustics, 2023

Improving Scheduled Sampling for Neural Transducer-Based ASR.
Proceedings of the IEEE International Conference on Acoustics, 2023

Next-Speaker Prediction Based on Non-Verbal Information in Multi-Party Video Conversation.
Proceedings of the IEEE International Conference on Acoustics, 2023

Leveraging Large Text Corpora For End-To-End Speech Summarization.
Proceedings of the IEEE International Conference on Acoustics, 2023

Modeling Lead-Lag Structure in Facial Expression Synchrony for Social-Psychological Outcome Prediction from Negotiation Interaction.
Proceedings of the IEEE International Conference on Acoustics, 2023

Text-to-Text Pre-Training with Paraphrasing for Improving Transformer-Based Image Captioning.
Proceedings of the 31st European Signal Processing Conference, 2023

2022
Audio Visual Scene-Aware Dialog Generation with Transformer-based Video Representations.
CoRR, 2022

Knowledge Transferred Fine-Tuning: Convolutional Neural Network Is Born Again With Anti-Aliasing Even in Data-Limited Situations.
IEEE Access, 2022

On the Use of Modality-Specific Large-Scale Pre-Trained Encoders for Multimodal Sentiment Analysis.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Multimodal Negotiation Corpus with Various Subjective Assessments for Social-Psychological Outcome Prediction from Non-Verbal Cues.
Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022

Domain Adversarial Self-Supervised Speech Representation Learning for Improving Unknown Domain Downstream Tasks.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Interactive Co-Learning with Cross-Modal Transformer for Audio-Visual Emotion Recognition.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Strategies to Improve Robustness of Target Speech Extraction to Enrollment Variations.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Dialogue Acts Aided Important Utterance Detection Based on Multiparty and Multimodal Information.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Predicting VQVAE-based Character Acting Style from Quotation-Annotated Text for Audiobook Speech Synthesis.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

End-to-End Joint Modeling of Conversation History-Dependent and Independent ASR Systems with Multi-History Training.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Speaker consistency loss and step-wise optimization for semi-supervised joint training of TTS and ASR using unpaired text data.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Fully Shareable Scene Text Recognition Modeling for Horizontal and Vertical Writing.
Proceedings of the 2022 IEEE International Conference on Image Processing, 2022

Hybrid RNN-T/Attention-Based Streaming ASR with Triggered Chunkwise Attention and Dual Internal Language Model Integration.
Proceedings of the IEEE International Conference on Acoustics, 2022

Customer Satisfaction Estimation Using Unsupervised Representation Learning with Multi-Format Prediction Loss.
Proceedings of the IEEE International Conference on Acoustics, 2022

Multi-Perspective Document Revision.
Proceedings of the 29th International Conference on Computational Linguistics, 2022

2021
Hierarchical Latent Words Language Models for Automatic Speech Recognition.
J. Inf. Process., 2021

Neural candidate-aware language models for speech recognition.
Comput. Speech Lang., 2021

Audiobook Speech Synthesis Conditioned by Cross-Sentence Context-Aware Word Embeddings.
Proceedings of the 11th ISCA Speech Synthesis Workshop, 2021

Large-Context Conversational Representation Learning: Self-Supervised Learning For Conversational Documents.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Utilizing Resource-Rich Language Datasets for End-to-End Scene Text Recognition in Resource-Poor Languages.
Proceedings of the MMAsia '21: ACM Multimedia Asia, Gold Coast, Australia, December 1, 2021

End-to-End Rich Transcription-Style Automatic Speech Recognition with Semi-Supervised Learning.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Cross-Modal Transformer-Based Neural Correction Models for Automatic Speech Recognition.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Streaming End-to-End Speech Recognition for Hybrid RNN-T/Attention Architecture.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Unified Autoregressive Modeling for Joint End-to-End Multi-Talker Overlapped Speech Recognition and Speaker Attribute Estimation.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Enrollment-Less Training for Personalized Voice Activity Detection.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Zero-Shot Joint Modeling of Multiple Spoken-Text-Style Conversion Tasks Using Switching Tokens.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Simpleflat: A Simple Whole-Network Pre-Training Approach for RNN Transducer-Based End-to-End Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2021

Hierarchical Transformer-Based Large-Context End-To-End ASR with Large-Context Knowledge Distillation.
Proceedings of the IEEE International Conference on Acoustics, 2021

Audio-Visual Speech Separation Using Cross-Modal Correspondence Loss.
Proceedings of the IEEE International Conference on Acoustics, 2021

MAPGN: Masked Pointer-Generator Network for Sequence-to-Sequence Pre-Training.
Proceedings of the IEEE International Conference on Acoustics, 2021

Speech Emotion Recognition Based on Listener Adaptive Models.
Proceedings of the IEEE International Conference on Acoustics, 2021

Hierarchical Knowledge Distillation for Dialogue Sequence Labeling.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

2020
Customer Satisfaction Estimation in Contact Center Calls Based on a Hierarchical Multi-Task Model.
IEEE ACM Trans. Audio Speech Lang. Process., 2020

DNN-based Speech Synthesis Using Abundant Tags of Spontaneous Speech Corpus.
Proceedings of The 12th Language Resources and Evaluation Conference, 2020

Generating Responses that Reflect Meta Information in User-Generated Question Answer Pairs.
Proceedings of The 12th Language Resources and Evaluation Conference, 2020

Parallel Corpus for Japanese Spoken-to-Written Style Conversion.
Proceedings of The 12th Language Resources and Evaluation Conference, 2020

Investigating Effective Additional Contextual Factors in DNN-Based Spontaneous Speech Synthesis.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Unsupervised Domain Adaptation for Dialogue Sequence Labeling Based on Hierarchical Adversarial Training.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Self-Distillation for Improving CTC-Transformer-Based ASR Systems.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Phoneme-to-Grapheme Conversion Based Large-Scale Pre-Training for End-to-End Automatic Speech Recognition.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

A Transformer-Based Audio Captioning Model with Keyword Estimation.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Memory Attentive Fusion: External Language Model Integration for Transformer-based Sequence-to-Sequence Model.
Proceedings of the 13th International Conference on Natural Language Generation, 2020

Distilling Attention Weights for CTC-Based ASR Systems.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Sequence-Level Consistency Training for Semi-Supervised End-to-End Automatic Speech Recognition.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Large-Context Pointer-Generator Networks for Spoken-to-Written Style Conversion.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Sequence-To-One Neural Networks for Japanese Dialect Speech Classification.
Proceedings of the 9th IEEE Global Conference on Consumer Electronics, 2020

Unsupervised Domain Adversarial Training in Angular Space for Facial Expression Recognition.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2020

End-to-End Automatic Speech Recognition with Deep Mutual Learning.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2020

Dialect-Aware Modeling for End-to-End Japanese Dialect Speech Recognition.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2020

2019
Viterbi Approximation of Latent Words Language Models for Automatic Speech Recognition.
J. Inf. Process., 2019

Latent Words Recurrent Neural Network Language Models for Automatic Speech Recognition.
IEICE Trans. Inf. Syst., 2019

Recurrent out-of-vocabulary word detection based on distribution of features.
Comput. Speech Lang., 2019

Does Speaking Training Application with Speech Recognition Motivate Junior High School Students in Actual Classroom? - A Case Study.
Proceedings of the 8th ISCA International Workshop on Speech and Language Technology in Education, 2019

A Joint End-to-End and DNN-HMM Hybrid Automatic Speech Recognition System with Transferring Sharable Knowledge.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Joint Maximization Decoder with Neural Converters for Fully Neural Network-Based Japanese Speech Recognition.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Improving Conversation-Context Language Models with Multiple Spoken Language Understanding Models.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

End-to-End Automatic Speech Recognition with a Reconstruction Criterion Using Speech-to-Text and Text-to-Speech Encoder-Decoders.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Speech Emotion Recognition Based on Multi-Label Emotion Existence Model.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Large Context End-to-end Automatic Speech Recognition via Extension of Hierarchical Recurrent Encoder-decoder Models.
Proceedings of the IEEE International Conference on Acoustics, 2019

Context-Aware Neural Voice Activity Detection Using Auxiliary Networks for Phoneme Recognition, Speech Enhancement and Acoustic Scene Classification.
Proceedings of the 27th European Signal Processing Conference, 2019

Generalized Large-Context Language Models Based on Forward-Backward Hierarchical Recurrent Encoder-Decoder Models.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Improving Speech-Based End-of-Turn Detection Via Cross-Modal Representation Learning with Punctuated Text Data.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Disfluency Detection Based on Speech-Aware Token-by-Token Sequence Labeling with BLSTM-CRFs and Attention Mechanisms.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Revisiting Dynamic Adjustment of Language Model Scaling Factor for Automatic Speech Recognition.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Can We Simulate Generative Process of Acoustic Modeling Data? Towards Data Restoration for Acoustic Modeling.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Urgent Voicemail Detection Focused on Long-term Temporal Variation.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Likability Estimation of Call-center Agents by Suppressing Annotator Variability.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

2018
Domain Adaptation Based on Mixture of Latent Words Language Models for Automatic Speech Recognition.
IEICE Trans. Inf. Syst., 2018

Neural Dialogue Context Online End-of-Turn Detection.
Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue, 2018

Neural Error Corrective Language Models for Automatic Speech Recognition.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Role Play Dialogue Aware Language Models Based on Conditional Hierarchical Recurrent Encoder-Decoder.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Automatic Question Detection from Acoustic and Phonetic Features Using Feature-wise Pre-training.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Neural Confnet Classification: Fully Neural Network Based Spoken Utterance Classification Using Word Confusion Networks.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Soft-Target Training with Ambiguous Emotional Utterances for DNN-Based Speech Emotion Classification.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Adversarial Training for Multi-task and Multi-lingual Joint Modeling of Utterance Intent Classification.
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31, 2018

Multi-task and Multi-lingual Joint Learning of Neural Lexical Utterance Classification based on Partially-shared Modeling.
Proceedings of the 27th International Conference on Computational Linguistics, 2018

Neural Speech-to-Text Language Models for Rescoring Hypotheses of DNN-HMM Hybrid Automatic Speech Recognition Systems.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018

Progressive Neural Network-based Knowledge Transfer in Acoustic Models.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018

Online Call Scene Segmentation of Contact Center Dialogues based on Role Aware Hierarchical LSTM-RNNs.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018

Relevant Phonetic-aware Neural Acoustic Models using Native English and Japanese Speech for Japanese-English Automatic Speech Recognition.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018

2017
Parallel Hierarchical Attention Networks with Shared Memory Reader for Multi-Stream Conversational Document Classification.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Online End-of-Turn Detection from Speech Based on Stacked Time-Asynchronous Sequential Networks.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Prosody Aware Word-Level Encoder Based on BLSTM-RNNs for DNN-Based Speech Synthesis.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Hierarchical LSTMs with Joint Learning for Estimating Customer Satisfaction from Contact Center Calls.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Improving Neural Text Normalization with Data Augmentation at Character- and Morphological Levels.
Proceedings of the Eighth International Joint Conference on Natural Language Processing, 2017

Hyperspherical Query Likelihood Models with Word Embeddings.
Proceedings of the Eighth International Joint Conference on Natural Language Processing, 2017

Parallel phonetically aware DNNs and LSTM-RNNS for frame-by-frame discriminative modeling of spoken language identification.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Domain adaptation of DNN acoustic models using knowledge distillation.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Joint unsupervised adaptation of n-gram and RNN language models via LDA-based hybrid mixture modeling.
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

2016
N-gram Approximation of Latent Words Language Models for Domain Robust Automatic Speech Recognition.
IEICE Trans. Inf. Syst., 2016

Investigation of Combining Various Major Language Model Technologies including Data Expansion and Adaptation.
IEICE Trans. Inf. Syst., 2016

Mechanism and Control of Whole-Body Electro-Hydrostatic Actuator Driven Humanoid Robot Hydra.
Proceedings of the International Symposium on Experimental Robotics, 2016

Language Identification Based on Generative Modeling of Posteriorgram Sequences Extracted from Frame-by-Frame DNNs and LSTM-RNNs.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Recurrent Out-of-Vocabulary Word Detection Using Distribution of Features.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Enhancement of mechanical strength, computational power, and heat management for fieldwork humanoid robots.
Proceedings of the 16th IEEE-RAS International Conference on Humanoid Robots, 2016

2015
Discourse Relation Recognition by Comparing Various Units of Sentence Expression with Recursive Neural Network.
Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation, 2015

Latent words recurrent neural network language models.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Combinations of various language model technologies including data expansion and adaptation in spontaneous speech recognition.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Training data selection for acoustic modeling via submodular optimization of joint kullback-leibler divergence.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Hierarchical Latent Words Language Models for Robust Modeling to Out-Of Domain Tasks.
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015

2014
Mixture of latent words language models for domain adaptation.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Read and spontaneous speech classification based on variance of GMM supervectors.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Role play dialogue topic model for language model adaptation in multi-party conversation speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2014

2013
Viterbi decoding for latent words language models using gibbs sampling.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Use of latent words language models in ASR: A sampling-based implementation.
Proceedings of the IEEE International Conference on Acoustics, 2013

2011
Language Model Expansion Using Webdata for Spoken Document Retrieval.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Training a Language Model Using Webdata for Large Vocabulary Japanese Spontaneous Speech Recognition.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

2010
Document expansion using relevant web documents for spoken document retrieval.
Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering, 2010


  Loading...