Atsushi Ando

Orcid: 0000-0002-3971-0654

According to our database1, Atsushi Ando authored at least 43 papers between 2015 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Speech Rhythm-Based Speaker Embeddings Extraction from Phonemes and Phoneme Duration for Multi-Speaker Speech Synthesis.
IEICE Trans. Inf. Syst., January, 2024

Guided Speaker Embedding.
CoRR, 2024

Mamba-based Segmentation Model for Speaker Diarization.
CoRR, 2024

Recursive Attentive Pooling for Extracting Speaker Embeddings from Multi-Speaker Recordings.
CoRR, 2024

SpeakerBeam-SS: Real-time Target Speaker Extraction with Lightweight Conv-TasNet and State Space Modeling.
CoRR, 2024

Factor-Conditioned Speaking-Style Captioning.
CoRR, 2024

NTT Speaker Diarization System for Chime-7: Multi-Domain, Multi-Microphone end-to-end and Vector Clustering Diarization.
Proceedings of the IEEE International Conference on Acoustics, 2024

2023
Multi-region CNN-Transformer for Micro-gesture Recognition in Face and Upper Body.
Proceedings of the ACM Multimedia Asia 2023, 2023

End-to-End Joint Target and Non-Target Speakers ASR.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Joint Autoregressive Modeling of End-to-End Multi-Talker Overlapped Speech Recognition and Utterance-level Timestamp Prediction.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

OnDA-DETR: Online Domain Adaptation for Detection Transformers with Self-Training Framework.
Proceedings of the IEEE International Conference on Image Processing, 2023

Adversarial Finetuning with Latent Representation Constraint to Mitigate Accuracy-Robustness Tradeoff.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

2022
Knowledge Transferred Fine-Tuning: Convolutional Neural Network Is Born Again With Anti-Aliasing Even in Data-Limited Situations.
IEEE Access, 2022

On the Use of Modality-Specific Large-Scale Pre-Trained Encoders for Multimodal Sentiment Analysis.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Interactive Co-Learning with Cross-Modal Transformer for Audio-Visual Emotion Recognition.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

End-to-End Joint Modeling of Conversation History-Dependent and Independent ASR Systems with Multi-History Training.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Speaker consistency loss and step-wise optimization for semi-supervised joint training of TTS and ASR using unpaired text data.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Hybrid RNN-T/Attention-Based Streaming ASR with Triggered Chunkwise Attention and Dual Internal Language Model Integration.
Proceedings of the IEEE International Conference on Acoustics, 2022

Customer Satisfaction Estimation Using Unsupervised Representation Learning with Multi-Format Prediction Loss.
Proceedings of the IEEE International Conference on Acoustics, 2022

2021
Speech Emotion Recognition in Real Environments using Characteristics of Emotional Expression and Perception.
PhD thesis, 2021

Streaming End-to-End Speech Recognition for Hybrid RNN-T/Attention Architecture.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Phoneme Duration Modeling Using Speech Rhythm-Based Speaker Embeddings for Multi-Speaker Speech Synthesis.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Simpleflat: A Simple Whole-Network Pre-Training Approach for RNN Transducer-Based End-to-End Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2021

Speech Emotion Recognition Based on Listener Adaptive Models.
Proceedings of the IEEE International Conference on Acoustics, 2021

2020
Customer Satisfaction Estimation in Contact Center Calls Based on a Hierarchical Multi-Task Model.
IEEE ACM Trans. Audio Speech Lang. Process., 2020

Sequence-Level Consistency Training for Semi-Supervised End-to-End Automatic Speech Recognition.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Speaker Age Estimation Using Age-Dependent Insensitive Loss.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2020

2019
Improving Conversation-Context Language Models with Multiple Spoken Language Understanding Models.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Speech Emotion Recognition Based on Multi-Label Emotion Existence Model.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Does the Lombard Effect Improve Emotional Communication in Noise? - Analysis of Emotional Speech Acted in Noise.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Improving Speech-Based End-of-Turn Detection Via Cross-Modal Representation Learning with Punctuated Text Data.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Urgent Voicemail Detection Focused on Long-term Temporal Variation.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Likability Estimation of Call-center Agents by Suppressing Annotator Variability.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

2018
Neural Dialogue Context Online End-of-Turn Detection.
Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue, 2018

Role Play Dialogue Aware Language Models Based on Conditional Hierarchical Recurrent Encoder-Decoder.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Automatic Question Detection from Acoustic and Phonetic Features Using Feature-wise Pre-training.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Soft-Target Training with Ambiguous Emotional Utterances for DNN-Based Speech Emotion Classification.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Online Call Scene Segmentation of Contact Center Dialogues based on Role Aware Hierarchical LSTM-RNNs.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018

2017
Interaction and Transition Model for Speech Emotion Recognition in Dialogue.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Hierarchical LSTMs with Joint Learning for Estimating Customer Satisfaction from Contact Center Calls.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Robust children and adults speech identification and confidence measure based on DNN posteriorgram.
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

2016
Speaker recognition in duration-mismatched condition using bootstrapped i-vectors.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016

2015
Agreement and disagreement utterance detection in conversational speech by extracting and integrating local features.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015


  Loading...