Zakaria Aldeneh

Shinji Watanabe

Tatiana Likhomanenko

Barry-John Theobald

CoRR, 2024

Exploring Prediction Targets in Masked Pre-Training for Speech Foundation Models.

[BibT_eX]

[DOI]

Li-Wei Chen

Takuya Higuchi

He Bai

CoRR, 2024

Towards Automatic Assessment of Self-Supervised Speech Models using Rank.

[BibT_eX]

[DOI]

CoRR, 2024

dMel: Speech Tokenization made Simple.

[BibT_eX]

[DOI]

CoRR, 2024

Can you Remove the Downstream Model for Speaker Recognition with Self-Supervised Speech Features?

[BibT_eX]

[DOI]

Shinji Watanabe

Barry-John Theobald

CoRR, 2024

ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible recipes, self-supervised front-ends, and off-the-shelf models.

[BibT_eX]

[DOI]

Shinji Watanabe

CoRR, 2024

2023

You're Not You When You're Angry: Robust Emotion Features Emerge by Recognizing Speakers.

[BibT_eX]

[DOI]

IEEE Trans. Affect. Comput., 2023

Spatial LibriSpeech: An Augmented Dataset for Spatial Audio Learning.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Naturalistic Head Motion Generation from Speech.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

On the Role of LIP Articulation in Visual Speech Perception.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

2022

Towards a Perceptual Model for Estimating the Quality of Visual Speech.

[BibT_eX]

[DOI]

CoRR, 2022

2021

Learning Paralinguistic Features from Audiobooks through Style Voice Conversion.

[BibT_eX]

[DOI]

Matthew Perez

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

On The Role of Visual Cues in Audiovisual Speech Enhancement.

[BibT_eX]

[DOI]

Anushree Prasanna Kumar

Proceedings of the IEEE International Conference on Acoustics, 2021

2020

Robust Methods for the Automatic Quantification and Prediction of Affect in Spoken Interactions.

[BibT_eX]

[DOI]

PhD thesis, 2020

Self-supervised Learning of Visual Speech Features with Audiovisual Speech Enhancement.

[BibT_eX]

[DOI]

Anushree Prasanna Kumar

CoRR, 2020

Aphasic Speech Recognition Using a Mixture of Speech Intelligibility Experts.

[BibT_eX]

[DOI]

Matthew Perez

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

2019

Identifying Mood Episodes Using Dialogue Features from Clinical Interviews.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Controlling for Confounders in Multimodal Emotion Classification via Adversarial Learning.

[BibT_eX]

[DOI]

Mimansa Jaiswal

Proceedings of the International Conference on Multimodal Interaction, 2019

Muse-ing on the Impact of Utterance Ordering on Crowdsourced Emotion Annotations.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

2018

Improving End-of-Turn Detection in Spoken Dialogues by Detecting Speaker Intentions as a Secondary Task.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017

Discretized Continuous Speech Emotion Recognition with Multi-Task Deep Recurrent Neural Network.

[BibT_eX]

[DOI]

Duc Le

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Capturing Long-Term Temporal Dependencies with Convolutional Networks for Continuous Emotion Recognition.

[BibT_eX]

[DOI]

Soheil Khorram

Melvin G. McInnis

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Progressive Neural Networks for Transfer Learning in Emotion Recognition.

[BibT_eX]

[DOI]

John Gideon

Soheil Khorram

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Pooling acoustic and lexical features for the prediction of valence.

[BibT_eX]

[DOI]

Soheil Khorram

Proceedings of the 19th ACM International Conference on Multimodal Interaction, 2017

Using regional saliency for speech emotion recognition.

[BibT_eX]

[DOI]