CoRR, 2024

M-BEST-RQ: A Multi-Channel Speech Foundation Model for Smart Glasses.

[DOI]

CoRR, 2024

Faster Speech-LLaMA Inference with Multi-token Prediction.

[DOI]

CoRR, 2024

Token-Weighted RNN-T For Learning From Flawed Data.

[DOI]

Wei Zhou

Ozlem Kalinli

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

DOC-RAG: ASR Language Model Personalization with Domain-Distributed Co-occurrence Retrieval Augmentation.

[DOI]

Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

2023

Towards Selection of Text-to-speech Data to Augment ASR Training.

[DOI]

CoRR, 2023

Text Generation with Speech Synthesis for ASR Data Augmentation.

[DOI]

Ethan Campbell-Taylor

Jessie Salas

Irina-Elena Veliche

Xi Chen

CoRR, 2023

Improving fast-slow Encoder based Transducer with Streaming Deliberation.

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

PersonaLM: Language Model Personalization via Domain-distributed Span Aggregated K-Nearest N-gram Retrieval Augmentation.

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

A Token-Wise Beam Search Algorithm for RNN-T.

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

2022

Scaling ASR Improves Zero and Few Shot Learning.

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

2021

N-HANS: A neural network-based toolkit for in-the-wild audio enhancement.

[DOI]

Shuo Liu

Emilia Parada-Cabaleiro

Multim. Tools Appl., 2021

Alignment Restricted Streaming Recurrent Neural Network Transducer.

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Deep Shallow Fusion for RNN-T Personalization.

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

A Two-Stage Approach to Speech Bandwidth Extension.

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Contextualized Streaming End-to-End Speech Recognition with Trie-Based Deep Biasing and Shallow Fusion.

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

A Time-Domain Convolutional Recurrent Network for Packet Loss Concealment.

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

2020

Neural Network Supervision: Notes on Loss Functions, Labels and Confidence Estimation.

[DOI]

PhD thesis, 2020

Analysis of loss functions for fast single-class classification.

[DOI]

Knowl. Inf. Syst., 2020

Contextual RNN-T For Open Domain ASR.

[DOI]

CoRR, 2020

Contextual RNN-T for Open Domain ASR.

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

2019

N-HANS: Introducing the Augsburg Neuro-Holistic Audio-eNhancement System.

[DOI]

Shuo Liu

CoRR, 2019

Single-Channel Speech Separation with Auxiliary Speaker Embeddings.

[DOI]

Shuo Liu

Andreas Triantafyllopoulos

CoRR, 2019

Towards Robust Speech Emotion Recognition Using Deep Residual Networks for Speech Enhancement.

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

A Walkthrough for the Principle of Logit Separation.

[DOI]

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019

2018

Scaling Speech Enhancement in Unseen Environments with Noise Embeddings.

[DOI]

Jing Han

CoRR, 2018

Weakly Supervised One-Shot Detection with Attention Siamese Networks.

[DOI]

CoRR, 2018

Calibrated Prediction Intervals for Neural Network Regressors.

[DOI]

Nicholas Cummins

IEEE Access, 2018

Fast Single-Class Classification and the Principle of Logit Separation.

[DOI]

Proceedings of the IEEE International Conference on Data Mining, 2018

Deep learning for multisensorial and multimodal interaction.

[DOI]

Proceedings of the Handbook of Multimodal-Multisensor Interfaces: Foundations, User Modeling, and Common Modality Combinations, 2018

2017

On Definable Skolem Functions in Weakly O-Minimal nonvaluational Structures.

[DOI]

Pantelis E. Eleftheriou

Assaf Hasson

J. Symb. Log., 2017

End-to-end learning for dimensional emotion recognition from physiological signals.

[DOI]

Proceedings of the 2017 IEEE International Conference on Multimedia and Expo, 2017

CAST a database: Rapid targeted large-scale big data acquisition via small-world modelling of social media platforms.

[DOI]

Proceedings of the Seventh International Conference on Affective Computing and Intelligent Interaction, 2017

Tunable Sensitivity to Large Errors in Neural Network Training.

[DOI]

Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 2017

2016

Convolutional Neural Networks with Data Augmentation for Classifying Speakers' Native Language.

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Convolutional RNN: An enhanced model for extracting features from sequential data.

[DOI]