2024
Efficient Streaming LLM for Speech Recognition.
CoRR, 2024
M-BEST-RQ: A Multi-Channel Speech Foundation Model for Smart Glasses.
,
,
,
,
,
,
,
,
,
,
CoRR, 2024
Faster Speech-LLaMA Inference with Multi-token Prediction.
CoRR, 2024
Token-Weighted RNN-T For Learning From Flawed Data.
Proceedings of the IEEE Spoken Language Technology Workshop, 2024
DOC-RAG: ASR Language Model Personalization with Domain-Distributed Co-occurrence Retrieval Augmentation.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024
2023
Towards Selection of Text-to-speech Data to Augment ASR Training.
CoRR, 2023
Text Generation with Speech Synthesis for ASR Data Augmentation.
,
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2023
Improving fast-slow Encoder based Transducer with Streaming Deliberation.
Proceedings of the IEEE International Conference on Acoustics, 2023
PersonaLM: Language Model Personalization via Domain-distributed Span Aggregated K-Nearest N-gram Retrieval Augmentation.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023
A Token-Wise Beam Search Algorithm for RNN-T.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023
2022
Scaling ASR Improves Zero and Few Shot Learning.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
2021
N-HANS: A neural network-based toolkit for in-the-wild audio enhancement.
Multim. Tools Appl., 2021
Alignment Restricted Streaming Recurrent Neural Network Transducer.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021
Deep Shallow Fusion for RNN-T Personalization.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021
A Two-Stage Approach to Speech Bandwidth Extension.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Contextualized Streaming End-to-End Speech Recognition with Trie-Based Deep Biasing and Shallow Fusion.
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
A Time-Domain Convolutional Recurrent Network for Packet Loss Concealment.
Proceedings of the IEEE International Conference on Acoustics, 2021
2020
Neural Network Supervision: Notes on Loss Functions, Labels and Confidence Estimation.
PhD thesis, 2020
Analysis of loss functions for fast single-class classification.
Knowl. Inf. Syst., 2020
Contextual RNN-T For Open Domain ASR.
CoRR, 2020
Contextual RNN-T for Open Domain ASR.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
2019
N-HANS: Introducing the Augsburg Neuro-Holistic Audio-eNhancement System.
CoRR, 2019
Single-Channel Speech Separation with Auxiliary Speaker Embeddings.
CoRR, 2019
Towards Robust Speech Emotion Recognition Using Deep Residual Networks for Speech Enhancement.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
A Walkthrough for the Principle of Logit Separation.
Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019
2018
Scaling Speech Enhancement in Unseen Environments with Noise Embeddings.
CoRR, 2018
Weakly Supervised One-Shot Detection with Attention Siamese Networks.
CoRR, 2018
Calibrated Prediction Intervals for Neural Network Regressors.
IEEE Access, 2018
Fast Single-Class Classification and the Principle of Logit Separation.
Proceedings of the IEEE International Conference on Data Mining, 2018
Deep learning for multisensorial and multimodal interaction.
Proceedings of the Handbook of Multimodal-Multisensor Interfaces: Foundations, User Modeling, and Common Modality Combinations, 2018
2017
On Definable Skolem Functions in Weakly O-Minimal nonvaluational Structures.
J. Symb. Log., 2017
End-to-end learning for dimensional emotion recognition from physiological signals.
Proceedings of the 2017 IEEE International Conference on Multimedia and Expo, 2017
CAST a database: Rapid targeted large-scale big data acquisition via small-world modelling of social media platforms.
Proceedings of the Seventh International Conference on Affective Computing and Intelligent Interaction, 2017
Tunable Sensitivity to Large Errors in Neural Network Training.
Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 2017
2016
Convolutional Neural Networks with Data Augmentation for Classifying Speakers' Native Language.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016
Convolutional RNN: An enhanced model for extracting features from sequential data.
Proceedings of the 2016 International Joint Conference on Neural Networks, 2016