2025
VisualSpeech: Enhance Prosody with Visual Context in TTS.
CoRR, January, 2025
2024
What happens to diffusion model likelihood when your model is conditional?
CoRR, 2024
Foundation Models for Music: A Survey.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2024
Learning from memory-based models.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024
Training Data Augmentation for Dysarthric Automatic Speech Recognition by Text-to-Dysarthric-Speech Synthesis.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024
Self-Train Before You Transcribe.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024
How Much Context Does My Attention-Based ASR System Need?
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024
MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the Twelfth International Conference on Learning Representations, 2024
Energy-Based Models for Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2024
Non-Intrusive Speech Intelligibility Prediction for Hearing-Impaired Users Using Intermediate ASR Features and Human Memory Models.
Proceedings of the IEEE International Conference on Acoustics, 2024
2023
MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2023
MARBLE: Music Audio Representation Benchmark for Universal Evaluation.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023
On the Effectiveness of Speech Self-Supervised Learning for Music.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the 24th International Society for Music Information Retrieval Conference, 2023
Speak & Improve: L2 English Speaking Practice Tool.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Leveraging Cross-Utterance Context For ASR Decoding.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Adapting Pretrained Models for Adult to Child Voice Conversion.
Proceedings of the 31st European Signal Processing Conference, 2023
2022
Increasing Context for Estimating Confidence Scores in Automatic Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2022
MAP-Music2Vec: A Simple and Effective Baseline for Self-Supervised Music Audio Representation Learning.
,
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2022
HERB: Measuring Hierarchical Regional Bias in Pre-trained Language Models.
Proceedings of the Findings of the Association for Computational Linguistics: AACL-IJCNLP 2022, 2022
2021
Approximate Fixed-Points in Recurrent Neural Networks.
CoRR, 2021
Continuous representations of intents for dialogue systems.
CoRR, 2021
2020
Confidence Estimation for Black Box Automatic Speech Recognition Systems Using Lattice Recurrent Neural Networks.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
2019
Exploiting Future Word Contexts in Neural Network Language Models for Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2019
Bi-directional Lattice Recurrent Neural Networks for Confidence Estimation.
Proceedings of the IEEE International Conference on Acoustics, 2019
Surprise Languages: Rapid-Response Cross-Language IR.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the 9th International Workshop on Evaluating Information Access co-located with the 14th NTCIR Conference on the Evaluation of Information Access Technologies (NTCIR 2019), 2019
Experiments with Cross-Language Speech Retrieval for Lower-Resource Languages.
Proceedings of the Information Retrieval Technology, 2019
2018
Improving Interpretability and Regularization in Deep Learning.
IEEE ACM Trans. Audio Speech Lang. Process., 2018
Confidence Estimation and Deletion Prediction Using Bidirectional Recurrent Neural Networks.
CoRR, 2018
Sequence Teacher-Student Training of Acoustic Models for Automatic Free Speaking Language Assessment.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018
Confidence Estimation and Deletion Prediction Using Bidirectional Recurrent Neural Networks.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018
Automatic Speech Recognition System Development in the "Wild".
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Impact of ASR Performance on Free Speaking Language Assessment.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Active Memory Networks for Language Modeling.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Phonetic and Graphemic Systems for Multi-Genre Broadcast Transcription.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018
2017
Future Word Contexts in Neural Network Language Models.
CoRR, 2017
Low-Resource Speech Recognition and Keyword-Spotting.
Proceedings of the Speech and Computer - 19th International Conference, 2017
An attention based model for off-topic spontaneous spoken response detection: An Initial Study.
Proceedings of the 7th ISCA International Workshop on Speech and Language Technology in Education, 2017
Use of Graphemic Lexicons for Spoken Language Assessment.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017
Investigating Bidirectional Recurrent Neural Network Language Models for Speech Recognition.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017
Stimulated training for automatic speech recognition and keyword search in limited resource conditions.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017
Morph-to-word transduction for accurate and efficient automatic speech recognition and keyword search.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017
Recurrent neural network language models for keyword search.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017
Future word contexts in neural network language models.
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017
Incorporating Uncertainty into Deep Learning for Spoken Language Assessment.
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017
2016
Log-Linear System Combination Using Structured Support Vector Machines.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016
Multi-Language Neural Network Language Models.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016
System combination with log-linear models.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016
2015
Joint decoding of tandem and hybrid systems for improved keyword spotting on low resource languages.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015
Improving speech recognition and keyword search for low resource languages using web data.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015
A language space representation for speech recognition.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015
Unicode-based graphemic systems for limited resource languages.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015
Structured discriminative models using deep neural-network features.
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015
Multilingual representations for low resource speech recognition and keyword search.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015
2014
Speech recognition and keyword spotting for low-resource languages: Babel project research at CUED.
Proceedings of the 4th Workshop on Spoken Language Technologies for Under-resourced Languages, 2014
Combining tandem and hybrid systems for improved speech recognition and keyword spotting on low resource languages.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014
Data augmentation for low resource languages.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014
Language independent and unsupervised acoustic models for speech recognition and keyword spotting.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014
Investigation of unsupervised adaptation of DNN acoustic models with filter bank input.
Proceedings of the IEEE International Conference on Acoustics, 2014
2013
Efficient decoding with generative score-spaces using the expectation semiring.
Proceedings of the IEEE International Conference on Acoustics, 2013
2012
Structured discriminative models for speech recognition.
Proceedings of the 2012 Symposium on Machine Learning in Speech and Language Processing, 2012
Rapid Nonlinear Speaker Adaptation for Large-Vocabulary Continuous Speech Recognition.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012
Inference algorithms for generative score-spaces.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012
2011
Structured discriminative models for noise robust continuous speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2011
Derivative kernels for noise robust ASR.
Proceedings of the 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, 2011
2010
Structured Log Linear Models for Noise Robust Speech Recognition.
IEEE Signal Process. Lett., 2010
2009
Support vector machines for noise robust ASR.
Proceedings of the 2009 IEEE Workshop on Automatic Speech Recognition & Understanding, 2009
2007
Initial Experiments with Estonian Speech Recognition.
Proceedings of the 16th Nordic Conference of Computational Linguistics, 2007