Kartik Audhkhasi

Proceedings of the IEEE International Conference on Acoustics, 2024

2023

O-1: Self-training with Oracle and 1-best Hypothesis.

[BibT_eX]

[DOI]

Murali Karthick Baskar

Andrew Rosenberg

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Robust Knowledge Distillation from RNN-T Models with Noisy Training Labels Using Full-Sum Loss.

[BibT_eX]

[DOI]

Mohammad Zeineldeen

Murali Karthick Baskar

Proceedings of the IEEE International Conference on Acoustics, 2023

Large-Scale Language Model Rescoring on Long-Form Data.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Modular Conformer Training for Flexible End-to-End ASR.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

2022

Modular Hybrid Autoregressive Transducer.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Analysis of Self-Attention Head Diversity for Conformer-based Automatic Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Federated Learning for Affective Computing Tasks.

[BibT_eX]

[DOI]

Proceedings of the 10th International Conference on Affective Computing and Intelligent Interaction, 2022

2021

Regularizing Word Segmentation by Creating Misspellings.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

AVLnet: Learning Audio-Visual Language Representations from Instructional Videos.

[BibT_eX]

[DOI]

Rogério Schmidt Feris

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Mixture Model Attention: Flexible Streaming and Non-Streaming Automatic Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Convolutional Dropout and Wordpiece Augmentation for End-to-End Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

2020

Noise can speed backpropagation learning and deep bidirectional pretraining.

[BibT_eX]

[DOI]

Neural Networks, 2020

AVLnet: Learning Audio-Visual Language Representations from Instructional Videos.

[BibT_eX]

[DOI]

CoRR, 2020

Single headed attention based sequence-to-sequence model for state-of-the-art results on Switchboard-300.

[BibT_eX]

[DOI]

CoRR, 2020

Single Headed Attention Based Sequence-to-Sequence Model for State-of-the-Art Results on Switchboard.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

End-to-End Spoken Language Understanding Without Full Transcripts.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Transliteration Based Data Augmentation for Training Multilingual ASR Acoustic Models in Low Resource Settings.

[BibT_eX]

[DOI]

Samuel Thomas

Brian Kingsbury

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Alignment-Length Synchronous Decoding for RNN Transducer.

[BibT_eX]

[DOI]

George Saon

Zoltán Tüske

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Leveraging Unpaired Text Data for Training End-To-End Speech-to-Intent Systems.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

Advancing Sequence-to-Sequence Based Speech Recognition.

[BibT_eX]

[DOI]

Zoltán Tüske

George Saon

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Detection and Recovery of OOVs for Improved English Broadcast News Captioning.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Challenging the Boundaries of Speech Recognition: The MALACH Corpus.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Multi-Task CTC Training with Auxiliary Feature Reconstruction for End-to-End Speech Recognition.

[BibT_eX]

[DOI]

Gakuto Kurata

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Guiding CTC Posterior Spike Timings for Improved Posterior Fusion and Knowledge Distillation.

[BibT_eX]

[DOI]

Gakuto Kurata

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Forget a Bit to Learn Better: Soft Forgetting for CTC-Based Automatic Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Acoustically Grounded Word Embeddings for Improved Acoustics-to-word Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Sequence Noise Injected Training for End-to-end Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Grounding Spoken Words in Unlabeled Video.

[BibT_eX]

[DOI]

Rogério Schmidt Feris

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2019

Simplified LSTMS for Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

2018

Modeling Multiple Time Series Annotations as Noisy Distortions of the Ground Truth: An Expectation-Maximization Approach.

[BibT_eX]

[DOI]

IEEE Trans. Affect. Comput., 2018

Improved Knowledge Distillation from Bi-Directional to Uni-Directional LSTM CTC for End-to-End Speech Recognition.

[BibT_eX]

[DOI]

Gakuto Kurata

Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Joint Modeling of Accents and Acoustics for Multi-Accent Speech Recognition.

[BibT_eX]

[DOI]

Mark Hasegawa-Johnson

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Whole Sentence Neural Language Models.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Building Competitive Direct Acoustics-to-Word Models for English Conversational Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017

End-to-End ASR-Free Keyword Search From Speech.

[BibT_eX]

[DOI]

IEEE J. Sel. Top. Signal Process., 2017

Recent progress in deep end-to-end models for spoken language processing.

[BibT_eX]

[DOI]

IBM J. Res. Dev., 2017

English Conversational Telephone Speech Recognition by Humans and Machines.

[BibT_eX]

[DOI]

Dimitrios Dimitriadis

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Direct Acoustics-to-Word Models for English Conversational Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

End-to-end speech recognition and keyword search on low-resource languages.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Knowledge distillation across ensembles of multilingual models for low-resource languages.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

2016

Noise-enhanced convolutional neural networks.

[BibT_eX]

[DOI]

Neural Networks, 2016

Detecting paralinguistic events in audio stream using context in features and probabilistic decisions.

[BibT_eX]

[DOI]

Sungbok Lee

Comput. Speech Lang., 2016

Invariant Representations for Noisy Speech Recognition.

[BibT_eX]

[DOI]

CoRR, 2016

Multilingual Data Selection for Low Resource Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Efficient one-vs-one kernel ridge regression for speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Semantic word embedding neural network language models for automatic speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

2015

Diverse Embedding Neural Network Language Models.

[BibT_eX]

[DOI]

Proceedings of the 3rd International Conference on Learning Representations, 2015

A mixture of experts approach towards intelligibility classification of pathological speech.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Multilingual representations for low resource speech recognition and keyword search.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

2014

Theoretical Analysis of Diversity in an Ensemble of Automatic Speech Recognition Systems.

[BibT_eX]

[DOI]

Andreas M. Zavou

IEEE ACM Trans. Audio Speech Lang. Process., 2014

Training ensemble of diverse classifiers on feature subsets.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2014

Semi-supervised term-weighted value rescoring for keyword search.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2014

Fusion of diverse denoising systems for robust automatic speech recognition.

[BibT_eX]

[DOI]

Naveen Kumar

Maarten Van Segbroeck

Peter Drotár

Proceedings of the IEEE International Conference on Acoustics, 2014

2013

A Globally-Variant Locally-Constant Model for Fusion of Labels from Multiple Diverse Experts without Using Reference Labels.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2013

Generalized Ambiguity Decomposition for Understanding Ensemble Diversity.

[BibT_eX]

[DOI]

CoRR, 2013

Which ASR should I choose for my dialogue system?

[BibT_eX]

[DOI]

Anton Leuski

David R. Traum

Proceedings of the SIGDIAL 2013 Conference, 2013

Paralinguistic event detection from speech using probabilistic time-series smoothing and masking.

[BibT_eX]

[DOI]

Sungbok Lee

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Classifying language-related developmental disorders from speech cues: the promise and the potential confounds.

[BibT_eX]

[DOI]

Maarten Van Segbroeck

Ming Li

Sungbok Lee

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Empirical link between hypothesis diversity and fusion performance in an ensemble of automatic speech recognition systems.

[BibT_eX]

[DOI]

Andreas M. Zavou

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Noisy hidden Markov models for speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 2013 International Joint Conference on Neural Networks, 2013

Noise benefits in backpropagation and deep bidirectional pre-training.

[BibT_eX]

[DOI]

Proceedings of the 2013 International Joint Conference on Neural Networks, 2013

Joint training of interpolated exponential n-gram models.

[BibT_eX]

[DOI]

Paul Vozila

Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013

2012

A reranking approach for recognition and classification of speech input in conversational dialogue systems.

[BibT_eX]

[DOI]

Fabrizio Morbini

Ron Artstein

Maarten Van Segbroeck

Kenji Sagae

David R. Traum

Proceedings of the 2012 IEEE Spoken Language Technology Workshop (SLT), 2012

Speaker Personality Classification Using Systems Based on Acoustic-Lexical Cues and an Optimal Tree-Structured Bayesian Network.

[BibT_eX]

[DOI]

Angeliki Metallinou

Ming Li

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Creating ensemble of diverse maximum entropy models.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Analyzing quality of crowd-sourced speech transcriptions of noisy audio for acoustic model adaptation.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

2011

Reliability-Weighted Acoustic Model Adaptation Using Crowd Sourced Transcriptions.

[BibT_eX]

[DOI]

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Emotion classification from speech using evaluator reliability-weighted combination of ranked lists.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2011

Accurate transcription of broadcast news speech using multiple noisy transcribers and unsupervised reliability metrics.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2011

2010

Automatic speech recognition system channel modeling.

[BibT_eX]

[DOI]

Qun Feng Tan

Emil Ettelaie

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Data-dependent evaluator modeling and its application to emotional valence classification from speech.

[BibT_eX]

[DOI]

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

2009

Automatic evaluation of spoken english fluency.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2009

Formant-based technique for automatic filled-pause detection in spontaneous spoken english.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2009

Lattice-based lexical cues for word fragment detection in conversational speech.

[BibT_eX]

[DOI]

Proceedings of the 2009 IEEE Workshop on Automatic Speech Recognition & Understanding, 2009

2007

Keyword Search using Modified Minimum Edit Distance Measure.

[BibT_eX]

[DOI]