Kartik Audhkhasi

Orcid: 0000-0002-2340-1144

  • University of Southern California, Los Angeles, USA

According to our database1, Kartik Audhkhasi authored at least 77 papers between 2007 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.



In proceedings 
PhD thesis 


Online presence:

On csauthors.net:


STAB: Speech Tokenizer Assessment Benchmark.
CoRR, 2024

Task Vector Algebra for ASR Models.
Proceedings of the IEEE International Conference on Acoustics, 2024

O-1: Self-training with Oracle and 1-best Hypothesis.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Robust Knowledge Distillation from RNN-T Models with Noisy Training Labels Using Full-Sum Loss.
Proceedings of the IEEE International Conference on Acoustics, 2023

Large-Scale Language Model Rescoring on Long-Form Data.
Proceedings of the IEEE International Conference on Acoustics, 2023

Modular Conformer Training for Flexible End-to-End ASR.
Proceedings of the IEEE International Conference on Acoustics, 2023

Modular Hybrid Autoregressive Transducer.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Analysis of Self-Attention Head Diversity for Conformer-based Automatic Speech Recognition.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Federated Learning for Affective Computing Tasks.
Proceedings of the 10th International Conference on Affective Computing and Intelligent Interaction, 2022

Regularizing Word Segmentation by Creating Misspellings.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

AVLnet: Learning Audio-Visual Language Representations from Instructional Videos.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Mixture Model Attention: Flexible Streaming and Non-Streaming Automatic Speech Recognition.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Convolutional Dropout and Wordpiece Augmentation for End-to-End Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2021

Noise can speed backpropagation learning and deep bidirectional pretraining.
Neural Networks, 2020

AVLnet: Learning Audio-Visual Language Representations from Instructional Videos.
CoRR, 2020

Single headed attention based sequence-to-sequence model for state-of-the-art results on Switchboard-300.
CoRR, 2020

Single Headed Attention Based Sequence-to-Sequence Model for State-of-the-Art Results on Switchboard.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

End-to-End Spoken Language Understanding Without Full Transcripts.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Transliteration Based Data Augmentation for Training Multilingual ASR Acoustic Models in Low Resource Settings.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Alignment-Length Synchronous Decoding for RNN Transducer.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Leveraging Unpaired Text Data for Training End-To-End Speech-to-Intent Systems.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Advancing Sequence-to-Sequence Based Speech Recognition.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Detection and Recovery of OOVs for Improved English Broadcast News Captioning.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Challenging the Boundaries of Speech Recognition: The MALACH Corpus.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Multi-Task CTC Training with Auxiliary Feature Reconstruction for End-to-End Speech Recognition.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Guiding CTC Posterior Spike Timings for Improved Posterior Fusion and Knowledge Distillation.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Forget a Bit to Learn Better: Soft Forgetting for CTC-Based Automatic Speech Recognition.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Acoustically Grounded Word Embeddings for Improved Acoustics-to-word Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2019

Sequence Noise Injected Training for End-to-end Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2019

Grounding Spoken Words in Unlabeled Video.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2019

Simplified LSTMS for Speech Recognition.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Modeling Multiple Time Series Annotations as Noisy Distortions of the Ground Truth: An Expectation-Maximization Approach.
IEEE Trans. Affect. Comput., 2018

Improved Knowledge Distillation from Bi-Directional to Uni-Directional LSTM CTC for End-to-End Speech Recognition.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Joint Modeling of Accents and Acoustics for Multi-Accent Speech Recognition.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Whole Sentence Neural Language Models.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Building Competitive Direct Acoustics-to-Word Models for English Conversational Speech Recognition.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

End-to-End ASR-Free Keyword Search From Speech.
IEEE J. Sel. Top. Signal Process., 2017

Recent progress in deep end-to-end models for spoken language processing.
IBM J. Res. Dev., 2017

English Conversational Telephone Speech Recognition by Humans and Machines.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Direct Acoustics-to-Word Models for English Conversational Speech Recognition.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

End-to-end speech recognition and keyword search on low-resource languages.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Knowledge distillation across ensembles of multilingual models for low-resource languages.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Noise-enhanced convolutional neural networks.
Neural Networks, 2016

Detecting paralinguistic events in audio stream using context in features and probabilistic decisions.
Comput. Speech Lang., 2016

Invariant Representations for Noisy Speech Recognition.
CoRR, 2016

Multilingual Data Selection for Low Resource Speech Recognition.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Efficient one-vs-one kernel ridge regression for speech recognition.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Semantic word embedding neural network language models for automatic speech recognition.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Diverse Embedding Neural Network Language Models.
Proceedings of the 3rd International Conference on Learning Representations, 2015

A mixture of experts approach towards intelligibility classification of pathological speech.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Multilingual representations for low resource speech recognition and keyword search.
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

Theoretical Analysis of Diversity in an Ensemble of Automatic Speech Recognition Systems.
IEEE ACM Trans. Audio Speech Lang. Process., 2014

Training ensemble of diverse classifiers on feature subsets.
Proceedings of the IEEE International Conference on Acoustics, 2014

Semi-supervised term-weighted value rescoring for keyword search.
Proceedings of the IEEE International Conference on Acoustics, 2014

Fusion of diverse denoising systems for robust automatic speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2014

A Globally-Variant Locally-Constant Model for Fusion of Labels from Multiple Diverse Experts without Using Reference Labels.
IEEE Trans. Pattern Anal. Mach. Intell., 2013

Generalized Ambiguity Decomposition for Understanding Ensemble Diversity.
CoRR, 2013

Which ASR should I choose for my dialogue system?
Proceedings of the SIGDIAL 2013 Conference, 2013

Paralinguistic event detection from speech using probabilistic time-series smoothing and masking.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Classifying language-related developmental disorders from speech cues: the promise and the potential confounds.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Empirical link between hypothesis diversity and fusion performance in an ensemble of automatic speech recognition systems.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Noisy hidden Markov models for speech recognition.
Proceedings of the 2013 International Joint Conference on Neural Networks, 2013

Noise benefits in backpropagation and deep bidirectional pre-training.
Proceedings of the 2013 International Joint Conference on Neural Networks, 2013

Joint training of interpolated exponential n-gram models.
Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013

A reranking approach for recognition and classification of speech input in conversational dialogue systems.
Proceedings of the 2012 IEEE Spoken Language Technology Workshop (SLT), 2012

Speaker Personality Classification Using Systems Based on Acoustic-Lexical Cues and an Optimal Tree-Structured Bayesian Network.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Creating ensemble of diverse maximum entropy models.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Analyzing quality of crowd-sourced speech transcriptions of noisy audio for acoustic model adaptation.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Reliability-Weighted Acoustic Model Adaptation Using Crowd Sourced Transcriptions.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Emotion classification from speech using evaluator reliability-weighted combination of ranked lists.
Proceedings of the IEEE International Conference on Acoustics, 2011

Accurate transcription of broadcast news speech using multiple noisy transcribers and unsupervised reliability metrics.
Proceedings of the IEEE International Conference on Acoustics, 2011

Automatic speech recognition system channel modeling.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Data-dependent evaluator modeling and its application to emotional valence classification from speech.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Automatic evaluation of spoken english fluency.
Proceedings of the IEEE International Conference on Acoustics, 2009

Formant-based technique for automatic filled-pause detection in spontaneous spoken english.
Proceedings of the IEEE International Conference on Acoustics, 2009

Lattice-based lexical cues for word fragment detection in conversational speech.
Proceedings of the 2009 IEEE Workshop on Automatic Speech Recognition & Understanding, 2009

Keyword Search using Modified Minimum Edit Distance Measure.
Proceedings of the IEEE International Conference on Acoustics, 2007
