Brian Kingsbury

Orcid: 0000-0002-1343-6837

According to our database1, Brian Kingsbury authored at least 153 papers between 1992 and 2024.

Collaborative distances:



In proceedings 
PhD thesis 




Exploring the limits of decoder-only models trained on public speech recognition corpora.
CoRR, 2024

Joint Unsupervised and Supervised Training for Automatic Speech Recognition via Bilevel Optimization.
Proceedings of the IEEE International Conference on Acoustics, 2024

Semi-Autoregressive Streaming ASR with Label Context.
Proceedings of the IEEE International Conference on Acoustics, 2024

Soft Random Sampling: A Theoretical and Empirical Analysis.
CoRR, 2023

High-Dimensional Smoothed Entropy Estimation via Dimensionality Reduction.
Proceedings of the IEEE International Symposium on Information Theory, 2023

ConvKT: Conversation-Level Knowledge Transfer for Context Aware End-to-End Spoken Language Understanding.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Comparison of Multilingual Self-Supervised and Weakly-Supervised Speech Pre-Training for Adaptation to Unseen Languages.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Improving RNN Transducer Acoustic Models for English Conversational Speech Recognition.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Multi-Speaker Data Augmentation for Improved end-to-end Automatic Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2023

Fine-Grained Textual Knowledge Transfer to Improve RNN Transducers for Speech Recognition and Understanding.
Proceedings of the IEEE International Conference on Acoustics, 2023

C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval.
Proceedings of the IEEE International Conference on Acoustics, 2023

A Stochastic Linearized Augmented Lagrangian Method for Decentralized Bilevel Optimization.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Tokenwise Contrastive Pretraining for Finer Speech-to-BERT Alignment in End-to-End Speech-to-Intent Systems.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

VQ-T: RNN Transducers using Vector-Quantized Prediction Network States.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Global RNN Transducer Models For Multi-dialect Speech Recognition.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Accelerating Inference and Language Model Fusion of Recurrent Neural Network Transducers via End-to-End 4-bit Quantization.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Improving Generalization of Deep Neural Network Acoustic Models with Length Perturbation and N-best Based Label Smoothing.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Integrating Text Inputs for Training and Adapting RNN Transducer ASR Models.
Proceedings of the IEEE International Conference on Acoustics, 2022

Towards Reducing the Need for Speech Training Data to Build Spoken Language Understanding Systems.
Proceedings of the IEEE International Conference on Acoustics, 2022

Towards End-to-End Integration of Dialog History for Improved Spoken Language Understanding.
Proceedings of the IEEE International Conference on Acoustics, 2022

Decentralized Bilevel Optimization for Personalized Client Learning.
Proceedings of the IEEE International Conference on Acoustics, 2022

Improving End-to-end Models for Set Prediction in Spoken Language Understanding.
Proceedings of the IEEE International Conference on Acoustics, 2022

A New Data Augmentation Method for Intent Classification Enhancement and its Application on Spoken Conversation Datasets.
Proceedings of the IEEE International Conference on Acoustics, 2022

Everything at Once - Multi-modal Fusion Transformer for Video Retrieval.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Asynchronous Decentralized Distributed Training of Acoustic Models.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Loss Landscape Dependent Self-Adjusting Learning Rates in Decentralized Stochastic Gradient Descent.
CoRR, 2021

On the Limit of English Conversational Speech Recognition.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

AVLnet: Learning Audio-Visual Language Representations from Instructional Videos.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Cascaded Multilingual Audio-Visual Learning from Videos.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Improving Customization of Neural Transducers by Mitigating Acoustic Mismatch of Synthesized Audio.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Integrating Dialog History into End-to-End Spoken Language Understanding Systems.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

4-Bit Quantization of LSTM-Based Speech Recognition Models.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Reducing Exposure Bias in Training Recurrent Neural Network Transducers.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Multimodal Clustering Networks for Self-supervised Learning from Unlabeled Videos.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Advancing RNN Transducer Technology for Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2021

End-to-End Spoken Language Understanding Using Transformer Networks and Self-Supervised Pre-Trained Features.
Proceedings of the IEEE International Conference on Acoustics, 2021

Federated Acoustic Modeling for Automatic Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2021

RNN Transducer Models for Spoken Language Understanding.
Proceedings of the IEEE International Conference on Acoustics, 2021

AVLnet: Learning Audio-Visual Language Representations from Instructional Videos.
CoRR, 2020

Single headed attention based sequence-to-sequence model for state-of-the-art results on Switchboard-300.
CoRR, 2020

Single Headed Attention Based Sequence-to-Sequence Model for State-of-the-Art Results on Switchboard.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Representation Based Meta-Learning for Few-Shot Spoken Intent Recognition.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

End-to-End Spoken Language Understanding Without Full Transcripts.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Transliteration Based Data Augmentation for Training Multilingual ASR Acoustic Models in Low Resource Settings.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Improving Efficiency in Large-Scale Decentralized Distributed Training.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Leveraging Unpaired Text Data for Training End-To-End Speech-to-Intent Systems.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Fast Training of Deep Neural Networks for Speech Recognition.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Kernel Approximation Methods for Speech Recognition.
J. Mach. Learn. Res., 2019

A Highly Efficient Distributed Deep Learning System for Automatic Speech Recognition.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Challenging the Boundaries of Speech Recognition: The MALACH Corpus.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Forget a Bit to Learn Better: Soft Forgetting for CTC-Based Automatic Speech Recognition.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Estimating Information Flow in Deep Neural Networks.
Proceedings of the 36th International Conference on Machine Learning, 2019

Beyond Backprop: Online Alternating Minimization with Auxiliary Variables.
Proceedings of the 36th International Conference on Machine Learning, 2019

Distributed Deep Learning Strategies for Automatic Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2019

English Broadcast News Speech Recognition by Humans and Machines.
Proceedings of the IEEE International Conference on Acoustics, 2019

Sequence Noise Injected Training for End-to-end Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2019

Simplified LSTMS for Speech Recognition.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Understanding Unequal Gender Classification Accuracy from Face Images.
CoRR, 2018

Estimating Information Flow in Neural Networks.
CoRR, 2018

Beyond Backprop: Alternating Minimization with co-Activation Memory.
CoRR, 2018

Building Competitive Direct Acoustics-to-Word Models for English Conversational Speech Recognition.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Parallel Deep Neural Network Training for Big Data on Blue Gene/Q.
IEEE Trans. Parallel Distributed Syst., 2017

Introduction to the Special Issue on End-to-End Speech and Language Processing.
IEEE J. Sel. Top. Signal Process., 2017

End-to-End ASR-Free Keyword Search From Speech.
IEEE J. Sel. Top. Signal Process., 2017

Accelerating deep neural network learning for speech recognition on a cluster of GPUs.
Proceedings of the Machine Learning on HPC Environments, 2017

Network architectures for multilingual speech representation learning.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Knowledge distillation across ensembles of multilingual models for low-resource languages.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Multilingual Data Selection for Low Resource Speech Recognition.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Improved Neural Network Initialization by Grouping Context-Dependent Targets for Acoustic Modeling.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Very deep multilingual convolutional neural networks for LVCSR.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Compact kernel models for acoustic modeling via random feature selection.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

A comparison between deep neural nets and kernel acoustic models for speech recognition.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Efficient one-vs-one kernel ridge regression for speech recognition.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Data Augmentation for Deep Neural Network Acoustic Modeling.
IEEE ACM Trans. Audio Speech Lang. Process., 2015

Deep Convolutional Neural Networks for Large-scale Speech Tasks.
Neural Networks, 2015

A multi-region deep neural network model in speech recognition.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Order-free spoken term detection.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Data augmentation for deep convolutional neural network acoustic modeling.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Multilingual representations for low resource speech recognition and keyword search.
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

Automatic Speech Recognition.
Proceedings of the Natural Language Processing of Semitic Languages, 2014

How to Scale Up Kernel Methods to Be As Good As Deep Neural Nets.
CoRR, 2014

Deep scattering spectra with deep neural networks for LVCSR tasks.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Parallel deep neural network training for LVCSR tasks using blue gene/Q.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Recent improvements in neural network acoustic modeling for LVCSR in low resource languages.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Improving deep neural network acoustic modeling for audio corpus indexing under the IARPA babel program.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Improvements to filterbank and delta learning within a deep neural network framework.
Proceedings of the IEEE International Conference on Acoustics, 2014

Efficient spoken term detection using confusion networks.
Proceedings of the IEEE International Conference on Acoustics, 2014

Automatic keyword selection for keyword search development and tuning.
Proceedings of the IEEE International Conference on Acoustics, 2014

Optimization Techniques to Improve Training Speed of Deep Neural Networks for Large Speech Tasks.
IEEE Trans. Speech Audio Process., 2013

Improving training time of Hessian-free optimization for deep neural networks using preconditioning and sampling.
CoRR, 2013

The IBM speech activity detection system for the DARPA RATS program.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Mixtures of Bayesian joint factor analyzers for noise robust automatic speech recognition.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Deep convolutional neural networks for LVCSR.
Proceedings of the IEEE International Conference on Acoustics, 2013

Low-rank matrix factorization for Deep Neural Network training with high-dimensional output targets.
Proceedings of the IEEE International Conference on Acoustics, 2013

Exploiting diversity for spoken term detection.
Proceedings of the IEEE International Conference on Acoustics, 2013

System combination and score normalization for spoken term detection.
Proceedings of the IEEE International Conference on Acoustics, 2013

A high-performance Cantonese keyword search system.
Proceedings of the IEEE International Conference on Acoustics, 2013

Audio-visual deep learning for noise robust speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2013

New types of deep neural network learning for speech recognition and related applications: an overview.
Proceedings of the IEEE International Conference on Acoustics, 2013

Developing speech recognition systems for corpus indexing under the IARPA Babel program.
Proceedings of the IEEE International Conference on Acoustics, 2013

An empirical study of confusion modeling in keyword search for low resource languages.
Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013

Learning filter banks within a deep neural network framework.
Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013

Improvements to Deep Convolutional Neural Networks for LVCSR.
Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013

Accelerating Hessian-free optimization for Deep Neural Networks by implicit preconditioning and sampling.
Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013

Deep Neural Network Language Models.
Proceedings of the Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT, 2012

Discriminative feature-space transforms using deep neural networks.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Scalable Minimum Bayes Risk Training of Deep Neural Network Acoustic Models Using Distributed Hessian-free Optimization.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Auto-encoder bottleneck features using deep belief networks.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Trends and advances in speech recognition.
IBM J. Res. Dev., 2011

Artificial intelligence research at IBM.
IBM J. Res. Dev., 2011

The IBM 2009 GALE Arabic speech transcription system.
Proceedings of the IEEE International Conference on Acoustics, 2011

Arccosine kernels: Acoustic modeling with infinite neural networks.
Proceedings of the IEEE International Conference on Acoustics, 2011

Making Deep Belief Networks effective for large vocabulary continuous speech recognition.
Proceedings of the 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, 2011

The IBM 2011 GALE Arabic speech transcription system.
Proceedings of the 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, 2011

The IBM Attila speech recognition toolkit.
Proceedings of the 2010 IEEE Spoken Language Technology Workshop, 2010

Rapid and inexpensive development of speech action classifiers for natural language call routing systems.
Proceedings of the 2010 IEEE Spoken Language Technology Workshop, 2010

The IBM 2008 GALE Arabic speech transcription system.
Proceedings of the IEEE International Conference on Acoustics, 2010

Advances in Arabic Speech Transcription at IBM Under the DARPA GALE Program.
IEEE Trans. Speech Audio Process., 2009

Tied-Mixture Language Modeling in Continuous Space.
Proceedings of the Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, Proceedings, May 31, 2009

Fast decoding for open vocabulary spoken term detection.
Proceedings of the Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, Proceedings, May 31, 2009

Lattice-based optimization of sequence classification criteria for neural-network acoustic modeling.
Proceedings of the IEEE International Conference on Acoustics, 2009

Machine translation in continuous space.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Monte Carlo model-space noise adaptation for speech recognition.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Discriminative graph training for ultra-fast low-footprint speech indexing.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Boosted MMI for model and feature-space discriminative training.
Proceedings of the IEEE International Conference on Acoustics, 2008

The IBM 2006 Gale Arabic ASR System.
Proceedings of the IEEE International Conference on Acoustics, 2007

Evaluation of Proposed Modifications to MPE for Large Scale Discriminative Training.
Proceedings of the IEEE International Conference on Acoustics, 2007

Discriminative Training of Decoding Graphs for Large Vocabulary Continuous Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2007

Pseudo pitch synchronous analysis of speech with applications to speaker recognition.
IEEE Trans. Speech Audio Process., 2006

Advances in speech transcription at IBM under the DARPA EARS program.
IEEE Trans. Speech Audio Process., 2006

Automated Quality Monitoring for Call Centers using Speech and NLP Technologies.
Proceedings of the Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, 2006

Automated Quality Monitoring in the Call Center with ASR and Maximum Entropy.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

The IBM 2004 Conversational Telephony System for Rich Transcription.
Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

Contructing Ensembles of ASR Systems Using Randomized Decision Trees.
Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

fMPE: Discriminatively Trained Features for Speech Recognition.
Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

An evaluation of a nonlinear feature transformation for conversational speech recognition.
Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

An architecture for rapid decoding of large vocabulary conversational speech.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Toward domain-independent conversational speech recognition.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Large vocabulary conversational speech recognition with a subspace constraint on inverse covariance matrices.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Automatic speech recognition performance on a voicemail transcription task.
IEEE Trans. Speech Audio Process., 2002

A hybrid HMM/traps model for robust voice activity detection.
Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

Distributed speech recognition using noise-robust MFCC and traps-estimated manner features.
Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

Large vocabulary conversational speech recognition with the extended maximum likelihood linear transformation (EMLLT) model.
Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

Robust speech recognition in Noisy Environments: The 2001 IBM spine evaluation system.
Proceedings of the IEEE International Conference on Acoustics, 2002

Recent improvements in speech recognition performance on large vocabulary conversational speech (voicemail and switchboard).
Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

Robust speech recognition using the modulation spectrogram.
Speech Commun., 1998

Performance improvements through combining phone- and syllable-scale information in automatic speech recognition.
Proceedings of the 5th International Conference on Spoken Language Processing, Incorporating The 7th Australian International Speech Science and Technology Conference, Sydney Convention Centre, Sydney, Australia, 30th November, 1998

Incorporating information from syllable-length time scales into automatic speech recognition.
Proceedings of the 1998 IEEE International Conference on Acoustics, 1998

Recognizing reverberant speech with RASTA-PLP.
Proceedings of the 1997 IEEE International Conference on Acoustics, 1997

The modulation spectrogram: in pursuit of an invariant representation of speech.
Proceedings of the 1997 IEEE International Conference on Acoustics, 1997

Spert-II: A Vector Microprocessor System.
Computer, 1996

SPERT-II: A Vector Microprocessor System and its Application to Large Problems in Backpropagation Training.
Proceedings of the Advances in Neural Information Processing Systems 8, 1995

SPERT: a VLIW/SIMD microprocessor for artificial neural network computations.
Proceedings of the Application Specific Array Processors, 1992
