Tara N. Sainath

Orcid: 0000-0002-4126-6556

Affiliations:
  • Google Inc., New York, NY, USA
  • IBM T. J. Watson Research Center, Yorktown Heights, NY, USA


According to our database1, Tara N. Sainath authored at least 210 papers between 2006 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
End-to-End Speech Recognition: A Survey.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

Text Injection for Neural Contextual Biasing.
CoRR, 2024

Hierarchical Recurrent Adapters for Efficient Multi-Task Adaptation of Large Speech Models.
CoRR, 2024

Massive End-to-end Speech Recognition Models with Time Reduction.
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024

A Comparison of Parameter-Efficient ASR Domain Adaptation Methods for Universal Speech and Language Models.
Proceedings of the IEEE International Conference on Acoustics, 2024

Augmenting Conformers With Structured State-Space Sequence Models For Online Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2024

Extreme Encoder Output Frame Rate Reduction: Improving Computational Latencies of Large End-to-End Models.
Proceedings of the IEEE International Conference on Acoustics, 2024

Multilingual and Fully Non-Autoregressive ASR with Large Language Model Fusion: A Comprehensive Study.
Proceedings of the IEEE International Conference on Acoustics, 2024

Improving Speech Recognition for African American English with Audio Classification.
Proceedings of the IEEE International Conference on Acoustics, 2024

USM-Lite: Quantization and Sparsity Aware Fine-Tuning for Speech Recognition with Universal Speech Models.
Proceedings of the IEEE International Conference on Acoustics, 2024

Efficient Adapter Finetuning for Tail Languages in Streaming Multilingual ASR.
Proceedings of the IEEE International Conference on Acoustics, 2024

Handling Ambiguity in Emotion: From Out-of-Domain Detection to Distribution Estimation.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023
Contextual Biasing with the Knuth-Morris-Pratt Matching Algorithm.
CoRR, 2023

Massive End-to-end Models for Short Search Queries.
CoRR, 2023

Augmenting conformers with structured state space models for online speech recognition.
CoRR, 2023

Text Injection for Capitalization and Turn-Taking Prediction in Speech Models.
CoRR, 2023

AudioPaLM: A Large Language Model That Can Speak and Listen.
CoRR, 2023

Practical Conformer: Optimizing size, speed and flops of Conformer for on-Device and cloud ASR.
CoRR, 2023

Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages.
CoRR, 2023

Improving Joint Speech-Text Representations Without Alignment.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Modular Domain Adaptation for Conformer-Based Streaming ASR.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Re-investigating the Efficient Transfer Learning of Speech Foundation Model using Feature Fusion Methods.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Semantic Segmentation with Bidirectional Language Models Improves Long-form ASR.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Mixture-of-Expert Conformer for Streaming Multilingual ASR.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

How to Estimate Model Transferability of Pre-Trained Speech Models?
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

UML: A Universal Monolingual Output Layer For Multilingual Asr.
Proceedings of the IEEE International Conference on Acoustics, 2023

A Quantum Kernel Learning Approach to Acoustic Modeling for Spoken Command Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2023

From English to More Languages: Parameter-Efficient Model Reprogramming for Cross-Lingual Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2023

Multi-Output RNN-T Joint Networks for Multi-Task Learning of ASR and Auxiliary Tasks.
Proceedings of the IEEE International Conference on Acoustics, 2023

Improving Contextual Biasing with Text Injection.
Proceedings of the IEEE International Conference on Acoustics, 2023

A Comparison of Semi-Supervised Learning Techniques for Streaming ASR at Scale.
Proceedings of the IEEE International Conference on Acoustics, 2023

JEIT: Joint End-to-End Model and Internal Language Model Training for Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2023

Efficient Domain Adaptation for Speech Foundation Models.
Proceedings of the IEEE International Conference on Acoustics, 2023

Resource-Efficient Transfer Learning from Speech Foundation Model Using Hierarchical Feature Fusion.
Proceedings of the IEEE International Conference on Acoustics, 2023

E2E Segmentation in a Two-Pass Cascaded Encoder ASR Model.
Proceedings of the IEEE International Conference on Acoustics, 2023

Massively Multilingual Shallow Fusion with Large Language Models.
Proceedings of the IEEE International Conference on Acoustics, 2023

Sharing Low Rank Conformer Weights for Tiny Always-On Ambient Speech Recognition Models.
Proceedings of the IEEE International Conference on Acoustics, 2023

Context-Aware end-to-end ASR Using Self-Attentive Embedding and Tensor Fusion.
Proceedings of the IEEE International Conference on Acoustics, 2023

Lego-Features: Exporting Modular Encoder Features for Streaming and Deliberation ASR.
Proceedings of the IEEE International Conference on Acoustics, 2023

Improving Multilingual and Code-Switching ASR Using Large Language Model Generated Text.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Efficient Cascaded Streaming ASR System Via Frame Rate Reduction.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Improved Long-Form Speech Recognition By Jointly Modeling The Primary And Non-Primary Speakers.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

2022
BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition.
IEEE J. Sel. Top. Signal Process., 2022

Self-Supervised Speech Representation Learning: A Review.
IEEE J. Sel. Top. Signal Process., 2022

Editorial Editorial of Special Issue on Self-Supervised Learning for Speech and Audio Processing.
IEEE J. Sel. Top. Signal Process., 2022

JOIST: A Joint Speech and Text Streaming Model for ASR.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Dual Learning for Large Vocabulary On-Device ASR.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

NAM+: Towards Scalable End-to-End Contextual Biasing for Adaptive ASR.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

A Truly Multilingual First Pass and Monolingual Second Pass Streaming on-Device ASR System.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Scaling Up Deliberation For Multilingual ASR.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Unified End-to-End Speech Recognition and Endpointing for Fast and Efficient Speech Systems.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Streaming End-to-End Multilingual Speech Recognition with Joint Language Identification.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Streaming Align-Refine for Non-autoregressive Deliberation.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Improving Rare Word Recognition with LM-aware MWER Training.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Towards Disentangled Speech Representations.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

A Language Agnostic Multilingual Streaming On-Device ASR System.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Sentence-Select: Large-Scale Language Model Data Selection for Rare-Word Speech Recognition.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

E2E Segmenter: Joint Segmenting and Decoding for Long-Form ASR.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Improving Deliberation by Text-Only and Semi-Supervised Training.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

A Unified Cascaded Encoder ASR Model for Dynamic Model Sizes.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Streaming Intended Query Detection using E2E Modeling for Continued Conversation.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Turn-Taking Prediction for Natural Conversational Speech.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Improving the Fusion of Acoustic and Text Representations in RNN-T.
Proceedings of the IEEE International Conference on Acoustics, 2022

Deliberation of Streaming RNN-Transducer by Non-Autoregressive Decoding.
Proceedings of the IEEE International Conference on Acoustics, 2022


Massively Multilingual ASR: A Lifelong Learning Solution.
Proceedings of the IEEE International Conference on Acoustics, 2022

Transducer-Based Streaming Deliberation for Cascaded Encoders.
Proceedings of the IEEE International Conference on Acoustics, 2022

Joint Unsupervised and Supervised Training for Multilingual ASR.
Proceedings of the IEEE International Conference on Acoustics, 2022

2021
Scaling End-to-End Models for Large-Scale Multilingual ASR.
CoRR, 2021

Transformer Based Deliberation for Two-Pass Speech Recognition.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

RNN-T Models Fail to Generalize to Out-of-Domain Audio: Causes and Solutions.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Multitask Training with Text Data for End-to-End Speech Recognition.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

An Efficient Streaming Non-Recurrent On-Device End-to-End Model with Improvements to Rare-Word Modeling.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

A Deliberation-Based Joint Acoustic and Text Decoder.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Lookup-Table Recurrent Language Models for Long Tail Speech Recognition.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Tied & Reduced RNN-T Decoder.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Dual-mode ASR: Unify and Improve Streaming ASR with Full-context Modeling.
Proceedings of the 9th International Conference on Learning Representations, 2021

FastEmit: Low-Latency Streaming ASR with Sequence-Level Emission Regularization.
Proceedings of the IEEE International Conference on Acoustics, 2021

Echo State Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2021

Learning Word-Level Confidence for Subword End-To-End ASR.
Proceedings of the IEEE International Conference on Acoustics, 2021

Less is More: Improved RNN-T Decoding Using Limited Label Context and Path Merging.
Proceedings of the IEEE International Conference on Acoustics, 2021

Cascaded Encoders for Unifying Streaming and Non-Streaming ASR.
Proceedings of the IEEE International Conference on Acoustics, 2021

A Better and Faster end-to-end Model for Streaming ASR.
Proceedings of the IEEE International Conference on Acoustics, 2021

Scaling End-to-End Models for Large-Scale Multilingual ASR.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

2020
Universal ASR: Unify and Improve Streaming ASR with Full-context Modeling.
CoRR, 2020

A Streaming On-Device End-to-End Model Surpassing Server-Side Conventional Model Quality and Latency.
CoRR, 2020

Emitting Word Timings with End-to-End Models.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Improving Tail Performance of a Deliberation E2E ASR Model Using a Large Text Corpus.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Low Latency Speech Recognition Using End-to-End Prefetching.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Multistate Encoding with End-To-End Speech RNN Transducer Network.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

An Attention-Based Joint Acoustic and Text on-Device End-To-End Model.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020


Improving Proper Noun Recognition in End-To-End Asr by Customization of the Mwer Loss Criterion.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Towards Fast and Accurate Streaming End-To-End ASR.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Deliberation Model Based Two-Pass End-To-End Speech Recognition.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019
Deep Learning for Audio Signal Processing.
IEEE J. Sel. Top. Signal Process., 2019

Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling.
CoRR, 2019

Shallow-Fusion End-to-End Contextual Biasing.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Two-Pass End-to-End Speech Recognition.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Improving Performance of End-to-End ASR on Numeric Sequences.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Large-Scale Multilingual Speech Recognition with a Streaming End-to-End Model.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Phoneme-Based Contextualization for Cross-Lingual Speech Recognition in End-to-End Models.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Bytes Are All You Need: End-to-end Multilingual Speech Recognition and Synthesis with Bytes.
Proceedings of the IEEE International Conference on Acoustics, 2019


A Spelling Correction Model for End-to-end Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2019

Joint Endpointing and Decoding with End-to-end Models.
Proceedings of the IEEE International Conference on Acoustics, 2019

Phoebe: Pronunciation-aware Contextualization for End-to-end Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2019

Semi-supervised Training for End-to-end Models via Weak Distillation.
Proceedings of the IEEE International Conference on Acoustics, 2019

Contextual Speech Recognition with Difficult Negative Training Examples.
Proceedings of the IEEE International Conference on Acoustics, 2019

Recognizing Long-Form Speech Using Streaming End-to-End Models.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

A Comparison of End-to-End Models for Long-Form Speech Recognition.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

2018
A Comparison of Techniques for Language Model Integration in Encoder-Decoder Speech Recognition.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Deep Context: End-to-end Contextual Speech Recognition.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Contextual Speech Recognition in End-to-end Neural Network Systems Using Beam Search.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Domain Adaptation Using Factorized Hidden Layer for Robust Automatic Speech Recognition.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Compression of End-to-End Models.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Multilingual Speech Recognition with a Single End-to-End Model.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

No Need for a Lexicon? Evaluating the Value of the Pronunciation Lexica in End-to-End Models.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Improving the Performance of Online Neural Transducer Models.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Minimum Word Error Rate Training for Attention-Based Sequence-to-Sequence Models.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Multi-Dialect Speech Recognition with a Single Sequence-to-Sequence Model.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Spectral Distortion Model for Training Phase-Sensitive Deep-Neural Networks for Far-Field Speech Recognition.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

An Analysis of Incorporating an External Language Model into a Sequence-to-Sequence Model.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Performance of Mask Based Statistical Beamforming in a Smart Home Scenario.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

State-of-the-Art Speech Recognition with Sequence-to-Sequence Models.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Temporal Modeling Using Dilated Convolution and Gating for Voice-Activity-Detection.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017
Parallel Deep Neural Network Training for Big Data on Blue Gene/Q.
IEEE Trans. Parallel Distributed Syst., 2017

Multichannel Signal Processing With Deep Neural Networks for Automatic Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2017

Multi-Dialect Speech Recognition With A Single Sequence-To-Sequence Model.
CoRR, 2017

Annealed f-Smoothing as a Mechanism to Speed up Neural Network Training.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Highway-LSTM and Recurrent Highway Networks for Speech Recognition.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

An Analysis of "Attention" in Sequence-to-Sequence Models.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

A Comparison of Sequence-to-Sequence Models for Speech Recognition.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Acoustic Modeling for Google Home.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Reducing the Computational Complexity of Two-Dimensional LSTMs.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Generation of Large-Scale Simulated Utterances in Virtual Rooms to Train Deep-Neural Networks for Far-Field Speech Recognition in Google Home.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Endpoint Detection Using Grid Long Short-Term Memory Networks for Streaming Speech Recognition.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Improving the efficiency of forward-backward algorithm using batched computation in TensorFlow.
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

Raw Multichannel Processing Using Deep Neural Networks.
Proceedings of the New Era for Robust Speech Recognition, Exploiting Deep Learning., 2017

2016
Feature Learning with Raw-Waveform CLDNNs for Voice Activity Detection.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Complex Linear Projection (CLP): A Discriminative Approach to Joint Feature Extraction and Acoustic Modeling.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Reducing the Computational Complexity of Multimicrophone Acoustic Models with Integrated Feature Extraction.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Modeling Time-Frequency Patterns with LSTM vs. Convolutional Architectures for LVCSR Tasks.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Lower Frame Rate Neural Network Acoustic Models.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Neural Network Adaptive Beamforming for Robust Multichannel Speech Recognition.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Factored spatial and spectral multichannel raw waveform CLDNNs.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Learning compact recurrent neural networks.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

2015
Deep Convolutional Neural Networks for Large-scale Speech Tasks.
Neural Networks, 2015

Structured Transforms for Small-Footprint Deep Learning.
Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 2015

Learning the speech front-end with raw waveform CLDNNs.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Convolutional neural networks for small-footprint keyword spotting.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Large vocabulary automatic speech recognition for children.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Locally-connected and convolutional neural networks for small footprint speaker recognition.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Convolutional, Long Short-Term Memory, fully connected Deep Neural Networks.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Automatic gain control and multi-style training for robust small-footprint keyword spotting with deep neural networks.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Query-by-example keyword spotting using long short-term memory networks.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Acoustic modelling with CD-CTC-SMBR LSTM RNNS.
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

Speaker location and microphone spacing invariant acoustic modeling from raw multichannel waveforms.
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

2014
Deep scattering spectra with deep neural networks for LVCSR tasks.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Parallel deep neural network training for LVCSR tasks using blue gene/Q.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Joint training of convolutional and non-convolutional neural networks.
Proceedings of the IEEE International Conference on Acoustics, 2014

Improvements to filterbank and delta learning within a deep neural network framework.
Proceedings of the IEEE International Conference on Acoustics, 2014

Deep Scattering Spectrum with deep neural networks.
Proceedings of the IEEE International Conference on Acoustics, 2014

Kernel methods match Deep Neural Networks on TIMIT.
Proceedings of the IEEE International Conference on Acoustics, 2014

2013
Optimization Techniques to Improve Training Speed of Deep Neural Networks for Large Speech Tasks.
IEEE Trans. Speech Audio Process., 2013

Improving training time of Hessian-free optimization for deep neural networks using preconditioning and sampling.
CoRR, 2013

Deep convolutional neural networks for LVCSR.
Proceedings of the IEEE International Conference on Acoustics, 2013

Low-rank matrix factorization for Deep Neural Network training with high-dimensional output targets.
Proceedings of the IEEE International Conference on Acoustics, 2013

An evaluation of posterior modeling techniques for phonetic recognition.
Proceedings of the IEEE International Conference on Acoustics, 2013

Improving deep neural networks for LVCSR using rectified linear units and dropout.
Proceedings of the IEEE International Conference on Acoustics, 2013

Developing speech recognition systems for corpus indexing under the IARPA Babel program.
Proceedings of the IEEE International Conference on Acoustics, 2013

Learning filter banks within a deep neural network framework.
Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013

Improvements to Deep Convolutional Neural Networks for LVCSR.
Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013

Accelerating Hessian-free optimization for Deep Neural Networks by implicit preconditioning and sampling.
Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013

2012
Exemplar-Based Processing for Speech Recognition: An Overview.
IEEE Signal Process. Mag., 2012

Deep Neural Network Language Models.
Proceedings of the Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT, 2012

Enhancing Exemplar-Based Posteriors for Speech Recognition Tasks.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Scalable Minimum Bayes Risk Training of Deep Neural Network Acoustic Models Using Distributed Hessian-free Optimization.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Auto-encoder bottleneck features using deep belief networks.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Improved pre-training of Deep Belief Networks using Sparse Encoding Symmetric Machines.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

N-best entropy based data selection for acoustic modeling.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

2011
Exemplar-Based Sparse Representation Features: From TIMIT to LVCSR.
IEEE ACM Trans. Audio Speech Lang. Process., 2011

Reducing Computational Complexities of Exemplar-Based Sparse Representations with Applications to Large Vocabulary Speech Recognition.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Convergence of Line Search A-Function Methods.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Application specific loss minimization using gradient boosting.
Proceedings of the IEEE International Conference on Acoustics, 2011

Exemplar-based Sparse Representation phone identification features.
Proceedings of the IEEE International Conference on Acoustics, 2011

Deep Belief Networks using discriminative features for phone recognition.
Proceedings of the IEEE International Conference on Acoustics, 2011

A-Functions: A generalization of Extended Baum-Welch transformations to convex optimization.
Proceedings of the IEEE International Conference on Acoustics, 2011

A convex hull approach to sparse representations for exemplar-based speech recognition.
Proceedings of the 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, 2011

Making Deep Belief Networks effective for large vocabulary continuous speech recognition.
Proceedings of the 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, 2011

2010
Data selection for language modeling using sparse representations.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Sparse representation features for speech recognition.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Sparse representations for text categorization.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

An analysis of sparseness and regularization in exemplar-based methods for speech classification.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Incorporating sparse representation phone identification features in automatic speech recognition using exponential families.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

A voice-commandable robotic forklift working alongside humans in minimally-prepared outdoor environments.
Proceedings of the IEEE International Conference on Robotics and Automation, 2010

Bayesian compressive sensing for phonetic classification.
Proceedings of the IEEE International Conference on Acoustics, 2010

The Use of isometric transformations and bayesian estimation in compressive sensing for fMRI classification.
Proceedings of the IEEE International Conference on Acoustics, 2010

Kalman filtering for compressed sensing.
Proceedings of the 13th Conference on Information Fusion, 2010

2009
Applications of broad class knowledge for noise robust speech recognition.
PhD thesis, 2009

A generalized family of parameter estimation techniques.
Proceedings of the IEEE International Conference on Acoustics, 2009

An exploration of large vocabulary tools for small vocabulary phonetic recognition.
Proceedings of the 2009 IEEE Workshop on Automatic Speech Recognition & Understanding, 2009

Island-driven search using broad phonetic classes.
Proceedings of the 2009 IEEE Workshop on Automatic Speech Recognition & Understanding, 2009

2008
A comparison of broad phonetic and acoustic units for noise robust segment-based phonetic recognition.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Generalization of extended baum-welch parameter estimation for discriminative training and decoding.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Gradient steepness metrics using extended Baum-Welch transformations for universal pattern recognition tasks.
Proceedings of the IEEE International Conference on Acoustics, 2008

2007
Audio classification using extended baum-welch transformations.
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Unsupervised Audio Segmentation using Extended Baum-Welch Transformations.
Proceedings of the IEEE International Conference on Acoustics, 2007

Broad phonetic class recognition in a Hidden Markov model framework using extended Baum-Welch transformations.
Proceedings of the IEEE Workshop on Automatic Speech Recognition & Understanding, 2007

2006
A Sinusoidal Model Approach to Acoustic Landmark Detection and Segmentation for Robust Segment-Based Speech Recognition.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006


  Loading...