We stand with Ukraine

We stand with Ukraine

Tara N. Sainath

Orcid: 0000-0002-4126-6556

Affiliations:

Google Inc., New York, NY, USA
IBM T. J. Watson Research Center, Yorktown Heights, NY, USA

According to our database¹, Tara N. Sainath authored at least 210 papers between 2006 and 2024.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

Online presence:

On csauthors.net:

Bibliography

2024

End-to-End Speech Recognition: A Survey.

[BibT_eX]

[DOI]

Rohit Prabhavalkar

,

,

Tara N. Sainath

,

,

Shinji Watanabe

IEEE ACM Trans. Audio Speech Lang. Process., 2024

Text Injection for Neural Contextual Biasing.

[BibT_eX]

[DOI]

,

,

Rohit Prabhavalkar

,

,

,

,

Tara N. Sainath

,

Bhuvana Ramabhadran

CoRR, 2024

Hierarchical Recurrent Adapters for Efficient Multi-Task Adaptation of Large Speech Models.

[BibT_eX]

[DOI]

Tsendsuren Munkhdalai

,

,

,

,

Tara N. Sainath

,

Pedro Moreno Mengibar

CoRR, 2024

Massive End-to-end Speech Recognition Models with Time Reduction.

[BibT_eX]

[DOI]

,

Rohit Prabhavalkar

,

,

,

Dongseong Hwang

,

,

,

,

,

,

,

Chengjian Zheng

,

,

Tara N. Sainath

,

Pedro Moreno Mengibar

Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024

A Comparison of Parameter-Efficient ASR Domain Adaptation Methods for Universal Speech and Language Models.

[BibT_eX]

[DOI]

,

,

Tsendsuren Munkhdalai

,

Nikhil Siddhartha

,

,

,

,

Tara N. Sainath

Proceedings of the IEEE International Conference on Acoustics, 2024

Augmenting Conformers With Structured State-Space Sequence Models For Online Speech Recognition.

[BibT_eX]

[DOI]

,

,

,

,

Krzysztof Choromanski

,

Tara N. Sainath

Proceedings of the IEEE International Conference on Acoustics, 2024

Extreme Encoder Output Frame Rate Reduction: Improving Computational Latencies of Large End-to-End Models.

[BibT_eX]

[DOI]

Rohit Prabhavalkar

,

,

,

,

,

,

,

Dongseong Hwang

,

Tara N. Sainath

,

Pedro J. Moreno

Proceedings of the IEEE International Conference on Acoustics, 2024

Multilingual and Fully Non-Autoregressive ASR with Large Language Model Fusion: A Comprehensive Study.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

Shuo-Yiin Chang

,

Tara N. Sainath

Proceedings of the IEEE International Conference on Acoustics, 2024

Improving Speech Recognition for African American English with Audio Classification.

[BibT_eX]

[DOI]

,

,

,

,

,

Alëna Aksënova

,

Tsendsuren Munkhdalai

,

,

,

,

Dongseong Hwang

,

Tara N. Sainath

,

Françoise Beaufays

,

Pedro Moreno Mengibar

Proceedings of the IEEE International Conference on Acoustics, 2024

USM-Lite: Quantization and Sparsity Aware Fine-Tuning for Speech Recognition with Universal Speech Models.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Rohit Prabhavalkar

,

,

Tara N. Sainath

,

,

,

Amir Yazdanbakhsh

,

Shivani Agrawal

Proceedings of the IEEE International Conference on Acoustics, 2024

Efficient Adapter Finetuning for Tail Languages in Streaming Multilingual ASR.

[BibT_eX]

[DOI]

,

,

,

Tara N. Sainath

,

Trevor Strohman

Proceedings of the IEEE International Conference on Acoustics, 2024

Handling Ambiguity in Emotion: From Out-of-Domain Detection to Distribution Estimation.

[BibT_eX]

[DOI]

,

,

,

Chung-Cheng Chiu

,

,

,

Tara N. Sainath

,

Philip C. Woodland

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023

Contextual Biasing with the Knuth-Morris-Pratt Matching Algorithm.

[BibT_eX]

[DOI]

,

,

Diamantino Caseiro

,

Tsendsuren Munkhdalai

,

,

,

,

,

Rohit Prabhavalkar

,

,

,

Tara N. Sainath

,

Pedro Moreno Mengibar

CoRR, 2023

Massive End-to-end Models for Short Search Queries.

[BibT_eX]

[DOI]

,

Rohit Prabhavalkar

,

Dongseong Hwang

,

,

,

,

,

,

,

,

,

,

Tara N. Sainath

,

Pedro Moreno Mengibar

CoRR, 2023

Augmenting conformers with structured state space models for online speech recognition.

[BibT_eX]

[DOI]

,

,

,

,

Krzysztof Choromanski

,

Tara N. Sainath

CoRR, 2023

Text Injection for Capitalization and Turn-Taking Prediction in Speech Models.

[BibT_eX]

[DOI]

,

Shuo-Yiin Chang

,

,

,

,

Tara N. Sainath

CoRR, 2023

AudioPaLM: A Large Language Model That Can Speak and Listen.

[BibT_eX]

[DOI]

CoRR, 2023

Practical Conformer: Optimizing size, speed and flops of Conformer for on-Device and cloud ASR.

[BibT_eX]

[DOI]

,

,

Tara N. Sainath

,

Krzysztof Choromanski

,

,

Trevor Strohman

,

,

CoRR, 2023

Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages.

[BibT_eX]

[DOI]

CoRR, 2023

Improving Joint Speech-Text Representations Without Alignment.

[BibT_eX]

[DOI]

,

,

Rohit Prabhavalkar

,

Andrew Rosenberg

,

Tara N. Sainath

,

Michael Picheny

,

,

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Modular Domain Adaptation for Conformer-Based Streaming ASR.

[BibT_eX]

[DOI]

,

,

Dongseong Hwang

,

Tara N. Sainath

,

Pedro Moreno Mengibar

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Re-investigating the Efficient Transfer Learning of Speech Foundation Model using Feature Fusion Methods.

[BibT_eX]

[DOI]

,

,

Dongseong Hwang

,

Tsendsuren Munkhdalai

,

Tara N. Sainath

,

Pedro Moreno Mengibar

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Semantic Segmentation with Bidirectional Language Models Improves Long-form ASR.

[BibT_eX]

[DOI]

,

,

,

Shuo-Yiin Chang

,

Tara N. Sainath

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Mixture-of-Expert Conformer for Streaming Multilingual ASR.

[BibT_eX]

[DOI]

,

,

Tara N. Sainath

,

,

Françoise Beaufays

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

How to Estimate Model Transferability of Pre-Trained Speech Models?

[BibT_eX]

[DOI]

,

Chao-Han Huck Yang

,

,

,

,

Shuo-Yiin Chang

,

Rohit Prabhavalkar

,

,

Tara N. Sainath

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

UML: A Universal Monolingual Output Layer For Multilingual Asr.

[BibT_eX]

[DOI]

,

,

Tara N. Sainath

,

Trevor Strohman

,

Shuo-Yiin Chang

Proceedings of the IEEE International Conference on Acoustics, 2023

A Quantum Kernel Learning Approach to Acoustic Modeling for Spoken Command Recognition.

[BibT_eX]

[DOI]

Chao-Han Huck Yang

,

,

,

,

Tara N. Sainath

,

Sabato Marco Siniscalchi

,

Proceedings of the IEEE International Conference on Acoustics, 2023

From English to More Languages: Parameter-Efficient Model Reprogramming for Cross-Lingual Speech Recognition.

[BibT_eX]

[DOI]

Chao-Han Huck Yang

,

,

,

,

Rohit Prabhavalkar

,

Tara N. Sainath

,

Trevor Strohman

Proceedings of the IEEE International Conference on Acoustics, 2023

Multi-Output RNN-T Joint Networks for Multi-Task Learning of ASR and Auxiliary Tasks.

[BibT_eX]

[DOI]

,

,

,

,

Shuo-Yiin Chang

,

,

Tara N. Sainath

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2023

Improving Contextual Biasing with Text Injection.

[BibT_eX]

[DOI]

Tara N. Sainath

,

Rohit Prabhavalkar

,

Diamantino Caseiro

,

,

Proceedings of the IEEE International Conference on Acoustics, 2023

A Comparison of Semi-Supervised Learning Techniques for Streaming ASR at Scale.

[BibT_eX]

[DOI]

,

Michael Picheny

,

,

Rohit Prabhavalkar

,

,

Tara N. Sainath

Proceedings of the IEEE International Conference on Acoustics, 2023

JEIT: Joint End-to-End Model and Internal Language Model Training for Speech Recognition.

[BibT_eX]

[DOI]

,

,

Rohit Prabhavalkar

,

Tara N. Sainath

,

,

,

,

,

Andrew Rosenberg

,

Bhuvana Ramabhadran

Proceedings of the IEEE International Conference on Acoustics, 2023

Efficient Domain Adaptation for Speech Foundation Models.

[BibT_eX]

[DOI]

,

Dongseong Hwang

,

,

,

,

Tara N. Sainath

,

,

,

,

Trevor Strohman

,

Françoise Beaufays

Proceedings of the IEEE International Conference on Acoustics, 2023

Resource-Efficient Transfer Learning from Speech Foundation Model Using Hierarchical Feature Fusion.

[BibT_eX]

[DOI]

,

,

,

Dongseong Hwang

,

Tara N. Sainath

,

Trevor Strohman

Proceedings of the IEEE International Conference on Acoustics, 2023

E2E Segmentation in a Two-Pass Cascaded Encoder ASR Model.

[BibT_eX]

[DOI]

,

Shuo-Yiin Chang

,

Tara N. Sainath

,

,

,

,

Rohit Prabhavalkar

,

,

,

Trevor D. Strohman

Proceedings of the IEEE International Conference on Acoustics, 2023

Massively Multilingual Shallow Fusion with Large Language Models.

[BibT_eX]

[DOI]

,

Tara N. Sainath

,

,

,

,

,

,

Rodrigo Cabrera

,

,

Trevor Strohman

Proceedings of the IEEE International Conference on Acoustics, 2023

Sharing Low Rank Conformer Weights for Tiny Always-On Ambient Speech Recognition Models.

[BibT_eX]

[DOI]

Steven M. Hernandez

,

,

,

Antoine Bruguier

,

Rohit Prabhavalkar

,

Tara N. Sainath

,

,

Proceedings of the IEEE International Conference on Acoustics, 2023

Context-Aware end-to-end ASR Using Self-Attentive Embedding and Tensor Fusion.

[BibT_eX]

[DOI]

Shuo-Yiin Chang

,

,

Tara N. Sainath

,

,

Trevor Strohman

Proceedings of the IEEE International Conference on Acoustics, 2023

Lego-Features: Exporting Modular Encoder Features for Streaming and Deliberation ASR.

[BibT_eX]

[DOI]

,

Rohit Prabhavalkar

,

Johan Schalkwyk

,

,

Tara N. Sainath

,

Françoise Beaufays

Proceedings of the IEEE International Conference on Acoustics, 2023

Improving Multilingual and Code-Switching ASR Using Large Language Model Generated Text.

[BibT_eX]

[DOI]

,

Tara N. Sainath

,

,

,

,

,

,

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Efficient Cascaded Streaming ASR System Via Frame Rate Reduction.

[BibT_eX]

[DOI]

,

,

,

Dongseong Hwang

,

,

Antoine Bruguier

,

Rohit Prabhavalkar

,

Tara N. Sainath

,

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Improved Long-Form Speech Recognition By Jointly Modeling The Primary And Non-Primary Speakers.

[BibT_eX]

[DOI]

Guru Prakash Arumugam

,

Shuo-Yiin Chang

,

Tara N. Sainath

,

Rohit Prabhavalkar

,

,

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

2022

BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition.

[BibT_eX]

[DOI]

IEEE J. Sel. Top. Signal Process., 2022

Self-Supervised Speech Representation Learning: A Review.

[BibT_eX]

[DOI]

Abdelrahman Mohamed

,

,

,

Jakob D. Havtorn

,

,

,

Katrin Kirchhoff

,

,

,

,

Tara N. Sainath

,

Shinji Watanabe

IEEE J. Sel. Top. Signal Process., 2022

Editorial Editorial of Special Issue on Self-Supervised Learning for Speech and Audio Processing.

[BibT_eX]

[DOI]

,

Shinji Watanabe

,

,

Abdelrahman Mohamed

,

Tara N. Sainath

IEEE J. Sel. Top. Signal Process., 2022

JOIST: A Joint Speech and Text Streaming Model for ASR.

[BibT_eX]

[DOI]

Tara N. Sainath

,

Rohit Prabhavalkar

,

,

,

,

,

,

,

Trevor Strohman

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Dual Learning for Large Vocabulary On-Device ASR.

[BibT_eX]

[DOI]

,

,

Tara N. Sainath

,

Rohit Prabhavalkar

,

Michael Picheny

,

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

NAM+: Towards Scalable End-to-End Contextual Biasing for Adaptive ASR.

[BibT_eX]

[DOI]

Tsendsuren Munkhdalai

,

,

,

,

,

,

Tara N. Sainath

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

A Truly Multilingual First Pass and Monolingual Second Pass Streaming on-Device ASR System.

[BibT_eX]

[DOI]

Sepand Mavandadi

,

,

,

,

Tara N. Sainath

,

Trevor Strohman

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Scaling Up Deliberation For Multilingual ASR.

[BibT_eX]

[DOI]

,

,

Tara N. Sainath

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Unified End-to-End Speech Recognition and Endpointing for Fast and Efficient Speech Systems.

[BibT_eX]

[DOI]

,

Shuo-Yiin Chang

,

,

Tara N. Sainath

,

,

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Streaming End-to-End Multilingual Speech Recognition with Joint Language Identification.

[BibT_eX]

[DOI]

,

,

Tara N. Sainath

,

Trevor Strohman

,

Sepand Mavandadi

,

Shuo-Yiin Chang

,

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Streaming Align-Refine for Non-autoregressive Deliberation.

[BibT_eX]

[DOI]

,

,

Tara N. Sainath

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Improving Rare Word Recognition with LM-aware MWER Training.

[BibT_eX]

[DOI]

,

,

Tara N. Sainath

,

,

Rohit Prabhavalkar

,

,

Bhuvana Ramabhadran

,

,

Sepand Mavandadi

,

,

Trevor Strohman

,

,

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Towards Disentangled Speech Representations.

[BibT_eX]

[DOI]

,

,

Andrew Rosenberg

,

Tara N. Sainath

,

Michael Picheny

,

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

A Language Agnostic Multilingual Streaming On-Device ASR System.

[BibT_eX]

[DOI]

,

Tara N. Sainath

,

,

Shuo-Yiin Chang

,

,

Trevor Strohman

,

,

,

,

,

,

Sameer Bidichandani

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Sentence-Select: Large-Scale Language Model Data Selection for Rare-Word Speech Recognition.

[BibT_eX]

[DOI]

,

,

Tara N. Sainath

,

,

Trevor D. Strohman

,

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

E2E Segmenter: Joint Segmenting and Decoding for Long-Form ASR.

[BibT_eX]

[DOI]

,

Shuo-Yiin Chang

,

,

Tara N. Sainath

,

Rohit Prabhavalkar

,

,

,

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Improving Deliberation by Text-Only and Semi-Supervised Training.

[BibT_eX]

[DOI]

,

Tara N. Sainath

,

,

Rohit Prabhavalkar

,

Trevor Strohman

,

Sepand Mavandadi

,

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

A Unified Cascaded Encoder ASR Model for Dynamic Model Sizes.

[BibT_eX]

[DOI]

,

,

,

Tara N. Sainath

,

,

,

,

,

,

,

Dongseong Hwang

,

,

Rohit Prabhavalkar

,

Trevor Strohman

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Streaming Intended Query Detection using E2E Modeling for Continued Conversation.

[BibT_eX]

[DOI]

Shuo-Yiin Chang

,

,

,

Tara N. Sainath

,

,

,

,

,

,

Trevor Strohman

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Turn-Taking Prediction for Natural Conversational Speech.

[BibT_eX]

[DOI]

Shuo-Yiin Chang

,

,

Tara N. Sainath

,

,

Trevor Strohman

,

,

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Improving the Fusion of Acoustic and Text Representations in RNN-T.

[BibT_eX]

[DOI]

,

,

,

Tara N. Sainath

,

Shuo-Yiin Chang

Proceedings of the IEEE International Conference on Acoustics, 2022

Deliberation of Streaming RNN-Transducer by Non-Autoregressive Decoding.

[BibT_eX]

[DOI]

,

,

Tara N. Sainath

Proceedings of the IEEE International Conference on Acoustics, 2022

Improving The Latency And Quality Of Cascaded Encoders.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Massively Multilingual ASR: A Lifelong Learning Solution.

[BibT_eX]

[DOI]

,

,

,

Tara N. Sainath

,

Trevor Strohman

,

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2022

Transducer-Based Streaming Deliberation for Cascaded Encoders.

[BibT_eX]

[DOI]

,

Tara N. Sainath

,

,

,

Trevor Strohman

Proceedings of the IEEE International Conference on Acoustics, 2022

Joint Unsupervised and Supervised Training for Multilingual ASR.

[BibT_eX]

[DOI]

,

,

,

,

Nikhil Siddhartha

,

,

Tara N. Sainath

Proceedings of the IEEE International Conference on Acoustics, 2022

2021

Scaling End-to-End Models for Large-Scale Multilingual ASR.

[BibT_eX]

[DOI]

,

,

Tara N. Sainath

,

,

,

,

,

,

CoRR, 2021

Transformer Based Deliberation for Two-Pass Speech Recognition.

[BibT_eX]

[DOI]

,

,

Tara N. Sainath

,

Trevor Strohman

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

RNN-T Models Fail to Generalize to Out-of-Domain Audio: Causes and Solutions.

[BibT_eX]

[DOI]

Chung-Cheng Chiu

,

,

,

Rohit Prabhavalkar

,

,

,

,

Tara N. Sainath

,

,

,

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Multitask Training with Text Data for End-to-End Speech Recognition.

[BibT_eX]

[DOI]

,

Tara N. Sainath

,

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

An Efficient Streaming Non-Recurrent On-Device End-to-End Model with Improvements to Rare-Word Modeling.

[BibT_eX]

[DOI]

Tara N. Sainath

,

,

,

,

,

,

,

,

,

Quoc-Nam Le-The

,

Shuo-Yiin Chang

,

,

,

,

Chung-Cheng Chiu

,

Diamantino Caseiro

,

,

,

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

A Deliberation-Based Joint Acoustic and Text Decoder.

[BibT_eX]

[DOI]

Sepand Mavandadi

,

Tara N. Sainath

,

,

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Lookup-Table Recurrent Language Models for Long Tail Speech Recognition.

[BibT_eX]

[DOI]

,

Tara N. Sainath

,

,

,

,

Trevor Strohman

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Tied & Reduced RNN-T Decoder.

[BibT_eX]

[DOI]

,

Tara N. Sainath

,

,

Emmanuel Guzman

,

,

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Dual-mode ASR: Unify and Improve Streaming ASR with Full-context Modeling.

[BibT_eX]

[DOI]

,

,

,

Chung-Cheng Chiu

,

,

Tara N. Sainath

,

,

Proceedings of the 9th International Conference on Learning Representations, 2021

FastEmit: Low-Latency Streaming ASR with Sequence-Level Emission Regularization.

[BibT_eX]

[DOI]

,

Chung-Cheng Chiu

,

,

Shuo-Yiin Chang

,

Tara N. Sainath

,

,

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2021

Echo State Speech Recognition.

[BibT_eX]

[DOI]

Harsh Shrivastava

,

,

,

,

Tara N. Sainath

Proceedings of the IEEE International Conference on Acoustics, 2021

Learning Word-Level Confidence for Subword End-To-End ASR.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Rohit Prabhavalkar

,

,

,

,

Tara N. Sainath

,

Proceedings of the IEEE International Conference on Acoustics, 2021

Less is More: Improved RNN-T Decoding Using Limited Label Context and Path Merging.

[BibT_eX]

[DOI]

Rohit Prabhavalkar

,

,

,

,

,

Trevor Strohman

,

Tara N. Sainath

Proceedings of the IEEE International Conference on Acoustics, 2021

Cascaded Encoders for Unifying Streaming and Non-Streaming ASR.

[BibT_eX]

[DOI]

,

Tara N. Sainath

,

,

,

Chung-Cheng Chiu

,

Rohit Prabhavalkar

,

,

Trevor Strohman

Proceedings of the IEEE International Conference on Acoustics, 2021

A Better and Faster end-to-end Model for Streaming ASR.

[BibT_eX]

[DOI]

,

,

,

Tara N. Sainath

,

Chung-Cheng Chiu

,

,

Shuo-Yiin Chang

,

,

,

,

,

,

,

Trevor Strohman

,

Proceedings of the IEEE International Conference on Acoustics, 2021

Scaling End-to-End Models for Large-Scale Multilingual ASR.

[BibT_eX]

[DOI]

,

,

Tara N. Sainath

,

,

,

,

,

,

,

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

2020

Universal ASR: Unify and Improve Streaming ASR with Full-context Modeling.

[BibT_eX]

[DOI]

,

,

,

Chung-Cheng Chiu

,

,

Tara N. Sainath

,

,

CoRR, 2020

A Streaming On-Device End-to-End Model Surpassing Server-Side Conventional Model Quality and Latency.

[BibT_eX]

[DOI]

CoRR, 2020

Emitting Word Timings with End-to-End Models.

[BibT_eX]

[DOI]

Tara N. Sainath

,

,

,

,

Trevor Strohman

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Improving Tail Performance of a Deliberation E2E ASR Model Using a Large Text Corpus.

[BibT_eX]

[DOI]

,

Sepand Mavandadi

,

Tara N. Sainath

,

,

,

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Low Latency Speech Recognition Using End-to-End Prefetching.

[BibT_eX]

[DOI]

Shuo-Yiin Chang

,

,

,

,

,

Tara N. Sainath

,

Trevor Strohman

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Multistate Encoding with End-To-End Speech RNN Transducer Network.

[BibT_eX]

[DOI]

,

,

,

Petar S. Aleksic

,

Tara N. Sainath

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

An Attention-Based Joint Acoustic and Text on-Device End-To-End Model.

[BibT_eX]

[DOI]

Tara N. Sainath

,

,

,

,

Chung-Cheng Chiu

,

Trevor Strohman

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

A Streaming On-Device End-To-End Model Surpassing Server-Side Conventional Model Quality and Latency.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Improving Proper Noun Recognition in End-To-End Asr by Customization of the Mwer Loss Criterion.

[BibT_eX]

[DOI]

,

Tara N. Sainath

,

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Towards Fast and Accurate Streaming End-To-End ASR.

[BibT_eX]

[DOI]

,

Shuo-Yiin Chang

,

Tara N. Sainath

,

,

,

Trevor Strohman

,

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Deliberation Model Based Two-Pass End-To-End Speech Recognition.

[BibT_eX]

[DOI]

,

Tara N. Sainath

,

,

Rohit Prabhavalkar

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

Deep Learning for Audio Signal Processing.

[BibT_eX]

[DOI]

Hendrik Purwins

,

,

Tuomas Virtanen

,

,

Shuo-Yiin Chang

,

Tara N. Sainath

IEEE J. Sel. Top. Signal Process., 2019

Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

Tara N. Sainath

,

,

Chung-Cheng Chiu

,

,

,

,

Stella Laurenzo

,

,

,

Wolfgang Macherey

,

,

,

,

,

,

Rohit Prabhavalkar

,

,

,

,

,

,

Sébastien Jean

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Kuan-Chieh Wang

,

Ekaterina Gonina

,

,

,

,

,

,

,

,

,

George F. Foster

,

John Richardson

,

,

Antoine Bruguier

,

,

,

,

,

,

,

Vijayaditya Peddinti

,

,

Michiel Bacchiani

,

Thomas B. Jablin

,

Robert Suderman

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Dmitry Lepikhin

,

,

,

,

Shubham Toshniwal

,

,

Michael Nirschl

,

CoRR, 2019

Shallow-Fusion End-to-End Contextual Biasing.

[BibT_eX]

[DOI]

,

Tara N. Sainath

,

,

,

,

,

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Two-Pass End-to-End Speech Recognition.

[BibT_eX]

[DOI]

Tara N. Sainath

,

,

,

,

Rohit Prabhavalkar

,

,

Mirkó Visontai

,

,

Trevor Strohman

,

,

,

Chung-Cheng Chiu

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Improving Performance of End-to-End ASR on Numeric Sequences.

[BibT_eX]

[DOI]

,

,

Tara N. Sainath

,

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Large-Scale Multilingual Speech Recognition with a Streaming End-to-End Model.

[BibT_eX]

[DOI]

,

Arindrima Datta

,

Tara N. Sainath

,

Eugene Weinstein

,

Bhuvana Ramabhadran

,

,

,

,

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Phoneme-Based Contextualization for Cross-Lingual Speech Recognition in End-to-End Models.

[BibT_eX]

[DOI]

,

Antoine Bruguier

,

Tara N. Sainath

,

Rohit Prabhavalkar

,

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Bytes Are All You Need: End-to-end Multilingual Speech Recognition and Synthesis with Bytes.

[BibT_eX]

[DOI]

,

,

Tara N. Sainath

,

,

Proceedings of the IEEE International Conference on Acoustics, 2019

Streaming End-to-end Speech Recognition for Mobile Devices.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

A Spelling Correction Model for End-to-end Speech Recognition.

[BibT_eX]

[DOI]

,

Tara N. Sainath

,

Proceedings of the IEEE International Conference on Acoustics, 2019

Joint Endpointing and Decoding with End-to-end Models.

[BibT_eX]

[DOI]

Shuo-Yiin Chang

,

Rohit Prabhavalkar

,

,

Tara N. Sainath

,

Proceedings of the IEEE International Conference on Acoustics, 2019

Phoebe: Pronunciation-aware Contextualization for End-to-end Speech Recognition.

[BibT_eX]

[DOI]

Antoine Bruguier

,

Rohit Prabhavalkar

,

,

Tara N. Sainath

Proceedings of the IEEE International Conference on Acoustics, 2019

Semi-supervised Training for End-to-end Models via Weak Distillation.

[BibT_eX]

[DOI]

,

Tara N. Sainath

,

,

Proceedings of the IEEE International Conference on Acoustics, 2019

Contextual Speech Recognition with Difficult Negative Training Examples.

[BibT_eX]

[DOI]

,

,

Tara N. Sainath

Proceedings of the IEEE International Conference on Acoustics, 2019

Recognizing Long-Form Speech Using Streaming End-to-End Models.

[BibT_eX]

[DOI]

,

Rohit Prabhavalkar

,

Chung-Cheng Chiu

,

,

Tara N. Sainath

,

Trevor Strohman

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

A Comparison of End-to-End Models for Long-Form Speech Recognition.

[BibT_eX]

[DOI]

Chung-Cheng Chiu

,

,

Rohit Prabhavalkar

,

,

Tara N. Sainath

,

,

,

,

,

Sergey Kishchenko

,

,

,

,

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

2018

A Comparison of Techniques for Language Model Integration in Encoder-Decoder Speech Recognition.

[BibT_eX]

[DOI]

Shubham Toshniwal

,

,

Chung-Cheng Chiu

,

,

Tara N. Sainath

,

Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Deep Context: End-to-end Contextual Speech Recognition.

[BibT_eX]

[DOI]

,

Tara N. Sainath

,

Rohit Prabhavalkar

,

,

Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Contextual Speech Recognition in End-to-end Neural Network Systems Using Beam Search.

[BibT_eX]

[DOI]

,

,

Petar S. Aleksic

,

,

Tara N. Sainath

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Domain Adaptation Using Factorized Hidden Layer for Robust Automatic Speech Recognition.

[BibT_eX]

[DOI]

,

,

,

Anshuman Tripathi

,

,

Tara N. Sainath

,

,

,

Michiel Bacchiani

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Compression of End-to-End Models.

[BibT_eX]

[DOI]

,

Tara N. Sainath

,

Rohit Prabhavalkar

,

,

,

,

Chung-Cheng Chiu

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Multilingual Speech Recognition with a Single End-to-End Model.

[BibT_eX]

[DOI]

Shubham Toshniwal

,

Tara N. Sainath

,

,

,

Pedro J. Moreno

,

Eugene Weinstein

,

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

No Need for a Lexicon? Evaluating the Value of the Pronunciation Lexica in End-to-End Models.

[BibT_eX]

[DOI]

Tara N. Sainath

,

Rohit Prabhavalkar

,

,

,

,

,

,

,

,

,

,

Chung-Cheng Chiu

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Improving the Performance of Online Neural Transducer Models.

[BibT_eX]

[DOI]

Tara N. Sainath

,

Chung-Cheng Chiu

,

Rohit Prabhavalkar

,

,

,

,

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Minimum Word Error Rate Training for Attention-Based Sequence-to-Sequence Models.

[BibT_eX]

[DOI]

Rohit Prabhavalkar

,

Tara N. Sainath

,

,

,

,

Chung-Cheng Chiu

,

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Multi-Dialect Speech Recognition with a Single Sequence-to-Sequence Model.

[BibT_eX]

[DOI]

,

Tara N. Sainath

,

,

Michiel Bacchiani

,

Eugene Weinstein

,

,

,

,

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Spectral Distortion Model for Training Phase-Sensitive Deep-Neural Networks for Far-Field Speech Recognition.

[BibT_eX]

[DOI]

,

Tara N. Sainath

,

,

,

Rajeev C. Nongpiur

,

Michiel Bacchiani

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

An Analysis of Incorporating an External Language Model into a Sequence-to-Sequence Model.

[BibT_eX]

[DOI]

,

,

,

Tara N. Sainath

,

,

Rohit Prabhavalkar

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Performance of Mask Based Statistical Beamforming in a Smart Home Scenario.

[BibT_eX]

[DOI]

,

Michiel Bacchiani

,

Tara N. Sainath

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

State-of-the-Art Speech Recognition with Sequence-to-Sequence Models.

[BibT_eX]

[DOI]

Chung-Cheng Chiu

,

Tara N. Sainath

,

,

Rohit Prabhavalkar

,

,

,

,

,

,

Ekaterina Gonina

,

,

,

,

Michiel Bacchiani

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Temporal Modeling Using Dilated Convolution and Gating for Voice-Activity-Detection.

[BibT_eX]

[DOI]

Shuo-Yiin Chang

,

,

,

Tara N. Sainath

,

Anshuman Tripathi

,

Aäron van den Oord

,

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017

Parallel Deep Neural Network Training for Big Data on Blue Gene/Q.

[BibT_eX]

[DOI]

,

Tara N. Sainath

,

Bhuvana Ramabhadran

,

Michael Picheny

,

John A. Gunnels

,

,

Upendra V. Chaudhari

,

Brian Kingsbury

IEEE Trans. Parallel Distributed Syst., 2017

Multichannel Signal Processing With Deep Neural Networks for Automatic Speech Recognition.

[BibT_eX]

[DOI]

Tara N. Sainath

,

,

Kevin W. Wilson

,

,

,

,

Michiel Bacchiani

,

,

Andrew W. Senior

,

,

,

IEEE ACM Trans. Audio Speech Lang. Process., 2017

Multi-Dialect Speech Recognition With A Single Sequence-To-Sequence Model.

[BibT_eX]

[DOI]

,

Tara N. Sainath

,

,

Michiel Bacchiani

,

Eugene Weinstein

,

,

,

,

CoRR, 2017

Annealed f-Smoothing as a Mechanism to Speed up Neural Network Training.

[BibT_eX]

[DOI]

Tara N. Sainath

,

Vijayaditya Peddinti

,

,

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Highway-LSTM and Recurrent Highway Networks for Speech Recognition.

[BibT_eX]

[DOI]

,

Tara N. Sainath

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

An Analysis of "Attention" in Sequence-to-Sequence Models.

[BibT_eX]

[DOI]

Rohit Prabhavalkar

,

Tara N. Sainath

,

,

,

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

A Comparison of Sequence-to-Sequence Models for Speech Recognition.

[BibT_eX]

[DOI]

Rohit Prabhavalkar

,

,

Tara N. Sainath

,

,

,

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Acoustic Modeling for Google Home.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Reducing the Computational Complexity of Two-Dimensional LSTMs.

[BibT_eX]

[DOI]

,

Tara N. Sainath

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Generation of Large-Scale Simulated Utterances in Virtual Rooms to Train Deep-Neural Networks for Far-Field Speech Recognition in Google Home.

[BibT_eX]

[DOI]

,

,

,

,

,

Tara N. Sainath

,

Michiel Bacchiani

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Endpoint Detection Using Grid Long Short-Term Memory Networks for Streaming Speech Recognition.

[BibT_eX]

[DOI]

Shuo-Yiin Chang

,

,

Tara N. Sainath

,

,

Carolina Parada

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Improving the efficiency of forward-backward algorithm using batched computation in TensorFlow.

[BibT_eX]

[DOI]

,

,

,

Tara N. Sainath

,

Michiel Bacchiani

Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

Raw Multichannel Processing Using Deep Neural Networks.

[BibT_eX]

[DOI]

Tara N. Sainath

,

,

Kevin W. Wilson

,

,

Michiel Bacchiani

,

,

,

,

Andrew W. Senior

,

,

,

Proceedings of the New Era for Robust Speech Recognition, Exploiting Deep Learning., 2017

2016

Feature Learning with Raw-Waveform CLDNNs for Voice Activity Detection.

[BibT_eX]

[DOI]

,

Tara N. Sainath

,

,

Carolina Parada

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Complex Linear Projection (CLP): A Discriminative Approach to Joint Feature Extraction and Acoustic Modeling.

[BibT_eX]

[DOI]

,

Tara N. Sainath

,

,

Michiel Bacchiani

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Reducing the Computational Complexity of Multimicrophone Acoustic Models with Integrated Feature Extraction.

[BibT_eX]

[DOI]

Tara N. Sainath

,

,

,

,

Kevin W. Wilson

,

Michiel Bacchiani

,

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Modeling Time-Frequency Patterns with LSTM vs. Convolutional Architectures for LVCSR Tasks.

[BibT_eX]

[DOI]

Tara N. Sainath

,

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Lower Frame Rate Neural Network Acoustic Models.

[BibT_eX]

[DOI]

,

Tara N. Sainath

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Neural Network Adaptive Beamforming for Robust Multichannel Speech Recognition.

[BibT_eX]

[DOI]

,

Tara N. Sainath

,

,

Kevin W. Wilson

,

Michiel Bacchiani

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Factored spatial and spectral multichannel raw waveform CLDNNs.

[BibT_eX]

[DOI]

Tara N. Sainath

,

,

Kevin W. Wilson

,

,

Michiel Bacchiani

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Learning compact recurrent neural networks.

[BibT_eX]

[DOI]

,

Vikas Sindhwani

,

Tara N. Sainath

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

2015

Deep Convolutional Neural Networks for Large-scale Speech Tasks.

[BibT_eX]

[DOI]

Tara N. Sainath

,

Brian Kingsbury

,

,

,

Abdel-rahman Mohamed

,

,

Bhuvana Ramabhadran

Neural Networks, 2015

Structured Transforms for Small-Footprint Deep Learning.

[BibT_eX]

[DOI]

Vikas Sindhwani

,

Tara N. Sainath

,

Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 2015

Learning the speech front-end with raw waveform CLDNNs.

[BibT_eX]

[DOI]

Tara N. Sainath

,

,

Andrew W. Senior

,

Kevin W. Wilson

,

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Convolutional neural networks for small-footprint keyword spotting.

[BibT_eX]

[DOI]

Tara N. Sainath

,

Carolina Parada

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Large vocabulary automatic speech recognition for children.

[BibT_eX]

[DOI]

,

,

,

Melissa K. Carroll

,

,

,

Tara N. Sainath

,

Andrew W. Senior

,

Françoise Beaufays

,

Michiel Bacchiani

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Locally-connected and convolutional neural networks for small footprint speaker recognition.

[BibT_eX]

[DOI]

,

Ignacio López-Moreno

,

Tara N. Sainath

,

Mirkó Visontai

,

,

Carolina Parada

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Convolutional, Long Short-Term Memory, fully connected Deep Neural Networks.

[BibT_eX]

[DOI]

Tara N. Sainath

,

,

Andrew W. Senior

,

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Automatic gain control and multi-style training for robust small-footprint keyword spotting with deep neural networks.

[BibT_eX]

[DOI]

Rohit Prabhavalkar

,

,

Carolina Parada

,

Preetum Nakkiran

,

Tara N. Sainath

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Query-by-example keyword spotting using long short-term memory networks.

[BibT_eX]

[DOI]

,

Carolina Parada

,

Tara N. Sainath

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Acoustic modelling with CD-CTC-SMBR LSTM RNNS.

[BibT_eX]

[DOI]

Andrew W. Senior

,

,

Felix de Chaumont Quitry

,

Tara N. Sainath

,

Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

Speaker location and microphone spacing invariant acoustic modeling from raw multichannel waveforms.

[BibT_eX]

[DOI]

Tara N. Sainath

,

,

Kevin W. Wilson

,

,

Michiel Bacchiani

,

Andrew W. Senior

Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

2014

Deep scattering spectra with deep neural networks for LVCSR tasks.

[BibT_eX]

[DOI]

Tara N. Sainath

,

Vijayaditya Peddinti

,

Brian Kingsbury

,

,

Bhuvana Ramabhadran

,

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Parallel deep neural network training for LVCSR tasks using blue gene/Q.

[BibT_eX]

[DOI]

Tara N. Sainath

,

,

Bhuvana Ramabhadran

,

Michael Picheny

,

John A. Gunnels

,

Brian Kingsbury

,

,

,

Upendra V. Chaudhari

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Joint training of convolutional and non-convolutional neural networks.

[BibT_eX]

[DOI]

,

,

Tara N. Sainath

Proceedings of the IEEE International Conference on Acoustics, 2014

Improvements to filterbank and delta learning within a deep neural network framework.

[BibT_eX]

[DOI]

Tara N. Sainath

,

Brian Kingsbury

,

Abdel-rahman Mohamed

,

,

Bhuvana Ramabhadran

Proceedings of the IEEE International Conference on Acoustics, 2014

Deep Scattering Spectrum with deep neural networks.

[BibT_eX]

[DOI]

Vijayaditya Peddinti

,

Tara N. Sainath

,

,

Bhuvana Ramabhadran

,

,

Proceedings of the IEEE International Conference on Acoustics, 2014

Kernel methods match Deep Neural Networks on TIMIT.

[BibT_eX]

[DOI]

,

,

Tara N. Sainath

,

Vikas Sindhwani

,

Bhuvana Ramabhadran

Proceedings of the IEEE International Conference on Acoustics, 2014

2013

Optimization Techniques to Improve Training Speed of Deep Neural Networks for Large Speech Tasks.

[BibT_eX]

[DOI]

Tara N. Sainath

,

Brian Kingsbury

,

,

Bhuvana Ramabhadran

IEEE Trans. Speech Audio Process., 2013

Improving training time of Hessian-free optimization for deep neural networks using preconditioning and sampling.

[BibT_eX]

[DOI]

Tara N. Sainath

,

,

Brian Kingsbury

,

Aleksandr Y. Aravkin

,

Bhuvana Ramabhadran

CoRR, 2013

Deep convolutional neural networks for LVCSR.

[BibT_eX]

[DOI]

Tara N. Sainath

,

Abdel-rahman Mohamed

,

Brian Kingsbury

,

Bhuvana Ramabhadran

Proceedings of the IEEE International Conference on Acoustics, 2013

Low-rank matrix factorization for Deep Neural Network training with high-dimensional output targets.

[BibT_eX]

[DOI]

Tara N. Sainath

,

Brian Kingsbury

,

Vikas Sindhwani

,

,

Bhuvana Ramabhadran

Proceedings of the IEEE International Conference on Acoustics, 2013

An evaluation of posterior modeling techniques for phonetic recognition.

[BibT_eX]

[DOI]

Rohit Prabhavalkar

,

Tara N. Sainath

,

,

Bhuvana Ramabhadran

,

Dimitri Kanevsky

Proceedings of the IEEE International Conference on Acoustics, 2013

Improving deep neural networks for LVCSR using rectified linear units and dropout.

[BibT_eX]

[DOI]

,

Tara N. Sainath

,

Geoffrey E. Hinton

Proceedings of the IEEE International Conference on Acoustics, 2013

Developing speech recognition systems for corpus indexing under the IARPA Babel program.

[BibT_eX]

[DOI]

,

,

Bhuvana Ramabhadran

,

,

Brian Kingsbury

,

,

,

Michael Picheny

,

Tara N. Sainath

,

Proceedings of the IEEE International Conference on Acoustics, 2013

Learning filter banks within a deep neural network framework.

[BibT_eX]

[DOI]

Tara N. Sainath

,

Brian Kingsbury

,

Abdel-rahman Mohamed

,

Bhuvana Ramabhadran

Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013

Improvements to Deep Convolutional Neural Networks for LVCSR.

[BibT_eX]

[DOI]

Tara N. Sainath

,

Brian Kingsbury

,

Abdel-rahman Mohamed

,

,

,

,

,

Aleksandr Y. Aravkin

,

Bhuvana Ramabhadran

Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013

Accelerating Hessian-free optimization for Deep Neural Networks by implicit preconditioning and sampling.

[BibT_eX]

[DOI]

Tara N. Sainath

,

,

Brian Kingsbury

,

Aleksandr Y. Aravkin

,

Bhuvana Ramabhadran

Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013

2012

Exemplar-Based Processing for Speech Recognition: An Overview.

[BibT_eX]

[DOI]

Tara N. Sainath

,

Bhuvana Ramabhadran

,

,

Dimitri Kanevsky

,

Dirk Van Compernolle

,

,

Jort F. Gemmeke

,

Jerome R. Bellegarda

,

IEEE Signal Process. Mag., 2012

Deep Neural Network Language Models.

[BibT_eX]

[DOI]

,

Tara N. Sainath

,

Brian Kingsbury

,

Bhuvana Ramabhadran

Proceedings of the Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT, 2012

Enhancing Exemplar-Based Posteriors for Speech Recognition Tasks.

[BibT_eX]

[DOI]

Tara N. Sainath

,

,

Dimitri Kanevsky

,

Bhuvana Ramabhadran

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Scalable Minimum Bayes Risk Training of Deep Neural Network Acoustic Models Using Distributed Hessian-free Optimization.

[BibT_eX]

[DOI]

Brian Kingsbury

,

Tara N. Sainath

,

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Auto-encoder bottleneck features using deep belief networks.

[BibT_eX]

[DOI]

Tara N. Sainath

,

Brian Kingsbury

,

Bhuvana Ramabhadran

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Improved pre-training of Deep Belief Networks using Sparse Encoding Symmetric Machines.

[BibT_eX]

[DOI]

Christian Plahl

,

Tara N. Sainath

,

Bhuvana Ramabhadran

,

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

N-best entropy based data selection for acoustic modeling.

[BibT_eX]

[DOI]

,

Tara N. Sainath

,

,

,

Bhuvana Ramabhadran

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

2011

Exemplar-Based Sparse Representation Features: From TIMIT to LVCSR.

[BibT_eX]

[DOI]

Tara N. Sainath

,

Bhuvana Ramabhadran

,

Michael Picheny

,

,

Dimitri Kanevsky

IEEE ACM Trans. Audio Speech Lang. Process., 2011

Reducing Computational Complexities of Exemplar-Based Sparse Representations with Applications to Large Vocabulary Speech Recognition.

[BibT_eX]

[DOI]

Tara N. Sainath

,

Bhuvana Ramabhadran

,

,

Dimitri Kanevsky

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Convergence of Line Search A-Function Methods.

[BibT_eX]

[DOI]

Dimitri Kanevsky

,

,

Tara N. Sainath

,

Bhuvana Ramabhadran

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Application specific loss minimization using gradient boosting.

[BibT_eX]

[DOI]

,

,

Tara N. Sainath

,

Bhuvana Ramabhadran

Proceedings of the IEEE International Conference on Acoustics, 2011

Exemplar-based Sparse Representation phone identification features.

[BibT_eX]

[DOI]

Tara N. Sainath

,

,

Bhuvana Ramabhadran

,

Dimitri Kanevsky

,

,

Parikshit M. Shah

Proceedings of the IEEE International Conference on Acoustics, 2011

Deep Belief Networks using discriminative features for phone recognition.

[BibT_eX]

[DOI]

Abdel-rahman Mohamed

,

Tara N. Sainath

,

,

Bhuvana Ramabhadran

,

Geoffrey E. Hinton

,

Michael A. Picheny

Proceedings of the IEEE International Conference on Acoustics, 2011

A-Functions: A generalization of Extended Baum-Welch transformations to convex optimization.

[BibT_eX]

[DOI]

Dimitri Kanevsky

,

,

Tara N. Sainath

,

Bhuvana Ramabhadran

,

Proceedings of the IEEE International Conference on Acoustics, 2011

A convex hull approach to sparse representations for exemplar-based speech recognition.

[BibT_eX]

[DOI]

Tara N. Sainath

,

,

Dimitri Kanevsky

,

Bhuvana Ramabhadran

,

Parikshit M. Shah

Proceedings of the 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, 2011

Making Deep Belief Networks effective for large vocabulary continuous speech recognition.

[BibT_eX]

[DOI]

Tara N. Sainath

,

Brian Kingsbury

,

Bhuvana Ramabhadran

,

,

,

Abdel-rahman Mohamed

Proceedings of the 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, 2011

2010

Data selection for language modeling using sparse representations.

[BibT_eX]

[DOI]

,

Tara N. Sainath

,

Bhuvana Ramabhadran

,

Dimitri Kanevsky

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Sparse representation features for speech recognition.

[BibT_eX]

[DOI]

Tara N. Sainath

,

Bhuvana Ramabhadran

,

,

Dimitri Kanevsky

,

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Sparse representations for text categorization.

[BibT_eX]

[DOI]

Tara N. Sainath

,

,

Dimitri Kanevsky

,

Bhuvana Ramabhadran

,

,

Julia Hirschberg

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

An analysis of sparseness and regularization in exemplar-based methods for speech classification.

[BibT_eX]

[DOI]

Dimitri Kanevsky

,

Tara N. Sainath

,

Bhuvana Ramabhadran

,

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Incorporating sparse representation phone identification features in automatic speech recognition using exponential families.

[BibT_eX]

[DOI]

,

Tara N. Sainath

,

Bhuvana Ramabhadran

,

,

,

Dimitri Kanevsky

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

A voice-commandable robotic forklift working alongside humans in minimally-prepared outdoor environments.

[BibT_eX]

[DOI]

,

Matthew R. Walter

,

Matthew E. Antone

,

,

,

,

Emilio Frazzoli

,

,

Jonathan P. How

,

Albert S. Huang

,

Jeong hwan Jeon

,

,

,

,

Tara N. Sainath

Proceedings of the IEEE International Conference on Robotics and Automation, 2010

Bayesian compressive sensing for phonetic classification.

[BibT_eX]

[DOI]

Tara N. Sainath

,

,

Dimitri Kanevsky

,

Bhuvana Ramabhadran

Proceedings of the IEEE International Conference on Acoustics, 2010

The Use of isometric transformations and bayesian estimation in compressive sensing for fMRI classification.

[BibT_eX]

[DOI]

,

Tara N. Sainath

,

,

Dimitri Kanevsky

,

,

Bhuvana Ramabhadran

Proceedings of the IEEE International Conference on Acoustics, 2010

Kalman filtering for compressed sensing.

[BibT_eX]

[DOI]

Dimitri Kanevsky

,

,

,

,

Bhuvana Ramabhadran

,

Tara N. Sainath

Proceedings of the 13th Conference on Information Fusion, 2010

2009

Applications of broad class knowledge for noise robust speech recognition.

[BibT_eX]

[DOI]

Tara N. Sainath

PhD thesis, 2009

A generalized family of parameter estimation techniques.

[BibT_eX]

[DOI]

Dimitri Kanevsky

,

Tara N. Sainath

,

Bhuvana Ramabhadran

Proceedings of the IEEE International Conference on Acoustics, 2009

An exploration of large vocabulary tools for small vocabulary phonetic recognition.

[BibT_eX]

[DOI]

Tara N. Sainath

,

Bhuvana Ramabhadran

,

Michael Picheny

Proceedings of the 2009 IEEE Workshop on Automatic Speech Recognition & Understanding, 2009

Island-driven search using broad phonetic classes.

[BibT_eX]

[DOI]

Tara N. Sainath

Proceedings of the 2009 IEEE Workshop on Automatic Speech Recognition & Understanding, 2009

2008

A comparison of broad phonetic and acoustic units for noise robust segment-based phonetic recognition.

[BibT_eX]

[DOI]

Tara N. Sainath

,

Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Generalization of extended baum-welch parameter estimation for discriminative training and decoding.

[BibT_eX]

[DOI]

Dimitri Kanevsky

,

Tara N. Sainath

,

Bhuvana Ramabhadran

,

Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Gradient steepness metrics using extended Baum-Welch transformations for universal pattern recognition tasks.

[BibT_eX]

[DOI]

Tara N. Sainath

,

Dimitri Kanevsky

,

Bhuvana Ramabhadran

Proceedings of the IEEE International Conference on Acoustics, 2008

2007

Audio classification using extended baum-welch transformations.

[BibT_eX]

[DOI]

Tara N. Sainath

,

,

Dimitri Kanevsky

Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Unsupervised Audio Segmentation using Extended Baum-Welch Transformations.

[BibT_eX]

[DOI]

Tara N. Sainath

,

Dimitri Kanevsky

,

Giridharan Iyengar

Proceedings of the IEEE International Conference on Acoustics, 2007

Broad phonetic class recognition in a Hidden Markov model framework using extended Baum-Welch transformations.

[BibT_eX]

[DOI]

Tara N. Sainath

,

Dimitri Kanevsky

,

Bhuvana Ramabhadran

Proceedings of the IEEE Workshop on Automatic Speech Recognition & Understanding, 2007

2006

A Sinusoidal Model Approach to Acoustic Landmark Detection and Segmentation for Robust Segment-Based Speech Recognition.

[BibT_eX]

[DOI]

Tara N. Sainath

,

Timothy J. Hazen

Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

Loading...