Pedro J. Moreno

Proceedings of the IEEE International Conference on Acoustics, 2024

2023

Contextual Biasing with the Knuth-Morris-Pratt Matching Algorithm.

[BibT_eX]

[DOI]

Weiran Wang

Zelin Wu

Diamantino Caseiro

Tsendsuren Munkhdalai

CoRR, 2023

Massive End-to-end Models for Short Search Queries.

[BibT_eX]

[DOI]

CoRR, 2023

Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages.

[BibT_eX]

[DOI]

CoRR, 2023

Modular Domain Adaptation for Conformer-Based Streaming ASR.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Re-investigating the Efficient Transfer Learning of Speech Foundation Model using Feature Fusion Methods.

[BibT_eX]

[DOI]

Zhouyuan Huo

Khe Chai Sim

Dongseong Hwang

Tsendsuren Munkhdalai

Tara N. Sainath

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Large-Scale Language Model Rescoring on Long-Form Data.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Modular Conformer Training for Flexible End-to-End ASR.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Audio-Adapterfusion: A Task-Id-Free Approach for Efficient and Non-Destructive Multi-Task Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

2022

Ask2Mask: Guided Data Selection for Masked Speech Modeling.

[BibT_eX]

[DOI]

Murali Karthick Baskar

IEEE J. Sel. Top. Signal Process., 2022

G-Augment: Searching for the Meta-Structure of Data Augmentation Policies for ASR.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Modular Hybrid Autoregressive Transducer.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Maestro-U: Leveraging Joint Speech-Text Representation Learning for Zero Supervised Speech ASR.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Non-Parallel Voice Conversion for ASR Augmentation.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

MAESTRO: Matched Speech Text Representations through Modality Matching.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

A Scalable Model Specialization Framework for Training and Inference using Submodels and its Application to Speech Model Personalization.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Analysis of Self-Attention Head Diversity for Conformer-based Automatic Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Multilingual Second-Pass Rescoring for Automatic Speech Recognition Systems.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Tts4pretrain 2.0: Advancing the use of Text and Speech in ASR Pretraining with Consistency and Contrastive Losses.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

2021

Self-Adaptive Distillation for Multilingual Speech Recognition: Leveraging Student Independence.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Semi-Supervision in ASR: Sequential MixMatch and Factorized TTS-Based Augmentation.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Conformer Parrotron: A Faster and Stronger End-to-End Speech Conversion and Recognition Model for Atypical Speech.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Mixture Model Attention: Flexible Streaming and Non-Streaming Automatic Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Mixture of Informed Experts for Multilingual Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Extending Parrotron: An End-to-End, Speech Conversion and Speech Recognition Model for Atypical Speech.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Injecting Text in Self-Supervised Speech Pretraining.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

2020

Multilingual Speech Recognition with Self-Attention Structured Parameterization.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

SCADA: Stochastic, Consistent and Adversarial Data Augmentation to Improve ASR.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Improving Speech Recognition Using GAN-Based Speech Synthesis and Contrastive Unspoken Text Selection.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Improving Speech Recognition Using Consistent Predictions on Synthesized Speech.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Neural Oracle Search on N-BEST Hypotheses.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

Parrotron: An End-to-End Speech-to-Speech Conversion Model and its Applications to Hearing-Impaired Speech and Speech Separation.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Leveraging Language ID in Multilingual End-to-End Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Speech Recognition with Augmented Synthesized Speech.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

2018

From Audio to Semantics: Approaches to End-to-End Spoken Language Understanding.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Transliteration Based Approaches to Improve Code-Switched Speech Recognition Performance.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Semantic Lattice Processing in Contextual Automatic Speech Recognition for Google Assistant.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Multilingual Speech Recognition with a Single End-to-End Model.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Hybrid Lstm-Fsmn Networks for Acoustic Modeling.

[BibT_eX]

[DOI]

Asa Oines

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Modeling Non-Linguistic Contextual Signals in LSTM Language Models Via Domain Adaptation.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017

Syllable-based acoustic modeling with CTC-SMBR-LSTM.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

Speech Research at Google to Enable Universal Speech Interfaces.

[BibT_eX]

[DOI]

Proceedings of the New Era for Robust Speech Recognition, Exploiting Deep Learning., 2017

2016

On the use of deep feedforward neural networks for automatic language identification.

[BibT_eX]

[DOI]

Joaquin Gonzalez-Rodriguez

David Martinez

Oldrich Plchot

Comput. Speech Lang., 2016

High quality agreement-based semi-supervised training data for acoustic modeling.

[BibT_eX]

[DOI]

Felix de Chaumont Quitry

Asa Oines

Proceedings of the 2016 IEEE Spoken Language Technology Workshop, 2016

Towards acoustic model unification across dialects.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE Spoken Language Technology Workshop, 2016

Selection and combination of hypotheses for dialectal speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

2015

Frame-by-frame language identification in short utterances using deep neural networks.

[BibT_eX]

[DOI]

Joaquín González-Rodríguez

Neural Networks, 2015

A Real-Time End-to-End Multilingual Speech Recognition Architecture.

[BibT_eX]

[DOI]

IEEE J. Sel. Top. Signal Process., 2015

Bringing contextual information to google speech recognition.

[BibT_eX]

[DOI]

Petar S. Aleksic

Mohammadreza Ghodsi

Assaf Hurwitz Michaely

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Multi-Dialectical Languages Effect on Speech Recognition: Too Much Choice Can Hurt.

[BibT_eX]

[DOI]

Mohamed G. Elfeky

Victor Soto

Proceedings of the 1st International Conference on Natural Language and Speech Processing, 2015

Improved recognition of contact names in voice commands.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

2014

Asynchronous stochastic optimization for sequence training of deep neural networks: towards big data.

[BibT_eX]

[DOI]

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

A big data approach to acoustic model training corpus selection.

[BibT_eX]

[DOI]

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Automatic language identification using long short-term memory recurrent neural networks.

[BibT_eX]

[DOI]

Joaquin Gonzalez-Rodriguez

Hasim Sak

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Backoff inspired features for maximum entropy language models.

[BibT_eX]

[DOI]

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Automatic language identification using deep neural networks.

[BibT_eX]

[DOI]

Joaquin Gonzalez-Rodriguez

Oldrich Plchot

David Martinez

Proceedings of the IEEE International Conference on Acoustics, 2014

2012

Google's cross-dialect Arabic voice search.

[BibT_eX]

[DOI]

Fadi Biadsy

Martin Jansche

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

2011

Deploying Google Search by Voice in Cantonese.

[BibT_eX]

[DOI]

Yun-Hsuan Sung

Martin Jansche

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

2010

Efficient and Robust Music Identification With Weighted Finite-State Transducers.

[BibT_eX]

[DOI]

IEEE Trans. Speech Audio Process., 2010

Discriminative Topic Segmentation of Text and Speech.

[BibT_eX]

[DOI]

Charl Johannes van Heerden

Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, 2010

Search by voice in Mandarin Chinese.

[BibT_eX]

[DOI]

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Building transcribed speech corpora quickly and cheaply for many languages.

[BibT_eX]

[DOI]

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Voice search for development.

[BibT_eX]

[DOI]

Etienne Barnard

Johan Schalkwyk

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

2009

General suffix automaton construction algorithm and space bounds.

[BibT_eX]

[DOI]

Theor. Comput. Sci., 2009

A new quality measure for topic segmentation of text and speech.

[BibT_eX]

[DOI]

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Audiovisual celebrity recognition in unconstrained web videos.

[BibT_eX]

[DOI]

Mehmet Emre Sargin

Hrishikesh B. Aradhye

Ming Zhao

Proceedings of the IEEE International Conference on Acoustics, 2009

A factor automaton approach for the forced alignment of long speech recordings.

[BibT_eX]

[DOI]

Christopher Alberti

Proceedings of the IEEE International Conference on Acoustics, 2009

An audio indexing system for election video material.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2009

2007

Bridging the Gap: Query by Semantic Example.

[BibT_eX]

[DOI]

Nikhil Rasiwasia

IEEE Trans. Multim., 2007

Supervised Learning of Semantic Classes for Image Annotation and Retrieval.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2007

Factor Automata of Automata and Applications.

[BibT_eX]

[DOI]

Proceedings of the Implementation and Application of Automata, 2007

Robust Music Identification, Detection, and Analysis.

[BibT_eX]

[DOI]

Proceedings of the 8th International Conference on Music Information Retrieval, 2007

Music Identification with Weighted Finite-State Transducers.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2007

2006

Query by Semantic Example.

[BibT_eX]

[DOI]

Nikhil Rasiwasia

Proceedings of the Image and Video Retrieval, 5th International Conference, 2006

2005

Approaches to reduce the effects of OOV queries on indexed spoken audio.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2005

2004

SVM kernel adaptation in speaker classification and verification.

[BibT_eX]

[DOI]

Proceedings of the 8th International Conference on Spoken Language Processing, 2004

News Tuner: a simple interface for searching and browsing radio archives.

[BibT_eX]

[DOI]

Proceedings of the 2004 IEEE International Conference on Multimedia and Expo, 2004

Semantic analysis of song lyrics.

[BibT_eX]

[DOI]

A. Kositsky

Proceedings of the 2004 IEEE International Conference on Multimedia and Expo, 2004

The Kullback-Leibler Kernel as a Framework for Discriminant and Localized Representations for Visual Recognition.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision, 2004

2003

A Kullback-Leibler Divergence Based Kernel for SVM Classification in Multimedia Applications.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 16 [Neural Information Processing Systems, 2003

A new SVM approach to speaker identification and verification using probabilistic distance kernels.

[BibT_eX]

[DOI]

Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

2002

Speechbot: an experimental speech-based search engine for multimedia content on the web.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2002

From Multimedia Retrieval to Knowledge Management.

[BibT_eX]

[DOI]

Gareth J. F. Jones

Computer, 2002

2001

Topic Segmentation with an Aspect Hidden Markov Model.

[BibT_eX]

[DOI]

David M. Blei

Proceedings of the SIGIR 2001: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2001

A boosting approach for confidence scoring.

[BibT_eX]

[DOI]

Bhiksha Raj

Proceedings of the EUROSPEECH 2001 Scandinavia, 2001

2000

SpeechBot: a Speech Recognition based Audio Indexing System for the Web.

[BibT_eX]

[DOI]

Proceedings of the Computer-Assisted Information Retrieval (Recherche d'Information et ses Applications), 2000

An experimental study of an audio indexing system for the web.

[BibT_eX]

[DOI]

Edward W. D. Whittaker

Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

Using the Fisher kernel method for Web audio classification.

[BibT_eX]

[DOI]

Leonidas I. Kontothanassis

Ryan Rifkin

Proceedings of the IEEE International Conference on Acoustics, 2000

1999

Indexing Multimedia for the Internet.

[BibT_eX]

[DOI]

David E. Kovalcin

Michael J. Swain

Proceedings of the Visual Information and Information Systems, 1999

On the use of support vector machines for phonetic classification.

[BibT_eX]

[DOI]

Philip Clarkson

Proceedings of the 1999 IEEE International Conference on Acoustics, 1999

1998

Data-driven environmental compensation for speech recognition: A unified approach.

[BibT_eX]

[DOI]

Bhiksha Raj

Richard M. Stern

Speech Commun., 1998

A recursive algorithm for the forced alignment of very long audio segments.

[BibT_eX]

[DOI]

Christopher F. Joerg

Oren Glickman

Proceedings of the 5th International Conference on Spoken Language Processing, Incorporating The 7th Australian International Speech Science and Technology Conference, Sydney Convention Centre, Sydney, Australia, 30th November, 1998

Factorial HMMs for acoustic modeling.

[BibT_eX]

[DOI]

Proceedings of the 1998 IEEE International Conference on Acoustics, 1998

1997

A new algorithm for robust speech recognition: the delta vector taylor series approach.

[BibT_eX]

[DOI]