Pedro J. Moreno

Affiliations:
  • Google. Inc., Mountain View, CA, USA


According to our database1, Pedro J. Moreno authored at least 110 papers between 1991 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Can DeepFake Speech be Reliably Detected?
CoRR, 2024

TransformerFAM: Feedback attention is working memory.
CoRR, 2024

Hierarchical Recurrent Adapters for Efficient Multi-Task Adaptation of Large Speech Models.
CoRR, 2024

Massive End-to-end Speech Recognition Models with Time Reduction.
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024

Extreme Encoder Output Frame Rate Reduction: Improving Computational Latencies of Large End-to-End Models.
Proceedings of the IEEE International Conference on Acoustics, 2024

Improving Speech Recognition for African American English with Audio Classification.
Proceedings of the IEEE International Conference on Acoustics, 2024

2023
Contextual Biasing with the Knuth-Morris-Pratt Matching Algorithm.
CoRR, 2023

Massive End-to-end Models for Short Search Queries.
CoRR, 2023

Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages.
CoRR, 2023

Modular Domain Adaptation for Conformer-Based Streaming ASR.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Re-investigating the Efficient Transfer Learning of Speech Foundation Model using Feature Fusion Methods.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Large-Scale Language Model Rescoring on Long-Form Data.
Proceedings of the IEEE International Conference on Acoustics, 2023

Modular Conformer Training for Flexible End-to-End ASR.
Proceedings of the IEEE International Conference on Acoustics, 2023

Audio-Adapterfusion: A Task-Id-Free Approach for Efficient and Non-Destructive Multi-Task Speech Recognition.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

2022
Ask2Mask: Guided Data Selection for Masked Speech Modeling.
IEEE J. Sel. Top. Signal Process., 2022

G-Augment: Searching for the Meta-Structure of Data Augmentation Policies for ASR.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Modular Hybrid Autoregressive Transducer.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Maestro-U: Leveraging Joint Speech-Text Representation Learning for Zero Supervised Speech ASR.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Non-Parallel Voice Conversion for ASR Augmentation.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

MAESTRO: Matched Speech Text Representations through Modality Matching.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

A Scalable Model Specialization Framework for Training and Inference using Submodels and its Application to Speech Model Personalization.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Analysis of Self-Attention Head Diversity for Conformer-based Automatic Speech Recognition.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Multilingual Second-Pass Rescoring for Automatic Speech Recognition Systems.
Proceedings of the IEEE International Conference on Acoustics, 2022

Tts4pretrain 2.0: Advancing the use of Text and Speech in ASR Pretraining with Consistency and Contrastive Losses.
Proceedings of the IEEE International Conference on Acoustics, 2022

2021
Self-Adaptive Distillation for Multilingual Speech Recognition: Leveraging Student Independence.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Semi-Supervision in ASR: Sequential MixMatch and Factorized TTS-Based Augmentation.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Conformer Parrotron: A Faster and Stronger End-to-End Speech Conversion and Recognition Model for Atypical Speech.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Mixture Model Attention: Flexible Streaming and Non-Streaming Automatic Speech Recognition.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Mixture of Informed Experts for Multilingual Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2021

Extending Parrotron: An End-to-End, Speech Conversion and Speech Recognition Model for Atypical Speech.
Proceedings of the IEEE International Conference on Acoustics, 2021

Injecting Text in Self-Supervised Speech Pretraining.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

2020
Multilingual Speech Recognition with Self-Attention Structured Parameterization.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

SCADA: Stochastic, Consistent and Adversarial Data Augmentation to Improve ASR.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Improving Speech Recognition Using GAN-Based Speech Synthesis and Contrastive Unspoken Text Selection.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Improving Speech Recognition Using Consistent Predictions on Synthesized Speech.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Neural Oracle Search on N-BEST Hypotheses.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019
Parrotron: An End-to-End Speech-to-Speech Conversion Model and its Applications to Hearing-Impaired Speech and Speech Separation.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Leveraging Language ID in Multilingual End-to-End Speech Recognition.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Speech Recognition with Augmented Synthesized Speech.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

2018
From Audio to Semantics: Approaches to End-to-End Spoken Language Understanding.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Transliteration Based Approaches to Improve Code-Switched Speech Recognition Performance.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Semantic Lattice Processing in Contextual Automatic Speech Recognition for Google Assistant.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Multilingual Speech Recognition with a Single End-to-End Model.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Hybrid Lstm-Fsmn Networks for Acoustic Modeling.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Modeling Non-Linguistic Contextual Signals in LSTM Language Models Via Domain Adaptation.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017
Syllable-based acoustic modeling with CTC-SMBR-LSTM.
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

Speech Research at Google to Enable Universal Speech Interfaces.
Proceedings of the New Era for Robust Speech Recognition, Exploiting Deep Learning., 2017

2016
On the use of deep feedforward neural networks for automatic language identification.
Comput. Speech Lang., 2016

High quality agreement-based semi-supervised training data for acoustic modeling.
Proceedings of the 2016 IEEE Spoken Language Technology Workshop, 2016

Towards acoustic model unification across dialects.
Proceedings of the 2016 IEEE Spoken Language Technology Workshop, 2016

Selection and combination of hypotheses for dialectal speech recognition.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

2015
Frame-by-frame language identification in short utterances using deep neural networks.
Neural Networks, 2015

A Real-Time End-to-End Multilingual Speech Recognition Architecture.
IEEE J. Sel. Top. Signal Process., 2015

Bringing contextual information to google speech recognition.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Multi-Dialectical Languages Effect on Speech Recognition: Too Much Choice Can Hurt.
Proceedings of the 1st International Conference on Natural Language and Speech Processing, 2015

Improved recognition of contact names in voice commands.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

2014
Asynchronous stochastic optimization for sequence training of deep neural networks: towards big data.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

A big data approach to acoustic model training corpus selection.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Automatic language identification using long short-term memory recurrent neural networks.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Backoff inspired features for maximum entropy language models.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Automatic language identification using deep neural networks.
Proceedings of the IEEE International Conference on Acoustics, 2014

2012
Google's cross-dialect Arabic voice search.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

2011
Deploying Google Search by Voice in Cantonese.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

2010
Efficient and Robust Music Identification With Weighted Finite-State Transducers.
IEEE Trans. Speech Audio Process., 2010

Discriminative Topic Segmentation of Text and Speech.
Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, 2010

Search by voice in Mandarin Chinese.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Building transcribed speech corpora quickly and cheaply for many languages.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Voice search for development.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

2009
General suffix automaton construction algorithm and space bounds.
Theor. Comput. Sci., 2009

A new quality measure for topic segmentation of text and speech.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Audiovisual celebrity recognition in unconstrained web videos.
Proceedings of the IEEE International Conference on Acoustics, 2009

A factor automaton approach for the forced alignment of long speech recordings.
Proceedings of the IEEE International Conference on Acoustics, 2009

An audio indexing system for election video material.
Proceedings of the IEEE International Conference on Acoustics, 2009

2007
Bridging the Gap: Query by Semantic Example.
IEEE Trans. Multim., 2007

Supervised Learning of Semantic Classes for Image Annotation and Retrieval.
IEEE Trans. Pattern Anal. Mach. Intell., 2007

Factor Automata of Automata and Applications.
Proceedings of the Implementation and Application of Automata, 2007

Robust Music Identification, Detection, and Analysis.
Proceedings of the 8th International Conference on Music Information Retrieval, 2007

Music Identification with Weighted Finite-State Transducers.
Proceedings of the IEEE International Conference on Acoustics, 2007

2006
Query by Semantic Example.
Proceedings of the Image and Video Retrieval, 5th International Conference, 2006

2005
Approaches to reduce the effects of OOV queries on indexed spoken audio.
IEEE Trans. Multim., 2005

2004
SVM kernel adaptation in speaker classification and verification.
Proceedings of the 8th International Conference on Spoken Language Processing, 2004

News Tuner: a simple interface for searching and browsing radio archives.
Proceedings of the 2004 IEEE International Conference on Multimedia and Expo, 2004

Semantic analysis of song lyrics.
Proceedings of the 2004 IEEE International Conference on Multimedia and Expo, 2004

The Kullback-Leibler Kernel as a Framework for Discriminant and Localized Representations for Visual Recognition.
Proceedings of the Computer Vision, 2004

2003
A Kullback-Leibler Divergence Based Kernel for SVM Classification in Multimedia Applications.
Proceedings of the Advances in Neural Information Processing Systems 16 [Neural Information Processing Systems, 2003

A new SVM approach to speaker identification and verification using probabilistic distance kernels.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

2002
Speechbot: an experimental speech-based search engine for multimedia content on the web.
IEEE Trans. Multim., 2002

From Multimedia Retrieval to Knowledge Management.
Computer, 2002

2001
Topic Segmentation with an Aspect Hidden Markov Model.
Proceedings of the SIGIR 2001: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2001

A boosting approach for confidence scoring.
Proceedings of the EUROSPEECH 2001 Scandinavia, 2001

2000
SpeechBot: a Speech Recognition based Audio Indexing System for the Web.
Proceedings of the Computer-Assisted Information Retrieval (Recherche d'Information et ses Applications), 2000

An experimental study of an audio indexing system for the web.
Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

Using the Fisher kernel method for Web audio classification.
Proceedings of the IEEE International Conference on Acoustics, 2000

1999
Indexing Multimedia for the Internet.
Proceedings of the Visual Information and Information Systems, 1999

On the use of support vector machines for phonetic classification.
Proceedings of the 1999 IEEE International Conference on Acoustics, 1999

1998
Data-driven environmental compensation for speech recognition: A unified approach.
Speech Commun., 1998

A recursive algorithm for the forced alignment of very long audio segments.
Proceedings of the 5th International Conference on Spoken Language Processing, Incorporating The 7th Australian International Speech Science and Technology Conference, Sydney Convention Centre, Sydney, Australia, 30th November, 1998

Factorial HMMs for acoustic modeling.
Proceedings of the 1998 IEEE International Conference on Acoustics, 1998

1997
A new algorithm for robust speech recognition: the delta vector taylor series approach.
Proceedings of the Fifth European Conference on Speech Communication and Technology, 1997

Delta vector taylor series environment compensation for speaker recognition.
Proceedings of the Fifth European Conference on Speech Communication and Technology, 1997

1996
Cepstral compensation by polynomial approximation for environment-independent speech recognition.
Proceedings of the 4th International Conference on Spoken Language Processing, 1996

A vector Taylor series approach for environment-independent speech recognition.
Proceedings of the 1996 IEEE International Conference on Acoustics, 1996

1995
A unified approach for robust speech recognition.
Proceedings of the Fourth European Conference on Speech Communication and Technology, 1995

Multivariate-Gaussian-based cepstral normalization for robust speech recognition.
Proceedings of the 1995 International Conference on Acoustics, 1995

1994
Signal processing for robust speech recognition.
Proceedings of the 3rd International Conference on Spoken Language Processing, 1994

Sources of degradation of speech recognition in the telephone network.
Proceedings of ICASSP '94: IEEE International Conference on Acoustics, 1994

Environment normalization for robust speech recognition using direct cepstral comparison.
Proceedings of ICASSP '94: IEEE International Conference on Acoustics, 1994

1992
A spoken language translator for restricted-domain context-free languages.
Speech Commun., 1992

Efficient grammar processing for a spoken language translation system.
Proceedings of the 1992 IEEE International Conference on Acoustics, 1992

1991
Toward a spoken language translator for restricted-domain context-free languages.
Proceedings of the Second European Conference on Speech Communication and Technology, 1991


  Loading...