Frank Seide

Ernie Chang

CoRR, 2024

Speech ReaLLM - Real-time Streaming Speech Recognition with Multimodal LLMs by Teaching the Flow of Time.

[BibT_eX]

[DOI]

CoRR, 2024

AGADIR: Towards Array-Geometry Agnostic Directional Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Effective Internal Language Model Training and Fusion for Factorized Transducer Model.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

2023

Directional Source Separation for Robust Speech Recognition on Smart Glasses.

[BibT_eX]

[DOI]

CoRR, 2023

DISGO: Automatic End-to-End Evaluation for Scene Text OCR.

[BibT_eX]

[DOI]

CoRR, 2023

Directional Speech Recognition for Speaker Disambiguation and Cross-talk Suppression.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Factorized Blank Thresholding for Improved Runtime Efficiency of Neural Transducers.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Joint Federated Learning and Personalization for on-Device ASR.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

2022

An Investigation of Monotonic Transducers for Large-Scale Automatic Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Federated Domain Adaptation for ASR with Full Self-Supervision.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

2018

Achieving Human Parity on Automatic Chinese to English News Translation.

[BibT_eX]

[DOI]

Marcin Junczys-Dowmunt

CoRR, 2018

Marian: Fast Neural Machine Translation in C++.

[BibT_eX]

[DOI]

Marcin Junczys-Dowmunt

Proceedings of ACL 2018, Melbourne, Australia, July 15-20, 2018, System Demonstrations, 2018

2017

Toward Human Parity in Conversational Speech Recognition.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2017

The microsoft 2016 conversational speech recognition system.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

2016

Achieving Human Parity in Conversational Speech Recognition.

[BibT_eX]

[DOI]

CoRR, 2016

CNTK: Microsoft's Open-Source Deep-Learning Toolkit.

[BibT_eX]

[DOI]

Amit Agarwal

Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016

2015

Deep bi-directional recurrent networks over spectral windows.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

2014

An introduction to computational networks and the computational network toolkit (invited talk).

[BibT_eX]

[DOI]

Christopher J. Rossbach

Jon Currey

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs.

[BibT_eX]

[DOI]

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

On parallelizability of stochastic gradient descent for speech DNNS.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2014

2013

The Deep Tensor Neural Network With Applications to Large Vocabulary Speech Recognition.

[BibT_eX]

[DOI]

Dong Yu

IEEE Trans. Speech Audio Process., 2013

Feature Learning in Deep Neural Networks - A Study on Speech Recognition Tasks

[BibT_eX]

[DOI]

Proceedings of the 1st International Conference on Learning Representations, 2013

MSR-FBK IWSLT 2013 SLT system description.

[BibT_eX]

[DOI]

Proceedings of the 10th International Workshop on Spoken Language Translation: Evaluation Campaign@IWSLT 2013, 2013

A new language independent, photo-realistic talking head driven by voice only.

[BibT_eX]

[DOI]

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

KL-divergence regularized deep neural network adaptation for improved large vocabulary speech recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2013

Error back propagation for sequence training of Context-Dependent Deep NetworkS for conversational speech transcription.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2013

Recent advances in deep learning for speech research at Microsoft.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2013

2012

Adaptation of context-dependent deep neural networks for automatic speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE Spoken Language Technology Workshop (SLT), 2012

Context-dependent Deep Neural Networks for audio indexing of real-life data.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE Spoken Language Technology Workshop (SLT), 2012

Large Vocabulary Speech Recognition Using Deep Tensor Neural Networks.

[BibT_eX]

[DOI]

Dong Yu

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Voice Activity Detection Using Speech Recognizer Feedback.

[BibT_eX]

[DOI]

Kit Thambiratnam

Weiwu Zhu

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

ClippyScript: A Programming Language for Multi-Domain Dialogue Systems.

[BibT_eX]

[DOI]

Sean McDirmid

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Pipelined Back-Propagation for Context-Dependent Deep Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Exploiting sparseness in deep neural networks for large vocabulary speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

2011

Conversational Speech Transcription Using Context-Dependent Deep Neural Networks.

[BibT_eX]

[DOI]

Gang Li

Dong Yu

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Leveraging the Web for automatically generating indexable and browsable keywords for speech files.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2011

Feature engineering in Context-Dependent Deep Neural Networks for conversational speech transcription.

[BibT_eX]

[DOI]

Proceedings of the 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, 2011

Subword-based multi-span pronunciation adaptation for recognizing accented speech.

[BibT_eX]

[DOI]

Timo Mertens

Kit Thambiratnam

Proceedings of the 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, 2011

2010

On using missing-feature theory with cepstral features - approximations to the multivariate integral.

[BibT_eX]

[DOI]

Pei Zhao

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Vocabulary and language model adaptation using just one speech file.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2010

Music rhythm characterization with application to workout-mix generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2010

2009

Multimedia retrieval through indexing speech: an enterprise perspective.

[BibT_eX]

[DOI]

Proceedings of the third workshop on Searching spontaneous conversational speech, 2009

Unsupervised lattice-based acoustic model adaptation for speaker-dependent conversational telephone speech transcription.

[BibT_eX]

[DOI]

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Learning a music similarity measure on automatic annotations with application to playlist generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2009

Unsupervised speaker adaptation for telephone call transcription.

[BibT_eX]

[DOI]

R. Wallace

Proceedings of the IEEE International Conference on Acoustics, 2009

Automatic punctuation generation for speech.

[BibT_eX]

[DOI]

Proceedings of the 2009 IEEE Workshop on Automatic Speech Recognition & Understanding, 2009

2008

Mobile Search With Multimodal Queries.

[BibT_eX]

[DOI]

Proc. IEEE, 2008

Word-lattice based spoken-document indexing with standard text indexers.

[BibT_eX]

[DOI]

Roger Peng Yu

Proceedings of the 2008 IEEE Spoken Language Technology Workshop, 2008

Fragmented context-dependent syllable acoustic models.

[BibT_eX]

[DOI]

Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

GPU-accelerated Gaussian clustering for fMPE discriminative training.

[BibT_eX]

[DOI]

Yu Shi

Frank K. Soong

Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Towards vocabulary-independent speech indexing for large-scale repositories.

[BibT_eX]

[DOI]

Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Addressing the out-of-vocabulary problem for large-scale Chinese spoken term detection.

[BibT_eX]

[DOI]

Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Approximateword-lattice indexing with text indexers: Time-Anchored Lattice Expansion.

[BibT_eX]

[DOI]

Yu Shi

Proceedings of the IEEE International Conference on Acoustics, 2008

Fusing multiple systems into a compact lattice index for chinese spoken term detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2008

Mobile ringtone search through query by humming.

[BibT_eX]

[DOI]

Lie Lu

Proceedings of the IEEE International Conference on Acoustics, 2008

2007

Learning spoken document similarity and recommendation using supervised probabilistic latent semantic analysis.

[BibT_eX]

[DOI]

Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Online vocabulary adaptation using limited adaptation data.

[BibT_eX]

[DOI]

C. E. Liu

Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

A Hidden-State Maximum Entropy Model Forword Confidence Estimation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2007

Towards spoken-document retrieval for the enterprise: Approximate word-lattice indexing with text indexers.

[BibT_eX]

[DOI]

Yu Shi

Proceedings of the IEEE Workshop on Automatic Speech Recognition & Understanding, 2007

A study of lattice-based spoken term detection for Chinese spontaneous speech.

[BibT_eX]

[DOI]

Proceedings of the IEEE Workshop on Automatic Speech Recognition & Understanding, 2007

2006

Discriminatively Trained spoken Document Similarity Models and their Application to Probabilistic Latent Semantic Analysis.

[BibT_eX]

[DOI]

Proceedings of the 2006 IEEE ACL Spoken Language Technology Workshop, 2006

Towards Spoken-Document Retrieval for the Internet: Lattice Indexing For Large-Scale Web-Search Architectures.

[BibT_eX]

[DOI]

Proceedings of the Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, 2006

Maximum Entropy Based Normalization Of Word Posteriors For Phonetic And Lvcsr Lattice Search.

[BibT_eX]

[DOI]

Duo Zhang

Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

2005

Vocabulary-Independent Indexing of Spontaneous Speech.

[BibT_eX]

[DOI]

IEEE Trans. Speech Audio Process., 2005

The use of virtual hypothesis copies in decoding of large-vocabulary continuous speech.

[BibT_eX]

[DOI]

IEEE Trans. Speech Audio Process., 2005

Searching the Audio Notebook: Keyword Search in Recorded Conversation.

[BibT_eX]

[DOI]

Proceedings of the HLT/EMNLP 2005, 2005

Fast Two-Stage Vocabulary-Independent Search In Spontaneous Speech.

[BibT_eX]

[DOI]

Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

2004

A hybrid word / phoneme-based approach for improved vocabulary-independent search in spontaneous speech.

[BibT_eX]

[DOI]

Frank Torsten Bernd Seide

Proceedings of the 8th International Conference on Spoken Language Processing, 2004

Vocabulary-independent search in spontaneous speech.

[BibT_eX]

[DOI]

Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

2003

An improved model-based speaker segmentation system.

[BibT_eX]

[DOI]

Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Coarticulation modeling by embedding a target-directed hidden trajectory model into HMM - model and training.

[BibT_eX]

[DOI]

Jian-Lai Zhou

Proceedings of the 2003 IEEE International Conference on Acoustics, 2003

Coarticulation modeling by embedding a target-directed hidden trajectory model into HMM - MAP decoding and evaluation.

[BibT_eX]

[DOI]

Jian-Lai Zhou

Proceedings of the 2003 IEEE International Conference on Acoustics, 2003

2002

A system for spoken query information retrieval on mobile devices.

[BibT_eX]

[DOI]

IEEE Trans. Speech Audio Process., 2002

2001

Rapid speaker adaptation using a priori knowledge by eigenspace analysis of MLLR parameters.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2001

2000

The thoughtful elephant: strategies for spoken dialog systems.

[BibT_eX]

[DOI]

IEEE Trans. Speech Audio Process., 2000

MAT-2000 - design, collection, and validation of a Mandarin 2000-speaker telephone speech database.

[BibT_eX]

[DOI]

Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

Two-stream modeling of Mandarin tones.

[BibT_eX]

[DOI]

Nick J.-C. Wang

Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

Improvements of the Philips 2000 Taiwan Mandarin benchmark system.

[BibT_eX]

[DOI]

Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

Pitch tracking and tone features for Mandarin speech recognition.

[BibT_eX]

[DOI]

Hank Chang-Han Huang

Proceedings of the IEEE International Conference on Acoustics, 2000

1999

Development of the philips 1999 taiwan Mandarin benchmark system.

[BibT_eX]

[DOI]

Proceedings of the Sixth European Conference on Speech Communication and Technology, 1999

1998

Phonetic Modelling In the Philips Chinese Continuous-Speech Recognition System.

[BibT_eX]

[DOI]

Nick J.-C. Wang

Proceedings of the 1998 International Symposium on Chinese Spoken Language Processing, 1998

1997

PADIS - An automatic telephone switchboard and directory information system.

[BibT_eX]

[DOI]

Speech Commun., 1997

Towards an automated directory information system.

[BibT_eX]

[DOI]

Andreas Kellner

Proceedings of the Fifth European Conference on Speech Communication and Technology, 1997

1996

A word graph based n-best search in continuous speech recognition.

[BibT_eX]

[DOI]

Bach-Hiep Tran

Volker Steinbiss

Proceedings of the 4th International Conference on Spoken Language Processing, 1996

Improving speech understanding by incorporating database constraints and dialogue history.

[BibT_eX]

[DOI]

Bernhard Rueber

Andreas Kellner

Proceedings of the 4th International Conference on Spoken Language Processing, 1996

A comparison of time conditioned and word conditioned search techniques for large vocabulary speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 4th International Conference on Spoken Language Processing, 1996

1995

The Philips automatic train timetable information system.

[BibT_eX]

[DOI]

Speech Commun., 1995

Fast likelihood computation for continuous-mixture densities using a tree-based nearest neighbor search.

[BibT_eX]

[DOI]

Proceedings of the Fourth European Conference on Speech Communication and Technology, 1995

1994

Non-linear regression based feature extraction for connected-word recognition in noise.

[BibT_eX]

[DOI]