Michael L. Seltzer

Orcid: 0000-0003-3474-2451

According to our database1, Michael L. Seltzer authored at least 103 papers between 2000 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.




In proceedings 
PhD thesis 


On csauthors.net:


Towards measuring fairness in speech recognition: Fair-Speech dataset.
CoRR, 2024

End-to-End Speech Recognition Contextualization with Large Language Models.
Proceedings of the IEEE International Conference on Acoustics, 2024

Augmenting text for spoken language understanding with Large Language Models.
CoRR, 2023

Modality Confidence Aware Training for Robust End-to-End Spoken Language Understanding.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Massively Multilingual ASR on 70 Languages: Tokenization, Architecture, and Generalization Capabilities.
Proceedings of the IEEE International Conference on Acoustics, 2023

Improving fast-slow Encoder based Transducer with Streaming Deliberation.
Proceedings of the IEEE International Conference on Acoustics, 2023

Factorized Blank Thresholding for Improved Runtime Efficiency of Neural Transducers.
Proceedings of the IEEE International Conference on Acoustics, 2023

Streaming parallel transducer beam search with fast slow cascaded encoders.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Deliberation Model for On-Device Spoken Language Understanding.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Evaluating User Perception of Speech Recognition System Quality with Semantic Distance Metric.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Neural-FST Class Language Model for End-to-End Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2022

Flexi-Transducer: Optimizing Latency, Accuracy and Compute forMulti-Domain On-Device Scenarios.
CoRR, 2021

Streaming Attention-Based Models with Augmented Memory for End-To-End Speech Recognition.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Alignment Restricted Streaming Recurrent Neural Network Transducer.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Deep Shallow Fusion for RNN-T Personalization.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Dynamic Encoder Transducer: A Flexible Solution for Trading Off Accuracy for Latency.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Dissecting User-Perceived Latency of On-Device E2E Speech Recognition.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Collaborative Training of Acoustic Encoders for Speech Recognition.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Flexi-Transducer: Optimizing Latency, Accuracy and Compute for Multi-Domain On-Device Scenarios.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Contextualized Streaming End-to-End Speech Recognition with Trie-Based Deep Biasing and Shallow Fusion.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Semantic Distance: A New Metric for ASR Performance Analysis Towards Spoken Language Understanding.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Memory-Efficient Speech Recognition on Smart Devices.
Proceedings of the IEEE International Conference on Acoustics, 2021

Improved Neural Language Model Fusion for Streaming Recurrent Neural Network Transducer.
Proceedings of the IEEE International Conference on Acoustics, 2021

Emformer: Efficient Memory Transformer Based Acoustic Model For Low Latency Streaming Speech Recognition.
CoRR, 2020

Weak-Attention Suppression for Transformer Based Speech Recognition.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Transformer-Based Acoustic Modeling for Hybrid Speech Recognition.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

G2G: TTS-Driven Pronunciation Learning for Graphemic Hybrid ASR.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Aipnet: Generative Adversarial Pre-Training of Accent-Invariant Networks for End-To-End Speech Recognition.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Speech Processing for Digital Home Assistants: Combining signal processing with deep-learning techniques.
IEEE Signal Process. Mag., 2019

Introduction to the Issue on Far-Field Speech Processing in the Era of Deep Learning: Speech Enhancement, Separation, and Recognition.
IEEE J. Sel. Top. Signal Process., 2019

RNN-T For Latency Controlled ASR With Improved Beam Search.
CoRR, 2019

Transformer-Transducer: End-to-End Speech Recognition with Self-Attention.
CoRR, 2019

Joint Grapheme and Phoneme Embeddings for Contextual End-to-End ASR.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

End-to-end Contextual Speech Recognition Using Class Language Models and a Token Passing Decoder.
Proceedings of the IEEE International Conference on Acoustics, 2019

From Senones to Chenones: Tied Context-Dependent Graphemes for Hybrid Speech Recognition.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Improved Training for Online End-to-end Speech Recognition Systems.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Towards Language-Universal End-to-End Speech Recognition.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Efficient Integration of Fixed Beamformers and Speech Separation Networks for Multi-Channel Far-Field Speech Separation.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Toward Human Parity in Conversational Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2017

Large-Scale Domain Adaptation via Teacher-Student Learning.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

A study on data augmentation of reverberant speech for robust speech recognition.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

May I take your order? A Neural Model for Extracting Structured Information from Conversations.
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, 2017

Discriminative Beamforming with Phase-Aware Neural Networks for Speech Enhancement and Recognition.
Proceedings of the New Era for Robust Speech Recognition, Exploiting Deep Learning., 2017

On the Role of Nonlinear Transformations in Deep Neural Network Acoustic Models.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Deep beamforming networks for multi-channel speech recognition.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Linearly augmented deep neural network.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Deep Neural Networks for Single-Channel Multi-Talker Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2015

Exploring how deep neural networks form phonemic categories.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Speech recognition with prediction-adaptation-correction recurrent neural networks.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Improving speech recognition in reverberation using a room-aware deep neural network and multi-task learning.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

An introduction to computational networks and the computational network toolkit (invited talk).
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

The influence of pitch and noise on the discriminability of filterbank features.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Towards better performance with heterogeneous training data in acoustic modeling using deep neural networks.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Single-channel mixed speech recognition using deep neural networks.
Proceedings of the IEEE International Conference on Acoustics, 2014

Factored adaptation of speaker and environment using orthogonal subspace transforms.
Proceedings of the IEEE International Conference on Acoustics, 2014

Feature Learning in Deep Neural Networks - A Study on Speech Recognition Tasks
Proceedings of the 1st International Conference on Learning Representations, 2013

Deep neural network features and semi-supervised training for low resource speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2013

An investigation of deep neural networks for noise robust speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2013

Multi-task learning in deep neural networks for improved phoneme recognition.
Proceedings of the IEEE International Conference on Acoustics, 2013

Recent advances in deep learning for speech research at Microsoft.
Proceedings of the IEEE International Conference on Acoustics, 2013

Factored adaptation using a combination of feature-space and model-space transforms.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Efficient VTS Adaptation Using Jacobian Approximation.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Improvements to VTS feature enhancement.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Acoustic Model Training for Robust Speech Recognition.
Proceedings of the Techniques for Noise Robustness in Automatic Speech Recognition, 2012

In-Car Media Search.
IEEE Signal Process. Mag., 2011

Improved Bottleneck Features Using Pretrained Deep Neural Networks.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Separating Speaker and Environmental Variability Using Factored Transforms.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

CROWDMOS: An approach for crowdsourcing mean opinion score studies.
Proceedings of the IEEE International Conference on Acoustics, 2011

Joint encoding of the waveform and speech recognition features using a transform codec.
Proceedings of the IEEE International Conference on Acoustics, 2011

Factored adaptation for separable compensation of speaker and environmental variability.
Proceedings of the 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, 2011

Noise Adaptive Training for Robust Automatic Speech Recognition.
IEEE Trans. Speech Audio Process., 2010

HMM adaptation using linear spline interpolation with integrated spline parameter training for robust speech recognition.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Binary coding of speech spectrograms using a deep auto-encoder.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Acoustic model adaptation via Linear Spline Interpolation for robust speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2010

Improving perceived accuracy for in-car media search.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Voice search of structured media data.
Proceedings of the IEEE International Conference on Acoustics, 2009

The data deluge: Challenges and opportunities of unlimited data in statistical signal processing.
Proceedings of the IEEE International Conference on Acoustics, 2009

Noise adaptive training using a vector taylor series approach for noise robust automatic speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2009

Noise robust model adaptation using linear spline interpolation.
Proceedings of the 2009 IEEE Workshop on Automatic Speech Recognition & Understanding, 2009

Towards a non-parametric acoustic model: an acoustic decision tree for observation probability calculation.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Maximum a posteriori ICA: Applying prior knowledge to the separation of acoustic sources.
Proceedings of the IEEE International Conference on Acoustics, 2008

Robust design of wideband loudspeaker arrays.
Proceedings of the IEEE International Conference on Acoustics, 2008

Training Wideband Acoustic Models Using Mixed-Bandwidth Training Data for Speech Recognition.
IEEE Trans. Speech Audio Process., 2007

Automatic Removal of Typed Keystrokes From Speech Signals.
IEEE Signal Process. Lett., 2007

Commute UX: Telephone Dialog System for Location-based Services.
Proceedings of the 8th SIGdial Workshop on Discourse and Dialogue, 2007

Robust location understanding in spoken dialog systems using intersections.
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Microphone Array Post-Filter using Incremental Bayes Learning to Track the Spatial Distributions of Speech and Noise.
Proceedings of the IEEE International Conference on Acoustics, 2007

Subband Likelihood-Maximizing Beamforming for Speech Recognition in Reverberant Environments.
IEEE Trans. Speech Audio Process., 2006

Robust bandwidth extension of noise-corrupted narrowband speech.
Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

Training Wideband Acoustic Models using Mixed-Bandwidth Training Data via Feature Bandwidth Extension.
Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

Speech Recognizer Based Maximum Likelihood Beamforming.
Proceedings of the Speech Separation by Humans and Machines, 2005

Likelihood-maximizing beamforming for robust hands-free speech recognition.
IEEE Trans. Speech Audio Process., 2004

A Bayesian classifier for spectrographic mask estimation for missing feature speech recognition.
Speech Commun., 2004

Reconstruction of missing features for robust speech recognition.
Speech Commun., 2004

Parameter sharing in subband likelihood-maximizing beamforming for speech recognition using microphone arrays.
Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

Speech-recognizer-based filter optimization for microphone array processing.
IEEE Signal Process. Lett., 2003

A harmonic-model-based front end for robust speech recognition.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Subband parameter optimization of microphone arrays for speech recognition in reverberant environments.
Proceedings of the 2003 IEEE International Conference on Acoustics, 2003

Speech recognizer-based microphone array processing for robust hands-free speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2002

Calibration of microphone arrays for improved speech recognition.
Proceedings of the EUROSPEECH 2001 Scandinavia, 2001

Speech in Noisy Environments: robust automatic segmentation, feature extraction, and hypothesis combination.
Proceedings of the IEEE International Conference on Acoustics, 2001

Classifier-based mask estimation for missing feature methods of robust speech recognition.
Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

Reconstruction of damaged spectrographic features for robust speech recognition.
Proceedings of the Sixth International Conference on Spoken Language Processing, 2000
