John R. Hershey

Affiliations:
  • Mitsubishi Electric Research Laboratories (MERL), Cambridge, USA
  • IBM T. J. Watson Research Center, New York, USA
  • University of California San Diego, Department of Cognitive Science


According to our database1, John R. Hershey authored at least 138 papers between 1999 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
Objective and subjective evaluation of speech enhancement methods in the UDASE task of the 7th CHiME challenge.
Comput. Speech Lang., 2025

2024
Towards sub-millisecond latency real-time speech enhancement models on hearables.
CoRR, 2024

Unsupervised Multi-Channel Separation And Adaptation.
Proceedings of the IEEE International Conference on Acoustics, 2024

Separating the "Chirp" from the "Chat": Self-supervised Visual Grounding of Sound and Language.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023
The CHiME-7 UDASE task: Unsupervised domain adaptation for conversational speech enhancement.
CoRR, 2023

TokenSplit: Using Discrete Speech Representations for Direct, Refined, and Transcript-Conditioned Speech Separation and Recognition.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Audioslots: A Slot-Centric Generative Model For Audio Separation.
Proceedings of the IEEE International Conference on Acoustics, 2023

2022
Distance-Based Sound Separation.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

CycleGAN-based Unpaired Speech Dereverberation.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Adapting Speech Separation to Real-World Meetings using Mixture Invariant Training.
Proceedings of the IEEE International Conference on Acoustics, 2022

Improving Bird Classification with Unsupervised Sound Separation.
Proceedings of the IEEE International Conference on Acoustics, 2022

AudioScopeV2: Audio-Visual Attention Architectures for Calibrated Open-Domain On-Screen Sound Separation.
Proceedings of the Computer Vision - ECCV 2022, 2022

2021
Improving On-Screen Sound Separation for Open Domain Videos with Audio-Visual Self-attention.
CoRR, 2021

Sparse, Efficient, and Semantic Mixture Invariant Training: Taming In-the-Wild Unsupervised Sound Separation.
Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2021

DF-Conformer: Integrated Architecture of Conv-Tasnet and Conformer Using Linear Complexity Self-Attention for Speech Enhancement.
Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2021

Self-Supervised Learning from Automatically Separated Sound Scenes.
Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2021

Sequential Multi-Frame Neural Beamforming for Speech Separation and Enhancement.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Integration of Speech Separation, Diarization, and Recognition for Multi-Speaker Meetings: System Description, Comparison, and Analysis.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Continuous Speech Separation Using Speaker Inventory for Long Recording.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Into the Wild with AudioScope: Unsupervised Audio-Visual Separation of On-Screen Sounds.
Proceedings of the 9th International Conference on Learning Representations, 2021

What's all the Fuss about Free Universal Sound Separation Data?
Proceedings of the IEEE International Conference on Acoustics, 2021

Sound Event Detection and Separation: A Benchmark on Desed Synthetic Soundscapes.
Proceedings of the IEEE International Conference on Acoustics, 2021

End-To-End Diarization for Variable Number of Speakers with Local-Global Networks and Discriminative Speaker Embeddings.
Proceedings of the IEEE International Conference on Acoustics, 2021

2020
Continuous Speech Separation Using Speaker Inventory for Long Multi-talker Recording.
CoRR, 2020

Unsupervised Sound Separation Using Mixtures of Mixtures.
CoRR, 2020

Unsupervised Sound Separation Using Mixture Invariant Training.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Improving Universal Sound Separation Using Sound Classification.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Improving Sound Event Detection in Domestic Environments using Sound Separation.
Proceedings of 5th the Workshop on Detection and Classification of Acoustic Scenes and Events 2020 (DCASE 2020), 2020

2019
Phasebook and Friends: Leveraging Discrete Representations for Source Separation.
IEEE J. Sel. Top. Signal Process., 2019

Adversarial training and decoding strategies for end-to-end neural conversation models.
Comput. Speech Lang., 2019

Alternating Between Spectral and Spatial Estimation for Speech Separation and Enhancement.
CoRR, 2019

Universal Sound Separation.
Proceedings of the 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2019

VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

End-to-End Multilingual Multi-Speaker Speech Recognition.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Differentiable Consistency Constraints for Improved Deep Speech Enhancement.
Proceedings of the IEEE International Conference on Acoustics, 2019

The Phasebook: Building Complex Masks via Discrete Representations for Source Separation.
Proceedings of the IEEE International Conference on Acoustics, 2019

SDR - Half-baked or Well Done?
Proceedings of the IEEE International Conference on Acoustics, 2019

2018
Exploring Tradeoffs in Models for Low-Latency Speech Enhancement.
Proceedings of the 16th International Workshop on Acoustic Signal Enhancement, 2018

End-to-End Speech Separation with Unfolded Iterative Phase Reconstruction.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Alternative Objective Functions for Deep Clustering.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Multi-Channel Deep Clustering: Discriminative Spectral and Spatial Embeddings for Speaker-Independent Speech Separation.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

End-to-End Multi-Speaker Speech Recognition.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

An End-to-End Language-Tracking Speech Recognizer for Mixed-Language Speech.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Speaker Adaptation for Multichannel End-to-End Speech Recognition.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

A Purely End-to-End System for Multi-speaker Speech Recognition.
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018

2017
Hybrid CTC/Attention Architecture for End-to-End Speech Recognition.
IEEE J. Sel. Top. Signal Process., 2017

Unified Architecture for Multichannel End-to-End Speech Recognition With Neural Beamforming.
IEEE J. Sel. Top. Signal Process., 2017

Prior-based Binary Masking and Discriminative Methods for Reverberant and Noisy Speech Recognition Using Distant Stereo Microphones.
J. Inf. Process., 2017

Multi-microphone speech recognition integrating beamforming, robust feature extraction, and advanced DNN/RNN backend.
Comput. Speech Lang., 2017

Attention-Based Multimodal Fusion for Video Description.
CoRR, 2017

Multichannel End-to-end Speech Recognition.
Proceedings of the 34th International Conference on Machine Learning, 2017

Attention-Based Multimodal Fusion for Video Description.
Proceedings of the IEEE International Conference on Computer Vision, 2017

Student-teacher network learning with enhanced features.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Deep long short-term memory adaptive beamforming networks for multichannel robust speech recognition.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Deep clustering and conventional networks for music separation: Stronger together.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Language independent end-to-end architecture for joint language identification and speech recognition.
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

Multi-level language modeling and decoding for open vocabulary end-to-end speech recognition.
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

Early and late integration of audio features for automatic video description.
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

Joint CTC/attention decoding for end-to-end speech recognition.
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017

Discriminative Beamforming with Phase-Aware Neural Networks for Speech Enhancement and Recognition.
Proceedings of the New Era for Robust Speech Recognition, Exploiting Deep Learning., 2017

Toolkits for Robust Speech Processing.
Proceedings of the New Era for Robust Speech Recognition, Exploiting Deep Learning., 2017

Preliminaries.
Proceedings of the New Era for Robust Speech Recognition, Exploiting Deep Learning., 2017

Novel Deep Architectures in Speech Processing.
Proceedings of the New Era for Robust Speech Recognition, Exploiting Deep Learning., 2017

Deep Recurrent Networks for Separation and Recognition of Single-Channel Speech in Nonstationary Background Audio.
Proceedings of the New Era for Robust Speech Recognition, Exploiting Deep Learning., 2017

2016
Global-Local Face Upsampling Network.
CoRR, 2016

Dialog state tracking with attention-based sequence-to-sequence learning.
Proceedings of the 2016 IEEE Spoken Language Technology Workshop, 2016

Full-Capacity Unitary Recurrent Neural Networks.
Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

Single-Channel Multi-Speaker Separation Using Deep Clustering.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Context-Sensitive and Role-Dependent Spoken Language Understanding Using Bidirectional and Attention LSTMs.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Improved MVDR Beamforming Using Single-Channel Mask Prediction Networks.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Driver confusion status detection using recurrent neural networks.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2016

Deep beamforming networks for multi-channel speech recognition.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Deep unfolding for multichannel source separation.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Minimum word error training of long short-term memory recurrent neural network language models for speech recognition.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Deep clustering: Discriminative embeddings for segmentation and separation.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

2015
Speech enhancement and recognition using multi-task learning of long short-term memory recurrent neural networks.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Uncertainty propagation through deep neural networks.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Micbots: Collecting large realistic datasets for speech and audio research using mobile robots.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Deep NMF for speech separation.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Phase-sensitive and recognition-boosted speech separation using deep recurrent neural networks.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Speech Enhancement with LSTM Recurrent Neural Networks and its Application to Noise-Robust ASR.
Proceedings of the Latent Variable Analysis and Signal Separation, 2015

The MERL/SRI system for the 3RD CHiME challenge using beamforming, robust feature extraction, and advanced speech recognition.
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

2014
Deep Unfolding: Model-Based Inspiration of Novel Deep Architectures.
CoRR, 2014

Discriminative NMF and its application to single-channel source separation.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Cost-level integration of statistical and rule-based dialog managers.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Sequential maximum mutual information linear discriminant analysis for speech recognition.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Log-linear dialog manager.
Proceedings of the IEEE International Conference on Acoustics, 2014

Non-negative source-filter dynamical system for speech enhancement.
Proceedings of the IEEE International Conference on Acoustics, 2014

Discriminatively trained recurrent neural networks for single-channel speech separation.
Proceedings of the 2014 IEEE Global Conference on Signal and Information Processing, 2014

Sequence discriminative training for low-rank deep neural networks.
Proceedings of the 2014 IEEE Global Conference on Signal and Information Processing, 2014

2013
Hierarchical and coupled non-negative dynamical systems with application to audio modeling.
Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2013

Ensemble learning for speech enhancement.
Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2013

Statistical Dialogue Management using Intention Dependency Graph.
Proceedings of the Sixth International Joint Conference on Natural Language Processing, 2013

Stereo-based feature enhancement using dictionary learning.
Proceedings of the IEEE International Conference on Acoustics, 2013

Effectiveness of discriminative training and feature transformation for reverberated and noisy speech.
Proceedings of the IEEE International Conference on Acoustics, 2013

Source localization in reverberant environments using sparse optimization.
Proceedings of the IEEE International Conference on Acoustics, 2013

Non-negative dynamical system with application to speech and audio.
Proceedings of the IEEE International Conference on Acoustics, 2013

A generalized discriminative training framework for system combination.
Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013

2012
Hidden Markov Acoustic Modeling With Bootstrap and Restructuring for Low-Resourced Languages.
IEEE Trans. Speech Audio Process., 2012

Indirect model-based speech enhancement.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Factorial Models for Noise Robust Speech Recognition.
Proceedings of the Techniques for Noise Robustness in Automatic Speech Recognition, 2012

2011
Entropy-based motion selection for touch-based registration using Rao-Blackwellized particle filtering.
Proceedings of the 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2011

Acoustic Modeling with Bootstrap and Restructuring Based on Full Covariance.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Clustering of bootstrapped acoustic model with full covariance.
Proceedings of the IEEE International Conference on Acoustics, 2011

2010
Single-Channel Multitalker Speech Recognition.
IEEE Signal Process. Mag., 2010

Tracking Motion, Deformation, and Texture Using Conditionally Gaussian Processes.
IEEE Trans. Pattern Anal. Mach. Intell., 2010

Super-human multi-talker speech recognition: A graphical modeling approach.
Comput. Speech Lang., 2010

Monaural speech separation and recognition challenge.
Comput. Speech Lang., 2010

Modeling posterior probabilities using the linear exponential family.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Signal interaction and the devil function.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Restructuring exponential family mixture models.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

2009
Variational loopy belief propagation for multi-talker speech recognition.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Refactoring acoustic models using variational expectation-maximization.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Single-channel speech separation and recognition using loopy belief propagation.
Proceedings of the IEEE International Conference on Acoustics, 2009

Refactoring acoustic models using variational density approximation.
Proceedings of the IEEE International Conference on Acoustics, 2009

A fast, accurate approximation to log likelihood of Gaussian mixture models.
Proceedings of the IEEE International Conference on Acoustics, 2009

Hierarchical variational loopy belief propagation for multi-talker speech recognition.
Proceedings of the 2009 IEEE Workshop on Automatic Speech Recognition & Understanding, 2009

2008
Efficient model-based speech separation and denoising using non-negative subspace analysis.
Proceedings of the IEEE International Conference on Acoustics, 2008

Optimizing speech recognition grammars using a measure of similarity between hidden Markov models.
Proceedings of the IEEE International Conference on Acoustics, 2008

Variational Bhattacharyya divergence for hidden Markov models.
Proceedings of the IEEE International Conference on Acoustics, 2008

Accelerated Monte Carlo for Kullback-Leibler divergence between Gaussian mixture models.
Proceedings of the IEEE International Conference on Acoustics, 2008

2007
Bhattacharyya error and divergence using variational importance sampling.
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Word confusability - measuring hidden Markov model similarity.
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Approximating the Kullback Leibler Divergence Between Gaussian Mixture Models.
Proceedings of the IEEE International Conference on Acoustics, 2007

Variational Kullback-Leibler divergence for Hidden Markov models.
Proceedings of the IEEE Workshop on Automatic Speech Recognition & Understanding, 2007

2006
Single Channel Speech Separation Using Factorial Dynamics.
Proceedings of the Advances in Neural Information Processing Systems 19, 2006

The Iroquois model: using temporal dynamics to separate speakers.
Proceedings of the ISCA Tutorial and Research Workshop on Statistical and Perceptual Audition, 2006

Super-human multi-talker speech recognition: the IBM 2006 speech separation challenge system.
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

2005
Perceptual inference in generative models.
PhD thesis, 2005

2004
Joint Tracking of Pose, Expression, and Texture using Conditionally Gaussian Filters.
Proceedings of the Advances in Neural Information Processing Systems 17 [Neural Information Processing Systems, 2004

Model-based fusion of bone and air sensors for speech enhancement and robust speech recognition.
Proceedings of the ISCA Tutorial and Research Workshop on Statistical and Perceptual Audio Processing, 2004

Single microphone source separation using high resolution signal reconstruction.
Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

Audio-visual graphical models for speech processing.
Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

Stereo Based 3D Tracking and Scene Learning, Employing Particle Filtering within EM.
Proceedings of the Computer Vision, 2004

3D Tracking of Morphable Objects Using Conditionally Gaussian Nonlinear Filters.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2004

2001
Audio-Visual Sound Separation Via Hidden Markov Models.
Proceedings of the Advances in Neural Information Processing Systems 14 [Neural Information Processing Systems: Natural and Synthetic, 2001

2000
A Low-Level Cortical Perception Model with Applications to Image Analysis.
Proceedings of the 2000 International Conference on Image Processing, 2000

1999
Audio Vision: Using Audio-Visual Synchrony to Locate Sounds.
Proceedings of the Advances in Neural Information Processing Systems 12, [NIPS Conference, Denver, Colorado, USA, November 29, 1999


  Loading...