Jonathan Le Roux

Orcid: 0000-0002-3451-171X

According to our database1, Jonathan Le Roux authored at least 162 papers between 2002 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
The Sound Demixing Challenge 2023 - Cinematic Demixing Track.
Trans. Int. Soc. Music. Inf. Retr., January, 2024

TS-SEP: Joint Diarization and Separation Conditioned on Estimated Speaker Embeddings.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

Task-Aware Unified Source Separation.
CoRR, 2024

Leveraging Audio-Only Data for Text-Queried Target Sound Extraction.
CoRR, 2024

Enhanced Reverberation as Supervision for Unsupervised Speech Separation.
CoRR, 2024

Disentangled Acoustic Fields For Multimodal Physical Scene Understanding.
CoRR, 2024

Speech dereverberation constrained on room impulse response characteristics.
CoRR, 2024

Sound Event Bounding Boxes.
CoRR, 2024

SMITIN: Self-Monitored Inference-Time INtervention for Generative Music Transformers.
CoRR, 2024

TF-Locoformer: Transformer with Local Modeling by Convolution for Speech Separation and Enhancement.
Proceedings of the 18th International Workshop on Acoustic Signal Enhancement, 2024

Improving Audio Captioning Models with Fine-Grained Audio Features, Text Embedding Supervision, and LLM Mix-Up Augmentation.
Proceedings of the IEEE International Conference on Acoustics, 2024

Late Audio-Visual Fusion for in-the-Wild Speaker Diarization.
Proceedings of the IEEE International Conference on Acoustics, 2024

NeuroHeed+: Improving Neuro-Steered Speaker Extraction with Joint Auditory Attention Detection.
Proceedings of the IEEE International Conference on Acoustics, 2024

NIIRF: Neural IIR Filter Field for HRTF Upsampling and Personalization.
Proceedings of the IEEE International Conference on Acoustics, 2024

GLA-GRAD: A Griffin-Lim Extended Waveform Generation Diffusion Model.
Proceedings of the IEEE International Conference on Acoustics, 2024

Why Does Music Source Separation Benefit from Cacophony?
Proceedings of the IEEE International Conference on Acoustics, 2024

WI-FI based Indoor Monitoring Enhanced by Multimodal Fusion.
Proceedings of the IEEE International Conference on Acoustics, 2024

Generation or Replication: Auscultating Audio Latent Diffusion Models.
Proceedings of the IEEE International Conference on Acoustics, 2024

SpecDiff-GAN: A Spectrally-Shaped Noise Diffusion GAN for Speech and Music Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2024

RILA: Reflective and Imaginative Language Agent for Zero-Shot Semantic Audio-Visual Navigation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023
STFT-Domain Neural Speech Enhancement With Very Low Algorithmic Latency.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

Tackling the Cocktail Fork Problem for Separation and Transcription of Real-World Soundtracks.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

The Sound Demixing Challenge 2023 - Cinematic Demixing Track.
CoRR, 2023

Pac-HuBERT: Self-Supervised Music Source Separation via Primitive Auditory Clustering and Hidden-Unit BERT.
CoRR, 2023

Location as Supervision for Weakly Supervised Multi-Channel Source Separation of Machine Sounds.
Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2023

Hyperbolic Unsupervised Anomalous Sound Detection.
Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2023

Style-transfer based Speech and Audio-visual Scene understanding for Robot Action Sequence Acquisition from Videos.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Cold Diffusion for Speech Enhancement.
Proceedings of the IEEE International Conference on Acoustics, 2023

Optimal Condition Training for Target Source Separation.
Proceedings of the IEEE International Conference on Acoustics, 2023

Hyperbolic Audio Source Separation.
Proceedings of the IEEE International Conference on Acoustics, 2023

Paᗧ-HuBERT: Self-Supervised Music Source Separation Via Primitive Auditory Clustering And Hidden-Unit Bert.
Proceedings of the IEEE International Conference on Acoustics, 2023

Latent Iterative Refinement for Modular Source Separation.
Proceedings of the IEEE International Conference on Acoustics, 2023

Reverberation as Supervision For Speech Separation.
Proceedings of the IEEE International Conference on Acoustics, 2023

Scenario-Aware Audio-Visual TF-Gridnet for Target Speech Extraction.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

2022
Momentum Pseudo-Labeling: Semi-Supervised ASR With Continuously Improving Pseudo-Labels.
IEEE J. Sel. Top. Signal Process., 2022

Towards End-to-end Speaker Diarization in the Wild.
CoRR, 2022

Heterogeneous Target Speech Separation.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Low-Latency Online Streaming VideoQA Using Audio-Visual Transformers.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Locate This, Not that: Class-Conditioned Sound Event DOA Estimation.
Proceedings of the IEEE International Conference on Acoustics, 2022

Audio-Visual Scene-Aware Dialog and Reasoning Using Audio-Visual Transformers with Joint Student-Teacher Learning.
Proceedings of the IEEE International Conference on Acoustics, 2022

The Cocktail Fork Problem: Three-Stem Audio Separation for Real-World Soundtracks.
Proceedings of the IEEE International Conference on Acoustics, 2022

Sequence Transduction with Graph-Based Supervision.
Proceedings of the IEEE International Conference on Acoustics, 2022

Advancing Momentum Pseudo-Labeling with Conformer and Initialization Strategy.
Proceedings of the IEEE International Conference on Acoustics, 2022

Extended Graph Temporal Classification for Multi-Speaker End-to-End ASR.
Proceedings of the IEEE International Conference on Acoustics, 2022

Improved Domain Generalization via Disentangled Multi-Task Learning in Unsupervised Anomalous Sound Detection.
Proceedings of the 7th Workshop on Detection and Classification of Acoustic Scenes and Events 2022, 2022

(2.5+1)D Spatio-Temporal Scene Graphs for Video Question Answering.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021
Convolutive Prediction for Monaural Speech Dereverberation and Noisy-Reverberant Speaker Separation.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

On the Compensation Between Magnitude and Phase in Speech Separation.
IEEE Signal Process. Lett., 2021

Leveraging Low-Distortion Target Estimates for Improved Speech Enhancement.
CoRR, 2021

Optimizing Latency for Online Video CaptioningUsing Audio-Visual Transformers.
CoRR, 2021

Anomalous Sound Detection Using Attentive Neural Processes.
Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2021

Convolutive Prediction for Reverberant Speech Separation.
Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2021

Dual Causal/Non-Causal Self-Attention for Streaming End-to-End Speech Recognition.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Advanced Long-Context End-to-End Speech Recognition Using Context-Expanded Transformers.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Optimizing Latency for Online Video Captioning Using Audio-Visual Transformers.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Momentum Pseudo-Labeling for Semi-Supervised Speech Recognition.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Visual Scene Graphs for Audio Source Separation.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Semi-Supervised Speech Recognition Via Graph-Based Temporal Classification.
Proceedings of the IEEE International Conference on Acoustics, 2021

Capturing Multi-Resolution Context by Dilated Self-Attention.
Proceedings of the IEEE International Conference on Acoustics, 2021

Unsupervised Domain Adaptation for Speech Recognition via Uncertainty Driven Self-Training.
Proceedings of the IEEE International Conference on Acoustics, 2021

Transcription Is All You Need: Learning To Separate Musical Mixtures With Score As Supervision.
Proceedings of the IEEE International Conference on Acoustics, 2021

Dynamic Graph Representation Learning for Video Dialog via Multi-Modal Shuffled Transformers.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020
Finding Strength in Weakness: Learning to Separate Sounds With Weak Supervision.
IEEE ACM Trans. Audio Speech Lang. Process., 2020

Multi-Pass Transformer for Machine Translation.
CoRR, 2020

Spatio-Temporal Scene Graphs for Video Dialog.
CoRR, 2020

Autoclip: Adaptive Gradient Clipping for Source Separation Networks.
Proceedings of the 30th IEEE International Workshop on Machine Learning for Signal Processing, 2020

Hierarchical Musical Instrument Separation.
Proceedings of the 21th International Society for Music Information Retrieval Conference, 2020

All-in-One Transformer: Unifying Speech Recognition, Audio Tagging, and Event Detection.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Detecting Audio Attacks on ASR Systems with Dropout Uncertainty.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Transformer-Based Long-Context End-to-End Speech Recognition.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Unsupervised Speaker Adaptation Using Attention-Based Speaker Memory for End-to-End ASR.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Learning to Separate Sounds from Weakly Labeled Scenes.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Streaming Automatic Speech Recognition with the Transformer Model.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

WHAMR!: Noisy and Reverberant Single-Channel Speech Separation.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

End-To-End Multi-Speaker Speech Recognition With Transformer.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019
Phasebook and Friends: Leveraging Discrete Representations for Source Separation.
IEEE J. Sel. Top. Signal Process., 2019

Bootstrapping deep music separation from primitive auditory grouping principles.
CoRR, 2019

Cutting Music Source Separation Some Slakh: A Dataset to Study the Impact of Training Data Quality and Quantity.
Proceedings of the 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2019

Universal Sound Separation.
Proceedings of the 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2019

WHAM!: Extending Speech Separation to Noisy Environments.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

End-to-End Multilingual Multi-Speaker Speech Recognition.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Vectorized Beam Search for CTC-Attention-Based Speech Recognition.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Unidirectional Neural Network Architectures for End-to-End Automatic Speech Recognition.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Class-conditional Embeddings for Music Source Separation.
Proceedings of the IEEE International Conference on Acoustics, 2019

Bootstrapping Single-channel Source Separation via Unsupervised Spatial Clustering on Stereo Mixtures.
Proceedings of the IEEE International Conference on Acoustics, 2019

The Phasebook: Building Complex Masks via Discrete Representations for Source Separation.
Proceedings of the IEEE International Conference on Acoustics, 2019

SDR - Half-baked or Well Done?
Proceedings of the IEEE International Conference on Acoustics, 2019

Triggered Attention for End-to-end Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2019

Cycle-consistency Training for End-to-end Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2019

Teacher-student Deep Clustering for Low-delay Single Channel Speech Separation.
Proceedings of the IEEE International Conference on Acoustics, 2019

Streaming End-to-End Speech Recognition with Joint CTC-Attention Based Models.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

MIMO-Speech: End-to-End Multi-Channel Multi-Speaker Speech Recognition.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

2018
Phase Reconstruction with Learned Time-Frequency Representations for Single-Channel Speech Separation.
Proceedings of the 16th International Workshop on Acoustic Signal Enhancement, 2018

End-to-End Speech Separation with Unfolded Iterative Phase Reconstruction.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Alternative Objective Functions for Deep Clustering.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Multi-Channel Deep Clustering: Discriminative Spectral and Spatial Embeddings for Speaker-Independent Speech Separation.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

End-to-End Multi-Speaker Speech Recognition.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

An End-to-End Language-Tracking Speech Recognizer for Mixed-Language Speech.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

A Purely End-to-End System for Multi-speaker Speech Recognition.
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018

2017
Duration-Controlled LSTM for Polyphonic Sound Event Detection.
IEEE ACM Trans. Audio Speech Lang. Process., 2017

Prior-based Binary Masking and Discriminative Methods for Reverberant and Noisy Speech Recognition Using Distant Stereo Microphones.
J. Inf. Process., 2017

Multi-microphone speech recognition integrating beamforming, robust feature extraction, and advanced DNN/RNN backend.
Comput. Speech Lang., 2017

Consistent anisotropic Wiener filtering for audio source separation.
Proceedings of the 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2017

Coupled Initialization of Multi-Channel Non-Negative Matrix Factorization Based on Spatial and Spectral Information.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Student-teacher network learning with enhanced features.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Deep clustering and conventional networks for music separation: Stronger together.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

BLSTM-HMM hybrid system combined with sound activity detection network for polyphonic Sound Event Detection.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Novel Deep Architectures in Speech Processing.
Proceedings of the New Era for Robust Speech Recognition, Exploiting Deep Learning., 2017

Deep Recurrent Networks for Separation and Recognition of Single-Channel Speech in Nonstationary Background Audio.
Proceedings of the New Era for Robust Speech Recognition, Exploiting Deep Learning., 2017

2016
Dialog state tracking with attention-based sequence-to-sequence learning.
Proceedings of the 2016 IEEE Spoken Language Technology Workshop, 2016

Full-Capacity Unitary Recurrent Neural Networks.
Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

Single-Channel Multi-Speaker Separation Using Deep Clustering.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Improved MVDR Beamforming Using Single-Channel Mask Prediction Networks.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Deep unfolding for multichannel source separation.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Deep clustering: Discriminative embeddings for segmentation and separation.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Bidirectional LSTM-HMM Hybrid System for Polyphonic Sound Event Detection.
Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events, 2016

2015
Phase Processing for Single-Channel Speech Enhancement: History and recent advances.
IEEE Signal Process. Mag., 2015

Micbots: Collecting large realistic datasets for speech and audio research using mobile robots.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Deep NMF for speech separation.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Phase-sensitive and recognition-boosted speech separation using deep recurrent neural networks.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Speech Enhancement with LSTM Recurrent Neural Networks and its Application to Noise-Robust ASR.
Proceedings of the Latent Variable Analysis and Signal Separation, 2015

The MERL/SRI system for the 3RD CHiME challenge using beamforming, robust feature extraction, and advanced speech recognition.
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

2014
Deep Unfolding: Model-Based Inspiration of Novel Deep Architectures.
CoRR, 2014

Discriminative NMF and its application to single-channel source separation.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Sequential maximum mutual information linear discriminant analysis for speech recognition.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Black box optimization for automatic speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2014

Non-negative source-filter dynamical system for speech enhancement.
Proceedings of the IEEE International Conference on Acoustics, 2014

Ensemble integration of calibrated speaker localization and statistical speech detection in domestic environments.
Proceedings of the 4th Joint Workshop on Hands-free Speech Communication and Microphone Arrays, 2014

Discriminatively trained recurrent neural networks for single-channel speech separation.
Proceedings of the 2014 IEEE Global Conference on Signal and Information Processing, 2014

Sequence discriminative training for low-rank deep neural networks.
Proceedings of the 2014 IEEE Global Conference on Signal and Information Processing, 2014

2013
Consistent Wiener Filtering for Audio Source Separation.
IEEE Signal Process. Lett., 2013

Block Coordinate Descent for Sparse NMF
Proceedings of the 1st International Conference on Learning Representations, 2013

Hierarchical and coupled non-negative dynamical systems with application to audio modeling.
Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2013

Ensemble learning for speech enhancement.
Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2013

Statistical Dialogue Management using Intention Dependency Graph.
Proceedings of the Sixth International Joint Conference on Natural Language Processing, 2013

The second 'chime' speech separation and recognition challenge: Datasets, tasks and baselines.
Proceedings of the IEEE International Conference on Acoustics, 2013

Source localization in reverberant environments using sparse optimization.
Proceedings of the IEEE International Conference on Acoustics, 2013

Non-negative dynamical system with application to speech and audio.
Proceedings of the IEEE International Conference on Acoustics, 2013

The second 'CHiME' speech separation and recognition challenge: An overview of challenge systems and outcomes.
Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013

A generalized discriminative training framework for system combination.
Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013

2012
Indirect model-based speech enhancement.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Factorial Models for Noise Robust Speech Recognition.
Proceedings of the Techniques for Noise Robustness in Automatic Speech Recognition, 2012

2011
Computational auditory induction as a missing-data model-fitting problem with Bregman divergence.
Speech Commun., 2011

Bayesian nonparametric spectrogram modeling based on infinite factorial infinite hidden Markov model.
Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2011

Infinite-state spectrum model for music signal analysis.
Proceedings of the IEEE International Conference on Acoustics, 2011

2010
Harmonic and Percussive Sound Separation and Its Application to MIR-Related Tasks.
Proceedings of the Advances in Music Information Retrieval, 2010

A statistical model of speech F0 contours.
Proceedings of the ISCA Workshop on Statistical And Perceptual Audition, 2010

Consistent Wiener Filtering: Generalized Time-Frequency Masking Respecting Spectrogram Consistency.
Proceedings of the Latent Variable Analysis and Signal Separation, 2010

Nonnegative Matrix Factorization with Markov-Chained Bases for Modeling Time-Varying Patterns in Music Spectrograms.
Proceedings of the Latent Variable Analysis and Signal Separation, 2010

Statistical Model of Speech Signals Based on Composite Autoregressive System with Application to Blind Source Separation.
Proceedings of the Latent Variable Analysis and Signal Separation, 2010

2008
Adaptive Template Matching with Shift-Invariant Semi-NMF.
Proceedings of the Advances in Neural Information Processing Systems 21, 2008

Explicit consistency constraints for STFT spectrograms and their application to phase reconstruction.
Proceedings of the ISCA Tutorial and Research Workshop on Statistical and Perceptual Audition, 2008

Computational auditory induction by missing-data non-negative matrix factorization.
Proceedings of the ISCA Tutorial and Research Workshop on Statistical and Perceptual Audition, 2008

Modulation analysis of speech through orthogonal FIR filterbank optimization.
Proceedings of the IEEE International Conference on Acoustics, 2008

Separation of a monaural audio signal into harmonic/percussive components by complementary diffusion on spectrogram.
Proceedings of the 2008 16th European Signal Processing Conference, 2008

2007
Single and Multiple F<sub>0</sub> Contour Estimation Through Parametric Spectrogram Modeling of Speech in Noisy Environments.
IEEE Trans. Speech Audio Process., 2007

Discriminative Training for Large-Vocabulary Speech Recognition Using Minimum Classification Error.
IEEE Trans. Speech Audio Process., 2007

Harmonic-Temporal Clustering of Speech for Single and Multiple F0 Contour Estimation in Noisy Environments.
Proceedings of the IEEE International Conference on Acoustics, 2007

MEG Signal Denoising Based on Time-Shift PCA.
Proceedings of the IEEE International Conference on Acoustics, 2007

2006
Speech analyzer using a joint estimation model of spectral envelope and fine structure.
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

2005
Optimization methods for discriminative training.
Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

2002
Fast Algorithms of Plant Computation Based on Substructure Instances.
Proceedings of the 10-th International Conference in Central Europe on Computer Graphics, 2002


  Loading...