Zhuo Chen

Aswin Shanmugam Subramanian

Yanmin Qian

IEEE ACM Trans. Audio Speech Lang. Process., 2022

WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing.

[BibT_eX]

[DOI]

IEEE J. Sel. Top. Signal Process., 2022

BEATs: Audio Pre-Training with Acoustic Tokenizers.

[BibT_eX]

[DOI]

CoRR, 2022

Breaking trade-offs in speech separation with sparsely-gated mixture of experts.

[BibT_eX]

[DOI]

CoRR, 2022

The Microsoft System for VoxCeleb Speaker Recognition Challenge 2022.

[BibT_eX]

[DOI]

CoRR, 2022

Exploring WavLM on Speech Enhancement.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Separating Long-Form Speech with Group-wise Permutation Invariant Training.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Streaming Multi-Talker ASR with Token-Level Serialized Output Training.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Why does Self-Supervised Learning for Speech Recognition Benefit Speaker Recognition?

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Channel-Wise Bit Allocation for Deep Visual Feature Quantization.

[BibT_eX]

[DOI]

Proceedings of the 2022 IEEE International Conference on Image Processing, 2022

All-Neural Beamformer for Continuous Speech Separation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Continuous Speech Separation with Recurrent Selective Attention Network.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

VarArray: Array-Geometry-Agnostic Continuous Speech Separation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

One Model to Enhance Them All: Array Geometry Agnostic Multi-Channel Personalized Speech Enhancement.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Continuous Streaming Multi-Talker ASR with Dual-Path Transducers.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Transcribe-to-Diarize: Neural Speaker Diarization for Unlimited Number of Speakers Using End-to-End Speaker-Attributed ASR.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Personalized speech enhancement: new models and Comprehensive evaluation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Unispeech-Sat: Universal Speech Representation Learning With Speaker Aware Pre-Training.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

2021

Speaker Separation Using Speaker Inventories and Estimated Speech.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2021

A New Image Codec Paradigm for Human and Machine Uses.

[BibT_eX]

[DOI]

CoRR, 2021

WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing.

[BibT_eX]

[DOI]

CoRR, 2021

Exploring End-to-End Multi-Channel ASR with Bias Information for Meeting Transcription.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Sequential Multi-Frame Neural Beamforming for Speech Separation and Enhancement.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Integration of Speech Separation, Diarization, and Recognition for Multi-Speaker Meetings: System Description, Comparison, and Analysis.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Dual-Path RNN for Long Recording Speech Separation.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

ESPnet-SE: End-To-End Speech Enhancement and Separation Toolkit Designed for ASR Integration.

[BibT_eX]

[DOI]

Chenda Li

Jing Shi

Wangyou Zhang

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Investigation of End-to-End Speaker-Attributed ASR for Continuous Multi-Talker Recordings.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Investigation of Practical Aspects of Single Channel Speech Separation for ASR.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Large-Scale Pre-Training of End-to-End Multi-Talker ASR for Meeting Transcription with Single Distant Microphone.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

End-to-End Speaker-Attributed ASR with Transformer.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Target-Speaker Voice Activity Detection with Improved i-Vector Estimation for Unknown Number of Speaker.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Continuous Speech Separation Using Speaker Inventory for Long Recording.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

AISHELL-4: An Open Source Dataset for Speech Enhancement, Separation, Recognition and Speaker Diarization in Conference Scenario.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Human Listening and Live Captioning: Multi-Task Training for Speech Enhancement.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Ultra Fast Speech Separation Model with Teacher Student Learning.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Microsoft Speaker Diarization System for the Voxceleb Speaker Recognition Challenge 2020.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Rethinking The Separation Layers In Speech Separation Networks.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Dual-Path Modeling for Long Recording Speech Separation in Meetings.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Minimum Bayes Risk Training for End-to-End Speaker-Attributed ASR.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Continuous Speech Separation with Conformer.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Don't Shoot Butterfly with Rifles: Multi-Channel Continuous Speech Separation with Early Exit Transformer.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Continuous Speech Separation with Ad Hoc Microphone Arrays.

[BibT_eX]

[DOI]

Proceedings of the 29th European Signal Processing Conference, 2021

A Comparative Study of Modular and Joint Approaches for Speaker-Attributed ASR on Monaural Long-Form Audio.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

2020

Toward Intelligent Sensing: Intermediate Deep Feature Compression.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2020

Continuous Speech Separation Using Speaker Inventory for Long Multi-talker Recording.

[BibT_eX]

[DOI]

CoRR, 2020

Don't shoot butterfly with rifles: Multi-channel Continuous Speech Separation with Early Exit Transformer.

[BibT_eX]

[DOI]

CoRR, 2020

Continuous Speech Separation with Conformer.

[BibT_eX]

[DOI]

CoRR, 2020

Continuous speech separation: dataset and analysis.

[BibT_eX]

[DOI]

CoRR, 2020

An End-to-End Architecture of Online Multi-Channel Speech Separation.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Neural Speech Separation Using Spatially Distributed Microphones.

[BibT_eX]

[DOI]

Dongmei Wang

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Joint Speaker Counting, Speech Recognition, and Speaker Identification for Overlapped Speech of any Number of Speakers.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Data Representation in Hybrid Coding Framework for Feature Maps Compression.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Image Processing, 2020

Improving Deep CNN Networks with Long Temporal Context for Text-Independent Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Dual-Path RNN: Efficient Long Sequence Modeling for Time-Domain Single-Channel Speech Separation.

[BibT_eX]

[DOI]

Yi Luo

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

End-to-end Microphone Permutation and Number Invariant Multi-channel Speech Separation.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Continuous Speech Separation: Dataset and Analysis.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

PyKaldi2: Yet another speech toolkit based on Kaldi and PyTorch.

[BibT_eX]

[DOI]

CoRR, 2019

Meeting Transcription Using Virtual Microphone Arrays.

[BibT_eX]

[DOI]

CoRR, 2019

Lossy Intermediate Deep Learning Feature Compression and Evaluation.

[BibT_eX]

[DOI]

Proceedings of the 27th ACM International Conference on Multimedia, 2019

Meeting Transcription Using Asynchronous Distant Microphones.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Beyond Ranking Loss: Deep Holographic Networks for Multi-Label Video Search.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE International Conference on Image Processing, 2019

Low-latency Speaker-independent Continuous Speech Separation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Single-channel Speech Extraction Using Speaker Inventory and Attention Network.

[BibT_eX]

[DOI]

Jasha Droppo

Yifan Gong

Proceedings of the IEEE International Conference on Acoustics, 2019

Advances in Online Audio-Visual Meeting Transcription.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Speech Separation Using Speaker Inventory.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

2018

Speaker-Independent Speech Separation With Deep Attractor Network.

[BibT_eX]

[DOI]

Yi Luo

Nima Mesgarani

IEEE ACM Trans. Audio Speech Lang. Process., 2018

Intermediate Deep Feature Compression: the Next Battlefield of Intelligent Sensing.

[BibT_eX]

[DOI]

CoRR, 2018

Speaker-Invariant Training via Adversarial Learning.

[BibT_eX]

[DOI]

CoRR, 2018

Multi-Channel Overlapped Speech Recognition with Location Guided Speech Extraction Network.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Recognizing Overlapped Speech in Meetings: A Multichannel Separation Approach Using Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Multi-Microphone Neural Speech Separation for Far-Field Multi-Talker Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Speaker-Invariant Training Via Adversarial Learning.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Developing Far-Field Speaker System Via Teacher-Student Learning.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Efficient Integration of Fixed Beamformers and Speech Separation Networks for Multi-Channel Far-Field Speech Separation.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Image Quality Assessment Based Label Smoothing in Deep Neural Network Learning.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017

Single Channel auditory source separation with neural network.

[BibT_eX]

[DOI]

PhD thesis, 2017

Multimodal deep learning for solar radio burst classification.

[BibT_eX]

[DOI]

Pattern Recognit., 2017

Multi-microphone speech recognition integrating beamforming, robust feature extraction, and advanced DNN/RNN backend.

[BibT_eX]

[DOI]

Comput. Speech Lang., 2017

Image Quality Assessment Guided Deep Neural Networks Training.

[BibT_eX]

[DOI]

CoRR, 2017

Improving Mask Learning Based Speech Enhancement System with Restoration Layers and Residual Connection.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Forecasting of ionospheric vertical total electron content (TEC) using LSTM networks.

[BibT_eX]

[DOI]

Proceedings of the 2017 International Conference on Machine Learning and Cybernetics, 2017

Solar radio spectrum classification with LSTM.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Multimedia & Expo Workshops, 2017

Convolutional neural network for classification of solar radio spectrum.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Multimedia & Expo Workshops, 2017

Deep clustering and conventional networks for music separation: Stronger together.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Deep attractor network for single-microphone speaker separation.

[BibT_eX]

[DOI]

Yi Luo

Nima Mesgarani

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Neural decoding of attentional selection in multi-speaker environments without access to separated sources.

[BibT_eX]

[DOI]

Proceedings of the 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), 2017

Unsupervised adaptation with domain separation networks for robust speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

Cracking the cocktail party problem by multi-beam deep attractor network.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

Facial action recognition using very deep networks for highly imbalanced class distribution.

[BibT_eX]

[DOI]

Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

Novel Deep Architectures in Speech Processing.

[BibT_eX]

[DOI]

Proceedings of the New Era for Robust Speech Recognition, Exploiting Deep Learning., 2017

2016

Imaging and representation learning of solar radio spectrums for classification.

[BibT_eX]

[DOI]

Multim. Tools Appl., 2016

End-to-End attention based text-dependent speaker verification.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE Spoken Language Technology Workshop, 2016

Perceptual image quality enhancement for solar radio image.

[BibT_eX]

[DOI]

Proceedings of the Eighth International Conference on Quality of Multimedia Experience, 2016

Adaptation of Neural Networks Constrained by Prior Statistics of Node Co-Activations.

[BibT_eX]

[DOI]

Tasha Nagamine

Nima Mesgarani

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Single-Channel Multi-Speaker Separation Using Deep Clustering.

[BibT_eX]

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Deep clustering: Discriminative embeddings for segmentation and separation.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

2015

Multimodal Learning for Classification of Solar Radio Spectrum.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Systems, 2015

Perceptual Quality Improvement for Synthesis Imaging of Chinese Spectral Radioheliograph.

[BibT_eX]

[DOI]

Proceedings of the Advances in Multimedia Information Processing - PCM 2015, 2015

Speech enhancement and recognition using multi-task learning of long short-term memory recurrent neural networks.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Solar Radio Astronomical Big Data Classification.

[BibT_eX]

[DOI]

Long Xu

Ying Weng