Zhuo Chen

Orcid: 0000-0003-0563-1760

Affiliations:
  • Microsoft, Redmond, WA, USA
  • Columbia University, New York, NY, USA (PhD 2017)


According to our database1, Zhuo Chen authored at least 128 papers between 2015 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
VioLA: Conditional Language Models for Speech Recognition, Synthesis, and Translation.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

SpeechX: Neural Codec Language Model as a Versatile Speech Transformer.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

T-SOT FNT: Streaming Multi-Talker ASR with Text-Only Domain Adaptation Capability.
Proceedings of the IEEE International Conference on Acoustics, 2024

ICMC-ASR: The ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition Challenge.
Proceedings of the IEEE International Conference on Acoustics, 2024

A Unified Image Compression Method for Human Perception and Multiple Vision Tasks.
Proceedings of the Computer Vision - ECCV 2024, 2024

2023
COSMIC: Data Efficient Instruction-tuning For Speech In-Context Learning.
CoRR, 2023

SpeechX: Neural Codec Language Model as a Versatile Speech Transformer.
CoRR, 2023

VioLA: Unified Codec Language Models for Speech Recognition, Synthesis, and Translation.
CoRR, 2023

Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling.
CoRR, 2023

Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers.
CoRR, 2023

Toward Scalable Image Feature Compression: A Content-Adaptive and Diffusion-Based Approach.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Speaker Diarization for ASR Output with T-vectors: A Sequence Classification Approach.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Adapting Multi-Lingual ASR Models for Handling Multiple Talkers.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

BEATs: Audio Pre-Training with Acoustic Tokenizers.
Proceedings of the International Conference on Machine Learning, 2023

Post-Training Quantization for Vision Transformer in Transformed Domain.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2023

Simulating Realistic Speech Overlaps Improves Multi-Talker ASR.
Proceedings of the IEEE International Conference on Acoustics, 2023

Speaker Change Detection For Transformer Transducer ASR.
Proceedings of the IEEE International Conference on Acoustics, 2023

DATA2VEC-SG: Improving Self-Supervised Learning Representations for Speech Generation Tasks.
Proceedings of the IEEE International Conference on Acoustics, 2023

Target Sound Extraction with Variable Cross-Modality Clues.
Proceedings of the IEEE International Conference on Acoustics, 2023

Vararray Meets T-Sot: Advancing the State of the Art of Streaming Distant Conversational Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2023

Self-Supervised Learning with Bi-Label Masked Speech Prediction for Streaming Multi-Talker Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2023

Real-Time Speech Interruption Analysis: from Cloud to Client Deployment.
Proceedings of the IEEE International Conference on Acoustics, 2023

Speech Separation with Large-Scale Self-Supervised Learning.
Proceedings of the IEEE International Conference on Acoustics, 2023

On Decoder-Only Architecture For Speech-to-Text and Large Language Model Integration.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

The Second Multi-Channel Multi-Party Meeting Transcription Challenge (M2MeT 2.0): A Benchmark for Speaker-Attributed ASR.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

2022
Dual-Path Modeling With Memory Embedding Model for Continuous Speech Separation.
IEEE ACM Trans. Audio Speech Lang. Process., 2022

WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing.
IEEE J. Sel. Top. Signal Process., 2022

BEATs: Audio Pre-Training with Acoustic Tokenizers.
CoRR, 2022

Breaking trade-offs in speech separation with sparsely-gated mixture of experts.
CoRR, 2022

The Microsoft System for VoxCeleb Speaker Recognition Challenge 2022.
CoRR, 2022

Exploring WavLM on Speech Enhancement.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Separating Long-Form Speech with Group-wise Permutation Invariant Training.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Streaming Multi-Talker ASR with Token-Level Serialized Output Training.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Why does Self-Supervised Learning for Speech Recognition Benefit Speaker Recognition?
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Channel-Wise Bit Allocation for Deep Visual Feature Quantization.
Proceedings of the 2022 IEEE International Conference on Image Processing, 2022

All-Neural Beamformer for Continuous Speech Separation.
Proceedings of the IEEE International Conference on Acoustics, 2022

Continuous Speech Separation with Recurrent Selective Attention Network.
Proceedings of the IEEE International Conference on Acoustics, 2022

VarArray: Array-Geometry-Agnostic Continuous Speech Separation.
Proceedings of the IEEE International Conference on Acoustics, 2022

One Model to Enhance Them All: Array Geometry Agnostic Multi-Channel Personalized Speech Enhancement.
Proceedings of the IEEE International Conference on Acoustics, 2022

Continuous Streaming Multi-Talker ASR with Dual-Path Transducers.
Proceedings of the IEEE International Conference on Acoustics, 2022

Transcribe-to-Diarize: Neural Speaker Diarization for Unlimited Number of Speakers Using End-to-End Speaker-Attributed ASR.
Proceedings of the IEEE International Conference on Acoustics, 2022

Personalized speech enhancement: new models and Comprehensive evaluation.
Proceedings of the IEEE International Conference on Acoustics, 2022

Unispeech-Sat: Universal Speech Representation Learning With Speaker Aware Pre-Training.
Proceedings of the IEEE International Conference on Acoustics, 2022

2021
Speaker Separation Using Speaker Inventories and Estimated Speech.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

A New Image Codec Paradigm for Human and Machine Uses.
CoRR, 2021

WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing.
CoRR, 2021

Exploring End-to-End Multi-Channel ASR with Bias Information for Meeting Transcription.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Sequential Multi-Frame Neural Beamforming for Speech Separation and Enhancement.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Integration of Speech Separation, Diarization, and Recognition for Multi-Speaker Meetings: System Description, Comparison, and Analysis.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Dual-Path RNN for Long Recording Speech Separation.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

ESPnet-SE: End-To-End Speech Enhancement and Separation Toolkit Designed for ASR Integration.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Investigation of End-to-End Speaker-Attributed ASR for Continuous Multi-Talker Recordings.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Investigation of Practical Aspects of Single Channel Speech Separation for ASR.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Large-Scale Pre-Training of End-to-End Multi-Talker ASR for Meeting Transcription with Single Distant Microphone.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

End-to-End Speaker-Attributed ASR with Transformer.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Target-Speaker Voice Activity Detection with Improved i-Vector Estimation for Unknown Number of Speaker.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Continuous Speech Separation Using Speaker Inventory for Long Recording.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

AISHELL-4: An Open Source Dataset for Speech Enhancement, Separation, Recognition and Speaker Diarization in Conference Scenario.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Human Listening and Live Captioning: Multi-Task Training for Speech Enhancement.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Ultra Fast Speech Separation Model with Teacher Student Learning.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Microsoft Speaker Diarization System for the Voxceleb Speaker Recognition Challenge 2020.
Proceedings of the IEEE International Conference on Acoustics, 2021

Rethinking The Separation Layers In Speech Separation Networks.
Proceedings of the IEEE International Conference on Acoustics, 2021

Dual-Path Modeling for Long Recording Speech Separation in Meetings.
Proceedings of the IEEE International Conference on Acoustics, 2021

Minimum Bayes Risk Training for End-to-End Speaker-Attributed ASR.
Proceedings of the IEEE International Conference on Acoustics, 2021

Continuous Speech Separation with Conformer.
Proceedings of the IEEE International Conference on Acoustics, 2021

Don't Shoot Butterfly with Rifles: Multi-Channel Continuous Speech Separation with Early Exit Transformer.
Proceedings of the IEEE International Conference on Acoustics, 2021

Continuous Speech Separation with Ad Hoc Microphone Arrays.
Proceedings of the 29th European Signal Processing Conference, 2021

A Comparative Study of Modular and Joint Approaches for Speaker-Attributed ASR on Monaural Long-Form Audio.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

2020
Toward Intelligent Sensing: Intermediate Deep Feature Compression.
IEEE Trans. Image Process., 2020

Continuous Speech Separation Using Speaker Inventory for Long Multi-talker Recording.
CoRR, 2020

Don't shoot butterfly with rifles: Multi-channel Continuous Speech Separation with Early Exit Transformer.
CoRR, 2020

Continuous Speech Separation with Conformer.
CoRR, 2020

Continuous speech separation: dataset and analysis.
CoRR, 2020

An End-to-End Architecture of Online Multi-Channel Speech Separation.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Neural Speech Separation Using Spatially Distributed Microphones.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Joint Speaker Counting, Speech Recognition, and Speaker Identification for Overlapped Speech of any Number of Speakers.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Data Representation in Hybrid Coding Framework for Feature Maps Compression.
Proceedings of the IEEE International Conference on Image Processing, 2020

Improving Deep CNN Networks with Long Temporal Context for Text-Independent Speaker Verification.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Dual-Path RNN: Efficient Long Sequence Modeling for Time-Domain Single-Channel Speech Separation.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

End-to-end Microphone Permutation and Number Invariant Multi-channel Speech Separation.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Continuous Speech Separation: Dataset and Analysis.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019
PyKaldi2: Yet another speech toolkit based on Kaldi and PyTorch.
CoRR, 2019

Meeting Transcription Using Virtual Microphone Arrays.
CoRR, 2019

Lossy Intermediate Deep Learning Feature Compression and Evaluation.
Proceedings of the 27th ACM International Conference on Multimedia, 2019

Meeting Transcription Using Asynchronous Distant Microphones.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Beyond Ranking Loss: Deep Holographic Networks for Multi-Label Video Search.
Proceedings of the 2019 IEEE International Conference on Image Processing, 2019

Low-latency Speaker-independent Continuous Speech Separation.
Proceedings of the IEEE International Conference on Acoustics, 2019

Single-channel Speech Extraction Using Speaker Inventory and Attention Network.
Proceedings of the IEEE International Conference on Acoustics, 2019


Speech Separation Using Speaker Inventory.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

2018
Speaker-Independent Speech Separation With Deep Attractor Network.
IEEE ACM Trans. Audio Speech Lang. Process., 2018

Intermediate Deep Feature Compression: the Next Battlefield of Intelligent Sensing.
CoRR, 2018

Speaker-Invariant Training via Adversarial Learning.
CoRR, 2018

Multi-Channel Overlapped Speech Recognition with Location Guided Speech Extraction Network.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Recognizing Overlapped Speech in Meetings: A Multichannel Separation Approach Using Neural Networks.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Multi-Microphone Neural Speech Separation for Far-Field Multi-Talker Speech Recognition.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Speaker-Invariant Training Via Adversarial Learning.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Developing Far-Field Speaker System Via Teacher-Student Learning.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Efficient Integration of Fixed Beamformers and Speech Separation Networks for Multi-Channel Far-Field Speech Separation.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Image Quality Assessment Based Label Smoothing in Deep Neural Network Learning.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017
Single Channel auditory source separation with neural network.
PhD thesis, 2017

Multimodal deep learning for solar radio burst classification.
Pattern Recognit., 2017

Multi-microphone speech recognition integrating beamforming, robust feature extraction, and advanced DNN/RNN backend.
Comput. Speech Lang., 2017

Image Quality Assessment Guided Deep Neural Networks Training.
CoRR, 2017

Improving Mask Learning Based Speech Enhancement System with Restoration Layers and Residual Connection.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Forecasting of ionospheric vertical total electron content (TEC) using LSTM networks.
Proceedings of the 2017 International Conference on Machine Learning and Cybernetics, 2017

Solar radio spectrum classification with LSTM.
Proceedings of the 2017 IEEE International Conference on Multimedia & Expo Workshops, 2017

Convolutional neural network for classification of solar radio spectrum.
Proceedings of the 2017 IEEE International Conference on Multimedia & Expo Workshops, 2017

Deep clustering and conventional networks for music separation: Stronger together.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Deep attractor network for single-microphone speaker separation.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Neural decoding of attentional selection in multi-speaker environments without access to separated sources.
Proceedings of the 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), 2017

Unsupervised adaptation with domain separation networks for robust speech recognition.
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

Cracking the cocktail party problem by multi-beam deep attractor network.
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

Facial action recognition using very deep networks for highly imbalanced class distribution.
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

Novel Deep Architectures in Speech Processing.
Proceedings of the New Era for Robust Speech Recognition, Exploiting Deep Learning., 2017

2016
Imaging and representation learning of solar radio spectrums for classification.
Multim. Tools Appl., 2016

End-to-End attention based text-dependent speaker verification.
Proceedings of the 2016 IEEE Spoken Language Technology Workshop, 2016

Perceptual image quality enhancement for solar radio image.
Proceedings of the Eighth International Conference on Quality of Multimedia Experience, 2016

Adaptation of Neural Networks Constrained by Prior Statistics of Node Co-Activations.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Single-Channel Multi-Speaker Separation Using Deep Clustering.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Deep clustering: Discriminative embeddings for segmentation and separation.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

2015
Multimodal Learning for Classification of Solar Radio Spectrum.
Proceedings of the 2015 IEEE International Conference on Systems, 2015

Perceptual Quality Improvement for Synthesis Imaging of Chinese Spectral Radioheliograph.
Proceedings of the Advances in Multimedia Information Processing - PCM 2015, 2015

Speech enhancement and recognition using multi-task learning of long short-term memory recurrent neural networks.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Solar Radio Astronomical Big Data Classification.
Proceedings of the High Performance Computing and Applications, 2015

Robust speech recognition in unknown reverberant and noisy conditions.
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

The MERL/SRI system for the 3RD CHiME challenge using beamforming, robust feature extraction, and advanced speech recognition.
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015


  Loading...