Wangyou Zhang

Orcid: 0000-0003-4500-3515

According to our database1, Wangyou Zhang authored at least 46 papers between 2019 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
SpoofCeleb: Speech Deepfake Detection and SASV In The Wild.
CoRR, 2024

Text-To-Speech Synthesis In The Wild.
CoRR, 2024

URGENT Challenge: Universality, Robustness, and Generalizability For Speech Enhancement.
CoRR, 2024

Beyond Performance Plateaus: A Comprehensive Study on Scalability in Speech Enhancement.
CoRR, 2024

SpeechComposer: Unifying Multiple Speech Tasks with Prompt Composition.
CoRR, 2024

ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible recipes, self-supervised front-ends, and off-the-shelf models.
CoRR, 2024

Improving Design of Input Condition Invariant Speech Enhancement.
CoRR, 2024

Improving Design of Input Condition Invariant Speech Enhancement.
Proceedings of the IEEE International Conference on Acoustics, 2024

Generation-Based Target Speech Extraction with Speech Discretization and Vocoder.
Proceedings of the IEEE International Conference on Acoustics, 2024

Towards Robust Speech Representation Learning for Thousands of Languages.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

2023
A Heterogeneous Graph to Abstract Syntax Tree Framework for Text-to-SQL.
IEEE Trans. Pattern Anal. Mach. Intell., November, 2023

Software Design and User Interface of ESPnet-SE++: Speech Enhancement for Robust Speech Processing.
J. Open Source Softw., November, 2023

Software Design and User Interface of ESPnet-SE++: Speech Enhancement for Robust Speech Processing (espnet-v.202310).
Dataset, October, 2023

Exploring the Integration of Speech Separation and Recognition with Self-Supervised Learning Representation.
Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2023

Weakly-Supervised Speech Pre-training: A Case Study on Target Speech Recognition.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Overlap Aware Continuous Speech Separation without Permutation Invariant Training.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Exploring Time-Frequency Domain Target Speaker Extraction For Causal and Non-Causal Processing.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Toward Universal Speech Enhancement For Diverse Input Conditions.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

A Single Speech Enhancement Model Unifying Dereverberation, Denoising, Speaker Counting, Separation, And Extraction.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Reproducing Whisper-Style Training Using An Open-Source Toolkit And Publicly Available Data.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Joint Prediction and Denoising for Large-Scale Multilingual Self-Supervised Learning.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

2022
End-to-End Dereverberation, Beamforming, and Speech Recognition in a Cocktail Party.
IEEE ACM Trans. Audio Speech Lang. Process., 2022

End-to-End Multi-Speaker ASR with Independent Vector Analysis.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Text-Informed Knowledge Distillation for Robust Speech Enhancement and Recognition.
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

Separating Long-Form Speech with Group-wise Permutation Invariant Training.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

ESPnet-SE++: Speech Enhancement for Robust Speech Recognition, Translation, and Understanding.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Exploring Effective Data Utilization for Low-Resource Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2022

Text Adaptive Detection for Customizable Keyword Spotting.
Proceedings of the IEEE International Conference on Acoustics, 2022

The Sjtu System For Multimodal Information Based Speech Processing Challenge 2021.
Proceedings of the IEEE International Conference on Acoustics, 2022

Towards Low-Distortion Multi-Channel Speech Enhancement: The ESPNET-Se Submission to the L3DAS22 Challenge.
Proceedings of the IEEE International Conference on Acoustics, 2022

2021
Closing the Gap Between Time-Domain Multi-Channel Speech Enhancement on Real and Simulation Conditions.
Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2021

ESPnet-SE: End-To-End Speech Enhancement and Separation Toolkit Designed for ASR Integration.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

End-to-End Dereverberation, Beamforming, and Speech Recognition with Improved Numerical Stability and Advanced Frontend.
Proceedings of the IEEE International Conference on Acoustics, 2021

Recent Developments on Espnet Toolkit Boosted By Conformer.
Proceedings of the IEEE International Conference on Acoustics, 2021

Convolutive Transfer Function Invariant SDR Training Criteria for Multi-Channel Reverberant Speech Separation.
Proceedings of the IEEE International Conference on Acoustics, 2021

2020
Improving End-to-End Single-Channel Multi-Talker Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2020

The 2020 ESPnet update: new features, broadened applications, performance improvements, and future plans.
CoRR, 2020

Convolutive Transfer Function Invariant SDR training criteria for Multi-Channel Reverberant Speech Separation.
CoRR, 2020

End-to-End Far-Field Speech Recognition with Unified Dereverberation and Beamforming.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Learning Contextual Language Embeddings for Monaural Multi-Talker Speech Recognition.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

End-To-End Multi-Speaker Speech Recognition With Transformer.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019
Robust DOA Estimation Based on Convolutional Neural Network and Time-Frequency Masking.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Knowledge Distillation for End-to-End Monaural Multi-Talker ASR System.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

End-to-End Overlapped Speech Detection and Speaker Counting with Raw Waveform.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

A Comparative Study on Transformer vs RNN in Speech Applications.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

MIMO-Speech: End-to-End Multi-Channel Multi-Speaker Speech Recognition.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019


  Loading...