2024
Enhanced Acoustic Howling Suppression via Hybrid Kalman Filter and Deep Learning Models.
IEEE ACM Trans. Audio Speech Lang. Process., 2024
SSR-Speech: Towards Stable, Safe and Robust Zero-shot Text-based Speech Editing and Synthesis.
CoRR, 2024
2023
Unifying Robustness and Fidelity: A Comprehensive Study of Pretrained Generative Methods for Speech Enhancement in Adverse Conditions.
CoRR, 2023
KalmanNet: A Learnable Kalman Filter for Acoustic Echo Cancellation.
CoRR, 2023
Hybrid AHS: A Hybrid of Kalman Filter and Deep Learning for Acoustic Howling Suppression.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Zoneformer: On-device Neural Beamformer For In-car Multi-zone Speech Separation, Enhancement and Echo Cancellation.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Deep Neural Mel-Subband Beamformer for in-Car Speech Separation.
Proceedings of the IEEE International Conference on Acoustics, 2023
Neuralkalman: A Learnable Kalman Filter for Acoustic Echo Cancellation.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023
Neuralecho: Hybrid of Full-Band and Sub-Band Recurrent Neural Network For Acoustic Echo Cancellation and Speech Enhancement.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023
2022
Deep learning based multi-source localization with source splitting and its effectiveness in multi-talker speech recognition.
Comput. Speech Lang., 2022
An investigation of neural uncertainty estimation for target speaker extraction equipped RNN transducer.
Comput. Speech Lang., 2022
NeuralEcho: A Self-Attentive Recurrent Neural Network For Unified Acoustic Echo Suppression And Speech Enhancement.
CoRR, 2022
Enhancing Zero-Shot Many to Many Voice Conversion with Self-Attention VAE.
CoRR, 2022
EEND-SS: Joint End-to-End Neural Speaker Diarization and Speech Separation for Flexible Number of Speakers.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022
Joint Neural AEC and Beamforming with Double-Talk Detection.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Towards end-to-end Speaker Diarization with Generalized Neural Speaker Clustering.
Proceedings of the IEEE International Conference on Acoustics, 2022
Joint Modeling of Code-Switched and Monolingual ASR via Conditional Factorization.
Proceedings of the IEEE International Conference on Acoustics, 2022
Fast-Rir: Fast Neural Diffuse Room Impulse Response Generator.
Proceedings of the IEEE International Conference on Acoustics, 2022
Enhancing Zero-Shot Many to Many Voice Conversion via Self-Attention VAE with Structurally Regularized Layers.
Proceedings of the 5th International Conference on Artificial Intelligence for Industries, 2022
2021
Multi-Channel Multi-Frame ADL-MVDR for Target Speech Separation.
IEEE ACM Trans. Audio Speech Lang. Process., 2021
An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and Separation.
IEEE ACM Trans. Audio Speech Lang. Process., 2021
Joint AEC AND Beamforming with Double-Talk Detection using RNN-Transformer.
CoRR, 2021
Generalized RNN beamformer for target speech separation.
CoRR, 2021
WPD++: An Improved Neural Beamformer for Simultaneous Speech Separation and Dereverberation.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021
Neural Mask based Multi-channel Convolutional Beamforming for Joint Dereverberation, Echo Cancellation and Denoising.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021
MetricNet: Towards Improved Modeling For Non-Intrusive Speech Quality Assessment.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Generalized Spatio-Temporal RNN Beamformer for Target Speech Separation.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
TeCANet: Temporal-Contextual Attention Network for Environment-Aware Speech Dereverberation.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
MIMO Self-Attentive RNN Beamformer for Multi-Speaker Speech Separation.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
A Joint Training Framework of Multi-Look Separator and Speaker Embedding Extractor for Overlapped Speech.
Proceedings of the IEEE International Conference on Acoustics, 2021
Towards Robust Speaker Verification with Target Speaker Enhancement.
Proceedings of the IEEE International Conference on Acoustics, 2021
ADL-MVDR: All Deep Learning MVDR Beamformer for Target Speech Separation.
Proceedings of the IEEE International Conference on Acoustics, 2021
Self-Supervised Text-Independent Speaker Verification Using Prototypical Momentum Contrastive Learning.
Proceedings of the IEEE International Conference on Acoustics, 2021
Directional ASR: A New Paradigm for E2E Multi-Speaker Speech Recognition with Source Localization.
Proceedings of the IEEE International Conference on Acoustics, 2021
Improving RNN Transducer with Target Speaker Extraction and Neural Uncertainty Estimation.
Proceedings of the IEEE International Conference on Acoustics, 2021
3D Spatial Features for Multi-Channel Target Speech Separation.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021
2020
Audio-Visual Speech Separation and Dereverberation With a Two-Stage Multimodal Network.
IEEE J. Sel. Top. Signal Process., 2020
Audio-Visual Multi-Channel Recognition of Overlapped Speech.
,
,
,
,
,
,
,
,
,
,
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
DurIAN: Duration Informed Attention Network for Speech Synthesis.
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Neural Spatio-Temporal Beamformer for Target Speech Separation.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
End-to-End Multi-Look Keyword Spotting.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Far-Field Location Guided Target Speech Extraction Using End-to-End Speech Recognition Objectives.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Speaker-Aware Target Speaker Enhancement by Jointly Learning with Speaker Embedding Extraction.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Integration of Multi-Look Beamformers for Multi-Channel Keyword Spotting.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Enhancing End-to-End Multi-Channel Speech Separation Via Spatial Feature Learning.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
2019
A Unified Framework for Speech Separation.
CoRR, 2019
DurIAN: Duration Informed Attention Network For Multimodal Synthesis.
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2019
End-to-End Multi-Channel Speech Separation.
CoRR, 2019
Improved Speaker-Dependent Separation for CHiME-5 Challenge.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Jointly Adversarial Enhancement Training for Robust End-to-End Speech Recognition.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Direction-Aware Speaker Beam for Multi-Channel Speaker Extraction.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Neural Spatial Filter: Target Speaker Speech Separation Assisted with Directional Information.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
A Comprehensive Study of Speech Separation: Spectrogram vs Waveform Separation.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Seq2Seq Attentional Siamese Neural Networks for Text-dependent Speaker Verification.
Proceedings of the IEEE International Conference on Acoustics, 2019
Joint Training of Complex Ratio Mask Based Beamformer and Acoustic Model for Noise Robust Asr.
Proceedings of the IEEE International Conference on Acoustics, 2019
Boundary Discriminative Large Margin Cosine Loss for Text-independent Speaker Verification.
Proceedings of the IEEE International Conference on Acoustics, 2019
Multi-band PIT and Model Integration for Improved Multi-channel Speech Separation.
Proceedings of the IEEE International Conference on Acoustics, 2019
Improving Speech Enhancement with Phonetic Embedding Features.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019
Time Domain Audio Visual Speech Separation.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019
Syllable-Dependent Discriminative Learning for Small Footprint Text-Dependent Speaker Verification.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019
2018
Text-Dependent Speech Enhancement for Small-Footprint Robust Keyword Detection.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Deep Extractor Network for Target Speaker Recovery from Single Channel Speech Mixtures.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Permutation Invariant Training of Generative Adversarial Network for Monaural Speech Separation.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
2012
Multi-Channel l<sub>1</sub> Regularized Convex Speech Enhancement Model and Fast Computation by the Split Bregman Method.
IEEE Trans. Speech Audio Process., 2012
Exploring Off Time Nature for Speech Enhancement.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012
Constrained Multichannel Speech Dereverberation.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012
A Triple-Microphone Real-Time Speech Enhancement Algorithm Based on Approximate Array Analytical Solutions.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012
2011
Modeling Category Identification Using Sparse Instance Representation.
Proceedings of the 33th Annual Meeting of the Cognitive Science Society, 2011
2010
Convexity and fast speech extraction by split bregman method.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010
Reducing musical noise in blind source separation by time-domain sparse filters and split bregman method.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010
2009
A nonlocally weighted soft-constrained natural gradient algorithm for blind separation of reverberant speech.
Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2009