Shuai Wang

Orcid: 0000-0003-1523-9631

Affiliations:
  • Chinese University of Hong Kong-Shenzhen (CUKH-SZ), Shenzhen Research Institute of Big Data, Shenzhen, China
  • Shanghai Jiao Tong University, Department of Computer Science and Engineering, China (PhD 2020)


According to our database1, Shuai Wang authored at least 83 papers between 2012 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Speech Separation With Pretrained Frontend to Minimize Domain Mismatch.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

Attention-Based Encoder-Decoder End-to-End Neural Diarization With Embedding Enhancer.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

Advancing speaker embedding learning: Wespeaker toolkit for research and production.
Speech Commun., 2024

Hierarchical Control of Emotion Rendering in Speech Synthesis.
CoRR, 2024

MoMuSE: Momentum Multi-modal Target Speaker Extraction for Real-time Scenarios with Impaired Visual Cues.
CoRR, 2024

The ISCSLP 2024 Conversational Voice Clone (CoVoC) Challenge: Tasks, Results and Findings.
CoRR, 2024

Prototype and Instance Contrastive Learning for Unsupervised Domain Adaptation in Speaker Verification.
CoRR, 2024

Multi-Level Speaker Representation for Target Speaker Extraction.
CoRR, 2024

WeSep: A Scalable and Flexible Toolkit Towards Generalizable Target Speaker Extraction.
CoRR, 2024

M-Vec: Matryoshka Speaker Embeddings with Flexible Dimensions.
CoRR, 2024

On the effectiveness of enrollment speech augmentation for Target Speaker Extraction.
CoRR, 2024

MacST: Multi-Accent Speech Synthesis via Text Transliteration for Accent Conversion.
CoRR, 2024

E1 TTS: Simple and Fast Non-Autoregressive TTS.
CoRR, 2024

Disentangling the Prosody and Semantic Information with Pre-trained Model for In-Context Learning based Zero-Shot Voice Conversion.
CoRR, 2024

Flow-TSVAD: Target-Speaker Voice Activity Detection via Latent Flow Matching.
CoRR, 2024

vec2wav 2.0: Advancing Voice Conversion via Discrete Token Vocoders.
CoRR, 2024

Whisper-PMFA: Partial Multi-Scale Feature Aggregation for Speaker Verification using Whisper Models.
CoRR, 2024

Drop the beat! Freestyler for Accompaniment Conditioned Rapping Voice Generation.
CoRR, 2024

On the Effectiveness of Acoustic BPE in Decoder-Only TTS.
CoRR, 2024

Autoregressive Diffusion Transformer for Text-to-Speech Synthesis.
CoRR, 2024

Hierarchical Emotion Prediction and Control in Text-to-Speech Synthesis.
CoRR, 2024

Attention-Constrained Inference for Robust Decoder-Only Text-to-Speech.
CoRR, 2024

The X-LANCE Technical Report for Interspeech 2024 Speech Processing Using Discrete Speech Unit Challenge.
CoRR, 2024

Fine-Grained Quantitative Emotion Editing for Speech Generation.
CoRR, 2024

VALL-T: Decoder-Only Generative Transducer for Robust and Decoding-Controllable Text-to-Speech.
CoRR, 2024

AutoPrep: An Automatic Preprocessing Framework for In-The-Wild Speech Data.
Proceedings of the IEEE International Conference on Acoustics, 2024

Leveraging in-the-wild Data for Effective Self-supervised Pretraining in Speaker Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2024

Dualvc 2: Dynamic Masked Convolution for Unified Streaming and Non-Streaming Voice Conversion.
Proceedings of the IEEE International Conference on Acoustics, 2024

Audio-Visual Active Speaker Extraction for Sparsely Overlapped Multi-Talker Speech.
Proceedings of the IEEE International Conference on Acoustics, 2024

Hierarchical Emotion Prediction and Control in Text-to-Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2024

Robust Cross-Domain Speaker Verification with Multi-Level Domain Adapters.
Proceedings of the IEEE International Conference on Acoustics, 2024

UniCATS: A Unified Context-Aware Text-to-Speech Framework with Contextual VQ-Diffusion and Vocoding.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
The NUS-HLT System for ICASSP2024 ICMC-ASR Grand Challenge.
CoRR, 2023

USED: Universal Speaker Extraction and Diarization.
CoRR, 2023

Wespeaker baselines for VoxSRC2023.
CoRR, 2023

DualVC: Dual-mode Voice Conversion using Intra-model Knowledge Distillation and Hybrid Predictive Coding.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Attention-based Encoder-Decoder Network for End-to-End Neural Speaker Diarization with Target Speaker Attractor.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Adversarial Speaker Disentanglement Using Unannotated External Data for Self-supervised Representation-based Voice Conversion.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2023

Wespeaker: A Research and Production Oriented Speaker Embedding Learning Toolkit.
Proceedings of the IEEE International Conference on Acoustics, 2023

2022
DF-ResNet: Boosting Speaker Verification Performance with Depth-First Design.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Context-aware Multimodal Fusion for Emotion Recognition.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Self-Knowledge Distillation via Feature Enhancement for Speaker Verification.
Proceedings of the IEEE International Conference on Acoustics, 2022

On the Importance of Different Frequency Bins for Speaker Verification.
Proceedings of the IEEE International Conference on Acoustics, 2022

2021
Audio-Visual Deep Neural Network for Robust Person Verification.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Voice Activity Detection in the Wild: A Data-Driven Approach Using Teacher-Student Training.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Revisiting the Statistics Pooling Layer in Deep Speaker Embedding Learning.
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

Speaker Embedding Augmentation with Noise Distribution Matching.
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

Non-Parallel Any-to-Many Voice Conversion by Replacing Speaker Statistics.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Unit Selection Synthesis Based Data Augmentation for Fixed Phrase Speaker Verification.
Proceedings of the IEEE International Conference on Acoustics, 2021

SynAug: Synthesis-Based Data Augmentation for Text-Dependent Speaker Verification.
Proceedings of the IEEE International Conference on Acoustics, 2021

Self-Supervised Learning Based Domain Adaptation for Robust Speaker Verification.
Proceedings of the IEEE International Conference on Acoustics, 2021

2020
Data Augmentation Using Deep Generative Models for Embedding Based Speaker Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2020

End-to-End Speaker-Dependent Voice Activity Detection.
CoRR, 2020


Dual-Adversarial Domain Adaptation for Generalized Replay Attack Detection.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Adversarial Domain Adaptation for Speaker Verification Using Partially Shared Network.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Multi-Modality Matters: A Performance Leap on VoxCeleb.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Text Adaptation for Speaker Verification with Speaker-Text Factorized Embeddings.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

But System for the Second Dihard Speech Diarization Challenge.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Optimizing Bayesian Hmm Based X-Vector Clustering for the Second Dihard Speech Diarization Challenge.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Channel Invariant Speaker Embedding Learning with Joint Multi-Task and Adversarial Training.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Investigation of Specaugment for Deep Speaker Embedding Learning.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019
Discriminative Neural Embedding Learning for Short-Duration Text-Independent Speaker Verification.
IEEE ACM Trans. Audio Speech Lang. Process., 2019

Erratum to: Past review, current progress, and challenges ahead on the cocktail party problem.
Frontiers Inf. Technol. Electron. Eng., 2019

BUT System Description to VoxCeleb Speaker Recognition Challenge 2019.
CoRR, 2019

The SJTU Robust Anti-Spoofing System for the ASVspoof 2019 Challenge.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Data Augmentation Using Variational Autoencoder for Embedding Based Speaker Verification.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

On the Usage of Phonetic Information for Text-Independent Speaker Embedding Extraction.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Cross-Domain Replay Spoofing Attack Detection Using Domain Adversarial Training.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Bayesian HMM Based x-Vector Clustering for Speaker Diarization.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Knowledge Distillation for Small Foot-print Deep Speaker Embedding.
Proceedings of the IEEE International Conference on Acoustics, 2019

Margin Matters: Towards More Discriminative Deep Neural Network Embeddings for Speaker Recognition.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

2018
Past review, current progress, and challenges ahead on the cocktail party problem.
Frontiers Inf. Technol. Electron. Eng., 2018

Generative Adversarial Networks based X-vector Augmentation for Robust Probabilistic Linear Discriminant Analysis in Speaker Verification.
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

Deep Discriminant Analysis for i-vector Based Robust Speaker Recognition.
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

Covariance Based Deep Feature for Text-Dependent Speaker Verification.
Proceedings of the Intelligence Science and Big Data Engineering, 2018

Angular Softmax for Short-Duration Text-independent Speaker Verification.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Focal Kl-Divergence Based Dilated Convolutional Neural Networks for Co-Channel Speaker Identification.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Joint I-Vector with End-to-End System for Short Duration Text-Independent Speaker Verification.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017
What Does the Speaker Embedding Encode?
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Integrating online i-vector into GMM-UBM for text-dependent speaker verification.
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

2012
A Deformable Surface Model for Real-Time Water Drop Animation.
IEEE Trans. Vis. Comput. Graph., 2012


  Loading...