Xu Li

Orcid: 0000-0003-2954-3271

Affiliations:
  • ARC Lab, Tencent PCG
  • Chinese University of Hong Kong, Department of Systems Engineering and Engineering Management, Hong Kong


According to our database1, Xu Li authored at least 29 papers between 2016 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
EA-VTR: Event-Aware Video-Text Retrieval.
CoRR, 2024

Neural Concatenative Singing Voice Conversion: Rethinking Concatenation-Based Approach for One-Shot Singing Voice Conversion.
Proceedings of the IEEE International Conference on Acoustics, 2024

Humtrans: A Novel Open-Source Dataset for Humming Melody Transcription and Beyond.
Proceedings of the IEEE International Conference on Acoustics, 2024

2023
AFL-Net: Integrating Audio, Facial, and Lip Modalities with Cross-Attention for Robust Speaker Diarization in the Wild.
CoRR, 2023

Prosody Modeling with 3D Visual Information for Expressive Video Dubbing.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Enhancing the Vocal Range of Single-Speaker Singing Voice Synthesis with Melody-Unsupervised Pre-Training.
Proceedings of the IEEE International Conference on Acoustics, 2023

2022
Improving the Adversarial Robustness for Speaker Verification by Self-Supervised Learning.
IEEE ACM Trans. Audio Speech Lang. Process., 2022

Spoofing-Aware Speaker Verification by Multi-Level Fusion.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

A Hierarchical Speaker Representation Framework for One-shot Singing Voice Conversion.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Characterizing the Adversarial Vulnerability of Speech self-Supervised Learning.
Proceedings of the IEEE International Conference on Acoustics, 2022

2021
VAENAR-TTS: Variational Auto-Encoder Based Non-AutoRegressive Text-to-Speech Synthesis.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Channel-Wise Gated Res2Net: Towards Robust Detection of Synthetic Speech Attacks.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Adversarial Defense for Automatic Speaker Verification by Cascaded Self-Supervised Learning Models.
Proceedings of the IEEE International Conference on Acoustics, 2021

Replay and Synthetic Speech Detection with Res2Net Architecture.
Proceedings of the IEEE International Conference on Acoustics, 2021

2020
Deep segmental phonetic posterior-grams based discovery of non-categories in L2 English speech.
CoRR, 2020

Bayesian x-vector: Bayesian Neural Network based x-vector System for Speaker Verification.
Proceedings of the Odyssey 2020: The Speaker and Language Recognition Workshop, 2020

Investigating Robustness of Adversarial Samples Detection for Automatic Speaker Verification.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Adversarial Attacks on GMM I-Vector Based Speaker Verification Systems.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019
Comparative Study of Parametric and Representation Uncertainty Modeling for Recurrent Neural Network Language Models.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Speech Emotion Recognition Using Capsule Networks.
Proceedings of the IEEE International Conference on Acoustics, 2019

End-to-end Code-switched TTS with Mix of Monolingual Recordings.
Proceedings of the IEEE International Conference on Acoustics, 2019

2018
Automatic lexical stress and pitch accent detection for L2 English speech using multi-distribution deep neural networks.
Speech Commun., 2018

Unsupervised Discovery of Non-native Phonetic Patterns in L2 English Speech for Mispronunciation Detection and Diagnosis.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Integrating Articulatory Features into Acoustic-Phonemic Model for Mispronunciation Detection and Diagnosis in L2 English Speech.
Proceedings of the 2018 IEEE International Conference on Multimedia and Expo, 2018

Applying Multitask Learning to Acoustic-Phonemic Model for Mispronunciation Detection and Diagnosis in L2 English Speech.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Unsupervised Discovery of an Extended Phoneme Set in L2 English Speech for Mispronunciation Detection and Diagnosis.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2016
Expressive Speech Driven Talking Avatar Synthesis with DBLSTM Using Limited Amount of Emotional Bimodal Data.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Phoneme Embedding and its Application to Speech Driven Talking Avatar Synthesis.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Low level descriptors based DBLSTM bottleneck feature for speech driven talking avatar.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016


  Loading...