Dan Su
Orcid: 0000-0003-2590-7090Affiliations:
- Tencent AI Lab, Shenzhen, China
According to our database1,
Dan Su
authored at least 107 papers
between 2017 and 2024.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on orcid.org
On csauthors.net:
Bibliography
2024
CoRR, 2024
Proceedings of the Forty-first International Conference on Machine Learning, 2024
DurIAN-E 2: Duration Informed Attention Network with Adaptive Variational Autoencoder and Adversarial Learning for Expressive Text-to-Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2024
Generative Pre-trained Speech Language Model with Efficient Hierarchical Transformer.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024
Proceedings of the Findings of the Association for Computational Linguistics, 2024
2023
DurIAN-E: Duration Informed Attention Network For Expressive Text-to-Speech Synthesis.
CoRR, 2023
Text-Only Domain Adaptation for End-to-End Speech Recognition through Down-Sampling Acoustic Representation.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023
2022
Cross-Speaker Emotion Transfer Through Information Perturbation in Emotional Speech Synthesis.
IEEE Signal Process. Lett., 2022
CoRR, 2022
DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs.
CoRR, 2022
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022
Content-Dependent Fine-Grained Speaker Embedding for Zero-Shot Speaker Adaptation in Text-to-Speech Synthesis.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Enhancing Word-Level Semantic Representation via Dependency Structure for Expressive Text-to-Speech Synthesis.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Learning Noise-independent Speech Representation for High-quality Voice Conversion for Noisy Target Speakers.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Glow-WaveGAN 2: High-quality Zero-shot Text-to-speech Synthesis and Any-to-any Voice Conversion.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, 2022
BDDM: Bilateral Denoising Diffusion Models for Fast and High-Quality Speech Synthesis.
Proceedings of the Tenth International Conference on Learning Representations, 2022
Proceedings of the IEEE International Conference on Acoustics, 2022
The CUHK-Tencent Speaker Diarization System for the ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Challenge.
Proceedings of the IEEE International Conference on Acoustics, 2022
Proceedings of the IEEE International Conference on Acoustics, 2022
VCVTS: Multi-Speaker Video-to-Speech Synthesis Via Cross-Modal Knowledge Transfer from Voice Conversion.
Proceedings of the IEEE International Conference on Acoustics, 2022
Consistent Training and Decoding for End-to-End Speech Recognition Using Lattice-Free MMI.
Proceedings of the IEEE International Conference on Acoustics, 2022
Simple Attention Module Based Speaker Verification with Iterative Noisy Label Detection.
Proceedings of the IEEE International Conference on Acoustics, 2022
DP-DWA: Dual-Path Dynamic Weight Attention Network With Streaming Dfsmn-San For Automatic Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2022
Referee: Towards Reference-Free Cross-Speaker Style Transfer with Low-Quality Data for Expressive Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2022
Enhancing Speaking Styles in Conversational Text-to-Speech Synthesis with Graph-Based Multi-Modal Context Modeling.
Proceedings of the IEEE International Conference on Acoustics, 2022
2021
Meta-Voice: Fast few-shot style transfer for expressive voice cloning using meta learning.
CoRR, 2021
GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10, 000 Hours of Transcribed Audio.
CoRR, 2021
Spoken Style Learning with Multi-modal Hierarchical Context Encoding for Conversational Text-to-Speech Synthesis.
CoRR, 2021
VARA-TTS: Non-Autoregressive Text-to-Speech Synthesis based on Very Deep VAE with Residual Attention.
CoRR, 2021
Effective Low-Cost Time-Domain Audio Separation Using Globally Attentive Locally Recurrent Networks.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
TeCANet: Temporal-Contextual Attention Network for Environment-Aware Speech Dereverberation.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Raw Waveform Encoder with Multi-Scale Globally Attentive Locally Recurrent Networks for End-to-End Speech Recognition.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Glow-WaveGAN: Learning Speech Representations from GAN-Based Variational Auto-Encoder for High Fidelity Flow-Based Speech Synthesis.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
GigaSpeech: An Evolving, Multi-Domain ASR Corpus with 10, 000 Hours of Transcribed Audio.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Proceedings of the ICMI '21 Companion: Companion Publication of the 2021 International Conference on Multimodal Interaction, Montreal, QC, Canada, October 18, 2021
Fastsvc: Fast Cross-Domain Singing Voice Conversion With Feature-Wise Linear Modulation.
Proceedings of the 2021 IEEE International Conference on Multimedia and Expo, 2021
A Joint Training Framework of Multi-Look Separator and Speaker Embedding Extractor for Overlapped Speech.
Proceedings of the IEEE International Conference on Acoustics, 2021
Proceedings of the IEEE International Conference on Acoustics, 2021
Proceedings of the IEEE International Conference on Acoustics, 2021
Proceedings of the IEEE International Conference on Acoustics, 2021
Sandglasset: A Light Multi-Granularity Self-Attentive Network for Time-Domain Speech Separation.
Proceedings of the IEEE International Conference on Acoustics, 2021
Learned Transferable Architectures Can Surpass Hand-Designed Architectures for Large Scale Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2021
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021
Speaker Independent and Multilingual/Mixlingual Speech-Driven Talking Head Generation Using Phonetic Posteriorgrams.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021
Tune-In: Training Under Negative Environments with Interference for Attention Networks Simulating Cocktail Party Effect.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021
2020
IEEE ACM Trans. Audio Speech Lang. Process., 2020
Neural Networks, 2020
Phonetic Posteriorgrams based Many-to-Many Singing Voice Conversion via Adversarial Training.
CoRR, 2020
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Investigating Robustness of Adversarial Samples Detection for Automatic Speaker Verification.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Multi-Level Deep Neural Network Adaptation for Speaker Verification Using MMD and Consistency Regularization.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Mixup-breakdown: A Consistency Training Method for Improving Generalization of Speech Separation Models.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Speaker-Aware Target Speaker Enhancement by Jointly Learning with Speaker Embedding Extraction.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Code-Switched Speech Synthesis Using Bilingual Phonetic Posteriorgram with Only Monolingual Corpora.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
2019
CoRR, 2019
Phrase-Level Class based Language Model for Mandarin Smart Speaker Query Recognition.
CoRR, 2019
Extract, Adapt and Recognize: An End-to-End Neural Network for Corrupted Monaural Speech Recognition.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Neural Spatial Filter: Target Speaker Speech Separation Assisted with Directional Information.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Disambiguation of Chinese Polyphones in an End-to-End Framework with Semantic Features Extracted by Pre-Trained BERT.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Proceedings of the IEEE International Conference on Acoustics, 2019
Joint Training of Complex Ratio Mask Based Beamformer and Acoustic Model for Noise Robust Asr.
Proceedings of the IEEE International Conference on Acoustics, 2019
Quasi-fully Convolutional Neural Network with Variational Inference for Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2019
Learning Discriminative Features in Sequence Training without Requiring Framewise Labelled Data.
Proceedings of the IEEE International Conference on Acoustics, 2019
Proceedings of the IEEE International Conference on Acoustics, 2019
Component Fusion: Learning Replaceable Language Model Component for End-to-end Speech Recognition System.
Proceedings of the IEEE International Conference on Acoustics, 2019
Boundary Discriminative Large Margin Cosine Loss for Text-independent Speaker Verification.
Proceedings of the IEEE International Conference on Acoustics, 2019
Proceedings of the IEEE International Conference on Acoustics, 2019
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019
Syllable-Dependent Discriminative Learning for Small Footprint Text-Dependent Speaker Verification.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019
2018
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Rapid Style Adaptation Using Residual Error Embedding for Expressive Speech Synthesis.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Improving Attention Based Sequence-to-Sequence Models for End-to-End English Conversational Speech Recognition.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Deep Extractor Network for Target Speaker Recovery from Single Channel Speech Mixtures.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Permutation Invariant Training of Generative Adversarial Network for Monaural Speech Separation.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
2017