Xu Tan
Orcid: 0000-0001-5631-0639Affiliations:
- Microsoft Research Asia, Beijing, China
According to our database1,
Xu Tan
authored at least 173 papers
between 2012 and 2024.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on orcid.org
On csauthors.net:
Bibliography
2024
IEEE Trans. Pattern Anal. Mach. Intell., December, 2024
IEEE Trans. Pattern Anal. Mach. Intell., June, 2024
CoRR, 2024
Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model.
CoRR, 2024
CoRR, 2024
UniAudio 1.5: Large Language Model-driven Audio Codec is A Few-shot Audio Task Learner.
CoRR, 2024
Make Your Actor Talk: Generalizable and High-Fidelity Lip Sync with Motion and Appearance Disentanglement.
CoRR, 2024
VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers.
CoRR, 2024
CoRR, 2024
D-CPT Law: Domain-specific Continual Pre-Training Scaling Law for Large Language Models.
CoRR, 2024
CoRR, 2024
RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis.
CoRR, 2024
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models.
CoRR, 2024
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024
UniStyle: Unified Style Modeling for Speaking Style Captioning and Stylistic Speech Synthesis.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024
Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024
Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024
Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024
Proceedings of the Forty-first International Conference on Machine Learning, 2024
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models.
Proceedings of the Forty-first International Conference on Machine Learning, 2024
NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers.
Proceedings of the Twelfth International Conference on Learning Representations, 2024
Proceedings of the Twelfth International Conference on Learning Representations, 2024
Proceedings of the Twelfth International Conference on Learning Representations, 2024
Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers.
Proceedings of the Twelfth International Conference on Learning Representations, 2024
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024
Mitigating Reversal Curse in Large Language Models via Semantic-aware Permutation Training.
Proceedings of the Findings of the Association for Computational Linguistics, 2024
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024
2023
IEEE J. Sel. Top. Signal Process., November, 2023
Artificial Intelligence: Foundations, Theory, and Algorithms, Springer, ISBN: 978-981-99-0826-4, 2023
CoRR, 2023
CoRR, 2023
GETMusic: Generating Any Music Tracks with a Unified Representation and Diffusion Framework.
CoRR, 2023
NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers.
CoRR, 2023
CoRR, 2023
ERA-Solver: Error-Robust Adams Solver for Fast Sampling of Diffusion Probabilistic Models.
CoRR, 2023
WuYun: Exploring hierarchical skeleton-guided melody generation using knowledge-enhanced deep learning.
CoRR, 2023
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023
Proceedings of the 31st ACM International Conference on Multimedia, 2023
Proceedings of the 31st ACM International Conference on Multimedia, 2023
Proceedings of the 31st ACM International Conference on Multimedia, 2023
DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder.
Proceedings of the 31st ACM International Conference on Multimedia, 2023
CLaMP: Contrastive Language-Music Pre-Training for Cross-Modal Symbolic Music Information Retrieval.
Proceedings of the 24th International Society for Music Information Retrieval Conference, 2023
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
NAS-FM: Neural Architecture Search for Tunable and Interpretable Sound Synthesis Based on Frequency Modulation.
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
MusicAgent: An AI Agent for Music Understanding and Generation with Large Language Models.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023
2022
CoRR, 2022
CoRR, 2022
Museformer: Transformer with Fine- and Coarse-Grained Attention for Music Generation.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022
BinauralGrad: A Two-Stage Conditional Diffusion Probabilistic Model for Binaural Audio Synthesis.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022
ReLyMe: Improving Lyric-to-Melody Generation by Incorporating Lyric-Melody Relationships.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022
MeloForm: Generating Melody with Musical Form based on Expert Systems and Neural Networks.
Proceedings of the 23rd International Society for Music Information Retrieval Conference, 2022
PDAugment: Data Augmentation by Pitch and Duration Adjustments for Automatic Lyrics Transcription.
Proceedings of the 23rd International Society for Music Information Retrieval Conference, 2022
Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme Representations for Text to Speech.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
DelightfulTTS 2: End-to-End Speech Synthesis with Adversarial Vector-Quantized Auto-Encoders.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Proceedings of the International Conference on Machine Learning, 2022
PriorGrad: Improving Conditional Denoising Diffusion Models with Data-Dependent Adaptive Prior.
Proceedings of the Tenth International Conference on Learning Representations, 2022
A Study on the Efficacy of Model Pre-Training In Developing Neural Text-to-Speech System.
Proceedings of the IEEE International Conference on Acoustics, 2022
Proceedings of the IEEE International Conference on Acoustics, 2022
Infergrad: Improving Diffusion Models for Vocoder by Considering Inference in Training.
Proceedings of the IEEE International Conference on Acoustics, 2022
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022
2021
FastCorrect 2: Fast Error Correction on Multiple Candidates for Automatic Speech Recognition.
CoRR, 2021
PriorGrad: Improving Conditional Denoising Diffusion Models with Data-Driven Adaptive Prior.
CoRR, 2021
FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021
NAS-BERT: Task-Agnostic and Adaptive-Size BERT Compression with Neural Architecture Search.
Proceedings of the KDD '21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2021
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Cross-Domain Speech Recognition with Unsupervised Character-Level Distribution Matching.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, 2021
Proceedings of the 9th International Conference on Learning Representations, 2021
Proceedings of the 9th International Conference on Learning Representations, 2021
Proceedings of the IEEE International Conference on Acoustics, 2021
Proceedings of the IEEE International Conference on Acoustics, 2021
Proceedings of the IEEE International Conference on Acoustics, 2021
Proceedings of the IEEE International Conference on Acoustics, 2021
Proceedings of the IEEE International Conference on Acoustics, 2021
FastCorrect 2: Fast Error Correction on Multiple Candidates for Automatic Speech Recognition.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2021, 2021
Proceedings of the Blizzard Challenge 2021, virtual, October 23, 2021, 2021
Proceedings of the Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, 2021
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021
2020
CoRR, 2020
VESR-Net: The Winning Solution to Youku Video Enhancement and Super-Resolution Challenge.
CoRR, 2020
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020
Proceedings of the KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2020
Proceedings of the KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2020
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, 2020
Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, 2020
Espnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Text-to-Speech Toolkit.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020
Fine-Tuning by Curriculum Learning for Non-Autoregressive Neural Machine Translation.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020
2019
Beyond Error Propagation: Language Branching Also Affects the Accuracy of Sequence Generation.
IEEE ACM Trans. Audio Speech Lang. Process., 2019
Hard but Robust, Easy but Sensitive: How Encoder and Decoder Perform in Neural Machine Translation.
CoRR, 2019
Proceedings of the Fourth Conference on Machine Translation, 2019
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019
Proceedings of the 36th International Conference on Machine Learning, 2019
Proceedings of the 36th International Conference on Machine Learning, 2019
Proceedings of the 7th International Conference on Learning Representations, 2019
Proceedings of the 7th International Conference on Learning Representations, 2019
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019
Knowledge Distillation from Bert in Pre-Training and Fine-Tuning for Polyphone Disambiguation.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019
Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019
2018
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, 2018
Proceedings of the 35th International Conference on Machine Learning, 2018
Beyond Error Propagation in Neural Machine Translation: Characteristics of Language Also Matter.
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31, 2018
Proceedings of the 27th International Conference on Computational Linguistics, 2018
2015
Structured Visual Feature Learning for Classification via Supervised Probabilistic Tensor Factorization.
IEEE Trans. Multim., 2015
2013
Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence, 2013
2012
Proceedings of the Intelligent Science and Intelligent Data Engineering, 2012
Proceedings of the Eighth International Conference on Intelligent Information Hiding and Multimedia Signal Processing, 2012