Xiang Yin

Orcid: 0000-0003-1324-4277

Affiliations:
  • ByteDance AI Lab, China


According to our database1, Xiang Yin authored at least 47 papers between 2020 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
MSGCN-ISTL: A multi-scaled self-attention-enhanced graph convolutional network with improved STL decomposition for probabilistic load forecasting.
Expert Syst. Appl., March, 2024

RefXVC: Cross-Lingual Voice Conversion With Enhanced Reference Leveraging.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

MimicTalk: Mimicking a personalized and expressive 3D talking face in minutes.
CoRR, 2024

MulliVC: Multi-lingual Voice Conversion With Cycle Consistency.
CoRR, 2024

Generative Expressive Conversational Speech Synthesis.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech Synthesis.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Emotion Rendering for Conversational Speech Synthesis with Heterogeneous Graph-Based Context Modeling.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Static-dynamic collaborative graph convolutional network with meta-learning for node-level traffic flow prediction.
Expert Syst. Appl., October, 2023

Spatiotemporal dynamic graph convolutional network for traffic speed forecasting.
Inf. Sci., September, 2023

EnchantDance: Unveiling the Potential of Music-Driven Dance Movement.
CoRR, 2023

C2G2: Controllable Co-speech Gesture Generation with Latent Diffusion Model.
CoRR, 2023

Mega-TTS 2: Zero-Shot Text-to-Speech with Arbitrary Length Speech Prompts.
CoRR, 2023

Mega-TTS: Zero-Shot Text-to-Speech at Scale with Intrinsic Inductive Bias.
CoRR, 2023

Ada-TTA: Towards Adaptive High-Quality Text-to-Talking Avatar Synthesis.
CoRR, 2023

Detector Guidance for Multi-Object Text-to-Image Generation.
CoRR, 2023

Make-An-Audio 2: Temporal-Enhanced Text-to-Audio Generation.
CoRR, 2023

GeneFace++: Generalized and Stable Real-Time Audio-Driven 3D Talking Face Generation.
CoRR, 2023

Towards Building Voice-based Conversational Recommender Systems: Datasets, Potential Solutions and Prospects.
Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2023

Unsupervised Video Domain Adaptation for Action Recognition: A Disentanglement Perspective.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Emotionally Situated Text-to-Speech Synthesis in User-Agent Conversation.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

S2CD: Self-heuristic Speaker Content Disentanglement for Any-to-Any Voice Conversion.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

StyleS2ST: Zero-shot Style Transfer for Direct Speech-to-speech Translation.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

GenerTTS: Pronunciation Disentanglement for Timbre and Style Generalization in Cross-Lingual Text-to-Speech.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

AudioQR: Deep Neural Audio Watermarks For QR Code.
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023

Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models.
Proceedings of the International Conference on Machine Learning, 2023

Virtual Try-On with Pose-Garment Keypoints Guided Inpainting.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

LiteG2P: A Fast, Light and High Accuracy Model for Grapheme-to-Phoneme Conversion.
Proceedings of the IEEE International Conference on Acoustics, 2023

CLAPSpeech: Learning Prosody from Text Context with Contrastive Language-Audio Pre-Training.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

UniLG: A Unified Structure-aware Framework for Lyrics Generation.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

AV-TranSpeech: Audio-Visual Robust Speech-to-Speech Translation.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

2022
Direct Speech-to-speech Translation without Textual Annotation using Bottleneck Features.
CoRR, 2022

Unsupervised Video Domain Adaptation: A Disentanglement Perspective.
CoRR, 2022

A Novel Chinese Dialect TTS Frontend with Non-Autoregressive Neural Machine Translation.
CoRR, 2022

Towards high-fidelity singing voice conversion with acoustic reference and contrastive predictive coding.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

An Automatic Soundtracking System for Text-to-Speech Audiobooks.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Towards Using Clothes Style Transfer for Scenario-Aware Person Video Generation.
Proceedings of the IEEE International Conference on Acoustics, 2022

2021
Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech.
CoRR, 2021

Towards Realistic Visual Dubbing with Heterogeneous Sources.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

ByteSing: A Chinese Singing Voice Synthesis System Using Duration Allocated Encoder-Decoder Acoustic Models and WaveRNN Vocoders.
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

Fine-Grained Prosody Modeling in Neural Speech Synthesis Using ToBI Representation.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

A Chapter-Wise Understanding System for Text-To-Speech in Chinese Novels.
Proceedings of the IEEE International Conference on Acoustics, 2021

PPG-Based Singing Voice Conversion with Adversarial Representation Learning.
Proceedings of the IEEE International Conference on Acoustics, 2021

2020
Improving Accent Conversion with Reference Encoder and End-To-End Text-To-Speech.
CoRR, 2020

A Hybrid Text Normalization System Using Multi-Head Self-Attention For Mandarin.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

A Unified Sequence-to-Sequence Front-End Model for Mandarin Text-to-Speech Synthesis.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Xiaomingbot: A Multilingual Robot News Reporter.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 2020


  Loading...