Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024

AudioLCM: Efficient and High-Quality Text-to-Audio Generation with Minimal Inference Steps.

[BibT_eX]

[DOI]

Huadai Liu

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

VoiceTuner: Self-Supervised Pre-training and Efficient Fine-tuning For Voice Generation.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

UniAudio: Towards Universal Audio Generation with Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

InstructSpeech: Following Speech Editing Instructions via Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

FreeBind: Free Lunch in Unified Multimodal Space via Knowledge Fusion.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

TCSinger: Zero-Shot Singing Voice Synthesis with Style Transfer and Multi-Level Style Control.

[BibT_eX]

[DOI]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Speech-to-Speech Translation with Discrete-Unit-Based Style Transfer.

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics, 2024

Wav2SQL: Direct Generalizable Speech-To-SQL Parsing.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2024

Self-Supervised Singing Voice Pre-Training towards Speech-to-Singing Conversion.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2024

Robust Singing Voice Transcription Serves Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

Make-A-Voice: Revisiting Voice Large Language Models as Scalable Multilingual and Multitask Learners.

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

Text-to-Song: Towards Controllable Music Generation Incorporating Vocal and Accompaniment.

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

TransFace: Unit-Based Audio-Visual Speech Synthesizer for Talking Head Translation.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2024

StyleSinger: Style Transfer for Out-of-Domain Singing Voice Synthesis.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

TransFace: Unit-Based Audio-Visual Speech Synthesizer for Talking Head Translation.

[BibT_eX]

[DOI]

CoRR, 2023

Chat-3D v2: Bridging 3D Scene and Large Language Models with Object Identifiers.

[BibT_eX]

[DOI]

CoRR, 2023

UniAudio: An Audio Foundation Model Toward Universal Audio Generation.

[BibT_eX]

[DOI]

CoRR, 2023

Mega-TTS: Zero-Shot Text-to-Speech at Scale with Intrinsic Inductive Bias.

[BibT_eX]

[DOI]

CoRR, 2023

Detector Guidance for Multi-Object Text-to-Image Generation.

[BibT_eX]

[DOI]

CoRR, 2023

Make-A-Voice: Unified Voice Synthesis With Discrete Representation.

[BibT_eX]

[DOI]

CoRR, 2023

Make-An-Audio 2: Temporal-Enhanced Text-to-Audio Generation.

[BibT_eX]

[DOI]

CoRR, 2023

HiFi-Codec: Group-residual Vector quantization for High Fidelity Audio Codec.

[BibT_eX]

[DOI]

CoRR, 2023

GeneFace++: Generalized and Stable Real-Time Audio-Driven 3D Talking Face Generation.

[BibT_eX]

[DOI]

CoRR, 2023

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head.

[BibT_eX]

[DOI]

CoRR, 2023

MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup for Visual Speech Translation and Recognition.

[BibT_eX]

[DOI]

CoRR, 2023

InstructTTS: Modelling Expressive TTS in Discrete Latent Space with Natural Language Style Prompt.

[BibT_eX]

[DOI]

CoRR, 2023

UniSinger: Unified End-to-End Singing Voice Synthesis With Cross-Modality Information Matching.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

TranSpeech: Speech-to-Speech Translation With Bilateral Perturbation.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup for Visual Speech Translation and Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

VarietySound: Timbre-Controllable Video to Sound Generation Via Unsupervised Information Disentanglement.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

ViT-TTS: Visual Text-to-Speech with Scalable Diffusion Transformer.

[BibT_eX]

[DOI]

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

CLAPSpeech: Learning Prosody from Text Context with Contrastive Language-Audio Pre-Training.

[BibT_eX]

[DOI]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

Contrastive Token-Wise Meta-Learning for Unseen Performer Visual Temporal-Aligned Translation.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

AlignSTS: Speech-to-Singing Conversion via Cross-Modal Alignment.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

FluentSpeech: Stutter-Oriented Automatic Speech Editing with Context-Aware Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

Prosody-TTS: Improving Prosody with Masked Autoencoder and Conditional Diffusion Model For Expressive Text-to-Speech.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

AV-TranSpeech: Audio-Visual Robust Speech-to-Speech Translation.

[BibT_eX]

[DOI]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

FastDiff 2: Revisiting and Incorporating GANs and Diffusion Models in High-Fidelity Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

RMSSinger: Realistic-Music-Score based Singing Voice Synthesis.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

2022

VarietySound: Timbre-Controllable Video to Sound Generation via Unsupervised Information Disentanglement.

[BibT_eX]

[DOI]

CoRR, 2022

GenerSpeech: Towards Style Transfer for Generalizable Out-Of-Domain Text-to-Speech Synthesis.

[BibT_eX]

[DOI]

CoRR, 2022

Boundary element analysis of thin structures using a dual transformation method for weakly singular boundary integrals.

[BibT_eX]

[DOI]

Comput. Math. Appl., 2022

M4Singer: A Multi-Style, Multi-Singer and Musical Score Provided Mandarin Singing Corpus.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

GenerSpeech: Towards Style Transfer for Generalizable Out-Of-Domain Text-to-Speech.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

ProDiff: Progressive Fast Diffusion Model for High-Quality Text-to-Speech.

[BibT_eX]

[DOI]

Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

SingGAN: Generative Adversarial Network For High-Fidelity Singing Voice Generation.

[BibT_eX]

[DOI]

Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, 2022

2021

SingGAN: Generative Adversarial Network For High-Fidelity Singing Voice Generation.

[BibT_eX]

[DOI]

CoRR, 2021

Bilateral Denoising Diffusion Models.

[BibT_eX]

[DOI]

CoRR, 2021

Multi-Singer: Fast Multi-Singer Singing Voice Vocoder With A Large-Scale Corpus.

[BibT_eX]

[DOI]

Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

EMOVIE: A Mandarin Emotion Speech Dataset with a Simple Emotional Text-to-Speech Model.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

2017

Research on Dynamic Safe Loading Techniques in Android Application Protection System.

[BibT_eX]

[DOI]

Proceedings of the Smart Computing and Communication, 2017

Rongjie Huang

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...