Shengpeng Ji

Orcid: 0000-0003-0988-5266

According to our database1, Shengpeng Ji authored at least 22 papers between 2022 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
VoxpopuliTTS: a large-scale multilingual TTS corpus for zero-shot speech generation.
Proceedings of the 31st International Conference on Computational Linguistics, 2025

2024
Generating Neural Networks for Diverse Networking Classification Tasks via Hardware-Aware Neural Architecture Search.
IEEE Trans. Computers, February, 2024

Speech Watermarking with Discrete Intermediate Representations.
CoRR, 2024

LLaVA-MR: Large Language-and-Vision Assistant for Video Moment Retrieval.
CoRR, 2024

WavChat: A Survey of Spoken Dialogue Models.
CoRR, 2024

OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup.
CoRR, 2024

MuVi: Video-to-Music Generation with Semantic Alignment and Rhythmic Synchronization.
CoRR, 2024

WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling.
CoRR, 2024

FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs.
CoRR, 2024

ACE: A Generative Cross-Modal Retrieval Framework with Coarse-To-Fine Semantic Modeling.
CoRR, 2024

ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and Zero-shot Language Style Control With Decoupled Codec.
CoRR, 2024

Unlocking the Potential of Multimodal Unified Discrete Representation through Training-Free Codebook Optimization and Hierarchical Alignment.
CoRR, 2024

Language-Codec: Reducing the Gaps Between Discrete Codec Representation and Speech Language Models.
CoRR, 2024

MobileSpeech: A Fast and High-Fidelity Framework for Mobile Zero-Shot Text-to-Speech.
CoRR, 2024

SyncTalklip: Highly Synchronized Lip-Readable Speaker Generation with Multi-Task Learning.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech Synthesis.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

TextrolSpeech: A Text Style Control Speech Corpus with Codec Language Text-to-Speech Models.
Proceedings of the IEEE International Conference on Acoustics, 2024

AudioVSR: Enhancing Video Speech Recognition with Audio Data.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

MobileSpeech: A Fast and High-Fidelity Framework for Mobile Zero-Shot Text-to-Speech.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023
TextrolSpeech: A Text Style Control Speech Corpus With Codec Language Text-to-Speech Models.
CoRR, 2023

Mega-TTS: Zero-Shot Text-to-Speech at Scale with Intrinsic Inductive Bias.
CoRR, 2023

2022
Coded Distributed Computing Schemes with Fewer Output Functions.
Proceedings of the 6th International Conference on Computer Science and Artificial Intelligence, 2022


  Loading...