Le Zhuo

Orcid: 0000-0001-7895-091X

According to our database¹, Le Zhuo authored at least 20 papers between 2022 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2025

IMAGINE-E: Image Generation Intelligence Evaluation of State-of-the-art Text-to-Image Models.

[BibT_eX]

[DOI]

CoRR, January, 2025

2024

Multimodal Music Generation with Explicit Bridges and Retrieval Augmentation.

[BibT_eX]

[DOI]

CoRR, 2024

VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection.

[BibT_eX]

[DOI]

CoRR, 2024

Customize Your Visual Autoregressive Recipe with Set Autoregressive Modeling.

[BibT_eX]

[DOI]

CoRR, 2024

I-Max: Maximize the Resolution Potential of Pre-trained Rectified Flow Transformers with Projected Flow.

[BibT_eX]

[DOI]

CoRR, 2024

PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions.

[BibT_eX]

[DOI]

CoRR, 2024

LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation.

[BibT_eX]

[DOI]

CoRR, 2024

Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining.

[BibT_eX]

[DOI]

CoRR, 2024

Lumina-Next: Making Lumina-T2X Stronger and Faster with Next-DiT.

[BibT_eX]

[DOI]

CoRR, 2024

Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers.

[BibT_eX]

[DOI]

CoRR, 2024

ProtLLM: An Interleaved Protein-Language LLM with Protein-as-Word Pre-Training.

[BibT_eX]

[DOI]

CoRR, 2024

Lumina-Next : Making Lumina-T2X Stronger and Faster with Next-DiT.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

ProtLLM: An Interleaved Protein-Language LLM with Protein-as-Word Pre-Training.

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023

LLMs as Visual Explainers: Advancing Image Classification with Evolving Visual Descriptions.

[BibT_eX]

[DOI]

CoRR, 2023

GraphText: Graph Reasoning in Text Space.

[BibT_eX]

[DOI]

CoRR, 2023

MARBLE: Music Audio Representation Benchmark for Universal Evaluation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

DiffDance: Cascaded Human Motion Diffusion Model for Dance Generation.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

LyricWhiz: Robust Multilingual Zero-Shot Lyrics Transcription by Whispering to ChatGPT.

[BibT_eX]

[DOI]

Proceedings of the 24th International Society for Music Information Retrieval Conference, 2023

Video Background Music Generation: Dataset, Method and Evaluation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

2022

Video Background Music Generation: Dataset, Method and Evaluation.

[BibT_eX]

[DOI]

CoRR, 2022

Le Zhuo

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...