2025

VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo.

[DOI]

Qianli Ma

Yaowei Zheng

CoRR, August, 2025

T2ISafety: Benchmark for Assessing Fairness, Toxicity, and Privacy in Image Generation.

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024

A Topic-level Self-Correctional Approach to Mitigate Hallucinations in MLLMs.

[DOI]

CoRR, 2024

WorldSimBench: Towards Video Generation Models as World Simulators.

[DOI]

CoRR, 2024

RH20T-P: A Primitive-Level Robotic Dataset Towards Composable Generalization Agents.

[DOI]

CoRR, 2024

Assessment of Multimodal Large Language Models in Alignment with Human Values.

[DOI]

CoRR, 2024

From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities.

[DOI]

CoRR, 2024

2023

ChEF: A Comprehensive Evaluation Framework for Standardized Assessment of Multimodal Large Language Models.

[DOI]

CoRR, 2023

LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset, Framework, and Benchmark.

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

2022

DanceFormer: Music Conditioned 3D Dance Generation with Parametric Motion Transformer.

[DOI]

Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022