B-VLLM: A Vision Large Language Model with Balanced Spatio-Temporal Tokens.
CoRR, 2024
Are We There Yet? Revealing the Risks of Utilizing Large Language Models in Scholarly Peer Review.
CoRR, 2024
OASIS: Open Agent Social Interaction Simulations with One Million Agents.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2024
WorldSimBench: Towards Video Generation Models as World Simulators.
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2024
Two Heads Are Better Than One: A Multi-Agent System Has the Potential to Improve Scientific Idea Generation.
CoRR, 2024
GenderBias-VL: Benchmarking Gender Bias in Vision Language Models via Counterfactual Probing.
CoRR, 2024
SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Model.
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2024
RH20T-P: A Primitive-Level Robotic Dataset Towards Composable Generalization Agents.
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2024
Assessment of Multimodal Large Language Models in Alignment with Human Values.
CoRR, 2024
MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simulated-World Control.
CoRR, 2024
Uni3D-LLM: Unifying Point Cloud Perception, Generation and Editing with Large Language Models.
CoRR, 2024
From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2024
3D Point Cloud Pre-Training with Knowledge Distilled from 2D Images.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2024
Octavius: Mitigating Task Interference in MLLMs via LoRA-MoE.
Proceedings of the Twelfth International Conference on Learning Representations, 2024
Depicting Beyond Scores: Advancing Image Quality Assessment Through Multi-modal Language Models.
Proceedings of the Computer Vision - ECCV 2024, 2024
MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
Towards Tracing Trustworthiness Dynamics: Revisiting Pre-training Period of Large Language Models.
Proceedings of the Findings of the Association for Computational Linguistics, 2024
ChEF: A Comprehensive Evaluation Framework for Standardized Assessment of Multimodal Large Language Models.
CoRR, 2023
Octavius: Mitigating Task Interference in MLLMs via MoE.
CoRR, 2023
Latent Distribution Adjusting for Face Anti-Spoofing.
CoRR, 2023
LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset, Framework, and Benchmark.
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023
3D Point Cloud Pre-training with Knowledge Distillation from 2D Images.
CoRR, 2022
Robust Face Anti-Spoofing with Dual Probabilistic Modeling.
CoRR, 2022
X-Learner: Learning Cross Sources and Tasks for Universal Visual Representation.
CoRR, 2022
Bamboo: Building Mega-Scale Vision Dataset Continually with Human-Machine Synergy.
CoRR, 2022
Benchmarking Omni-Vision Representation Through the Lens of Visual Realms.
Proceedings of the Computer Vision - ECCV 2022, 2022
X-Learner: Learning Cross Sources and Tasks for Universal Visual Representation.
Proceedings of the Computer Vision - ECCV 2022, 2022
One to Transfer All: A Universal Transfer Framework for Vision Foundation Model with Few Data.
CoRR, 2021
Few-Shot Domain Expansion for Face Anti-Spoofing.
CoRR, 2021
CelebA-Spoof Challenge 2020 on Face Anti-Spoofing: Methods and Results.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2021
CelebA-Spoof: Large-Scale Face Anti-spoofing Dataset with Rich Annotations.
Proceedings of the Computer Vision - ECCV 2020, 2020