ST-Think: How Multimodal Large Language Models Reason About 4D Worlds from Ego-Centric Videos.
CoRR, March, 2025
VideoMAP: Toward Scalable Mamba-based Video Autoregressive Pretraining.
CoRR, March, 2025
MobileH2R: Learning Generalizable Human to Mobile Robot Handover Exclusively from Scalable and Diverse Synthetic Data.
CoRR, January, 2025
MAP: Unleashing Hybrid Mamba-Transformer Vision Backbone's Potential with Masked Autoregressive Pretraining.
CoRR, 2024
PhysReaction: Physically Plausible Real-Time Humanoid Reaction Synthesis via Forward Dynamics Guided 4D Imitation.
CoRR, 2024
PhysReaction: Physically Plausible Real-Time Humanoid Reaction Synthesis via Forward Dynamics Guided 4D Imitation.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024
CrossVideo: Self-supervised Cross-modal Contrastive Learning for Point Cloud Video Understanding.
Proceedings of the IEEE International Conference on Robotics and Automation, 2024
Physics-Aware Hand-Object Interaction Denoising.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
Empowering biologists to decode omics data: the Genekitr R package and web server.
BMC Bioinform., December, 2023
Interactive Humanoid: Online Full-Body Motion Reaction Synthesis with Social Affordance Canonicalization and Forecasting.
CoRR, 2023
NSM4D: Neural Scene Model Based Online 4D Point Cloud Sequence Understanding.
CoRR, 2023
LeaF: Learning Frames for 4D Point Cloud Sequence Understanding.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023
Complete-to-Partial 4D Distillation for Self-Supervised Point Cloud Sequence Representation Learning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
Point Primitive Transformer for Long-Term 4D Point Cloud Video Understanding.
Proceedings of the Computer Vision - ECCV 2022, 2022
HOI4D: A 4D Egocentric Dataset for Category-Level Human-Object Interaction.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022
Contrastive Multimodal Fusion with TupleInfoNCE.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021
P4Contrast: Contrastive Learning with Pairs of Point-Pixel Pairs for RGB-D Scene Understanding.
CoRR, 2020