2025
SAM2Act: Integrating Visual Foundation Model with A Memory Architecture for Robotic Manipulation.
CoRR, January, 2025

2024
AHA: A Vision-Language-Model for Detecting and Reasoning Over Failures in Robotic Manipulation.
CoRR, 2024

Manipulate-Anything: Automating Real-World Robots using Vision-Language Models.
Proceedings of the Conference on Robot Learning, 6-9 November 2024, Munich, Germany., 2024

2023
Read My Mind: A Multi-Modal Dataset for Human Belief Prediction.
CoRR, 2023

A Study of Comfortability between Interactive AI and Human.
CoRR, 2023

MVTrans: Multi-View Perception of Transparent Objects.
Proceedings of the IEEE International Conference on Robotics and Automation, 2023

NEWTON: Are Large Language Models Capable of Physical Reasoning?
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

AR2-D2: Training a Robot Without a Robot.
Proceedings of the Conference on Robot Learning, 2023

2021
Predicting 3D shapes, masks, and properties of materials, liquids, and objects inside transparent containers, using the TransProteus CGI dataset.
CoRR, 2021

CONetV2: Efficient Auto-Channel Size Optimization for CNNs.
Proceedings of the 20th IEEE International Conference on Machine Learning and Applications, 2021

Seeing Glass: Joint Point-Cloud and Depth Completion for Transparent Objects.
Proceedings of the Conference on Robot Learning, 8-11 November 2021, London, UK., 2021