Yifei Huang

Orcid: 0000-0001-8067-6227

Affiliations:
  • University of Tokyo, Japan
  • Shanghai Jiao Tong University, China (until 2015)


According to our database1, Yifei Huang authored at least 51 papers between 2017 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Matching Compound Prototypes for Few-Shot Action Recognition.
Int. J. Comput. Vis., September, 2024

Pre-Training for 3D Hand Pose Estimation with Contrastive Learning on Large-Scale Hand Images in the Wild.
CoRR, 2024

ActionVOS: Actions as Prompts for Video Object Segmentation.
CoRR, 2024

EgoVideo: Exploring Egocentric Foundation Model and Downstream Adaptation.
CoRR, 2024

InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding.
CoRR, 2024

Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding.
CoRR, 2024

FineBio: A Fine-Grained Video Dataset of Biological Experiments with Hierarchical Annotation.
CoRR, 2024

Masked Video and Body-Worn IMU Autoencoder for Egocentric Action Recognition.
Proceedings of the Computer Vision - ECCV 2024, 2024

Retrieval-Augmented Egocentric Video Captioning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives.
, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023
MoVQA: A Benchmark of Versatile Question-Answering for Long-Form Movie Understanding.
CoRR, 2023

Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives.
CoRR, 2023

VideoLLM: Modeling Video Sequence with Large Language Models.
CoRR, 2023

Fine-grained Affordance Annotation for Egocentric Hand-Object Interaction Videos.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023

3D Segmenter: 3D Transformer based Semantic Segmentation via 2D Panoramic Distillation.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Memory-and-Anticipation Transformer for Online Action Understanding.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Weakly Supervised Temporal Sentence Grounding with Uncertainty-Guided Self-training.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Structural Multiplane Image: Bridging Neural View Synthesis and 3D Reconstruction.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

First Bite/Chew: distinguish different types of food by first biting/chewing and the corresponding hand movement.
Proceedings of the Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems, 2023

Proposal-based Temporal Action Localization with Point-level Supervision.
Proceedings of the 34th British Machine Vision Conference 2023, 2023

First Bite/Chew: distinguish typical allergic food by two IMUs.
Proceedings of the Augmented Humans International Conference 2023, 2023

2022
Spatio-Temporal Perturbations for Video Attribution.
IEEE Trans. Circuits Syst. Video Technol., 2022

InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D Challenges.
CoRR, 2022

Precise Affordance Annotation for Egocentric Action Video Datasets.
CoRR, 2022

Seeing our Blind Spots: Smart Glasses-based Simulation to Increase Design Students' Awareness of Visual Impairment.
Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology, 2022

Inner self drawing machine.
Proceedings of the SIGGRAPH Asia 2022 Art Gallery, 2022

GazeSync: Eye Movement Transfer Using an Optical Eye Tracker and Monochrome Liquid Crystal Displays.
Proceedings of the IUI 2022: 27th International Conference on Intelligent User Interfaces, Helsinki, Finland, March 22 - 25, 2022, 2022

Compound Prototype Matching for Few-Shot Action Recognition.
Proceedings of the Computer Vision - ECCV 2022, 2022

Interact before Align: Leveraging Cross-Modal Knowledge for Domain Adaptive Action Recognition.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022


2021
Ego4D: Around the World in 3, 000 Hours of Egocentric Video.
CoRR, 2021

Spatio-Temporal Perturbations for Video Attribution.
CoRR, 2021

EPIC-KITCHENS-100 Unsupervised Domain Adaptation Challenge for Action Recognition 2021: Team M3EM Technical Report.
CoRR, 2021

Towards Visually Explaining Video Understanding Networks with Perturbation.
Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2021

Precise Multi-Modal In-Hand Pose Estimation using Low-Precision Sensors for Robotic Assembly.
Proceedings of the IEEE International Conference on Robotics and Automation, 2021

Goal-Oriented Gaze Estimation for Zero-Shot Learning.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Stacked Temporal Attention: Improving First-person Action Recognition by Emphasizing Discriminative Clips.
Proceedings of the 32nd British Machine Vision Conference 2021, 2021

Leveraging Human Selective Attention for Medical Image Analysis with Limited Training Data.
Proceedings of the 32nd British Machine Vision Conference 2021, 2021

Commonsense Knowledge Aware Concept Selection For Diverse and Informative Visual Storytelling.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020
Mutual Context Network for Jointly Estimating Egocentric Gaze and Action.
IEEE Trans. Image Process., 2020

An Ego-Vision System for Discovering Human Joint Attention.
IEEE Trans. Hum. Mach. Syst., 2020

Learn to Extract Building Outline from Misaligned Annotation through Nearest Feature Selector.
Remote. Sens., 2020

A Comprehensive Study on Visual Explanations for Spatio-temporal Networks.
CoRR, 2020

Learn to Recover Visible Color for Video Surveillance in a Day.
Proceedings of the Computer Vision - ECCV 2020, 2020

Improving Action Segmentation via Graph-Based Temporal Reasoning.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019
Mutual Context Network for Jointly Estimating Egocentric Gaze and Actions.
CoRR, 2019

Manipulation-Skill Assessment from Videos with Spatial Attention Network.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshops, 2019

2018
Predicting Gaze in Egocentric Video by Learning Task-Dependent Attention Transition.
Proceedings of the Computer Vision - ECCV 2018, 2018

Semantic Aware Attention Based Deep Object Co-segmentation.
Proceedings of the Computer Vision - ACCV 2018, 2018

2017
Temporal Localization and Spatial Segmentation of Joint Attention in Multiple First-Person Videos.
Proceedings of the 2017 IEEE International Conference on Computer Vision Workshops, 2017


  Loading...