Difei Gao

Orcid: 0000-0001-8494-3492

According to our database1, Difei Gao authored at least 41 papers between 2015 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Event Graph Guided Compositional Spatial-Temporal Reasoning for Video Question Answering.
IEEE Trans. Image Process., 2024

Learning Video Context as Interleaved Multimodal Sequences.
CoRR, 2024

GUI Action Narrator: Where and When Did That Action Take Place?
CoRR, 2024

VideoGUI: A Benchmark for GUI Automation from Instructional Videos.
CoRR, 2024

LOVA3: Learning to Visual Question Answering, Asking and Assessment.
CoRR, 2024

Delocate: Detection and Localization for Deepfake Videos with Randomly-Located Tampered Traces.
Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024

VIT-LENS: Towards Omni-modal Representations.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

AssistGUI: Task-Oriented PC Graphical User Interface Automation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

VideoLLM-online: Online Video Large Language Model for Streaming Video.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023
CRIC: A VQA Dataset for Compositional Reasoning on Vision and Commonsense.
IEEE Trans. Pattern Anal. Mach. Intell., May, 2023

ASSISTGUI: Task-Oriented Desktop Graphical User Interface Automation.
CoRR, 2023

ViT-Lens-2: Gateway to Omni-modal Intelligence.
CoRR, 2023

CVPR 2023 Text Guided Video Editing Competition.
CoRR, 2023

Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation.
CoRR, 2023

Recap: Detecting Deepfake Video with Unpredictable Tampered Traces via Recovering Faces and Mapping Recovered Faces.
CoRR, 2023

GroundNLQ @ Ego4D Natural Language Queries Challenge 2023.
CoRR, 2023

AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn.
CoRR, 2023

Mover: Mask and Recovery based Facial Part Consistency Aware Method for Deepfake Video Detection.
CoRR, 2023

DeepfakeMAE: Facial Part Consistency Aware Masked Autoencoder for Deepfake Video Detection.
CoRR, 2023

Learning to Learn: How to Continuously Teach Humans and Machines.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

UniVTG: Towards Unified Video-Language Temporal Grounding.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

GazeVQA: A Video Question Answering Dataset for Multiview Eye-Gaze Task-Oriented Collaborations.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

MIST : Multi-modal Iterative Spatial-Temporal Transformer for Long-form Video Question Answering.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Affordance Grounding from Demonstration Video to Target Image.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

CONE: An Efficient COarse-to-fiNE Alignment Framework for Long Video Temporal Grounding.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

Symbolic Replay: Scene Graph as Prompt for Continual Learning on VQA Task.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
An Efficient COarse-to-fiNE Alignment Framework @ Ego4D Natural Language Queries Challenge 2022.
CoRR, 2022

Egocentric Video-Language Pretraining @ Ego4D Challenge 2022.
CoRR, 2022

Egocentric Video-Language Pretraining.
CoRR, 2022

GEB+: A benchmark for generic event boundary captioning, grounding and text-based retrieval.
CoRR, 2022

Egocentric Video-Language Pretraining.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

AssistSR: Task-oriented Video Segment Retrieval for Personal AI Assistant.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022

AssistQ: Affordance-Centric Question-Driven Task Completion for Egocentric Assistant.
Proceedings of the Computer Vision - ECCV 2022, 2022

GEB+: A Benchmark for Generic Event Boundary Captioning, Grounding and Retrieval.
Proceedings of the Computer Vision - ECCV 2022, 2022

2021
AssistSR: Affordance-centric Question-driven Video Segment Retrieval.
CoRR, 2021

Env-QA: A Video Question Answering Benchmark for Comprehensive Understanding of Dynamic Environments.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

2020
Learning to Recognize Visual Concepts for Visual Question Answering With Structural Label Space.
IEEE J. Sel. Top. Signal Process., 2020

Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019
From Two Graphs to N Questions: A VQA Dataset for Compositional Reasoning on Vision and Commonsense.
CoRR, 2019

2017
Visual Textbook Network: Watch Carefully before Answering Visual Questions.
Proceedings of the British Machine Vision Conference 2017, 2017

2015
Correlated warped Gaussian processes for gender-specific age estimation.
Proceedings of the 2015 IEEE International Conference on Image Processing, 2015


  Loading...