Qing Li

Orcid: 0000-0003-1185-5365

Affiliations:
  • Beijing Institute for General Artificial Intelligence (BIGAI), National Key Laboratory of General Artificial Intelligence, Beijing, China
  • University of California, Los Angeles, CA, USA (former)
  • University of Science and Technology of China, Hefei, China (former)


According to our database1, Qing Li authored at least 32 papers between 2016 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

2016
2017
2018
2019
2020
2021
2022
2023
2024
0
5
10
15
6
1
1
1
7
4
4
2
2
3
1

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Task-oriented Sequential Grounding in 3D Scenes.
CoRR, 2024

FIRE: A Dataset for Feedback Integration and Refinement Evaluation of Multimodal Models.
CoRR, 2024

OmniJARVIS: Unified Vision-Language-Action Tokenization Enables Open-World Instruction Following Agents.
CoRR, 2024

Semantic Gaussians: Open-Vocabulary Scene Understanding with 3D Gaussian Splatting.
CoRR, 2024

VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding.
CoRR, 2024

Parameter-Efficient Fine-Tuning for Pre-Trained Vision Models: A Survey.
CoRR, 2024

An Embodied Generalist Agent in 3D World.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Bongard-OpenWorld: Few-Shot Reasoning for Free-form Visual Concepts in the Real World.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Neural-Symbolic Recursive Machine for Systematic Generalization.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Unifying 3D Vision-Language Understanding via Promptable Queries.
Proceedings of the Computer Vision - ECCV 2024, 2024

SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding.
Proceedings of the Computer Vision - ECCV 2024, 2024

[inline-graphic not available: see fulltext]VideoAgent: A Memory-Augmented Multimodal Agent for Video Understanding.
Proceedings of the Computer Vision - ECCV 2024, 2024

CLOVA: A Closed-LOop Visual Assistant with Tool Usage and Update.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023
Learning non-Markovian Decision-Making from State-only Sequences.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

SQA3D: Situated Question Answering in 3D Scenes.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

A Minimalist Dataset for Systematic Generalization of Perception, Syntax, and Semantics.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

3D-VisTA: Pre-trained Transformer for 3D Vision and Text Alignment.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

2022
Perceive, Ground, Reason, and Act: A Benchmark for General-purpose Visual Representation.
CoRR, 2022

2021
A HINT from Arithmetic: On Systematic Generalization of Perception, Syntax, and Semantics.
CoRR, 2021

VLGrammar: Grounded Grammar Induction of Vision and Language.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

YouRefIt: Embodied Reference Understanding with Language and Gesture.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

SMART: A Situation Model for Algebra Story Problems via Attributed Grammar.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

Learning by Fixing: Solving Math Word Problems with Weak Supervision.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020
Closed Loop Neural-Symbolic Learning via Integrating Neural Perception, Grammar Parsing, and Symbolic Reasoning.
Proceedings of the 37th International Conference on Machine Learning, 2020

A Competence-Aware Curriculum for Visual Concepts Learning via Question Answering.
Proceedings of the Computer Vision - ECCV 2020, 2020

2019
Why Does a Visual Question Have Different Answers?
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

VizWiz-Priv: A Dataset for Recognizing the Presence and Purpose of Private Visual Information in Images Taken by Blind People.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018
Tell-and-Answer: Towards Explainable Visual Question Answering using Attributes and Captions.
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31, 2018

VQA-E: Explaining, Elaborating, and Enhancing Your Answers for Visual Questions.
Proceedings of the Computer Vision - ECCV 2018, 2018

VizWiz Grand Challenge: Answering Visual Questions From Blind People.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

2017
Learning hierarchical video representation for action recognition.
Int. J. Multim. Inf. Retr., 2017

2016
Action Recognition by Learning Deep Multi-Granular Spatio-Temporal Video Representation.
Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval, 2016


  Loading...