Qing Li

Orcid: 0000-0003-1185-5365

Affiliations:

Beijing Institute for General Artificial Intelligence (BIGAI), National Key Laboratory of General Artificial Intelligence, Beijing, China
University of California, Los Angeles, CA, USA (former)
University of Science and Technology of China, Hefei, China (former)

According to our database¹, Qing Li authored at least 33 papers between 2016 and 2024.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Bibliography

2024

Multi-modal Agent Tuning: Building a VLM-Driven Agent for Efficient Tool Usage.

[BibT_eX]

[DOI]

CoRR, 2024

Task-oriented Sequential Grounding in 3D Scenes.

[BibT_eX]

[DOI]

CoRR, 2024

FIRE: A Dataset for Feedback Integration and Refinement Evaluation of Multimodal Models.

[BibT_eX]

[DOI]

CoRR, 2024

OmniJARVIS: Unified Vision-Language-Action Tokenization Enables Open-World Instruction Following Agents.

[BibT_eX]

[DOI]

CoRR, 2024

Semantic Gaussians: Open-Vocabulary Scene Understanding with 3D Gaussian Splatting.

[BibT_eX]

[DOI]

CoRR, 2024

VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding.

[BibT_eX]

[DOI]

CoRR, 2024

Parameter-Efficient Fine-Tuning for Pre-Trained Vision Models: A Survey.

[BibT_eX]

[DOI]

CoRR, 2024

An Embodied Generalist Agent in 3D World.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Bongard-OpenWorld: Few-Shot Reasoning for Free-form Visual Concepts in the Real World.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Neural-Symbolic Recursive Machine for Systematic Generalization.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Unifying 3D Vision-Language Understanding via Promptable Queries.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

[inline-graphic not available: see fulltext]VideoAgent: A Memory-Augmented Multimodal Agent for Video Understanding.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

CLOVA: A Closed-LOop Visual Assistant with Tool Usage and Update.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023

Learning non-Markovian Decision-Making from State-only Sequences.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

SQA3D: Situated Question Answering in 3D Scenes.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

A Minimalist Dataset for Systematic Generalization of Perception, Syntax, and Semantics.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

3D-VisTA: Pre-trained Transformer for 3D Vision and Text Alignment.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

2022

Perceive, Ground, Reason, and Act: A Benchmark for General-purpose Visual Representation.

[BibT_eX]

[DOI]

CoRR, 2022

2021

A HINT from Arithmetic: On Systematic Generalization of Perception, Syntax, and Semantics.

[BibT_eX]

[DOI]

CoRR, 2021

VLGrammar: Grounded Grammar Induction of Vision and Language.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

YouRefIt: Embodied Reference Understanding with Language and Gesture.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

SMART: A Situation Model for Algebra Story Problems via Attributed Grammar.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

Learning by Fixing: Solving Math Word Problems with Weak Supervision.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020

Closed Loop Neural-Symbolic Learning via Integrating Neural Perception, Grammar Parsing, and Symbolic Reasoning.

[BibT_eX]

[DOI]

Proceedings of the 37th International Conference on Machine Learning, 2020

A Competence-Aware Curriculum for Visual Concepts Learning via Question Answering.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

2019

Why Does a Visual Question Have Different Answers?

[BibT_eX]

[DOI]

Nilavra Bhattacharya

Qing Li

Danna Gurari

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

VizWiz-Priv: A Dataset for Recognizing the Presence and Purpose of Private Visual Information in Images Taken by Blind People.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018

Tell-and-Answer: Towards Explainable Visual Question Answering using Attributes and Captions.

[BibT_eX]

[DOI]

Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31, 2018

VQA-E: Explaining, Elaborating, and Enhancing Your Answers for Visual Questions.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2018, 2018

VizWiz Grand Challenge: Answering Visual Questions From Blind People.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

2017

Learning hierarchical video representation for action recognition.

[BibT_eX]

[DOI]

Int. J. Multim. Inf. Retr., 2017

2016

Action Recognition by Learning Deep Multi-Granular Spatio-Temporal Video Representation.

[BibT_eX]

[DOI]

Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval, 2016

Qing Li

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...