Jiaqi Wang

Orcid: 0000-0001-6877-5353

Affiliations:

Shanghai Artificial Intelligence Laboratory, China

According to our database¹, Jiaqi Wang authored at least 78 papers between 2018 and 2024.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Bibliography

2024

IDArb: Intrinsic Decomposition for Arbitrary Number of Input Views and Illuminations.

[BibT_eX]

[DOI]

CoRR, 2024

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions.

[BibT_eX]

[DOI]

CoRR, 2024

FiVA: Fine-grained Visual Attribute Dataset for Text-to-Image Diffusion Models.

[BibT_eX]

[DOI]

CoRR, 2024

SimC3D: A Simple Contrastive 3D Pretraining Framework Using RGB Images.

[BibT_eX]

[DOI]

CoRR, 2024

X-Prompt: Towards Universal In-Context Image Generation in Auto-Regressive Vision Language Foundation Models.

[BibT_eX]

[DOI]

CoRR, 2024

MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction.

[BibT_eX]

[DOI]

CoRR, 2024

SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree.

[BibT_eX]

[DOI]

CoRR, 2024

Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate.

[BibT_eX]

[DOI]

CoRR, 2024

Utilize the Flow before Stepping into the Same River Twice: Certainty Represented Knowledge Flow for Refusal-Aware Instruction Tuning.

[BibT_eX]

[DOI]

CoRR, 2024

BroadWay: Boost Your Text-to-Video Generation Model in a Training-free Way.

[BibT_eX]

[DOI]

CoRR, 2024

Tailor3D: Customized 3D Assets Editing and Generation with Dual-Side Images.

[BibT_eX]

[DOI]

CoRR, 2024

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output.

[BibT_eX]

[DOI]

CoRR, 2024

MMLongBench-Doc: Benchmarking Long-context Document Understanding with Visualizations.

[BibT_eX]

[DOI]

CoRR, 2024

Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs.

[BibT_eX]

[DOI]

CoRR, 2024

MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs.

[BibT_eX]

[DOI]

CoRR, 2024

V3Det Challenge 2024 on Vast Vocabulary and Open Vocabulary Object Detection: Methods and Results.

[BibT_eX]

[DOI]

CoRR, 2024

MotionClone: Training-Free Motion Cloning for Controllable Video Generation.

[BibT_eX]

[DOI]

CoRR, 2024

ShareGPT4Video: Improving Video Understanding and Generation with Better Captions.

[BibT_eX]

[DOI]

CoRR, 2024

Bootstrap3D: Improving 3D Content Creation with Synthetic Data.

[BibT_eX]

[DOI]

CoRR, 2024

Streaming Long Video Understanding with Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

ReasonPix2Pix: Instruction Reasoning Dataset for Advanced Image Editing.

[BibT_eX]

[DOI]

CoRR, 2024

Make-it-Real: Unleashing Large Multimodal Model's Ability for Painting 3D Objects with Realistic Materials.

[BibT_eX]

[DOI]

CoRR, 2024

How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites.

[BibT_eX]

[DOI]

CoRR, 2024

Unified Scene Representation and Reconstruction for 3D Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD.

[BibT_eX]

[DOI]

CoRR, 2024

Are We on the Right Way for Evaluating Large Vision-Language Models?

[BibT_eX]

[DOI]

CoRR, 2024

InternLM2 Technical Report.

[BibT_eX]

[DOI]

et al.

CoRR, 2024

RAR: Retrieving And Ranking Augmented MLLMs for Visual Recognition.

[BibT_eX]

[DOI]

CoRR, 2024

SongComposer: A Large Language Model for Lyric and Melody Composition in Song Generation.

[BibT_eX]

[DOI]

CoRR, 2024

DualFocus: Integrating Macro and Micro Perspectives in Multi-modal Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

SepRep-Net: Multi-source Free Domain Adaptation via Model Separation And Reparameterization.

[BibT_eX]

[DOI]

Ying Jin

Jiaqi Wang

Dahua Lin

CoRR, 2024

InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model.

[BibT_eX]

[DOI]

CoRR, 2024

VLMEvalKit: An Open-Source ToolKit for Evaluating Large Multi-Modality Models.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Long-CLIP: Unlocking the Long-Text Capability of CLIP.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Adversarial Prompt Tuning for Vision-Language Models.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

MMBench: Is Your Multi-modal Model an All-Around Player?

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

ShareGPT4V: Improving Large Multi-modal Models with Better Captions.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

GPT4Point: A Unified Framework for Point-Language Understanding and Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

OneLLM: One Framework to Align All Modalities with Language.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Alpha-CLIP: A CLIP Model Focusing on Wherever you Want.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

VIGC: Visual Instruction Generation and Correction.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

OCBEV: Object-Centric BEV Transformer for Multi-View 3D Object Detection.

[BibT_eX]

[DOI]

Proceedings of the International Conference on 3D Vision, 2024

2023

Gemini vs GPT-4V: A Preliminary Comparison and Combination of Vision-Language Models Through Qualitative Cases.

[BibT_eX]

[DOI]

CoRR, 2023

Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization.

[BibT_eX]

[DOI]

CoRR, 2023

InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition.

[BibT_eX]

[DOI]

CoRR, 2023

MLLM-DataEngine: An Iterative Refinement Approach for MLLM.

[BibT_eX]

[DOI]

CoRR, 2023

WanJuan: A Comprehensive Multimodal Dataset for Advancing English and Chinese Large Models.

[BibT_eX]

[DOI]

CoRR, 2023

OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic Perception, Reconstruction and Generation.

[BibT_eX]

[DOI]

CoRR, 2023

HyperDreamer: Hyper-Realistic 3D Content Generation and Editing from a Single Image.

[BibT_eX]

[DOI]

Proceedings of the SIGGRAPH Asia 2023 Conference Papers, 2023

Zero-shot Skeleton-based Action Recognition via Mutual Information Estimation and Maximization.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformers.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

Voxurf: Voxel-based Efficient and Accurate Neural Surface Reconstruction.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

V3Det: Vast Vocabulary Visual Detection Dataset.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Dense Distinct Query for End-to-End Object Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic Perception, Reconstruction and Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Multi-Level Logit Distillation.

[BibT_eX]

[DOI]

Ying Jin

Jiaqi Wang

Dahua Lin

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Self-Supervised Action Representation Learning from Partial Spatio-Temporal Skeleton Sequences.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

Semantics-Aware Dynamic Localization and Refinement for Referring Image Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022

CARAFE++: Unified Content-Aware ReAssembly of FEatures.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2022

DG-STGCN: Dynamic Spatial-Temporal Modeling for Skeleton-based Action Recognition.

[BibT_eX]

[DOI]

CoRR, 2022

What Are Expected Queries in End-to-End Object Detection?

[BibT_eX]

[DOI]

CoRR, 2022

MINI: Mining Implicit Novel Instances for Few-Shot Object Detection.

[BibT_eX]

[DOI]

CoRR, 2022

Semi-Supervised Semantic Segmentation via Gentle Teaching Assistant.

[BibT_eX]

[DOI]

Ying Jin

Jiaqi Wang

Dahua Lin

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

PYSKL: Towards Good Practices for Skeleton Action Recognition.

[BibT_eX]

[DOI]

Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

LAVT: Language-Aware Vision Transformer for Referring Image Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021

Texture Memory-Augmented Deep Patch-Based Image Inpainting.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2021

Few-Shot Object Detection via Association and DIscrimination.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

MMFashion: An Open-Source Toolbox for Visual Fashion Analysis.

[BibT_eX]

[DOI]

Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Seesaw Loss for Long-Tailed Instance Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020

Side-Aware Boundary Localization for More Precise Object Detection.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

2019

MMDetection: Open MMLab Detection Toolbox and Benchmark.

[BibT_eX]

[DOI]

CoRR, 2019

CARAFE: Content-Aware ReAssembly of FEatures.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Region Proposal by Guided Anchoring.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Hybrid Task Cascade for Instance Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018

Optimizing Video Object Detection via a Scale-Time Lattice.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Jiaqi Wang

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...