Kai Chen

Orcid: 0000-0002-6820-2325

Affiliations:
  • Shanghai AI Laboratory, Guangzhou, China
  • SenseTime Research, Hong Kong
  • Chinese University of Hong Kong, MMLab, Hong Kong (PhD 2019)


According to our database1, Kai Chen authored at least 125 papers between 2017 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Transformer-Based Visual Segmentation: A Survey.
IEEE Trans. Pattern Anal. Mach. Intell., December, 2024

PixMIM: Rethinking Pixel Reconstruction in Masked Image Modeling.
Trans. Mach. Learn. Res., 2024

CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and Evolution.
CoRR, 2024

InternLM2.5-StepProver: Advancing Automated Theorem Proving via Expert Iteration on Large-Scale LEAN Problems.
CoRR, 2024

Training Language Models to Critique With Multi-agent Feedback.
CoRR, 2024

HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models.
CoRR, 2024

What are the Essential Factors in Crafting Effective Long Context Multi-Hop Instruction Datasets? Insights and Best Practices.
CoRR, 2024

MindSearch: Mimicking Human Minds Elicits Deep AI Searcher.
CoRR, 2024

HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation.
CoRR, 2024

LEAN-GitHub: Compiling GitHub LEAN repositories for a versatile LEAN prover.
CoRR, 2024

NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window?
CoRR, 2024

CIBench: Evaluating Your LLMs with a Code Interpreter Plugin.
CoRR, 2024

GTA: A Benchmark for General Tool Agents.
CoRR, 2024

Live2Diff: Live Stream Translation via Uni-directional Attention in Video Diffusion Models.
CoRR, 2024

ANAH-v2: Scaling Analytical Hallucination Annotation of Large Language Models.
CoRR, 2024

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output.
CoRR, 2024

StyleShot: A Snapshot on Any Style.
CoRR, 2024

Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language.
CoRR, 2024

MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning.
CoRR, 2024

MotionBooth: Motion-Aware Customized Text-to-Video Generation.
CoRR, 2024

InternLM-Law: An Open Source Chinese Legal Large Language Model.
CoRR, 2024

Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs.
CoRR, 2024

MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding.
CoRR, 2024

Sagiri: Low Dynamic Range Image Enhancement with Generative Diffusion Prior.
CoRR, 2024

Lean Workbook: A large-scale Lean problem set formalized from natural language math problems.
CoRR, 2024

AlchemistCoder: Harmonizing and Eliciting Code Capability by Hindsight Tuning on Multi-source Data.
CoRR, 2024

Benchmarking and Improving Bird's Eye View Perception Robustness in Autonomous Driving.
CoRR, 2024

An Empirical Study of Training State-of-the-Art LiDAR Segmentation Models.
CoRR, 2024

Efficient LLM Jailbreak via Adaptive Dense-to-sparse Constrained Optimization.
CoRR, 2024

The RoboDrive Challenge: Drive Anytime Anywhere in Any Condition.
CoRR, 2024

Multi-Modal Data-Efficient 3D Scene Understanding for Autonomous Driving.
CoRR, 2024

Adapting LLaMA Decoder to Vision Transformer.
CoRR, 2024

InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD.
CoRR, 2024

InternLM2 Technical Report.
CoRR, 2024

Calib3D: Calibrating Model Preferences for Reliable 3D Scene Understanding.
CoRR, 2024

DevBench: A Comprehensive Benchmark for Software Development.
CoRR, 2024

CriticBench: Evaluating Large Language Models as Critic.
CoRR, 2024

InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning.
CoRR, 2024

InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model.
CoRR, 2024

OMG-Seg: Is One Model Good Enough For All Segmentation?
CoRR, 2024

RAP-SAM: Towards Real-Time All-Purpose Segment Anything.
CoRR, 2024

HuixiangDou: Overcoming Group Chat Scenarios with LLM-based Technical Assistance.
CoRR, 2024

Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks.
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024

BotChat: Evaluating LLMs' Capabilities of Having Multi-Turn Dialogues.
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2024, 2024

VLMEvalKit: An Open-Source ToolKit for Evaluating Large Multi-Modality Models.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Can AI Assistants Know What They Don't Know?
Proceedings of the Forty-first International Conference on Machine Learning, 2024

ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

Scaling Behavior for Large Language Models regarding Numeral Systems: An Example using Pythia.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

LawBench: Benchmarking Legal Knowledge of Large Language Models.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

A Task Is Worth One Word: Learning with Task Prompts for High-Quality Versatile Image Inpainting.
Proceedings of the Computer Vision - ECCV 2024, 2024

ScanReason: Empowering 3D Visual Grounding with Reasoning Capabilities.
Proceedings of the Computer Vision - ECCV 2024, 2024

Open-Vocabulary SAM: Segment and Recognize Twenty-Thousand Classes Interactively.
Proceedings of the Computer Vision - ECCV 2024, 2024

4D Contrastive Superflows are Dense 3D Representation Learners.
Proceedings of the Computer Vision - ECCV 2024, 2024

AnyControl: Create Your Artwork with Versatile Control on Text-to-Image Generation.
Proceedings of the Computer Vision - ECCV 2024, 2024

MMBench: Is Your Multi-modal Model an All-Around Player?
Proceedings of the Computer Vision - ECCV 2024, 2024

Towards Language-Driven Video Inpainting via Multimodal Large Language Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Make-It-Vivid: Dressing Your Animatable Biped Cartoon Characters from Text.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

RTMO: Towards High-Performance One-Stage Real-Time Multi-Person Pose Estimation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

From Pixels to Graphs: Open-Vocabulary Scene Graph Generation with Vision-Language Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

OMG-Seg: Is One Model Good Enough for all Segmentation?
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024


MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

ANAH: Analytical Annotation of Hallucinations in Large Language Models.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

LLaST: Improved End-to-end Speech Translation System Leveraged by Large Language Models.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

T-Eval: Evaluating the Tool Utilization Capability of Large Language Models Step by Step.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023
T-Eval: Evaluating the Tool Utilization Capability Step by Step.
CoRR, 2023

PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models.
CoRR, 2023

Mixed Pseudo Labels for Semi-Supervised Object Detection.
CoRR, 2023

Evaluating Hallucinations in Chinese Large Language Models.
CoRR, 2023

DST-Det: Simple Dynamic Self-Training for Open-Vocabulary Object Detection.
CoRR, 2023

LawBench: Benchmarking Legal Knowledge of Large Language Models.
CoRR, 2023

InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition.
CoRR, 2023

Object2Scene: Putting Objects in Context for Open-Vocabulary 3D Detection.
CoRR, 2023

Learning Referring Video Object Segmentation from Weak Annotation.
CoRR, 2023

GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest.
CoRR, 2023

MultiModal-GPT: A Vision and Language Model for Dialogue with Humans.
CoRR, 2023

RoboBEV: Towards Robust Bird's Eye View Perception under Corruptions.
CoRR, 2023

RIFormer: Keep Your Vision Backbone Effective While Removing Token Mixer.
CoRR, 2023

RTMPose: Real-Time Multi-Person Pose Estimation based on MMPose.
CoRR, 2023

Segment Any Point Cloud Sequences by Distilling Vision Foundation Models.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

TG-VQA: Ternary Game of Video Question Answering.
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023

Improving Pixel-based MIM by Reducing Wasted Modeling Capability.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Robo3D: Towards Robust and Reliable 3D Perception against Corruptions.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Dense Distinct Query for End-to-End Object Detection.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

RIFormer: Keep Your Vision Backbone Effective But Removing Token Mixer.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Semantics-Aware Dynamic Localization and Refinement for Referring Image Segmentation.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
CARAFE++: Unified Content-Aware ReAssembly of FEatures.
IEEE Trans. Pattern Anal. Mach. Intell., 2022

RTMDet: An Empirical Study of Designing Real-Time Object Detectors.
CoRR, 2022

DG-STGCN: Dynamic Spatial-Temporal Modeling for Skeleton-based Action Recognition.
CoRR, 2022

What Are Expected Queries in End-to-End Object Detection?
CoRR, 2022

Dense Siamese Network.
CoRR, 2022

Deliberated Domain Bridging for Domain Adaptive Semantic Segmentation.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

MMRotate: A Rotated Object Detection Benchmark using PyTorch.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

PYSKL: Towards Good Practices for Skeleton Action Recognition.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Dense Siamese Network for Dense Unsupervised Learning.
Proceedings of the Computer Vision - ECCV 2022, 2022

Mitigating Representation Bias in Action Recognition: Algorithms and Benchmarks.
Proceedings of the Computer Vision - ECCV 2022 Workshops, 2022

Group R-CNN for Weakly Semi-supervised Object Detection with Points.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

OCSampler: Compressing Videos to One Clip with Single-step Sampling.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Video K-Net: A Simple, Strong, and Unified Baseline for Video Segmentation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

TransRank: Self-supervised Video Representation Learning via Ranking-based Transformation Recognition.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Revisiting Skeleton-based Action Recognition.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

LAVT: Language-Aware Vision Transformer for Referring Image Segmentation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021
Towards Balanced Learning for Instance Recognition.
Int. J. Comput. Vis., 2021

STransGAN: An Empirical Study on Transformer in GANs.
CoRR, 2021

WSSOD: A New Pipeline for Weakly- and Semi-Supervised Object Detection.
CoRR, 2021

Revisiting Skeleton-based Action Recognition.
CoRR, 2021

K-Net: Towards Unified Image Segmentation.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Few-Shot Object Detection via Association and DIscrimination.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

MMOCR: A Comprehensive Toolbox for Text Detection, Recognition and Understanding.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Positional Encoding As Spatial Inductive Bias in GANs.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Seesaw Loss for Long-Tailed Instance Segmentation.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Temporal ROI Align for Video Object Recognition.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020
Feature Pyramid Grids.
CoRR, 2020

Side-Aware Boundary Localization for More Precise Object Detection.
Proceedings of the Computer Vision - ECCV 2020, 2020

Prime Sample Attention in Object Detection.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019
MMDetection: Open MMLab Detection Toolbox and Benchmark.
CoRR, 2019

CARAFE: Content-Aware ReAssembly of FEatures.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Region Proposal by Guided Anchoring.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Libra R-CNN: Towards Balanced Learning for Object Detection.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Hybrid Task Cascade for Instance Segmentation.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018
Optimizing Video Object Detection via a Scale-Time Lattice.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

2017
Video Object Segmentation with Re-identification.
CoRR, 2017

Discover and Learn New Objects from Documentaries.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017


  Loading...