Kai Chen

Orcid: 0000-0002-6820-2325

Affiliations:
  • SenseTime Research, Hong Kong
  • Shanghai AI Laboratory, Guangzhou, China
  • Chinese University of Hong Kong, SenseTime Joint Lab, Hong Kong (PhD 2019)


According to our database1, Kai Chen authored at least 86 papers between 2017 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
InternLM-Law: An Open Source Chinese Legal Large Language Model.
CoRR, 2024

Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs.
CoRR, 2024

MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding.
CoRR, 2024

MIPI 2024 Challenge on Few-shot RAW Image Denoising: Methods and Results.
CoRR, 2024

ANAH: Analytical Annotation of Hallucinations in Large Language Models.
CoRR, 2024

AlchemistCoder: Harmonizing and Eliciting Code Capability by Hindsight Tuning on Multi-source Data.
CoRR, 2024

An Empirical Study of Training State-of-the-Art LiDAR Segmentation Models.
CoRR, 2024

MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark.
CoRR, 2024

The RoboDrive Challenge: Drive Anytime Anywhere in Any Condition.
CoRR, 2024

Multi-Modal Data-Efficient 3D Scene Understanding for Autonomous Driving.
CoRR, 2024

Adapting LLaMA Decoder to Vision Transformer.
CoRR, 2024

InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD.
CoRR, 2024

Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks.
CoRR, 2024

From Pixels to Graphs: Open-Vocabulary Scene Graph Generation with Vision-Language Models.
CoRR, 2024

InternLM2 Technical Report.
CoRR, 2024

Calib3D: Calibrating Model Preferences for Reliable 3D Scene Understanding.
CoRR, 2024

Make-It-Vivid: Dressing Your Animatable Biped Cartoon Characters from Text.
CoRR, 2024

Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models.
CoRR, 2024

CriticBench: Evaluating Large Language Models as Critic.
CoRR, 2024

InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning.
CoRR, 2024

InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model.
CoRR, 2024

Can AI Assistants Know What They Don't Know?
CoRR, 2024

OMG-Seg: Is One Model Good Enough For All Segmentation?
CoRR, 2024

Towards Language-Driven Video Inpainting via Multimodal Large Language Models.
CoRR, 2024

HuixiangDou: Overcoming Group Chat Scenarios with LLM-based Technical Assistance.
CoRR, 2024

2023
EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI.
CoRR, 2023

T-Eval: Evaluating the Tool Utilization Capability Step by Step.
CoRR, 2023

Mixed Pseudo Labels for Semi-Supervised Object Detection.
CoRR, 2023

BotChat: Evaluating LLMs' Capabilities of Having Multi-Turn Dialogues.
CoRR, 2023

Evaluating Hallucinations in Chinese Large Language Models.
CoRR, 2023

DST-Det: Simple Dynamic Self-Training for Open-Vocabulary Object Detection.
CoRR, 2023

LawBench: Benchmarking Legal Knowledge of Large Language Models.
CoRR, 2023

InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition.
CoRR, 2023

Object2Scene: Putting Objects in Context for Open-Vocabulary 3D Detection.
CoRR, 2023

Learning Referring Video Object Segmentation from Weak Annotation.
CoRR, 2023

MMBench: Is Your Multi-modal Model an All-around Player?
CoRR, 2023

GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest.
CoRR, 2023

MultiModal-GPT: A Vision and Language Model for Dialogue with Humans.
CoRR, 2023

Transformer-Based Visual Segmentation: A Survey.
CoRR, 2023

RoboBEV: Towards Robust Bird's Eye View Perception under Corruptions.
CoRR, 2023

RIFormer: Keep Your Vision Backbone Effective While Removing Token Mixer.
CoRR, 2023

Robo3D: Towards Robust and Reliable 3D Perception against Corruptions.
CoRR, 2023

PixMIM: Rethinking Pixel Reconstruction in Masked Image Modeling.
CoRR, 2023

Segment Any Point Cloud Sequences by Distilling Vision Foundation Models.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

TG-VQA: Ternary Game of Video Question Answering.
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023

Improving Pixel-based MIM by Reducing Wasted Modeling Capability.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Robo3D: Towards Robust and Reliable 3D Perception against Corruptions.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Dense Distinct Query for End-to-End Object Detection.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

RIFormer: Keep Your Vision Backbone Effective But Removing Token Mixer.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Semantics-Aware Dynamic Localization and Refinement for Referring Image Segmentation.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
CARAFE++: Unified Content-Aware ReAssembly of FEatures.
IEEE Trans. Pattern Anal. Mach. Intell., 2022

RTMDet: An Empirical Study of Designing Real-Time Object Detectors.
CoRR, 2022

DG-STGCN: Dynamic Spatial-Temporal Modeling for Skeleton-based Action Recognition.
CoRR, 2022

What Are Expected Queries in End-to-End Object Detection?
CoRR, 2022

Dense Siamese Network.
CoRR, 2022

MMRotate: A Rotated Object Detection Benchmark using PyTorch.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

PYSKL: Towards Good Practices for Skeleton Action Recognition.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Dense Siamese Network for Dense Unsupervised Learning.
Proceedings of the Computer Vision - ECCV 2022, 2022

Mitigating Representation Bias in Action Recognition: Algorithms and Benchmarks.
Proceedings of the Computer Vision - ECCV 2022 Workshops, 2022

Group R-CNN for Weakly Semi-supervised Object Detection with Points.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

OCSampler: Compressing Videos to One Clip with Single-step Sampling.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Video K-Net: A Simple, Strong, and Unified Baseline for Video Segmentation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

TransRank: Self-supervised Video Representation Learning via Ranking-based Transformation Recognition.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Revisiting Skeleton-based Action Recognition.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

LAVT: Language-Aware Vision Transformer for Referring Image Segmentation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021
Towards Balanced Learning for Instance Recognition.
Int. J. Comput. Vis., 2021

STransGAN: An Empirical Study on Transformer in GANs.
CoRR, 2021

WSSOD: A New Pipeline for Weakly- and Semi-Supervised Object Detection.
CoRR, 2021

Revisiting Skeleton-based Action Recognition.
CoRR, 2021

K-Net: Towards Unified Image Segmentation.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Few-Shot Object Detection via Association and DIscrimination.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

MMOCR: A Comprehensive Toolbox for Text Detection, Recognition and Understanding.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Positional Encoding As Spatial Inductive Bias in GANs.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Seesaw Loss for Long-Tailed Instance Segmentation.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Temporal ROI Align for Video Object Recognition.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020
Feature Pyramid Grids.
CoRR, 2020

Side-Aware Boundary Localization for More Precise Object Detection.
Proceedings of the Computer Vision - ECCV 2020, 2020

Prime Sample Attention in Object Detection.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019
MMDetection: Open MMLab Detection Toolbox and Benchmark.
CoRR, 2019

CARAFE: Content-Aware ReAssembly of FEatures.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Region Proposal by Guided Anchoring.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Libra R-CNN: Towards Balanced Learning for Object Detection.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Hybrid Task Cascade for Instance Segmentation.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018
Optimizing Video Object Detection via a Scale-Time Lattice.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

2017
Video Object Segmentation with Re-identification.
CoRR, 2017

Discover and Learn New Objects from Documentaries.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017


  Loading...