Zhengkai Jiang

Orcid: 0000-0003-4064-994X

Affiliations:

Tencent, Youtu Lab, Shanghai, China
Chinese Academy of Sciences, Institute of Automation, Beijing, China (former)
University of Chinese Academy of Sciences, School of Artificial Intelligence, Beijing, China (former)

According to our database¹, Zhengkai Jiang authored at least 41 papers between 2019 and 2024.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Bibliography

2024

UniAff: A Unified Representation of Affordances for Tool Usage and Articulation with Vision-Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

SKT: Integrating State-Aware Keypoint Trajectories with Vision-Language Models for Robotic Garment Manipulation.

[BibT_eX]

[DOI]

CoRR, 2024

OSV: One Step is Enough for High-Quality Image to Video Generation.

[BibT_eX]

[DOI]

CoRR, 2024

Temporal and Interactive Modeling for Efficient Human-Human Motion Generation.

[BibT_eX]

[DOI]

CoRR, 2024

NoiseBoost: Alleviating Hallucination with Noise Perturbation for Multimodal Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

VividPose: Advancing Stable Video Diffusion for Realistic Human Image Animation.

[BibT_eX]

[DOI]

CoRR, 2024

AdapNet: Adaptive Noise-Based Network for Low-Quality Image Retrieval.

[BibT_eX]

[DOI]

CoRR, 2024

Efficient Multimodal Large Language Models: A Survey.

[BibT_eX]

[DOI]

CoRR, 2024

Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers.

[BibT_eX]

[DOI]

CoRR, 2024

ManipVQA: Injecting Robotic Affordance and Physically Grounded Information into Multi-Modal Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

PointSeg: A Training-Free Paradigm for 3D Scene Segmentation via Foundation Models.

[BibT_eX]

[DOI]

CoRR, 2024

MDT-A2G: Exploring Masked Diffusion Transformers for Co-Speech Gesture Generation.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

MambaGesture: Enhancing Co-Speech Gesture Generation with Mamba and Disentangled Multi-Modality Fusion.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

UniM-OV3D: Uni-Modality Open-Vocabulary 3D Scene Understanding with Fine-Grained Feature Representation.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024

Personalize Segment Anything Model with One Shot.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

DiffuMatting: Synthesizing Arbitrary Objects with Matting-Level Annotation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Learning Unified Reference Representation for Unsupervised Multi-class Anomaly Detection.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

SDSTrack: Self-Distillation Symmetric Adapter Learning for Multi-Modal Visual Object Tracking.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Density Matters: Improved Core-Set for Active Domain Adaptive Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

ChatIllusion: Efficient-Aligning Interleaved Generation ability with Visual Instruction Model.

[BibT_eX]

[DOI]

CoRR, 2023

Dual Path Transformer with Partition Attention.

[BibT_eX]

[DOI]

CoRR, 2023

Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model.

[BibT_eX]

[DOI]

CoRR, 2023

Personalize Segment Anything Model with One Shot.

[BibT_eX]

[DOI]

CoRR, 2023

Rethinking Mobile Block for Efficient Neural Models.

[BibT_eX]

[DOI]

CoRR, 2023

Rethinking Mobile Block for Efficient Attention-based Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

2022

Illumination Adaptive Transformer.

[BibT_eX]

[DOI]

CoRR, 2022

Prototypical Contrast Adaptation for Domain Adaptive Semantic Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

STC: Spatio-Temporal Contrastive Learning for Video Instance Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022 Workshops, 2022

You Only Need 90K Parameters to Adapt Light: a Light Weight Transformer for Image Enhancement and Exposure Correction.

[BibT_eX]

[DOI]

Proceedings of the 33rd British Machine Vision Conference 2022, 2022

DIRL: Domain-Invariant Representation Learning for Generalizable Semantic Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021

SiamRCR: Reciprocal Classification and Regression for Visual Object Tracking.

[BibT_eX]

[DOI]

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, 2021

Rethinking Counting and Localization in Crowds: A Purely Point-Based Framework.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

2020

Contrastive Visual-Linguistic Pretraining.

[BibT_eX]

[DOI]

CoRR, 2020

AutoAssign: Differentiable Label Assignment for Dense Object Detection.

[BibT_eX]

[DOI]

CoRR, 2020

Fine-Grained Dynamic Head for Object Detection.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Rethinking Learnable Tree Filter for Generic Feature Transform.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Learning Where to Focus for Efficient Video Object Detection.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

2019

Learning Motion Priors for Efficient Video Object Detection.

[BibT_eX]

[DOI]

CoRR, 2019

Class-balanced Grouping and Sampling for Point Cloud 3D Object Detection.

[BibT_eX]

[DOI]

CoRR, 2019

Dynamic Fusion With Intra- and Inter-Modality Attention Flow for Visual Question Answering.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Video Object Detection with Locally-Weighted Deformable Neighbors.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

Zhengkai Jiang

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...