Kaipeng Zhang

Orcid: 0000-0001-6105-6532

According to our database1, Kaipeng Zhang authored at least 70 papers between 2016 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Open-Vocabulary Animal Keypoint Detection with Semantic-Feature Matching.
Int. J. Comput. Vis., December, 2024

HF-HRNet: A Simple Hardware Friendly High-Resolution Network.
IEEE Trans. Circuits Syst. Video Technol., August, 2024

Semantic Image Segmentation by Dynamic Discriminative Prototypes.
IEEE Trans. Multim., 2024

FMGNet: An efficient feature-multiplex group network for real-time vision task.
Pattern Recognit., 2024

TP-Eval: Tap Multimodal LLMs' Potential in Evaluation by Customizing Prompts.
CoRR, 2024

Dynamic Multimodal Evaluation with Flexible Complexity by Vision-Language Bootstrapping.
CoRR, 2024

ZipVL: Efficient Large Vision-Language Models with Dynamic Token Sparsification and KV Cache Compression.
CoRR, 2024

Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation.
CoRR, 2024

HRVMamba: High-Resolution Visual State Space Model for Dense Prediction.
CoRR, 2024

Prioritize Alignment in Dataset Distillation.
CoRR, 2024

MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models.
CoRR, 2024

Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model.
CoRR, 2024

EfficientQAT: Efficient Quantization-Aware Training for Large Language Models.
CoRR, 2024

Lumina-Next: Making Lumina-T2X Stronger and Faster with Next-DiT.
CoRR, 2024

PhyBench: A Physical Commonsense Benchmark for Evaluating Text-to-Image Models.
CoRR, 2024

Rethinking Human Evaluation Protocol for Text-to-Video Models: Enhancing Reliability, Reproducibility, and Practicality.
CoRR, 2024

GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices.
CoRR, 2024

Needle In A Multimodal Haystack.
CoRR, 2024

UDKAG: Augmenting Large Vision-Language Models with Up-to-Date Knowledge.
CoRR, 2024

Adapting LLaMA Decoder to Vision Transformer.
CoRR, 2024

ConvBench: A Multi-Turn Conversation Evaluation Benchmark with Hierarchical Capability for Large Vision-Language Models.
CoRR, 2024

AVIBench: Towards Evaluating the Robustness of Large Vision-Language Model on Adversarial Visual-Instructions.
CoRR, 2024

Towards Implicit Prompt For Text-To-Image Models.
CoRR, 2024

RoboScript: Code Generation for Free-Form Manipulation Tasks across Real and Simulation.
CoRR, 2024

SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models.
CoRR, 2024

Two Trades is not Baffled: Condensing Graph via Crafting Rational Gradient Matching.
CoRR, 2024

ChartAssisstant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training and Multitask Instruction Tuning.
CoRR, 2024

T3M: Text Guided 3D Human Motion Synthesis from Speech.
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2024, 2024

MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Position: Towards Implicit Prompt For Text-To-Image Models.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

BESA: Pruning Large Language Models with Blockwise Parameter-Efficient Sparsity Allocation.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Towards Lossless Dataset Distillation via Difficulty-Aligned Trajectory Matching.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Align, Adapt and Inject: Audio-Guided Image Generation, Editing and Stylization.
Proceedings of the IEEE International Conference on Acoustics, 2024

DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

OneLLM: One Framework to Align All Modalities with Language.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

ChartAssistant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training and Multitask Instruction Tuning.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

Data Adaptive Traceback for Vision-Language Foundation Models in Image Classification.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

TagCLIP: A Local-to-Global Framework to Enhance Open-Vocabulary Multi-Label Classification of CLIP without Training.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Toward High-quality Face-Mask Occluded Restoration.
ACM Trans. Multim. Comput. Commun. Appl., January, 2023

MLLMs-Augmented Visual-Language Representation Learning.
CoRR, 2023

DREAM+: Efficient Dataset Distillation by Bidirectional Representative Matching.
CoRR, 2023

Towards Unified and Effective Domain Generalization.
CoRR, 2023

Language-driven Open-Vocabulary Keypoint Detection for Animal Body and Face.
CoRR, 2023

ImageBind-LLM: Multi-modality Instruction Tuning.
CoRR, 2023

Tiny LVLM-eHub: Early Multimodal Experiments with Bard.
CoRR, 2023

Meta-Transformer: A Unified Framework for Multimodal Learning.
CoRR, 2023

Align, Adapt and Inject: Sound-guided Unified Image Generation.
CoRR, 2023

LVLM-eHub: A Comprehensive Evaluation Benchmark for Large Vision-Language Models.
CoRR, 2023

Foundation Model is Efficient Multimodal Multitask Model Selector.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

RaMLP: Vision MLP via Region-aware Mixing.
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023

DiffRate : Differentiable Compression Rate for Efficient Vision Transformers.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

2021
Neural Routing by Memory.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

2020
A Dual-Thread Method for Time-Optimal Trajectory Planning in Joint Space Based on Improved NGA.
J. Robotics, 2020

FarSee-Net: Real-Time Semantic Segmentation by Efficient Multi-scale Context Aggregation and Feature Space Super-resolution.
Proceedings of the 2020 IEEE International Conference on Robotics and Automation, 2020

2019
A Comprehensive Study on Center Loss for Deep Face Recognition.
Int. J. Comput. Vis., 2019

Bootstrap Model Ensemble and Rank Loss for Engagement Intensity Regression.
Proceedings of the International Conference on Multimodal Interaction, 2019

Exploring Regularizations with Face, Body and Image Cues for Group Cohesion Prediction.
Proceedings of the International Conference on Multimodal Interaction, 2019

2018
Cascade Attention Networks For Group Emotion Recognition with Face, Body and Image Cues.
Proceedings of the 2018 on International Conference on Multimodal Interaction, 2018

Super-Identity Convolutional Neural Network for Face Hallucination.
Proceedings of the Computer Vision - ECCV 2018, 2018

Deep Disguised Faces Recognition.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2018

Attribute Augmented Convolutional Neural Network for Face Hallucination.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2018

PIVTONS: Pose Invariant Virtual Try-On Shoe with Conditional Image Completion.
Proceedings of the Computer Vision - ACCV 2018, 2018

2017
Group emotion recognition with individual facial emotion CNNs and global image based CNNs.
Proceedings of the 19th ACM International Conference on Multimodal Interaction, 2017

Detecting Faces Using Inside Cascaded Contextual CNN.
Proceedings of the IEEE International Conference on Computer Vision, 2017

2016
Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks.
IEEE Signal Process. Lett., 2016

Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Networks.
CoRR, 2016

A Discriminative Feature Learning Approach for Deep Face Recognition.
Proceedings of the Computer Vision - ECCV 2016, 2016

Gender and Smile Classification Using Deep Convolutional Neural Networks.
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2016


  Loading...