Kaipeng Zhang

Orcid: 0000-0001-6105-6532

According to our database¹, Kaipeng Zhang authored at least 72 papers between 2016 and 2024.

Collaborative distances:

Dijkstra number² of three.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2024

Open-Vocabulary Animal Keypoint Detection with Semantic-Feature Matching.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., December, 2024

HF-HRNet: A Simple Hardware Friendly High-Resolution Network.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., August, 2024

Semantic Image Segmentation by Dynamic Discriminative Prototypes.

[BibT_eX]

[DOI]

Kaipeng Zhang

Yoichi Sato

IEEE Trans. Multim., 2024

FMGNet: An efficient feature-multiplex group network for real-time vision task.

[BibT_eX]

[DOI]

Pattern Recognit., 2024

ZipAR: Accelerating Auto-regressive Image Generation through Spatial Locality.

[BibT_eX]

[DOI]

CoRR, 2024

GATE OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation.

[BibT_eX]

[DOI]

CoRR, 2024

TP-Eval: Tap Multimodal LLMs' Potential in Evaluation by Customizing Prompts.

[BibT_eX]

[DOI]

CoRR, 2024

Dynamic Multimodal Evaluation with Flexible Complexity by Vision-Language Bootstrapping.

[BibT_eX]

[DOI]

CoRR, 2024

ZipVL: Efficient Large Vision-Language Models with Dynamic Token Sparsification and KV Cache Compression.

[BibT_eX]

[DOI]

CoRR, 2024

Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation.

[BibT_eX]

[DOI]

CoRR, 2024

HRVMamba: High-Resolution Visual State Space Model for Dense Prediction.

[BibT_eX]

[DOI]

CoRR, 2024

Prioritize Alignment in Dataset Distillation.

[BibT_eX]

[DOI]

Konstantinos N. Plataniotis

Kai Wang

Yang You

CoRR, 2024

MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model.

[BibT_eX]

[DOI]

CoRR, 2024

EfficientQAT: Efficient Quantization-Aware Training for Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

Lumina-Next: Making Lumina-T2X Stronger and Faster with Next-DiT.

[BibT_eX]

[DOI]

CoRR, 2024

PhyBench: A Physical Commonsense Benchmark for Evaluating Text-to-Image Models.

[BibT_eX]

[DOI]

CoRR, 2024

Rethinking Human Evaluation Protocol for Text-to-Video Models: Enhancing Reliability, Reproducibility, and Practicality.

[BibT_eX]

[DOI]

CoRR, 2024

GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices.

[BibT_eX]

[DOI]

CoRR, 2024

Needle In A Multimodal Haystack.

[BibT_eX]

[DOI]

CoRR, 2024

UDKAG: Augmenting Large Vision-Language Models with Up-to-Date Knowledge.

[BibT_eX]

[DOI]

CoRR, 2024

Adapting LLaMA Decoder to Vision Transformer.

[BibT_eX]

[DOI]

CoRR, 2024

ConvBench: A Multi-Turn Conversation Evaluation Benchmark with Hierarchical Capability for Large Vision-Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

AVIBench: Towards Evaluating the Robustness of Large Vision-Language Model on Adversarial Visual-Instructions.

[BibT_eX]

[DOI]

CoRR, 2024

Towards Implicit Prompt For Text-To-Image Models.

[BibT_eX]

[DOI]

CoRR, 2024

RoboScript: Code Generation for Free-Form Manipulation Tasks across Real and Simulation.

[BibT_eX]

[DOI]

CoRR, 2024

SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

Two Trades is not Baffled: Condensing Graph via Crafting Rational Gradient Matching.

[BibT_eX]

[DOI]

CoRR, 2024

ChartAssisstant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training and Multitask Instruction Tuning.

[BibT_eX]

[DOI]

CoRR, 2024

T3M: Text Guided 3D Human Motion Synthesis from Speech.

[BibT_eX]

[DOI]

Wenshuo Peng

Kaipeng Zhang

Sai Qian Zhang

Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2024, 2024

MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Position: Towards Implicit Prompt For Text-To-Image Models.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

BESA: Pruning Large Language Models with Blockwise Parameter-Efficient Sparsity Allocation.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Towards Lossless Dataset Distillation via Difficulty-Aligned Trajectory Matching.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Align, Adapt and Inject: Audio-Guided Image Generation, Editing and Stylization.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

OneLLM: One Framework to Align All Modalities with Language.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

ChartAssistant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training and Multitask Instruction Tuning.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2024

Data Adaptive Traceback for Vision-Language Foundation Models in Image Classification.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

TagCLIP: A Local-to-Global Framework to Enhance Open-Vocabulary Multi-Label Classification of CLIP without Training.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

Toward High-quality Face-Mask Occluded Restoration.

[BibT_eX]

[DOI]

ACM Trans. Multim. Comput. Commun. Appl., January, 2023

MLLMs-Augmented Visual-Language Representation Learning.

[BibT_eX]

[DOI]

CoRR, 2023

DREAM+: Efficient Dataset Distillation by Bidirectional Representative Matching.

[BibT_eX]

[DOI]

CoRR, 2023

Towards Unified and Effective Domain Generalization.

[BibT_eX]

[DOI]

CoRR, 2023

Language-driven Open-Vocabulary Keypoint Detection for Animal Body and Face.

[BibT_eX]

[DOI]

CoRR, 2023

ImageBind-LLM: Multi-modality Instruction Tuning.

[BibT_eX]

[DOI]

CoRR, 2023

Tiny LVLM-eHub: Early Multimodal Experiments with Bard.

[BibT_eX]

[DOI]

CoRR, 2023

Meta-Transformer: A Unified Framework for Multimodal Learning.

[BibT_eX]

[DOI]

CoRR, 2023

Align, Adapt and Inject: Sound-guided Unified Image Generation.

[BibT_eX]

[DOI]

CoRR, 2023

LVLM-eHub: A Comprehensive Evaluation Benchmark for Large Vision-Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

Foundation Model is Efficient Multimodal Multitask Model Selector.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

RaMLP: Vision MLP via Region-aware Mixing.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023

DiffRate : Differentiable Compression Rate for Efficient Vision Transformers.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

2021

Neural Routing by Memory.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

2020

A Dual-Thread Method for Time-Optimal Trajectory Planning in Joint Space Based on Improved NGA.

[BibT_eX]

[DOI]

Kaipeng Zhang

Ning Liu

Gao Wang

J. Robotics, 2020

FarSee-Net: Real-Time Semantic Segmentation by Efficient Multi-scale Context Aggregation and Feature Space Super-resolution.

[BibT_eX]

[DOI]

Zhanpeng Zhang

Kaipeng Zhang

Proceedings of the 2020 IEEE International Conference on Robotics and Automation, 2020

2019

A Comprehensive Study on Center Loss for Deep Face Recognition.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., 2019

Bootstrap Model Ensemble and Rank Loss for Engagement Intensity Regression.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Multimodal Interaction, 2019

Exploring Regularizations with Face, Body and Image Cues for Group Cohesion Prediction.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Multimodal Interaction, 2019

2018

Cascade Attention Networks For Group Emotion Recognition with Face, Body and Image Cues.

[BibT_eX]

[DOI]

Proceedings of the 2018 on International Conference on Multimodal Interaction, 2018

Super-Identity Convolutional Neural Network for Face Hallucination.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2018, 2018

Deep Disguised Faces Recognition.

[BibT_eX]

[DOI]

Kaipeng Zhang

Ya-Liang Chang

Winston H. Hsu

Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2018

Attribute Augmented Convolutional Neural Network for Face Hallucination.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2018

PIVTONS: Pose Invariant Virtual Try-On Shoe with Conditional Image Completion.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ACCV 2018, 2018

2017

Group emotion recognition with individual facial emotion CNNs and global image based CNNs.

[BibT_eX]

[DOI]

Proceedings of the 19th ACM International Conference on Multimodal Interaction, 2017

Detecting Faces Using Inside Cascaded Contextual CNN.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Computer Vision, 2017

2016

Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks.

[BibT_eX]

[DOI]

IEEE Signal Process. Lett., 2016

Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Networks.

[BibT_eX]

[DOI]

CoRR, 2016

A Discriminative Feature Learning Approach for Deep Face Recognition.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2016, 2016

Gender and Smile Classification Using Deep Convolutional Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2016

Kaipeng Zhang

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...