Kaipeng Zhang
Orcid: 0000-0001-6105-6532
According to our database1,
Kaipeng Zhang
authored at least 70 papers
between 2016 and 2024.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
On csauthors.net:
Bibliography
2024
Int. J. Comput. Vis., December, 2024
IEEE Trans. Circuits Syst. Video Technol., August, 2024
IEEE Trans. Multim., 2024
Pattern Recognit., 2024
CoRR, 2024
Dynamic Multimodal Evaluation with Flexible Complexity by Vision-Language Bootstrapping.
CoRR, 2024
ZipVL: Efficient Large Vision-Language Models with Dynamic Token Sparsification and KV Cache Compression.
CoRR, 2024
Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation.
CoRR, 2024
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models.
CoRR, 2024
CoRR, 2024
CoRR, 2024
Rethinking Human Evaluation Protocol for Text-to-Video Models: Enhancing Reliability, Reproducibility, and Practicality.
CoRR, 2024
CoRR, 2024
ConvBench: A Multi-Turn Conversation Evaluation Benchmark with Hierarchical Capability for Large Vision-Language Models.
CoRR, 2024
AVIBench: Towards Evaluating the Robustness of Large Vision-Language Model on Adversarial Visual-Instructions.
CoRR, 2024
RoboScript: Code Generation for Free-Form Manipulation Tasks across Real and Simulation.
CoRR, 2024
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models.
CoRR, 2024
CoRR, 2024
ChartAssisstant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training and Multitask Instruction Tuning.
CoRR, 2024
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2024, 2024
MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI.
Proceedings of the Forty-first International Conference on Machine Learning, 2024
Proceedings of the Forty-first International Conference on Machine Learning, 2024
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models.
Proceedings of the Forty-first International Conference on Machine Learning, 2024
BESA: Pruning Large Language Models with Blockwise Parameter-Efficient Sparsity Allocation.
Proceedings of the Twelfth International Conference on Learning Representations, 2024
Proceedings of the Twelfth International Conference on Learning Representations, 2024
Proceedings of the Twelfth International Conference on Learning Representations, 2024
Proceedings of the IEEE International Conference on Acoustics, 2024
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
ChartAssistant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training and Multitask Instruction Tuning.
Proceedings of the Findings of the Association for Computational Linguistics, 2024
Data Adaptive Traceback for Vision-Language Foundation Models in Image Classification.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024
TagCLIP: A Local-to-Global Framework to Enhance Open-Vocabulary Multi-Label Classification of CLIP without Training.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024
2023
ACM Trans. Multim. Comput. Commun. Appl., January, 2023
CoRR, 2023
CoRR, 2023
CoRR, 2023
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023
2021
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021
2020
A Dual-Thread Method for Time-Optimal Trajectory Planning in Joint Space Based on Improved NGA.
J. Robotics, 2020
FarSee-Net: Real-Time Semantic Segmentation by Efficient Multi-scale Context Aggregation and Feature Space Super-resolution.
Proceedings of the 2020 IEEE International Conference on Robotics and Automation, 2020
2019
Int. J. Comput. Vis., 2019
Proceedings of the International Conference on Multimodal Interaction, 2019
Exploring Regularizations with Face, Body and Image Cues for Group Cohesion Prediction.
Proceedings of the International Conference on Multimodal Interaction, 2019
2018
Cascade Attention Networks For Group Emotion Recognition with Face, Body and Image Cues.
Proceedings of the 2018 on International Conference on Multimodal Interaction, 2018
Proceedings of the Computer Vision - ECCV 2018, 2018
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2018
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2018
Proceedings of the Computer Vision - ACCV 2018, 2018
2017
Group emotion recognition with individual facial emotion CNNs and global image based CNNs.
Proceedings of the 19th ACM International Conference on Multimodal Interaction, 2017
Proceedings of the IEEE International Conference on Computer Vision, 2017
2016
IEEE Signal Process. Lett., 2016
CoRR, 2016
Proceedings of the Computer Vision - ECCV 2016, 2016
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2016