Peng Gao
Orcid: 0009-0005-7881-712XAffiliations:
- Shanghai Artificial Intelligence Laboratory, China
- Chinese University of Hong Kong, Hong Kong (PhD 2021)
According to our database1,
Peng Gao
authored at least 163 papers
between 2014 and 2024.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
On csauthors.net:
Bibliography
2024
IEEE Trans. Pattern Anal. Mach. Intell., September, 2024
Int. J. Comput. Vis., May, 2024
Int. J. Comput. Vis., February, 2024
I-Max: Maximize the Resolution Potential of Pre-trained Rectified Flow Transformers with Projected Flow.
CoRR, 2024
UniAff: A Unified Representation of Affordances for Tool Usage and Articulation with Vision-Language Models.
CoRR, 2024
SKT: Integrating State-Aware Keypoint Trajectories with Vision-Language Models for Robotic Garment Manipulation.
CoRR, 2024
PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions.
CoRR, 2024
CoRR, 2024
Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining.
CoRR, 2024
CoRR, 2024
Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers.
CoRR, 2024
Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want.
CoRR, 2024
CoRR, 2024
ManipVQA: Injecting Robotic Affordance and Physically Grounded Information into Multi-Modal Large Language Models.
CoRR, 2024
Searching a Lightweight Network Architecture for Thermal Infrared Pedestrian Tracking.
CoRR, 2024
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models.
CoRR, 2024
Uni3D-LLM: Unifying Point Cloud Perception, Generation and Editing with Large Language Models.
CoRR, 2024
ChartAssisstant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training and Multitask Instruction Tuning.
CoRR, 2024
Bridging Zero-shot Object Navigation and Foundation Models through Pixel-Guided Navigation Skill.
Proceedings of the IEEE International Conference on Robotics and Automation, 2024
MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI.
Proceedings of the Forty-first International Conference on Machine Learning, 2024
Proceedings of the Forty-first International Conference on Machine Learning, 2024
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models.
Proceedings of the Forty-first International Conference on Machine Learning, 2024
Proceedings of the Forty-first International Conference on Machine Learning, 2024
Proceedings of the Forty-first International Conference on Machine Learning, 2024
LLaMA-Adapter: Efficient Fine-tuning of Large Language Models with Zero-initialized Attention.
Proceedings of the Twelfth International Conference on Learning Representations, 2024
Proceedings of the Twelfth International Conference on Learning Representations, 2024
BESA: Pruning Large Language Models with Blockwise Parameter-Efficient Sparsity Allocation.
Proceedings of the Twelfth International Conference on Learning Representations, 2024
Proceedings of the Twelfth International Conference on Learning Representations, 2024
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024
Proceedings of the Computer Vision - ECCV 2024, 2024
SpatialFormer: Towards Generalizable Vision Transformers with Explicit Spatial Understanding.
Proceedings of the Computer Vision - ECCV 2024, 2024
Proceedings of the Computer Vision - ECCV 2024, 2024
SPHINX: A Mixer of Weights, Visual Embeddings and Image Scales for Multi-modal Large Language Models.
Proceedings of the Computer Vision - ECCV 2024, 2024
No Time to Train: Empowering Non-Parametric Networks for Few-Shot 3D Scene Segmentation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
ChartAssistant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training and Multitask Instruction Tuning.
Proceedings of the Findings of the Association for Computational Linguistics, 2024
Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024
2023
IEEE Trans. Pattern Anal. Mach. Intell., October, 2023
Improving drug-target affinity prediction via feature fusion and knowledge distillation.
Briefings Bioinform., May, 2023
P2FEViT: Plug-and-Play CNN Feature Embedded Hybrid Vision Transformer for Remote Sensing Image Classification.
Remote. Sens., April, 2023
Object-Centric Masked Image Modeling-Based Self-Supervised Pretraining for Remote Sensing Object Detection.
IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens., 2023
LiDAR-LLM: Exploring the Potential of Large Language Models for 3D LiDAR Understanding.
CoRR, 2023
ChatIllusion: Efficient-Aligning Interleaved Generation ability with Visual Instruction Model.
CoRR, 2023
SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models.
CoRR, 2023
CoRR, 2023
Point-Bind & Point-LLM: Aligning Point Cloud with Multi-modality for 3D Understanding, Generation, and Instruction Following.
CoRR, 2023
Less is More: Towards Efficient Few-shot 3D Semantic Segmentation via Training-free Networks.
CoRR, 2023
CoRR, 2023
Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation.
CoRR, 2023
Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model.
CoRR, 2023
CoRR, 2023
Parameter is Not All You Need: Starting from Non-Parametric Networks for 3D Point Cloud Analysis.
CoRR, 2023
CoRR, 2023
Proceedings of the 31st ACM International Conference on Multimedia, 2023
Proceedings of the IEEE International Geoscience and Remote Sensing Symposium, 2023
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
Learning 3D Representations from 2D Pre-Trained Models via Image-to-Point Masked Autoencoders.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
Prompt, Generate, Then Cache: Cascade of Foundation Models Makes Strong Few-Shot Learners.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023
2022
Consecutive Pre-Training: A Knowledge Transfer Learning Strategy with Relevant Unlabeled Data for Remote Sensing Domain.
Remote. Sens., 2022
Hierarchical Disentangling Network for Building Extraction from Very High Resolution Optical Remote Sensing Imagery.
Remote. Sens., 2022
Consecutive Pretraining: A Knowledge Transfer Learning Strategy with Relevant Unlabeled Data for Remote Sensing Domain.
CoRR, 2022
PASH at TREC 2021 Deep Learning Track: Generative Enhanced Model for Multi-stage Ranking.
CoRR, 2022
CandidateDrug4Cancer: An Open Molecular Graph Learning Benchmark on Drug Discovery for Cancer.
CoRR, 2022
Distillation with Contrast is All You Need for Self-Supervised Point Cloud Representation Learning.
CoRR, 2022
CoRR, 2022
RestoreDet: Degradation Equivariant Representation for Object Detection in Low Resolution Images.
CoRR, 2022
Proceedings of the Semantic Web - ISWC 2022, 2022
SFE-AI at SemEval-2022 Task 11: Low-Resource Named Entity Recognition using Large Pre-trained Language Models.
Proceedings of the 16th International Workshop on Semantic Evaluation, SemEval@NAACL 2022, 2022
Point-M2AE: Multi-scale Masked Autoencoders for Hierarchical Point Cloud Pre-training.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022
Adaptive Local Context Embedding for Small Vehicle Detection from Aerial Optical Remote Sensing Images.
Proceedings of the IEEE International Geoscience and Remote Sensing Symposium, 2022
UniFormer: Unified Transformer for Efficient Spatial-Temporal Representation Learning.
Proceedings of the Tenth International Conference on Learning Representations, 2022
Audio-Visual Scene-Aware Dialog and Reasoning Using Audio-Visual Transformers with Joint Student-Teacher Learning.
Proceedings of the IEEE International Conference on Acoustics, 2022
Proceedings of the Computer Vision - ECCV 2022, 2022
Proceedings of the Computer Vision - ECCV 2022, 2022
Proceedings of the Computer Vision, 2022
Proceedings of the Computer Vision - ECCV 2022, 2022
Proceedings of the Computer Vision - ECCV 2022, 2022
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022
Unleashing the Potential of Vision-Language Models for Long-Tailed Visual Recognition.
Proceedings of the 33rd British Machine Vision Conference 2022, 2022
You Only Need 90K Parameters to Adapt Light: a Light Weight Transformer for Image Enhancement and Exposure Correction.
Proceedings of the 33rd British Machine Vision Conference 2022, 2022
2021
Automated vertebral landmarks and spinal curvature estimation using non-directional part affinity fields.
Neurocomputing, 2021
Multi-View Partial (MVP) Point Cloud Challenge 2021 on Completion and Registration: Methods and Results.
CoRR, 2021
Superpixel-Based Building Damage Detection from Post-earthquake Very High Resolution Imagery Using Deep Neural Networks.
CoRR, 2021
CoRR, 2021
Pairwise Half-graph Discrimination: A Simple Graph-level Self-supervised Strategy for Pre-training Graph Neural Networks.
CoRR, 2021
Winner Team Mia at TextVQA Challenge 2021: Vision-and-Language Representation Learning with Pre-trained Sequence-to-Sequence Model.
CoRR, 2021
An effective self-supervised framework for learning expressive molecular global representations to drug discovery.
Briefings Bioinform., 2021
PASH at TREC 2021 Deep Learning Track: Generative Enhanced Model for Multi-stageRankingtrack: DL.
Proceedings of the Thirtieth Text REtrieval Conference, 2021
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021
Pairwise Half-graph Discrimination: A Simple Graph-level Self-supervised Strategy for Pre-training Graph Neural Networks.
Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, 2021
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021
Proceedings of the 32nd British Machine Vision Conference 2021, 2021
Dynamic Graph Representation Learning for Video Dialog via Multi-Modal Shuffled Transformers.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021
2020
Learn molecular representations from large-scale unlabeled molecules for drug discovery.
CoRR, 2020
CoRR, 2020
CoRR, 2020
Proceedings of the Twenty-Ninth Text REtrieval Conference, 2020
Proceedings of the Twenty-Ninth Text REtrieval Conference, 2020
Unsupervised Domain Adaptation for Cross-Device OCT Lesion Detection via Learning Adaptive Features.
Proceedings of the 17th IEEE International Symposium on Biomedical Imaging, 2020
Proceedings of the 25th International Conference on Pattern Recognition, 2020
Multi-Layer Content Interaction Through Quaternion Product for Visual Question Answering.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020
Proceedings of the Computer Vision - ECCV 2020, 2020
Proceedings of the 31st British Machine Vision Conference 2020, 2020
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020
2019
Structure-Aware Noise Reduction Generative Adversarial Network for Optical Coherence Tomography Image.
Proceedings of the Ophthalmic Medical Image Analysis - 6th International Workshop, 2019
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019
Dynamic Fusion With Intra- and Inter-Modality Attention Flow for Visual Question Answering.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019
2018
Proceedings of the Computer Vision - ECCV 2018, 2018
2017
Proceedings of the 2017 IEEE International Conference on Web Services, 2017
2016
Proceedings of the 2016 IEEE International Conference on Service Operations and Logistics, 2016
Moving object map analytics: A framework enabling contextual spatial-temporal analytics of Internet of Things applications.
Proceedings of the 2016 IEEE International Conference on Service Operations and Logistics, 2016
2014
Proceedings of the IEEE Third International Conference on Mobile Services, Anchorage, AK, USA, June 27, 2014
Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, 2014