Peng Gao

Orcid: 0009-0005-7881-712X

Affiliations:
  • Shanghai Artificial Intelligence Laboratory, China
  • Chinese University of Hong Kong, Hong Kong (PhD 2021)


According to our database1, Peng Gao authored at least 163 papers between 2014 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
FeatAug-DETR: Enriching One-to-Many Matching for DETRs With Feature Augmentation.
IEEE Trans. Pattern Anal. Mach. Intell., September, 2024

Mimic before Reconstruct: Enhancing Masked Autoencoders with Feature Mimicking.
Int. J. Comput. Vis., May, 2024

CLIP-Adapter: Better Vision-Language Models with Feature Adapters.
Int. J. Comput. Vis., February, 2024

POS-BERT: Point cloud one-stage BERT pre-training.
Expert Syst. Appl., 2024

I-Max: Maximize the Resolution Potential of Pre-trained Rectified Flow Transformers with Projected Flow.
CoRR, 2024

UniAff: A Unified Representation of Affordances for Tool Usage and Articulation with Vision-Language Models.
CoRR, 2024

SKT: Integrating State-Aware Keypoint Trajectories with Vision-Language Models for Robotic Garment Manipulation.
CoRR, 2024

PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions.
CoRR, 2024

MMSearch: Benchmarking the Potential of Large Models as Multi-modal Search Engines.
CoRR, 2024

SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners.
CoRR, 2024

Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining.
CoRR, 2024

AMEX: Android Multi-annotation Expo Dataset for Mobile GUI Agents.
CoRR, 2024

EfficientQAT: Efficient Quantization-Aware Training for Large Language Models.
CoRR, 2024

MAVIS: Mathematical Visual Instruction Tuning.
CoRR, 2024

VEnhancer: Generative Space-Time Enhancement for Video Generation.
CoRR, 2024

Lumina-Next: Making Lumina-T2X Stronger and Faster with Next-DiT.
CoRR, 2024

A3VLM: Actionable Articulation-Aware Vision Language Model.
CoRR, 2024

Phased Consistency Model.
CoRR, 2024

TerDiT: Ternary Diffusion Models with Transformers.
CoRR, 2024

Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers.
CoRR, 2024

Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want.
CoRR, 2024

MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
CoRR, 2024

ManipVQA: Injecting Robotic Affordance and Physically Grounded Information into Multi-Modal Large Language Models.
CoRR, 2024

Searching a Lightweight Network Architecture for Thermal Infrared Pedestrian Tracking.
CoRR, 2024

SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models.
CoRR, 2024

Uni3D-LLM: Unifying Point Cloud Perception, Generation and Editing with Large Language Models.
CoRR, 2024

ChartAssisstant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training and Multitask Instruction Tuning.
CoRR, 2024

Bridging Zero-shot Object Navigation and Foundation Models through Pixel-Guided Navigation Skill.
Proceedings of the IEEE International Conference on Robotics and Automation, 2024

MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

InstructSpeech: Following Speech Editing Instructions via Large Language Models.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

FreeBind: Free Lunch in Unified Multimodal Space via Knowledge Fusion.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

LLaMA-Adapter: Efficient Fine-tuning of Large Language Models with Zero-initialized Attention.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Personalize Segment Anything Model with One Shot.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

BESA: Pruning Large Language Models with Blockwise Parameter-Efficient Sparsity Allocation.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Unleashing the Potentials of Likelihood Composition for Multi-modal Language Models.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

MATHVERSE: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
Proceedings of the Computer Vision - ECCV 2024, 2024

SpatialFormer: Towards Generalizable Vision Transformers with Explicit Spatial Understanding.
Proceedings of the Computer Vision - ECCV 2024, 2024

Any2Point: Empowering Any-Modality Large Models for Efficient 3D Understanding.
Proceedings of the Computer Vision - ECCV 2024, 2024

SPHINX: A Mixer of Weights, Visual Embeddings and Image Scales for Multi-modal Large Language Models.
Proceedings of the Computer Vision - ECCV 2024, 2024

No Time to Train: Empowering Non-Parametric Networks for Few-Shot 3D Scene Segmentation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

OneLLM: One Framework to Align All Modalities with Language.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Digital Life Project: Autonomous 3D Characters with Social Intelligence.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

ChartAssistant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training and Multitask Instruction Tuning.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
UniFormer: Unifying Convolution and Self-Attention for Visual Recognition.
IEEE Trans. Pattern Anal. Mach. Intell., October, 2023

Hybrid token transformer for deep face recognition.
Pattern Recognit., July, 2023

Improving drug-target affinity prediction via feature fusion and knowledge distillation.
Briefings Bioinform., May, 2023

P2FEViT: Plug-and-Play CNN Feature Embedded Hybrid Vision Transformer for Remote Sensing Image Classification.
Remote. Sens., April, 2023

Object-Centric Masked Image Modeling-Based Self-Supervised Pretraining for Remote Sensing Object Detection.
IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens., 2023

LiDAR-LLM: Exploring the Potential of Large Language Models for 3D LiDAR Understanding.
CoRR, 2023

A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise.
CoRR, 2023

3DAxiesPrompts: Unleashing the 3D Spatial Task Capabilities of GPT-4V.
CoRR, 2023

ChatIllusion: Efficient-Aligning Interleaved Generation ability with Visual Instruction Model.
CoRR, 2023

SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models.
CoRR, 2023

Improving Compositional Text-to-image Generation with Large Vision-Language Models.
CoRR, 2023

ImageBind-LLM: Multi-modality Instruction Tuning.
CoRR, 2023

Point-Bind & Point-LLM: Aligning Point Cloud with Multi-modality for 3D Understanding, Generation, and Instruction Following.
CoRR, 2023

Less is More: Towards Efficient Few-shot 3D Semantic Segmentation via Training-free Networks.
CoRR, 2023

Tiny LVLM-eHub: Early Multimodal Experiments with Bard.
CoRR, 2023

LVLM-eHub: A Comprehensive Evaluation Benchmark for Large Vision-Language Models.
CoRR, 2023

Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation.
CoRR, 2023

Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model.
CoRR, 2023

Personalize Segment Anything Model with One Shot.
CoRR, 2023

LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model.
CoRR, 2023

LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention.
CoRR, 2023

Parameter is Not All You Need: Starting from Non-Parametric Networks for 3D Point Cloud Analysis.
CoRR, 2023

Mimic before Reconstruct: Enhancing Masked Autoencoders with Feature Mimicking.
CoRR, 2023

SUG: Single-dataset Unified Generalization for 3D Point Cloud Classification.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Hybrid Transformer Network for Change Detection Under Self-Supervised Pretraining.
Proceedings of the IEEE International Geoscience and Remote Sensing Symposium, 2023

Not All Features Matter: Enhancing Few-shot CLIP with Adaptive Prior Refinement.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

PointCLIP V2: Prompting CLIP and GPT for Powerful 3D Open-world Learning.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

SparseMAE: Sparse Training Meets Masked Autoencoders.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

MonoDETR: Depth-guided Transformer for Monocular 3D Object Detection.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Filter Pruning Via Filters Similarity in Consecutive Layers.
Proceedings of the IEEE International Conference on Acoustics, 2023

Starting from Non-Parametric Networks for 3D Point Cloud Analysis.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Learning 3D Representations from 2D Pre-Trained Models via Image-to-Point Masked Autoencoders.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Prompt, Generate, Then Cache: Cascade of Foundation Models Makes Strong Few-Shot Learners.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Stare at What You See: Masked Image Modeling without Reconstruction.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Q-DETR: An Efficient Low-Bit Quantized Detection Transformer.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Resilient Binary Neural Network.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
Consecutive Pre-Training: A Knowledge Transfer Learning Strategy with Relevant Unlabeled Data for Remote Sensing Domain.
Remote. Sens., 2022

Hierarchical Disentangling Network for Building Extraction from Very High Resolution Optical Remote Sensing Imagery.
Remote. Sens., 2022

PointCLIP V2: Adapting CLIP for Powerful 3D Open-world Learning.
CoRR, 2022

Collaboration of Pre-trained Models Makes Better Few-shot Learner.
CoRR, 2022

Tip-Adapter: Training-free Adaption of CLIP for Few-shot Classification.
CoRR, 2022

Consecutive Pretraining: A Knowledge Transfer Learning Strategy with Relevant Unlabeled Data for Remote Sensing Domain.
CoRR, 2022

Illumination Adaptive Transformer.
CoRR, 2022

PASH at TREC 2021 Deep Learning Track: Generative Enhanced Model for Multi-stage Ranking.
CoRR, 2022

ConvMAE: Masked Convolution Meets Masked Autoencoders.
CoRR, 2022

POS-BERT: Point Cloud One-Stage BERT Pre-Training.
CoRR, 2022

MonoDETR: Depth-aware Transformer for Monocular 3D Object Detection.
CoRR, 2022

CandidateDrug4Cancer: An Open Molecular Graph Learning Benchmark on Drug Discovery for Cancer.
CoRR, 2022

Distillation with Contrast is All You Need for Self-Supervised Point Cloud Representation Learning.
CoRR, 2022

TerViT: An Efficient Ternary Vision Transformer.
CoRR, 2022

UniFormer: Unified Transformer for Efficient Spatiotemporal Representation Learning.
CoRR, 2022

RestoreDet: Degradation Equivariant Representation for Object Detection in Low Resolution Images.
CoRR, 2022

HCL: Improving Graph Representation with Hierarchical Contrastive Learning.
Proceedings of the Semantic Web - ISWC 2022, 2022

SFE-AI at SemEval-2022 Task 11: Low-Resource Named Entity Recognition using Large Pre-trained Language Models.
Proceedings of the 16th International Workshop on Semantic Evaluation, SemEval@NAACL 2022, 2022

Point-M2AE: Multi-scale Masked Autoencoders for Hierarchical Point Cloud Pre-training.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Q-ViT: Accurate and Fully Quantized Low-bit Vision Transformer.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

MCMAE: Masked Convolution Meets Masked Autoencoders.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Adaptive Local Context Embedding for Small Vehicle Detection from Aerial Optical Remote Sensing Images.
Proceedings of the IEEE International Geoscience and Remote Sensing Symposium, 2022

UniFormer: Unified Transformer for Efficient Spatial-Temporal Representation Learning.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Audio-Visual Scene-Aware Dialog and Reasoning Using Audio-Visual Transformers with Joint Student-Teacher Learning.
Proceedings of the IEEE International Conference on Acoustics, 2022

Tip-Adapter: Training-Free Adaption of CLIP for Few-Shot Classification.
Proceedings of the Computer Vision - ECCV 2022, 2022

IDa-Det: An Information Discrepancy-Aware Distillation for 1-Bit Detectors.
Proceedings of the Computer Vision - ECCV 2022, 2022

Recurrent Bilinear Optimization for Binary Neural Networks.
Proceedings of the Computer Vision, 2022

Frozen CLIP Models are Efficient Video Learners.
Proceedings of the Computer Vision - ECCV 2022, 2022

Prototypical Contrast Adaptation for Domain Adaptive Semantic Segmentation.
Proceedings of the Computer Vision - ECCV 2022, 2022

PointCLIP: Point Cloud Understanding by CLIP.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Unleashing the Potential of Vision-Language Models for Long-Tailed Visual Recognition.
Proceedings of the 33rd British Machine Vision Conference 2022, 2022

You Only Need 90K Parameters to Adapt Light: a Light Weight Transformer for Image Enhancement and Exposure Correction.
Proceedings of the 33rd British Machine Vision Conference 2022, 2022

2021
Automated vertebral landmarks and spinal curvature estimation using non-directional part affinity fields.
Neurocomputing, 2021

Multi-View Partial (MVP) Point Cloud Challenge 2021 on Completion and Registration: Methods and Results.
CoRR, 2021

Superpixel-Based Building Damage Detection from Post-earthquake Very High Resolution Imagery Using Deep Neural Networks.
CoRR, 2021

A Simple Long-Tailed Recognition Baseline via Vision-Language Model.
CoRR, 2021

Tip-Adapter: Training-free CLIP-Adapter for Better Vision-Language Modeling.
CoRR, 2021

Pairwise Half-graph Discrimination: A Simple Graph-level Self-supervised Strategy for Pre-training Graph Neural Networks.
CoRR, 2021

Winner Team Mia at TextVQA Challenge 2021: Vision-and-Language Representation Learning with Pre-trained Sequence-to-Sequence Model.
CoRR, 2021

Oriented Object Detection with Transformer.
CoRR, 2021

Scalable Transformers for Neural Machine Translation.
CoRR, 2021

Container: Context Aggregation Network.
CoRR, 2021

Dual-stream Network for Visual Recognition.
CoRR, 2021

RomeBERT: Robust Training of Multi-Exit BERT.
CoRR, 2021

An effective self-supervised framework for learning expressive molecular global representations to drug discovery.
Briefings Bioinform., 2021

PASH at TREC 2021 Deep Learning Track: Generative Enhanced Model for Multi-stageRankingtrack: DL.
Proceedings of the Thirtieth Text REtrieval Conference, 2021

Dual-stream Network for Visual Recognition.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Container: Context Aggregation Networks.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Dense Contrastive Visual-Linguistic Pretraining.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Pairwise Half-graph Discrimination: A Simple Graph-level Self-supervised Strategy for Pre-training Graph Neural Networks.
Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, 2021

Fast Convergence of DETR with Spatially Modulated Co-Attention.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

End-to-End Object Detection with Adaptive Clustering Transformer.
Proceedings of the 32nd British Machine Vision Conference 2021, 2021

Dynamic Graph Representation Learning for Video Dialog via Multi-Modal Shuffled Transformers.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020
Learn molecular representations from large-scale unlabeled molecules for drug discovery.
CoRR, 2020

End-to-End Object Detection with Adaptive Clustering Transformer.
CoRR, 2020

Multi-Pass Transformer for Machine Translation.
CoRR, 2020

Contrastive Visual-Linguistic Pretraining.
CoRR, 2020

Gradient Regularized Contrastive Learning for Continual Domain Adaptation.
CoRR, 2020

Spatio-Temporal Scene Graphs for Video Dialog.
CoRR, 2020

Character Matters: Video Story Understanding with Character-Aware Relations.
CoRR, 2020

Extreme Low-Light Imaging with Multi-granulation Cooperative Networks.
CoRR, 2020

PASH at TREC 2020 Deep Learning Track: Dense Matching for Nested Ranking.
Proceedings of the Twenty-Ninth Text REtrieval Conference, 2020

A Multiple Models Ensembling Method in TREC Deep Learning.
Proceedings of the Twenty-Ninth Text REtrieval Conference, 2020

Unsupervised Domain Adaptation for Cross-Device OCT Lesion Detection via Learning Adaptive Features.
Proceedings of the 17th IEEE International Symposium on Biomedical Imaging, 2020

Automatic Student Network Search for Knowledge Distillation.
Proceedings of the 25th International Conference on Pattern Recognition, 2020

Multi-Layer Content Interaction Through Quaternion Product for Visual Question Answering.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Pre-training Entity Relation Encoder with Intra-span and Inter-span Information.
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

Learning Where to Focus for Efficient Video Object Detection.
Proceedings of the Computer Vision - ECCV 2020, 2020

Semi-supervised Active Learning for Instance Segmentation via Scoring Predictions.
Proceedings of the 31st British Machine Vision Conference 2020, 2020

Region Focus Network for Joint Optic Disc and Cup Segmentation.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019
Structure-Aware Noise Reduction Generative Adversarial Network for Optical Coherence Tomography Image.
Proceedings of the Ophthalmic Medical Image Analysis - 6th International Workshop, 2019

Multi-Modality Latent Interaction Network for Visual Question Answering.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Dynamic Fusion With Intra- and Inter-Modality Attention Flow for Visual Question Answering.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Video Object Detection with Locally-Weighted Deformable Neighbors.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018
Question-Guided Hybrid Convolution for Visual Question Answering.
Proceedings of the Computer Vision - ECCV 2018, 2018

2017
Towards Reliable Online Services Analyzing Mobile Sensor Big Data.
Proceedings of the 2017 IEEE International Conference on Web Services, 2017

2016
Space-map-matching-based candidate selection for GPS map matching.
Proceedings of the 2016 IEEE International Conference on Service Operations and Logistics, 2016

Moving object map analytics: A framework enabling contextual spatial-temporal analytics of Internet of Things applications.
Proceedings of the 2016 IEEE International Conference on Service Operations and Logistics, 2016

2014
Scalable Mobile Data Streaming with Trajectory Preserving Partitioning.
Proceedings of the IEEE Third International Conference on Mobile Services, Anchorage, AK, USA, June 27, 2014

Maximizing Multi-scale Spatial Statistical Discrepancy.
Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, 2014


  Loading...