Peng Gao

Orcid: 0009-0005-7881-712X

Affiliations:

Shanghai Artificial Intelligence Laboratory, China
Chinese University of Hong Kong, Hong Kong (PhD 2021)

According to our database¹, Peng Gao authored at least 168 papers between 2014 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

2014

2016

2018

2020

2022

2024

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2025

LVLM-EHub: A Comprehensive Evaluation Benchmark for Large Vision-Language Models.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., March, 2025

EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation.

[BibT_eX]

[DOI]

CoRR, January, 2025

2024

FeatAug-DETR: Enriching One-to-Many Matching for DETRs With Feature Augmentation.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., September, 2024

Mimic before Reconstruct: Enhancing Masked Autoencoders with Feature Mimicking.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., May, 2024

CLIP-Adapter: Better Vision-Language Models with Feature Adapters.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., February, 2024

POS-BERT: Point cloud one-stage BERT pre-training.

[BibT_eX]

[DOI]

Expert Syst. Appl., 2024

TAR3D: Creating High-Quality 3D Assets via Next-Part Prediction.

[BibT_eX]

[DOI]

CoRR, 2024

Customize Your Visual Autoregressive Recipe with Set Autoregressive Modeling.

[BibT_eX]

[DOI]

CoRR, 2024

I-Max: Maximize the Resolution Potential of Pre-trained Rectified Flow Transformers with Projected Flow.

[BibT_eX]

[DOI]

CoRR, 2024

UniAff: A Unified Representation of Affordances for Tool Usage and Articulation with Vision-Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

SKT: Integrating State-Aware Keypoint Trajectories with Vision-Language Models for Robotic Garment Manipulation.

[BibT_eX]

[DOI]

CoRR, 2024

PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions.

[BibT_eX]

[DOI]

CoRR, 2024

MMSearch: Benchmarking the Potential of Large Models as Multi-modal Search Engines.

[BibT_eX]

[DOI]

CoRR, 2024

SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners.

[BibT_eX]

[DOI]

CoRR, 2024

Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining.

[BibT_eX]

[DOI]

CoRR, 2024

AMEX: Android Multi-annotation Expo Dataset for Mobile GUI Agents.

[BibT_eX]

[DOI]

CoRR, 2024

EfficientQAT: Efficient Quantization-Aware Training for Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

MAVIS: Mathematical Visual Instruction Tuning.

[BibT_eX]

[DOI]

CoRR, 2024

VEnhancer: Generative Space-Time Enhancement for Video Generation.

[BibT_eX]

[DOI]

CoRR, 2024

Lumina-Next: Making Lumina-T2X Stronger and Faster with Next-DiT.

[BibT_eX]

[DOI]

CoRR, 2024

A3VLM: Actionable Articulation-Aware Vision Language Model.

[BibT_eX]

[DOI]

CoRR, 2024

Phased Consistency Model.

[BibT_eX]

[DOI]

Fu-Yun Wang

Zhaoyang Huang

Alexander William Bergman

CoRR, 2024

TerDiT: Ternary Diffusion Models with Transformers.

[BibT_eX]

[DOI]

CoRR, 2024

Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers.

[BibT_eX]

[DOI]

CoRR, 2024

Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want.

[BibT_eX]

[DOI]

CoRR, 2024

MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?

[BibT_eX]

[DOI]

CoRR, 2024

Searching a Lightweight Network Architecture for Thermal Infrared Pedestrian Tracking.

[BibT_eX]

[DOI]

CoRR, 2024

SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

Uni3D-LLM: Unifying Point Cloud Perception, Generation and Editing with Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

ChartAssisstant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training and Multitask Instruction Tuning.

[BibT_eX]

[DOI]

CoRR, 2024

Lumina-Next : Making Lumina-T2X Stronger and Faster with Next-DiT.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Phased Consistency Models.

[BibT_eX]

[DOI]

Fu-Yun Wang

Zhaoyang Huang

Alexander William Bergman

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

ManipVQA: Injecting Robotic Affordance and Physically Grounded Information into Multi-Modal Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2024

Bridging Zero-shot Object Navigation and Foundation Models through Pixel-Guided Navigation Skill.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Robotics and Automation, 2024

MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

InstructSpeech: Following Speech Editing Instructions via Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

FreeBind: Free Lunch in Unified Multimodal Space via Knowledge Fusion.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

LLaMA-Adapter: Efficient Fine-tuning of Large Language Models with Zero-initialized Attention.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Personalize Segment Anything Model with One Shot.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

BESA: Pruning Large Language Models with Blockwise Parameter-Efficient Sparsity Allocation.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Unleashing the Potentials of Likelihood Composition for Multi-modal Language Models.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

MATHVERSE: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

SpatialFormer: Towards Generalizable Vision Transformers with Explicit Spatial Understanding.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Any2Point: Empowering Any-Modality Large Models for Efficient 3D Understanding.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

SPHINX: A Mixer of Weights, Visual Embeddings and Image Scales for Multi-modal Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

No Time to Train: Empowering Non-Parametric Networks for Few-Shot 3D Scene Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

OneLLM: One Framework to Align All Modalities with Language.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Digital Life Project: Autonomous 3D Characters with Social Intelligence.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

ChartAssistant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training and Multitask Instruction Tuning.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2024

Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

UniFormer: Unifying Convolution and Self-Attention for Visual Recognition.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., October, 2023

Hybrid token transformer for deep face recognition.

[BibT_eX]

[DOI]

Pattern Recognit., July, 2023

Improving drug-target affinity prediction via feature fusion and knowledge distillation.

[BibT_eX]

[DOI]

Briefings Bioinform., May, 2023

P2FEViT: Plug-and-Play CNN Feature Embedded Hybrid Vision Transformer for Remote Sensing Image Classification.

[BibT_eX]

[DOI]

Remote. Sens., April, 2023

Object-Centric Masked Image Modeling-Based Self-Supervised Pretraining for Remote Sensing Object Detection.

[BibT_eX]

[DOI]

IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens., 2023

LiDAR-LLM: Exploring the Potential of Large Language Models for 3D LiDAR Understanding.

[BibT_eX]

[DOI]

CoRR, 2023

A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise.

[BibT_eX]

[DOI]

CoRR, 2023

3DAxiesPrompts: Unleashing the 3D Spatial Task Capabilities of GPT-4V.

[BibT_eX]

[DOI]

CoRR, 2023

ChatIllusion: Efficient-Aligning Interleaved Generation ability with Visual Instruction Model.

[BibT_eX]

[DOI]

CoRR, 2023

SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

Improving Compositional Text-to-image Generation with Large Vision-Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

ImageBind-LLM: Multi-modality Instruction Tuning.

[BibT_eX]

[DOI]

CoRR, 2023

Point-Bind & Point-LLM: Aligning Point Cloud with Multi-modality for 3D Understanding, Generation, and Instruction Following.

[BibT_eX]

[DOI]

CoRR, 2023

Less is More: Towards Efficient Few-shot 3D Semantic Segmentation via Training-free Networks.

[BibT_eX]

[DOI]

CoRR, 2023

Tiny LVLM-eHub: Early Multimodal Experiments with Bard.

[BibT_eX]

[DOI]

CoRR, 2023

Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation.

[BibT_eX]

[DOI]

CoRR, 2023

Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model.

[BibT_eX]

[DOI]

CoRR, 2023

Personalize Segment Anything Model with One Shot.

[BibT_eX]

[DOI]

CoRR, 2023

LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model.

[BibT_eX]

[DOI]

CoRR, 2023

LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention.

[BibT_eX]

[DOI]

CoRR, 2023

Parameter is Not All You Need: Starting from Non-Parametric Networks for 3D Point Cloud Analysis.

[BibT_eX]

[DOI]

CoRR, 2023

Mimic before Reconstruct: Enhancing Masked Autoencoders with Feature Mimicking.

[BibT_eX]

[DOI]

CoRR, 2023

SUG: Single-dataset Unified Generalization for 3D Point Cloud Classification.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

Hybrid Transformer Network for Change Detection Under Self-Supervised Pretraining.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Geoscience and Remote Sensing Symposium, 2023

Not All Features Matter: Enhancing Few-shot CLIP with Adaptive Prior Refinement.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

PointCLIP V2: Prompting CLIP and GPT for Powerful 3D Open-world Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

SparseMAE: Sparse Training Meets Masked Autoencoders.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

MonoDETR: Depth-guided Transformer for Monocular 3D Object Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Filter Pruning Via Filters Similarity in Consecutive Layers.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Starting from Non-Parametric Networks for 3D Point Cloud Analysis.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Learning 3D Representations from 2D Pre-Trained Models via Image-to-Point Masked Autoencoders.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Prompt, Generate, Then Cache: Cascade of Foundation Models Makes Strong Few-Shot Learners.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Stare at What You See: Masked Image Modeling without Reconstruction.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Q-DETR: An Efficient Low-Bit Quantized Detection Transformer.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Resilient Binary Neural Network.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022

Consecutive Pre-Training: A Knowledge Transfer Learning Strategy with Relevant Unlabeled Data for Remote Sensing Domain.

[BibT_eX]

[DOI]

Remote. Sens., 2022

Hierarchical Disentangling Network for Building Extraction from Very High Resolution Optical Remote Sensing Imagery.

[BibT_eX]

[DOI]

Remote. Sens., 2022

PointCLIP V2: Adapting CLIP for Powerful 3D Open-world Learning.

[BibT_eX]

[DOI]

CoRR, 2022

Collaboration of Pre-trained Models Makes Better Few-shot Learner.

[BibT_eX]

[DOI]

CoRR, 2022

Tip-Adapter: Training-free Adaption of CLIP for Few-shot Classification.

[BibT_eX]

[DOI]

CoRR, 2022

Consecutive Pretraining: A Knowledge Transfer Learning Strategy with Relevant Unlabeled Data for Remote Sensing Domain.

[BibT_eX]

[DOI]

CoRR, 2022

Illumination Adaptive Transformer.

[BibT_eX]

[DOI]

CoRR, 2022

PASH at TREC 2021 Deep Learning Track: Generative Enhanced Model for Multi-stage Ranking.

[BibT_eX]

[DOI]

CoRR, 2022

ConvMAE: Masked Convolution Meets Masked Autoencoders.

[BibT_eX]

[DOI]

CoRR, 2022

POS-BERT: Point Cloud One-Stage BERT Pre-Training.

[BibT_eX]

[DOI]

CoRR, 2022

MonoDETR: Depth-aware Transformer for Monocular 3D Object Detection.

[BibT_eX]

[DOI]

CoRR, 2022

CandidateDrug4Cancer: An Open Molecular Graph Learning Benchmark on Drug Discovery for Cancer.

[BibT_eX]

[DOI]

CoRR, 2022

Distillation with Contrast is All You Need for Self-Supervised Point Cloud Representation Learning.

[BibT_eX]

[DOI]

CoRR, 2022

TerViT: An Efficient Ternary Vision Transformer.

[BibT_eX]

[DOI]

CoRR, 2022

UniFormer: Unified Transformer for Efficient Spatiotemporal Representation Learning.

[BibT_eX]

[DOI]

CoRR, 2022

RestoreDet: Degradation Equivariant Representation for Object Detection in Low Resolution Images.

[BibT_eX]

[DOI]

CoRR, 2022

HCL: Improving Graph Representation with Hierarchical Contrastive Learning.

[BibT_eX]

[DOI]

Proceedings of the Semantic Web - ISWC 2022, 2022

SFE-AI at SemEval-2022 Task 11: Low-Resource Named Entity Recognition using Large Pre-trained Language Models.

[BibT_eX]

[DOI]

Proceedings of the 16th International Workshop on Semantic Evaluation, SemEval@NAACL 2022, 2022

Point-M2AE: Multi-scale Masked Autoencoders for Hierarchical Point Cloud Pre-training.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Q-ViT: Accurate and Fully Quantized Low-bit Vision Transformer.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

MCMAE: Masked Convolution Meets Masked Autoencoders.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Adaptive Local Context Embedding for Small Vehicle Detection from Aerial Optical Remote Sensing Images.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Geoscience and Remote Sensing Symposium, 2022

UniFormer: Unified Transformer for Efficient Spatial-Temporal Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the Tenth International Conference on Learning Representations, 2022

Audio-Visual Scene-Aware Dialog and Reasoning Using Audio-Visual Transformers with Joint Student-Teacher Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Tip-Adapter: Training-Free Adaption of CLIP for Few-Shot Classification.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

IDa-Det: An Information Discrepancy-Aware Distillation for 1-Bit Detectors.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

Recurrent Bilinear Optimization for Binary Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision, 2022

Frozen CLIP Models are Efficient Video Learners.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

Prototypical Contrast Adaptation for Domain Adaptive Semantic Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

PointCLIP: Point Cloud Understanding by CLIP.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Unleashing the Potential of Vision-Language Models for Long-Tailed Visual Recognition.

[BibT_eX]

[DOI]

Proceedings of the 33rd British Machine Vision Conference 2022, 2022

You Only Need 90K Parameters to Adapt Light: a Light Weight Transformer for Image Enhancement and Exposure Correction.

[BibT_eX]

[DOI]

Proceedings of the 33rd British Machine Vision Conference 2022, 2022

2021

Automated vertebral landmarks and spinal curvature estimation using non-directional part affinity fields.

[BibT_eX]

[DOI]

Neurocomputing, 2021

Multi-View Partial (MVP) Point Cloud Challenge 2021 on Completion and Registration: Methods and Results.

[BibT_eX]

[DOI]

Francisco Gómez Fernández

Qinlong Wang

Yang Yang

CoRR, 2021

Superpixel-Based Building Damage Detection from Post-earthquake Very High Resolution Imagery Using Deep Neural Networks.

[BibT_eX]

[DOI]

CoRR, 2021

A Simple Long-Tailed Recognition Baseline via Vision-Language Model.

[BibT_eX]

[DOI]

CoRR, 2021

Tip-Adapter: Training-free CLIP-Adapter for Better Vision-Language Modeling.

[BibT_eX]

[DOI]

CoRR, 2021

Pairwise Half-graph Discrimination: A Simple Graph-level Self-supervised Strategy for Pre-training Graph Neural Networks.

[BibT_eX]

[DOI]

CoRR, 2021

Winner Team Mia at TextVQA Challenge 2021: Vision-and-Language Representation Learning with Pre-trained Sequence-to-Sequence Model.

[BibT_eX]

[DOI]

CoRR, 2021

Oriented Object Detection with Transformer.

[BibT_eX]

[DOI]

CoRR, 2021

Scalable Transformers for Neural Machine Translation.

[BibT_eX]

[DOI]

CoRR, 2021

Container: Context Aggregation Network.

[BibT_eX]

[DOI]

CoRR, 2021

Dual-stream Network for Visual Recognition.

[BibT_eX]

[DOI]

CoRR, 2021

RomeBERT: Robust Training of Multi-Exit BERT.

[BibT_eX]

[DOI]

CoRR, 2021

An effective self-supervised framework for learning expressive molecular global representations to drug discovery.

[BibT_eX]

[DOI]

Briefings Bioinform., 2021

PASH at TREC 2021 Deep Learning Track: Generative Enhanced Model for Multi-stageRankingtrack: DL.

[BibT_eX]

[DOI]

Proceedings of the Thirtieth Text REtrieval Conference, 2021

Dual-stream Network for Visual Recognition.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Container: Context Aggregation Networks.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Dense Contrastive Visual-Linguistic Pretraining.

[BibT_eX]

[DOI]

Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Pairwise Half-graph Discrimination: A Simple Graph-level Self-supervised Strategy for Pre-training Graph Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, 2021

Fast Convergence of DETR with Spatially Modulated Co-Attention.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

End-to-End Object Detection with Adaptive Clustering Transformer.

[BibT_eX]

[DOI]

Proceedings of the 32nd British Machine Vision Conference 2021, 2021

Dynamic Graph Representation Learning for Video Dialog via Multi-Modal Shuffled Transformers.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020

Learn molecular representations from large-scale unlabeled molecules for drug discovery.

[BibT_eX]

[DOI]

CoRR, 2020

End-to-End Object Detection with Adaptive Clustering Transformer.

[BibT_eX]

[DOI]

CoRR, 2020

Multi-Pass Transformer for Machine Translation.

[BibT_eX]

[DOI]

CoRR, 2020

Contrastive Visual-Linguistic Pretraining.

[BibT_eX]

[DOI]

CoRR, 2020

Gradient Regularized Contrastive Learning for Continual Domain Adaptation.

[BibT_eX]

[DOI]

CoRR, 2020

Spatio-Temporal Scene Graphs for Video Dialog.

[BibT_eX]

[DOI]

CoRR, 2020

Character Matters: Video Story Understanding with Character-Aware Relations.

[BibT_eX]

[DOI]

CoRR, 2020

Extreme Low-Light Imaging with Multi-granulation Cooperative Networks.

[BibT_eX]

[DOI]

CoRR, 2020

PASH at TREC 2020 Deep Learning Track: Dense Matching for Nested Ranking.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Ninth Text REtrieval Conference, 2020

A Multiple Models Ensembling Method in TREC Deep Learning.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Ninth Text REtrieval Conference, 2020

Unsupervised Domain Adaptation for Cross-Device OCT Lesion Detection via Learning Adaptive Features.

[BibT_eX]

[DOI]

Proceedings of the 17th IEEE International Symposium on Biomedical Imaging, 2020

Automatic Student Network Search for Knowledge Distillation.

[BibT_eX]

[DOI]

Proceedings of the 25th International Conference on Pattern Recognition, 2020

Multi-Layer Content Interaction Through Quaternion Product for Visual Question Answering.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Pre-training Entity Relation Encoder with Intra-span and Inter-span Information.

[BibT_eX]

[DOI]

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

Learning Where to Focus for Efficient Video Object Detection.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

Semi-supervised Active Learning for Instance Segmentation via Scoring Predictions.

[BibT_eX]

[DOI]

Proceedings of the 31st British Machine Vision Conference 2020, 2020

Region Focus Network for Joint Optic Disc and Cup Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019

Structure-Aware Noise Reduction Generative Adversarial Network for Optical Coherence Tomography Image.

[BibT_eX]

[DOI]

Proceedings of the Ophthalmic Medical Image Analysis - 6th International Workshop, 2019

Multi-Modality Latent Interaction Network for Visual Question Answering.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Dynamic Fusion With Intra- and Inter-Modality Attention Flow for Visual Question Answering.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Video Object Detection with Locally-Weighted Deformable Neighbors.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018

Question-Guided Hybrid Convolution for Visual Question Answering.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2018, 2018

2017

Towards Reliable Online Services Analyzing Mobile Sensor Big Data.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Web Services, 2017

2016

Space-map-matching-based candidate selection for GPS map matching.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Service Operations and Logistics, 2016

Moving object map analytics: A framework enabling contextual spatial-temporal analytics of Internet of Things applications.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Service Operations and Logistics, 2016

2014

Scalable Mobile Data Streaming with Trajectory Preserving Partitioning.

[BibT_eX]

[DOI]

Proceedings of the IEEE Third International Conference on Mobile Services, Anchorage, AK, USA, June 27, 2014

Maximizing Multi-scale Spatial Statistical Discrepancy.

[BibT_eX]

[DOI]

Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, 2014

Peng Gao

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...