Zhe Gan

According to our database1, Zhe Gan authored at least 156 papers between 2009 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Multimodal Foundation Models: From Specialists to General-Purpose Assistants.
Found. Trends Comput. Graph. Vis., 2024

Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms.
CoRR, 2024

Improve Vision Language Model Chain-of-thought Reasoning.
CoRR, 2024

MM-Ego: Towards Building Egocentric Multimodal LLMs.
CoRR, 2024

Contrastive Localized Language-Image Pre-Training.
CoRR, 2024

Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models.
CoRR, 2024

MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning.
CoRR, 2024

SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models.
CoRR, 2024

Understanding Alignment in Multimodal LLMs: A Comprehensive Study.
CoRR, 2024

MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs.
CoRR, 2024

Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models.
CoRR, 2024

MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training.
CoRR, 2024

How Easy is It to Fool Your Multimodal LLMs? An Empirical Analysis on Deceptive Prompts.
CoRR, 2024

Ferret: Refer and Ground Anything Anywhere at Any Granularity.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Compressing LLMs: The Truth is Rarely Pure and Never Simple.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Guiding Instruction-based Image Editing via Multimodal Large Language Models.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs.
Proceedings of the Computer Vision - ECCV 2024, 2024

GRiT: A Generative Region-to-Text Transformer for Object Understanding.
Proceedings of the Computer Vision - ECCV 2024, 2024


VeCLIP: Improving CLIP Training via Visual-Enriched Captions.
Proceedings of the Computer Vision - ECCV 2024, 2024

Diagnostic Benchmark and Iterative Inpainting for Layout-Guided Image Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023
InfoVisDial: An Informative Visual Dialogue Dataset by Bridging Large Multimodal and Language Models.
CoRR, 2023

From Scarcity to Efficiency: Improving CLIP Training via Visual-enriched Captions.
CoRR, 2023

MOFI: Learning Image Representations from Noisy Entity Annotated Images.
CoRR, 2023

Prompting GPT-3 To Be Reliable.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Pre-trained Language Models Do Not Help Auto-regressive Text-to-Image Generation.
Proceedings of the Proceedings on "I Can't Believe It's Not Better: Failure Modes in the Age of Foundation Models" at NeurIPS 2023 Workshops, 2023

An Empirical Study of Multimodal Model Merging.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Generalized Decoding for Pixel, Image, and Language.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Non-Contrastive Learning Meets Language-Image Pre-Training.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

ReCo: Region-Controlled Text-to-Image Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

LAVENDER: Unifying Video-Language Understanding as Masked Language Modeling.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

An Empirical Study of End-to-End Video-Language Transformers with Masked Visual Modeling.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
GIT: A Generative Image-to-text Transformer for Vision and Language.
Trans. Mach. Learn. Res., 2022

Adversarial Feature Augmentation and Normalization for Visual Recognition.
Trans. Mach. Learn. Res., 2022

Vision-Language Pre-Training: Basics, Recent Advances, and Future Trends.
Found. Trends Comput. Graph. Vis., 2022

Exploring Discrete Diffusion Models for Image Captioning.
CoRR, 2022

K-LITE: Learning Transferable Visual Models with External Knowledge.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

UniTAB: Unifying Text and Box Outputs for Grounded Vision-Language Modeling.
Proceedings of the Computer Vision - ECCV 2022, 2022

SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Injecting Semantic Concepts into End-to-End Image Captioning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

An Empirical Study of Training End-to-End Vision-and-Language Transformers.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Scaling Up Vision-Language Pretraining for Image Captioning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

Playing Lottery Tickets with Vision and Language.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

Efficient Robust Training via Backward Smoothing.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021
MLP Architectures for Vision-and-Language Modeling: An Empirical Study.
CoRR, 2021

VIOLET : End-to-End Video-Language Transformers with Masked Visual-token Modeling.
CoRR, 2021

Scaling Up Vision-Language Pre-training for Image Captioning.
CoRR, 2021

Crossing the Format Boundary of Text and Boxes: Towards Unified Vision-Language Modeling.
CoRR, 2021

UFO: A UniFied TransfOrmer for Vision-Language Representation Learning.
CoRR, 2021

Simpler, Faster, Stronger: Breaking The log-K Curse On Contrastive Learners With FlatNCE.
CoRR, 2021

Playing Lottery Tickets with Vision and Language.
CoRR, 2021

CUPID: Adaptive Curation of Pre-training Data for Video-and-Language Representation Learning.
CoRR, 2021

Ultra-Data-Efficient GAN Training: Drawing A Lottery Ticket First, Then Training It Toughly.
CoRR, 2021

Meta Module Network for Compositional Visual Reasoning.
Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2021

MaxVA: Fast Adaptation of Step Sizes by Maximizing Observed Variance of Gradients.
Proceedings of the Machine Learning and Knowledge Discovery in Databases. Research Track, 2021

Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models.
Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, 2021

VALUE: A Multi-Task Benchmark for Video-and-Language Understanding Evaluation.
Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, 2021

The Elastic Lottery Ticket Hypothesis.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Chasing Sparsity in Vision Transformers: An End-to-End Exploration.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Data-Efficient GAN Training Beyond (Just) Augmentations: A Lottery Ticket Perspective.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

APo-VAE: Text Generation in Hyperbolic Space.
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

Improving Zero-Shot Voice Style Transfer via Disentangled Representation Learning.
Proceedings of the 9th International Conference on Learning Representations, 2021

InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective.
Proceedings of the 9th International Conference on Learning Representations, 2021

Adversarial VQA: A New Benchmark for Evaluating the Robustness of VQA Models.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Less Is More: ClipBERT for Video-and-Language Learning via Sparse Sampling.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Wasserstein Contrastive Representation Distillation.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Cluster-Former: Clustering-based Sparse Transformer for Question Answering.
Proceedings of the Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, 2021

EarlyBERT: Efficient BERT Training via Early-bird Lottery Tickets.
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

FILTER: An Enhanced Fusion Method for Cross-lingual Language Understanding.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020
A Closer Look at the Robustness of Vision-and-Language Pre-trained Models.
CoRR, 2020

Cluster-Former: Clustering-based Sparse Transformer for Long-Range Dependency Encoding.
CoRR, 2020

Accelerating Real-Time Question Answering via Question Generation.
CoRR, 2020

Adaptive Learning Rates with Maximum Variation Averaging.
CoRR, 2020

POINTER: Constrained Text Generation via Insertion-based Generative Pre-training.
CoRR, 2020

Large-Scale Adversarial Training for Vision-and-Language Representation Learning.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Sequential Attention GAN for Interactive Image Editing.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

CLUB: A Contrastive Log-ratio Upper Bound of Mutual Information.
Proceedings of the 37th International Conference on Machine Learning, 2020

Graph Optimal Transport for Cross-Domain Alignment.
Proceedings of the 37th International Conference on Machine Learning, 2020

FreeLB: Enhanced Adversarial Training for Natural Language Understanding.
Proceedings of the 8th International Conference on Learning Representations, 2020

POINTER: Constrained Progressive Text Generation via Insertion-based Generative Pre-training.
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

Cross-Thought for Sentence Encoder Pre-training.
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

Contrastive Distillation on Intermediate Representations for Language Model Compression.
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training.
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

Hierarchical Graph Network for Multi-hop Question Answering.
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

Multi-Fact Correction in Abstractive Text Summarization.
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

Contextual Text Style Transfer.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020, 2020

UNITER: UNiversal Image-TExt Representation Learning.
Proceedings of the Computer Vision - ECCV 2020, 2020

Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models.
Proceedings of the Computer Vision - ECCV 2020, 2020

Violin: A Large-Scale Dataset for Video-and-Language Inference.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

BachGAN: High-Resolution Image Synthesis From Salient Object Layout.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Nested-Wasserstein Self-Imitation Learning for Sequence Generation.
Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, 2020

Improving Adversarial Text Generation by Modeling the Distant Future.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

Discourse-Aware Neural Extractive Text Summarization.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

Distilling Knowledge Learned in BERT for Text Generation.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

MagGAN: High-Resolution Face Attribute Editing with Mask-Guided Generative Adversarial Network.
Proceedings of the Computer Vision - ACCV 2020 - 15th Asian Conference on Computer Vision, Kyoto, Japan, November 30, 2020

Contrastively Smoothed Class Alignment for Unsupervised Domain Adaptation.
Proceedings of the Computer Vision - ACCV 2020 - 15th Asian Conference on Computer Vision, Kyoto, Japan, November 30, 2020

Graph-Driven Generative Models for Heterogeneous Multi-Task Learning.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

What Makes A Good Story? Designing Composite Rewards for Visual Storytelling.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019
Distilling the Knowledge of BERT for Text Generation.
CoRR, 2019

Discourse-Aware Neural Extractive Model for Text Summarization.
CoRR, 2019

FreeLB: Enhanced Adversarial Training for Language Understanding.
CoRR, 2019

UNITER: Learning UNiversal Image-TExt Representations.
CoRR, 2019

Topic-Guided Variational Autoencoders for Text Generation.
CoRR, 2019

Improving Textual Network Learning with Variational Homophilic Embeddings.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Topic-Guided Variational Auto-Encoder for Text Generation.
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019

Improving Sequence-to-Sequence Learning via Optimal Transport.
Proceedings of the 7th International Conference on Learning Representations, 2019

Relation-Aware Graph Attention Network for Visual Question Answering.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Adversarial Domain Adaptation for Machine Reading Comprehension.
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019

Patient Knowledge Distillation for BERT Model Compression.
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019

Domain Adaptive Text Style Transfer.
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019

TIGEr: Text-to-Image Grounding for Image Caption Evaluation.
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019

StoryGAN: A Sequential Conditional GAN for Story Visualization.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Tactical Rewind: Self-Correction via Backtracking in Vision-And-Language Navigation.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Multi-step Reasoning via Recurrent Dual Attention for Visual Dialog.
Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019

Hierarchically Structured Reinforcement Learning for Topically Coherent Visual Story Generation.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018
Deep Generative Models for Vision and Language Intelligence.
PhD thesis, 2018

Sequential Attention GAN for Interactive Image Editing via Dialogue.
CoRR, 2018

Sequence Generation with Guider Network.
CoRR, 2018

Adversarial Text Generation via Feature-Mover's Distance.
CoRR, 2018

Generating Informative and Diverse Conversational Responses via Adversarial Information Maximization.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Adversarial Text Generation via Feature-Mover's Distance.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Multi-Label Learning from Medical Plain Text with Convolutional Residual Models.
Proceedings of the Machine Learning for Healthcare Conference, 2018

JointGAN: Multi-Domain Joint Distribution Learning with Generative Adversarial Nets.
Proceedings of the 35th International Conference on Machine Learning, 2018

AttnGAN: Fine-Grained Text to Image Generation With Attentional Generative Adversarial Networks.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Topic Compositional Neural Language Model.
Proceedings of the International Conference on Artificial Intelligence and Statistics, 2018

Adaptive Feature Abstraction for Translating Video to Text.
Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

2017
Stein Variational Autoencoder.
CoRR, 2017

Deconvolutional Paragraph Representation Learning.
Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

Adversarial Symmetric Variational Autoencoder.
Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

VAE Learning via Stein Variational Gradient Descent.
Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

Triangle Generative Adversarial Networks.
Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

Adversarial Feature Matching for Text Generation.
Proceedings of the 34th International Conference on Machine Learning, 2017

Stochastic Gradient Monomial Gamma Sampler.
Proceedings of the 34th International Conference on Machine Learning, 2017

Adaptive Feature Abstraction for Translating Video to Language.
Proceedings of the 5th International Conference on Learning Representations, 2017

Adaptive DCTNet for audio signal classification.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Character-level deep conflation for business data analytics.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Learning Generic Sentence Representations Using Convolutional Neural Networks.
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 2017

Semantic Compositional Networks for Visual Captioning.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

StyleNet: Generating Attractive Visual Captions with Styles.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

Scalable Bayesian Learning of Recurrent Neural Networks for Language Modeling.
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017

Unsupervised Learning with Truncated Gaussian Graphical Models.
Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 2017

2016
Unsupervised Learning of Sentence Representations using Convolutional Neural Networks.
CoRR, 2016

Variational Autoencoder for Deep Learning of Images, Labels and Captions.
Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

Factored Temporal Sigmoid Belief Networks for Sequence Learning.
Proceedings of the 33nd International Conference on Machine Learning, 2016

Learning Weight Uncertainty with Stochastic Gradient MCMC for Shape Classification.
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

Bridging the Gap between Stochastic Gradient MCMC and Stochastic Optimization.
Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, 2016

Inference of gene networks associated with the host response to infectious disease.
Proceedings of the Big Data over Networks, 2016

2015
Deep Poisson Factor Modeling.
Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 2015

Deep Temporal Sigmoid Belief Networks for Sequence Modeling.
Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 2015

Scalable Deep Poisson Factor Analysis for Topic Modeling.
Proceedings of the 32nd International Conference on Machine Learning, 2015

Learning Deep Sigmoid Belief Networks with Data Augmentation.
Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, 2015

2009
A General Geo-spatial Multi-scale Conceptual Model for Automatic Generalization.
Proceedings of the 2009 International Conference on Environmental Science and Information Application Technology, 2009

Research on the Integration Techniques of Task-Oriented Geospatial Information Service for Battlefield.
Proceedings of the 2009 International Conference on Environmental Science and Information Application Technology, 2009


  Loading...