Chunyuan Li

Orcid: 0009-0000-6608-7469

According to our database1, Chunyuan Li authored at least 167 papers between 2011 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Guest Editorial: Special Issue on the Promises and Dangers of Large Vision Models.
Int. J. Comput. Vis., April, 2024

Multimodal Foundation Models: From Specialists to General-Purpose Assistants.
Found. Trends Comput. Graph. Vis., 2024

Video Instruction Tuning With Synthetic Data.
CoRR, 2024

LLaVA-Critic: Learning to Evaluate Multimodal Models.
CoRR, 2024

MMSearch: Benchmarking the Potential of Large Models as Multi-modal Search Engines.
CoRR, 2024

SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners.
CoRR, 2024

LLaVA-OneVision: Easy Visual Task Transfer.
CoRR, 2024

LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models.
CoRR, 2024

LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models.
CoRR, 2024

Long Context Transfer from Language to Vision.
CoRR, 2024

Beyond Raw Videos: Understanding Edited Videos with Large Multimodal Model.
CoRR, 2024

MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding.
CoRR, 2024

Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment.
CoRR, 2024

Graphic Design with Large Multimodal Model.
CoRR, 2024

Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward.
CoRR, 2024

Training Small Multimodal Models to Bridge Biomedical Competency Gap: A Case Study in Radiology Imaging.
CoRR, 2024

TrustLLM: Trustworthiness in Large Language Models.
CoRR, 2024


MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Towards Building The Federatedgpt: Federated Instruction Tuning.
Proceedings of the IEEE International Conference on Acoustics, 2024

LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models.
Proceedings of the Computer Vision - ECCV 2024, 2024

Grounding DINO: Marrying DINO with Grounded Pre-training for Open-Set Object Detection.
Proceedings of the Computer Vision - ECCV 2024, 2024

LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents.
Proceedings of the Computer Vision - ECCV 2024, 2024

Segment and Recognize Anything at Any Granularity.
Proceedings of the Computer Vision - ECCV 2024, 2024

Improved Baselines with Visual Instruction Tuning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Visual in-Context Prompting.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Aligning Large Multimodal Models with Factually Augmented RLHF.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

2023
Calibration and Uncertainty in Neural Time-to-Event Modeling.
IEEE Trans. Neural Networks Learn. Syst., April, 2023

Contrastive Attraction and Contrastive Repulsion for Representation Learning.
Trans. Mach. Learn. Res., 2023

LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models.
CoRR, 2023

Multimodal Foundation Models for Zero-shot Animal Species Recognition in Camera Trap Images.
CoRR, 2023

LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing.
CoRR, 2023

Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V.
CoRR, 2023

BiomedJourney: Counterfactual Biomedical Image Generation by Instruction-Learning from Multimodal Patient Journeys.
CoRR, 2023

MathVista: Evaluating Math Reasoning in Visual Contexts with GPT-4V, Bard, and Other Large Multimodal Models.
CoRR, 2023

An Empirical Study of Scaling Instruct-Tuned Large Multimodal Models.
CoRR, 2023

Benchmarking and Analyzing Generative Data for Visual Recognition.
CoRR, 2023

Semantic-SAM: Segment and Recognize Anything at Any Granularity.
CoRR, 2023

Large Multimodal Models: Notes on CVPR 2023 Tutorial.
CoRR, 2023

MIMIC-IT: Multi-Modal In-Context Instruction Tuning.
CoRR, 2023

On the Hidden Mystery of OCR in Large Multimodal Models.
CoRR, 2023

Towards Building the Federated GPT: Federated Instruction Tuning.
CoRR, 2023

Instruction Tuning with GPT-4.
CoRR, 2023

A Simple Framework for Open-Vocabulary Segmentation and Detection.
CoRR, 2023

Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection.
CoRR, 2023

Visual Instruction Tuning.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Large Language Models are Visual Reasoning Coordinators.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

A Simple Framework for Open-Vocabulary Segmentation and Detection.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Research on multi-factor quantitative investment strategy of SVM model based on machine learning.
Proceedings of the 4th International Conference on Artificial Intelligence and Computer Engineering, 2023

Scaling Vision-Language Models with Sparse Mixture of Experts.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Generalized Decoding for Pixel, Image, and Language.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Learning Customized Visual Models with Retrieval-Augmented Knowledge.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

GLIGEN: Open-Set Grounded Text-to-Image Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Parameter-Efficient Model Adaptation for Vision Transformers.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
Vision-Language Pre-Training: Basics, Recent Advances, and Future Trends.
Found. Trends Comput. Graph. Vis., 2022

Application of Two Novel Acoustic Emission Parameters on Identifying the Instability of Granite.
Entropy, 2022

Lafite2: Few-shot Text-to-Image Generation.
CoRR, 2022

Parameter-efficient Fine-tuning for Vision Transformers.
CoRR, 2022

Focal Modulation Networks.
CoRR, 2022

Focal Modulation Networks.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

K-LITE: Learning Transferable Visual Models with External Knowledge.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Hierarchical Transformer for Survival Prediction Using Multimodality Whole Slide Images and Genomics.
Proceedings of the 26th International Conference on Pattern Recognition, 2022

Efficient Self-supervised Vision Transformers for Representation Learning.
Proceedings of the Tenth International Conference on Learning Representations, 2022

STT: Soft Template Tuning for Few-Shot Adaptation.
Proceedings of the IEEE International Conference on Data Mining Workshops, 2022

Towards Language-Free Training for Text-to-Image Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

RegionCLIP: Region-based Language-Image Pretraining.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Unified Contrastive Learning in Image-Text-Label Space.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Grounded Language-Image Pre-training.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

MoDNA: motif-oriented pre-training for DNA language model.
Proceedings of the BCB '22: 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, Northbrook, Illinois, USA, August 7, 2022

2021
SOLOIST: Building Task Bots at Scale with Transfer Learning and Machine Teaching.
Trans. Assoc. Comput. Linguistics, 2021

A Generic Approach for Enhancing GANs by Regularized Latent Optimization.
CoRR, 2021

LAFITE: Towards Language-Free Training for Text-to-Image Generation.
CoRR, 2021

Florence: A New Foundation Model for Computer Vision.
CoRR, 2021

SYNERGY: Building Task Bots at Scale Using Symbolic Knowledge and Machine Teaching.
CoRR, 2021

Focal Self-attention for Local-Global Interactions in Vision Transformers.
CoRR, 2021

Contrastive Conditional Transport for Representation Learning.
CoRR, 2021

SDA: Improving Text Generation with Self Data Augmentation.
CoRR, 2021

Leveraging User Behavior History for Personalized Email Search.
Proceedings of the WWW '21: The Web Conference 2021, 2021

Focal Attention for Long-Range Interactions in Vision Transformers.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Exploring Robustness of Unsupervised Domain Adaptation in Semantic Segmentation.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Rethinking Sentiment Style Transfer.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2021, 2021

Few-Shot Named Entity Recognition: An Empirical Baseline Study.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

ReMP: Rectified Metric Propagation for Few-Shot Learning.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2021

Partition-Guided GANs.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

RADDLE: An Evaluation Benchmark and Analysis Platform for Robust Task-oriented Dialog Systems.
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

Hierarchical Graph Capsule Network.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020
Few-Shot Named Entity Recognition: A Comprehensive Study.
CoRR, 2020

Self-supervised Pre-training with Hard Examples Improves Visual Representations.
CoRR, 2020

Robust Conversational AI with Grounded Text Generation.
CoRR, 2020

Weakly supervised cross-domain alignment with optimal transport.
CoRR, 2020

SOLOIST: Few-shot Task-Oriented Dialog with A Single Pre-trained Auto-regressive Model.
CoRR, 2020

POINTER: Constrained Text Generation via Insertion-based Generative Pre-training.
CoRR, 2020

Shape retrieval of non-rigid 3d human models.
CoRR, 2020

Multi-View Learning for Vision-and-Language Navigation.
CoRR, 2020

Feature Quantization Improves GAN Training.
Proceedings of the 37th International Conference on Machine Learning, 2020

Cyclical Stochastic Gradient MCMC for Bayesian Deep Learning.
Proceedings of the 8th International Conference on Learning Representations, 2020

RaCT: Toward Amortized Ranking-Critical Training For Collaborative Filtering.
Proceedings of the 8th International Conference on Learning Representations, 2020

POINTER: Constrained Progressive Text Generation via Insertion-based Generative Pre-training.
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

Few-shot Natural Language Generation for Task-Oriented Dialog.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020, 2020

Improving Text Generation with Student-Forcing Optimal Transport.
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

Optimus: Organizing Sentences via Pre-trained Modeling of a Latent Space.
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

Repulsive Attention: Rethinking Multi-head Attention as Bayesian Inference.
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

Structure-Aware Human-Action Generation.
Proceedings of the Computer Vision - ECCV 2020, 2020

Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks.
Proceedings of the Computer Vision - ECCV 2020, 2020

Towards Learning a Generic Agent for Vision-and-Language Navigation via Pre-Training.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Survival cluster analysis.
Proceedings of the ACM CHIL '20: ACM Conference on Health, 2020

Advancing weakly supervised cross-domain alignment with optimal transport.
Proceedings of the 31st British Machine Vision Conference 2020, 2020

Complementary Auxiliary Classifiers for Label-Conditional Text Generation.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019
Straight-Through Estimator as Projected Wasserstein Gradient Flow.
CoRR, 2019

Twin Auxiliary Classifiers GAN.
CoRR, 2019

Towards Amortized Ranking-Critical Training for Collaborative Filtering.
CoRR, 2019

Twin Auxilary Classifiers GAN.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Cyclical Annealing Schedule: A Simple Approach to Mitigating KL Vanishing.
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019

Robust Navigation with Language Pretraining and Stochastic Sampling.
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019

Implicit Deep Latent Variable Models for Text Generation.
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019

DoubleTransfer at MEDIQA 2019: Multi-Source Transfer Learning for Natural Language Understanding in the Medical Domain.
Proceedings of the 18th BioNLP Workshop and Shared Task, 2019

Adversarial Learning of a Sampler Based on an Unnormalized Distribution.
Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, 2019

Communication-Efficient Stochastic Gradient MCMC for Neural Networks.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018
Towards Better Representations with Deep/Bayesian Learning.
PhD thesis, 2018

Localization of a high-speed train using a speed model based on the gradient descent algorithm.
Future Gener. Comput. Syst., 2018

Generative Adversarial Network Training is a Continual Learning Problem.
CoRR, 2018

Car-following behavior of connected vehicles in a mixed traffic flow: modeling and stability analysis.
CoRR, 2018

Policy Optimization as Wasserstein Gradient Flows.
Proceedings of the 35th International Conference on Machine Learning, 2018

Continuous-Time Flows for Efficient Inference and Density Estimation.
Proceedings of the 35th International Conference on Machine Learning, 2018

Adversarial Time-to-Event Modeling.
Proceedings of the 35th International Conference on Machine Learning, 2018

Measuring the Intrinsic Dimension of Objective Landscapes.
Proceedings of the 6th International Conference on Learning Representations, 2018

Learning Structural Weight Uncertainty for Sequential Decision-Making.
Proceedings of the International Conference on Artificial Intelligence and Statistics, 2018

Symmetric Variational Autoencoder and Connections to Adversarial Learning.
Proceedings of the International Conference on Artificial Intelligence and Statistics, 2018

Joint Embedding of Words and Labels for Text Classification.
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018

Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms.
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018

2017
Symmetric Variational Autoencoder and Connections to Adversarial Learning.
CoRR, 2017

Towards Understanding Adversarial Learning for Joint Distribution Matching.
CoRR, 2017

Stein Variational Autoencoder.
CoRR, 2017

Adversarial Symmetric Variational Autoencoder.
Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

VAE Learning via Stein Variational Gradient Descent.
Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

ALICE: Towards Understanding Adversarial Learning for Joint Distribution Matching.
Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

Triangle Generative Adversarial Networks.
Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

Learning Generic Sentence Representations Using Convolutional Neural Networks.
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 2017

Scalable Bayesian Learning of Recurrent Neural Networks for Language Modeling.
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017

Unsupervised Learning with Truncated Gaussian Graphical Models.
Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 2017

2016
A spectral graph wavelet approach for nonrigid 3D shape retrieval.
Pattern Recognit. Lett., 2016

Unsupervised Learning of Sentence Representations using Convolutional Neural Networks.
CoRR, 2016

Variational Autoencoder for Deep Learning of Images, Labels and Captions.
Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

Stochastic Gradient MCMC with Stale Gradients.
Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

Bayesian Dictionary Learning with Gaussian Processes and Sigmoid Belief Networks.
Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, 2016

Learning Weight Uncertainty with Stochastic Gradient MCMC for Shape Classification.
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

A Deep Generative Deconvolutional Image Model.
Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, 2016

Bridging the Gap between Stochastic Gradient MCMC and Stochastic Optimization.
Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, 2016

High-Order Stochastic Gradient Thermostats for Bayesian Learning of Deep Models.
Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2016

Preconditioned Stochastic Gradient Langevin Dynamics for Deep Neural Networks.
Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2016

A Unifying Variational Inference Framework for Hierarchical Graph-Coupled HMM with an Application to Influenza Infection.
Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2016

2015
A comparison of 3D shape retrieval methods based on a large-scale benchmark supporting multimodal queries.
Comput. Vis. Image Underst., 2015

Deep Temporal Sigmoid Belief Networks for Sequence Modeling.
Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 2015

2014
Online redundant image elimination and its application to wireless capsule endoscopy.
Signal Image Video Process., 2014

Symmetry discovery and retrieval of nonrigid 3D shapes using geodesic skeleton paths.
Multim. Tools Appl., 2014

Spatially aggregating spectral descriptors for nonrigid 3D shape retrieval: a comparative survey.
Multim. Syst., 2014

Persistence-Based Structural Recognition.
Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014

Automatic Location of Landmarks used in Manual Anthropometry.
Proceedings of the 7th Eurographics Workshop on 3D Object Retrieval, 2014

2013
A multiresolution descriptor for deformable 3D shape retrieval.
Vis. Comput., 2013

Intrinsic spatial pyramid matching for deformable 3D shape retrieval.
Int. J. Multim. Inf. Retr., 2013

2012
Fast shape retrieval using a graph theoretic approach.
Int. J. Multim. Inf. Retr., 2012

2011
Research on the Message-Oriented Middleware for Wireless Sensor Networks.
J. Comput., 2011

Skeleton Path Based Approach for Nonrigid 3D Shape Analysis and Retrieval.
Proceedings of the Combinatorial Image Analysis - 14th International Workshop, 2011

Minimum near-convex decomposition for robust shape representation.
Proceedings of the IEEE International Conference on Computer Vision, 2011

Fast Shape Re-ranking with Neighborhood Induced Similarity Measure.
Proceedings of the Computer Analysis of Images and Patterns, 2011


  Loading...