Junnan Li

Orcid: 0000-0002-1405-2034

Affiliations:
  • National University of Singapore, Graduate School for Integrative Sciences and Engineering, Singapore
  • Salesforce Research Asia, Singapore


According to our database1, Junnan Li authored at least 56 papers between 2016 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
What Are We Measuring When We Evaluate Large Vision-Language Models? An Analysis of Latent Factors and Biases.
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024

X-InstructBLIP: A Framework for Aligning Image, 3D, Audio, Video to LLMs and its Emergent Cross-Modal Reasoning.
Proceedings of the Computer Vision - ECCV 2024, 2024

ULIP-2: Towards Scalable Multimodal Pre-Training for 3D Understanding.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023
Improving Tail-Class Representation with Centroid Contrastive Learning.
Pattern Recognit. Lett., April, 2023

X-InstructBLIP: A Framework for aligning X-Modal instruction-aware representations to LLMs and Emergent Cross-modal Reasoning.
CoRR, 2023

CodeTF: One-stop Transformer Library for State-of-the-art Code LLM.
CoRR, 2023

ULIP-2: Towards Scalable Multimodal Pre-training for 3D Understanding.
CoRR, 2023

Efficient Text-to-Code Retrieval with Cascaded Fast and Slow Transformer Models.
Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2023

BLIP-Diffusion: Pre-trained Subject Representation for Controllable Text-to-Image Generation and Editing.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models.
Proceedings of the International Conference on Machine Learning, 2023

Masked Unsupervised Self-training for Label-free Image Classification.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

CodeT5+: Open Code Large Language Models for Code Understanding and Generation.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

From Images to Textual Prompts: Zero-shot Visual Question Answering with Frozen Large Language Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

LAVIS: A One-stop Library for Language-Vision Intelligence.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 2023

Tackling Data Heterogeneity in Federated Learning with Class Prototypes.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
From Images to Textual Prompts: Zero-shot VQA with Frozen Large Language Models.
CoRR, 2022

BotSIM: An End-to-End Bot Simulation Toolkit for Commercial Task-Oriented Dialog Systems.
CoRR, 2022

LAVIS: A Library for Language-Vision Intelligence.
CoRR, 2022

Masked Unsupervised Self-training for Zero-shot Image Classification.
CoRR, 2022

BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation.
Proceedings of the International Conference on Machine Learning, 2022

Plug-and-Play VQA: Zero-shot VQA by Conjoining Large Pretrained Models with Zero Training.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022

Open Vocabulary Object Detection with Pseudo Bounding-Box Labels.
Proceedings of the Computer Vision - ECCV 2022, 2022

Align and Prompt: Video-and-Language Pre-training with Entity Prompts.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021
Towards Open Vocabulary Object Detection without Human-provided Bounding Boxes.
CoRR, 2021

Cascaded Fast and Slow Models for Efficient Semantic Code Search.
CoRR, 2021

Align before Fuse: Vision and Language Representation Learning with Momentum Distillation.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Prototypical Contrastive Learning of Unsupervised Representations.
Proceedings of the 9th International Conference on Learning Representations, 2021

MoPro: Webly Supervised Learning with Momentum Prototypes.
Proceedings of the 9th International Conference on Learning Representations, 2021

Learning from Noisy Data with Robust Representation Learning.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

CoMatch: Semi-supervised Learning with Contrastive Graph Regularization.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

2020
Interact as You Intend: Intention-Driven Human-Object Interaction Detection.
IEEE Trans. Multim., 2020

Video Storytelling: Textual Summaries for Events.
IEEE Trans. Multim., 2020

Visual Social Relationship Recognition.
Int. J. Comput. Vis., 2020

Prototypical Contrastive Learning of Unsupervised Representations.
CoRR, 2020

Improving out-of-distribution generalization via multi-task self-supervised pretraining.
CoRR, 2020

Towards Noise-resistant Object Detection with Noisy Annotations.
CoRR, 2020

GradMix: Multi-source Transfer across Domains and Tasks.
Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2020

Weakly-Supervised Multi-Person Action Recognition in 360° Videos.
Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2020

DivideMix: Learning with Noisy Labels as Semi-supervised Learning.
Proceedings of the 8th International Conference on Learning Representations, 2020

Learning on the Fly: An RNN-Based Online Throughput Prediction Framework for UAV Communications.
Proceedings of the 2020 IEEE International Conference on Communications Workshops, 2020

The Devil Is in Classification: A Simple Framework for Long-Tail Instance Segmentation.
Proceedings of the Computer Vision - ECCV 2020, 2020

2019
A Multi-sensor Framework for Personal Presentation Analytics.
ACM Trans. Multim. Comput. Commun. Appl., 2019

Deep Reinforcement Learning in Soft Viscoelastic Actuator of Dielectric Elastomer.
IEEE Robotics Autom. Lett., 2019

LSTM-based multi-label video event detection.
Multim. Tools Appl., 2019

Classification Calibration for Long-tail Instance Segmentation.
CoRR, 2019

Self-supervised Representation Learning Using 360° Data.
Proceedings of the 27th ACM International Conference on Multimedia, 2019

Learning to Detect Human-Object Interactions With Knowledge.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Learning to Learn From Noisy Labeled Data.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018
Video Storytelling.
CoRR, 2018

Unsupervised Learning of View-invariant Action Representations.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

2017
Hierarchical & multimodal video captioning: Discovering and transferring multimodal knowledge for vision to language.
Comput. Vis. Image Underst., 2017

Attention Transfer from Web Images for Video Recognition.
Proceedings of the 2017 ACM on Multimedia Conference, 2017

Dual-Glance Model for Deciphering Social Relationships.
Proceedings of the IEEE International Conference on Computer Vision, 2017

2016
Demo Paper: PreSense - An Assistive Presentation Self-Quantification System.
Proceedings of the IEEE International Symposium on Multimedia, 2016

Multi-stream Deep Learning Framework for Automated Presentation Assessment.
Proceedings of the IEEE International Symposium on Multimedia, 2016


  Loading...