We stand with Ukraine

We stand with Ukraine

Junnan Li

Orcid: 0000-0002-1405-2034

Affiliations:

National University of Singapore, Graduate School for Integrative Sciences and Engineering, Singapore
Salesforce Research Asia, Singapore

According to our database¹, Junnan Li authored at least 56 papers between 2016 and 2024.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

Online presence:

on orcid.org
on ieeexplore.ieee.org

On csauthors.net:

Bibliography

2024

What Are We Measuring When We Evaluate Large Vision-Language Models? An Analysis of Latent Factors and Biases.

[BibT_eX]

[DOI]

Anthony Meng Huat Tiong

,

,

,

,

Steven C. H. Hoi

,

Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024

X-InstructBLIP: A Framework for Aligning Image, 3D, Audio, Video to LLMs and its Emergent Cross-Modal Reasoning.

[BibT_eX]

[DOI]

Artemis Panagopoulou

,

,

,

,

,

,

,

Silvio Savarese

,

,

Juan Carlos Niebles

Proceedings of the Computer Vision - ECCV 2024, 2024

ULIP-2: Towards Scalable Multimodal Pre-Training for 3D Understanding.

[BibT_eX]

[DOI]

,

,

,

Artemis Panagopoulou

,

,

Roberto Martín-Martín

,

,

,

,

Juan Carlos Niebles

,

Silvio Savarese

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023

Improving Tail-Class Representation with Centroid Contrastive Learning.

[BibT_eX]

[DOI]

Anthony Meng Huat Tiong

,

,

,

,

,

Steven C. H. Hoi

Pattern Recognit. Lett., April, 2023

X-InstructBLIP: A Framework for aligning X-Modal instruction-aware representations to LLMs and Emergent Cross-modal Reasoning.

[BibT_eX]

[DOI]

Artemis Panagopoulou

,

,

,

,

,

,

,

Silvio Savarese

,

,

Juan Carlos Niebles

CoRR, 2023

CodeTF: One-stop Transformer Library for State-of-the-art Code LLM.

[BibT_eX]

[DOI]

,

,

,

,

Akhilesh Deepak Gotmare

,

Steven C. H. Hoi

CoRR, 2023

ULIP-2: Towards Scalable Multimodal Pre-training for 3D Understanding.

[BibT_eX]

[DOI]

,

,

,

,

Roberto Martín-Martín

,

,

,

,

Juan Carlos Niebles

,

Silvio Savarese

CoRR, 2023

Efficient Text-to-Code Retrieval with Cascaded Fast and Slow Transformer Models.

[BibT_eX]

[DOI]

Akhilesh Deepak Gotmare

,

,

,

Steven C. H. Hoi

Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2023

BLIP-Diffusion: Pre-trained Subject Representation for Controllable Text-to-Image Generation and Editing.

[BibT_eX]

[DOI]

,

,

Steven C. H. Hoi

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning.

[BibT_eX]

[DOI]

,

,

,

Anthony Meng Huat Tiong

,

,

,

,

,

Steven C. H. Hoi

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models.

[BibT_eX]

[DOI]

,

,

Silvio Savarese

,

Steven C. H. Hoi

Proceedings of the International Conference on Machine Learning, 2023

Masked Unsupervised Self-training for Label-free Image Classification.

[BibT_eX]

[DOI]

,

Silvio Savarese

,

Steven C. H. Hoi

Proceedings of the Eleventh International Conference on Learning Representations, 2023

CodeT5+: Open Code Large Language Models for Code Understanding and Generation.

[BibT_eX]

[DOI]

,

,

Akhilesh Gotmare

,

,

,

Steven C. H. Hoi

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

From Images to Textual Prompts: Zero-shot Visual Question Answering with Frozen Large Language Models.

[BibT_eX]

[DOI]

,

,

,

Anthony Meng Huat Tiong

,

,

,

Steven C. H. Hoi

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

LAVIS: A One-stop Library for Language-Vision Intelligence.

[BibT_eX]

[DOI]

,

,

,

,

Silvio Savarese

,

Steven C. H. Hoi

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 2023

Tackling Data Heterogeneity in Federated Learning with Class Prototypes.

[BibT_eX]

[DOI]

,

,

,

Shelby Heinecke

,

,

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022

From Images to Textual Prompts: Zero-shot VQA with Frozen Large Language Models.

[BibT_eX]

[DOI]

,

,

,

Anthony Meng Huat Tiong

,

,

,

Steven C. H. Hoi

CoRR, 2022

BotSIM: An End-to-End Bot Simulation Toolkit for Commercial Task-Oriented Dialog Systems.

[BibT_eX]

[DOI]

,

,

,

Steven C. H. Hoi

CoRR, 2022

LAVIS: A Library for Language-Vision Intelligence.

[BibT_eX]

[DOI]

,

,

,

,

Silvio Savarese

,

Steven C. H. Hoi

CoRR, 2022

Masked Unsupervised Self-training for Zero-shot Image Classification.

[BibT_eX]

[DOI]

,

Silvio Savarese

,

Steven C. H. Hoi

CoRR, 2022

BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation.

[BibT_eX]

[DOI]

,

,

,

Steven C. H. Hoi

Proceedings of the International Conference on Machine Learning, 2022

Plug-and-Play VQA: Zero-shot VQA by Conjoining Large Pretrained Models with Zero Training.

[BibT_eX]

[DOI]

Anthony Meng Huat Tiong

,

,

,

Silvio Savarese

,

Steven C. H. Hoi

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022

Open Vocabulary Object Detection with Pseudo Bounding-Box Labels.

[BibT_eX]

[DOI]

,

,

Juan Carlos Niebles

,

,

,

,

Proceedings of the Computer Vision - ECCV 2022, 2022

Align and Prompt: Video-and-Language Pre-training with Entity Prompts.

[BibT_eX]

[DOI]

,

,

,

Juan Carlos Niebles

,

Steven C. H. Hoi

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021

Towards Open Vocabulary Object Detection without Human-provided Bounding Boxes.

[BibT_eX]

[DOI]

,

,

Juan Carlos Niebles

,

,

,

,

CoRR, 2021

Cascaded Fast and Slow Models for Efficient Semantic Code Search.

[BibT_eX]

[DOI]

Akhilesh Deepak Gotmare

,

,

,

Steven C. H. Hoi

CoRR, 2021

Align before Fuse: Vision and Language Representation Learning with Momentum Distillation.

[BibT_eX]

[DOI]

,

Ramprasaath R. Selvaraju

,

Akhilesh Gotmare

,

,

,

Steven Chu-Hong Hoi

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Prototypical Contrastive Learning of Unsupervised Representations.

[BibT_eX]

[DOI]

,

,

,

Steven C. H. Hoi

Proceedings of the 9th International Conference on Learning Representations, 2021

MoPro: Webly Supervised Learning with Momentum Prototypes.

[BibT_eX]

[DOI]

,

,

Steven C. H. Hoi

Proceedings of the 9th International Conference on Learning Representations, 2021

Learning from Noisy Data with Robust Representation Learning.

[BibT_eX]

[DOI]

,

,

Steven C. H. Hoi

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

CoMatch: Semi-supervised Learning with Contrastive Graph Regularization.

[BibT_eX]

[DOI]

,

,

Steven C. H. Hoi

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

2020

Interact as You Intend: Intention-Driven Human-Object Interaction Detection.

[BibT_eX]

[DOI]

,

,

,

,

Mohan S. Kankanhalli

IEEE Trans. Multim., 2020

Video Storytelling: Textual Summaries for Events.

[BibT_eX]

[DOI]

,

,

,

Mohan S. Kankanhalli

IEEE Trans. Multim., 2020

Visual Social Relationship Recognition.

[BibT_eX]

[DOI]

,

,

,

Mohan S. Kankanhalli

Int. J. Comput. Vis., 2020

Prototypical Contrastive Learning of Unsupervised Representations.

[BibT_eX]

[DOI]

,

,

,

,

Steven C. H. Hoi

CoRR, 2020

Improving out-of-distribution generalization via multi-task self-supervised pretraining.

[BibT_eX]

[DOI]

Isabela Albuquerque

,

,

,

Nitish Shirish Keskar

,

CoRR, 2020

Towards Noise-resistant Object Detection with Noisy Annotations.

[BibT_eX]

[DOI]

,

,

,

Steven C. H. Hoi

CoRR, 2020

GradMix: Multi-source Transfer across Domains and Tasks.

[BibT_eX]

[DOI]

,

,

,

,

Mohan S. Kankanhalli

Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2020

Weakly-Supervised Multi-Person Action Recognition in 360° Videos.

[BibT_eX]

[DOI]

,

,

,

Shoji Nishimura

,

Mohan S. Kankanhalli

Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2020

DivideMix: Learning with Noisy Labels as Semi-supervised Learning.

[BibT_eX]

[DOI]

,

,

Steven C. H. Hoi

Proceedings of the 8th International Conference on Learning Representations, 2020

Learning on the Fly: An RNN-Based Online Throughput Prediction Framework for UAV Communications.

[BibT_eX]

[DOI]

,

,

,

Hiroshi Yoshida

,

Proceedings of the 2020 IEEE International Conference on Communications Workshops, 2020

The Devil Is in Classification: A Simple Framework for Long-Tail Instance Segmentation.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Steven C. H. Hoi

,

Proceedings of the Computer Vision - ECCV 2020, 2020

2019

A Multi-sensor Framework for Personal Presentation Analytics.

[BibT_eX]

[DOI]

,

,

,

Mohan S. Kankanhalli

ACM Trans. Multim. Comput. Commun. Appl., 2019

Deep Reinforcement Learning in Soft Viscoelastic Actuator of Dielectric Elastomer.

[BibT_eX]

[DOI]

,

,

,

,

Mohan S. Kankanhalli

,

IEEE Robotics Autom. Lett., 2019

LSTM-based multi-label video event detection.

[BibT_eX]

[DOI]

,

,

,

,

,

Mohan S. Kankanhalli

Multim. Tools Appl., 2019

Classification Calibration for Long-tail Instance Segmentation.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Steven C. H. Hoi

,

CoRR, 2019

Self-supervised Representation Learning Using 360° Data.

[BibT_eX]

[DOI]

,

,

,

Shoji Nishimura

,

Mohan S. Kankanhalli

Proceedings of the 27th ACM International Conference on Multimedia, 2019

Learning to Detect Human-Object Interactions With Knowledge.

[BibT_eX]

[DOI]

,

,

,

,

Mohan S. Kankanhalli

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Learning to Learn From Noisy Labeled Data.

[BibT_eX]

[DOI]

,

,

,

Mohan S. Kankanhalli

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018

Video Storytelling.

[BibT_eX]

[DOI]

,

,

,

Mohan S. Kankanhalli

CoRR, 2018

Unsupervised Learning of View-invariant Action Representations.

[BibT_eX]

[DOI]

,

,

,

Mohan S. Kankanhalli

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

2017

Hierarchical & multimodal video captioning: Discovering and transferring multimodal knowledge for vision to language.

[BibT_eX]

[DOI]

,

,

,

,

,

Mohan S. Kankanhalli

Comput. Vis. Image Underst., 2017

Attention Transfer from Web Images for Video Recognition.

[BibT_eX]

[DOI]

,

,

,

Mohan S. Kankanhalli

Proceedings of the 2017 ACM on Multimedia Conference, 2017

Dual-Glance Model for Deciphering Social Relationships.

[BibT_eX]

[DOI]

,

,

,

Mohan S. Kankanhalli

Proceedings of the IEEE International Conference on Computer Vision, 2017

2016

Demo Paper: PreSense - An Assistive Presentation Self-Quantification System.

[BibT_eX]

[DOI]

,

,

Mohan S. Kankanhalli

Proceedings of the IEEE International Symposium on Multimedia, 2016

Multi-stream Deep Learning Framework for Automated Presentation Assessment.

[BibT_eX]

[DOI]

,

,

Mohan S. Kankanhalli

Proceedings of the IEEE International Symposium on Multimedia, 2016

Loading...