Linjie Li

Orcid: 0000-0003-0867-8863

According to our database¹, Linjie Li authored at least 84 papers between 2016 and 2024.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2024

Multimodal Foundation Models: From Specialists to General-Purpose Assistants.

[BibT_eX]

[DOI]

Found. Trends Comput. Graph. Vis., 2024

An Iterative Resampling Deep Decoupling Domain Adaptation method for class-imbalance bearing fault diagnosis under variant working conditions.

[BibT_eX]

[DOI]

Expert Syst. Appl., 2024

EditRoom: LLM-parameterized Graph Diffusion for Composable 3D Room Layout Editing.

[BibT_eX]

[DOI]

CoRR, 2024

MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated Capabilities.

[BibT_eX]

[DOI]

CoRR, 2024

Certainly Uncertain: A Benchmark and Metric for Multimodal Epistemic and Aleatoric Awareness.

[BibT_eX]

[DOI]

Khyathi Raghavi Chandu

CoRR, 2024

VideoGUI: A Benchmark for GUI Automation from Instructional Videos.

[BibT_eX]

[DOI]

CoRR, 2024

MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos.

[BibT_eX]

[DOI]

CoRR, 2024

Motion Consistency Model: Accelerating Video Diffusion with Disentangled Motion-Appearance Distillation.

[BibT_eX]

[DOI]

CoRR, 2024

Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning.

[BibT_eX]

[DOI]

CoRR, 2024

List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs.

[BibT_eX]

[DOI]

CoRR, 2024

Entity6K: A Large Open-Domain Evaluation Dataset for Real-World Entity Recognition.

[BibT_eX]

[DOI]

CoRR, 2024

TaE: Task-aware Expandable Representation for Long Tail Class Incremental Learning.

[BibT_eX]

[DOI]

CoRR, 2024

COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training.

[BibT_eX]

[DOI]

CoRR, 2024

OpenLEAF: A Novel Benchmark for Open-Domain Interleaved Image-Text Generation.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Bring Metric Functions into Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024

MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

The Generative AI Paradox: "What It Can Create, It May Not Understand".

[BibT_eX]

[DOI]

Abhilasha Ravichander

Khyathi Raghavi Chandu

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Enhancing Human-to-Robot Skill Transfer: A Framework Integrating Movement and Variable Impedance Based on EMG.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Industrial Technology, 2024

IDOL: Unified Dual-Modal Latent Diffusion for Human-Centric Joint Video-Depth Generation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Idea2Img: Iterative Self-refinement with GPT-4V for Automatic Image Design and Generation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

MM-Narrator: Narrating Long-form Videos with Multimodal In-Context Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Disco: Disentangled Control for Realistic Human Dance Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

MMSum: A Dataset for Multimodal Summarization and Thumbnail Generation of Videos.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Diagnostic Benchmark and Iterative Inpainting for Layout-Guided Image Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023

Interfacing Foundation Models' Embeddings.

[BibT_eX]

[DOI]

CoRR, 2023

GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation.

[BibT_eX]

[DOI]

CoRR, 2023

MM-VID: Advancing Video Understanding with GPT-4V(ision).

[BibT_eX]

[DOI]

CoRR, 2023

DEsignBench: Exploring and Benchmarking DALL-E 3 for Imagining Visual Design.

[BibT_eX]

[DOI]

CoRR, 2023

Idea2Img: Iterative Self-Refinement with GPT-4V(ision) for Automatic Image Design and Generation.

[BibT_eX]

[DOI]

CoRR, 2023

OpenLEAF: Open-Domain Interleaved Image-Text Generation and Evaluation.

[BibT_eX]

[DOI]

CoRR, 2023

The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision).

[BibT_eX]

[DOI]

CoRR, 2023

Spatial-Frequency U-Net for Denoising Diffusion Probabilistic Models.

[BibT_eX]

[DOI]

CoRR, 2023

DisCo: Disentangled Control for Referring Human Dance Generation in Real World.

[BibT_eX]

[DOI]

CoRR, 2023

Aligning Large Multi-Modal Model with Robust Instruction Tuning.

[BibT_eX]

[DOI]

CoRR, 2023

MultiSum: A Dataset for Multimodal Summarization and Thumbnail Generation of Videos.

[BibT_eX]

[DOI]

CoRR, 2023

Segment Everything Everywhere All at Once.

[BibT_eX]

[DOI]

CoRR, 2023

NUWA-XL: Diffusion over Diffusion for eXtremely Long Video Generation.

[BibT_eX]

[DOI]

CoRR, 2023

MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action.

[BibT_eX]

[DOI]

CoRR, 2023

Segment Everything Everywhere All at Once.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Learning 3D Photography Videos via Self-supervised Diffusion on Single Images.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023

Equivariant Similarity for Vision-Language Foundation Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

An Empirical Study of Multimodal Model Merging.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Generalized Decoding for Pixel, Image, and Language.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

ReCo: Region-Controlled Text-to-Image Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Adaptive Human Matting for Dynamic Videos.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

LAVENDER: Unifying Video-Language Understanding as Masked Language Modeling.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

An Empirical Study of End-to-End Video-Language Transformers with Masked Visual Modeling.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

NUWA-XL: Diffusion over Diffusion for eXtremely Long Video Generation.

[BibT_eX]

[DOI]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

2022

Global Profiling of 2-hydroxyisobutyrylome in Common Wheat.

[BibT_eX]

[DOI]

Genom. Proteom. Bioinform., August, 2022

GIT: A Generative Image-to-text Transformer for Vision and Language.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2022

Vision-Language Pre-Training: Basics, Recent Advances, and Future Trends.

[BibT_eX]

[DOI]

Found. Trends Comput. Graph. Vis., 2022

Cross-modal Representation Learning for Zero-shot Action Recognition.

[BibT_eX]

[DOI]

CoRR, 2022

Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Multiple Z-Complementary Code Sets With Low Inter-Set Cross-Correlation.

[BibT_eX]

[DOI]

Proceedings of the 10th International Workshop on Signal Design and Its Applications in Communications, 2022

Crossmodal Representation Learning for Zero-shot Action Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

PREVAIL: Pre-trained Variational Adversarial Active Learning for Molecular Property Prediction.

[BibT_eX]

[DOI]

Proceedings of the 8th IEEE International Conference on Cloud Computing and Intelligent Systems, 2022

Playing Lottery Tickets with Vision and Language.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021

MLP Architectures for Vision-and-Language Modeling: An Empirical Study.

[BibT_eX]

[DOI]

CoRR, 2021

VIOLET : End-to-End Video-Language Transformers with Masked Visual-token Modeling.

[BibT_eX]

[DOI]

CoRR, 2021

Playing Lottery Tickets with Vision and Language.

[BibT_eX]

[DOI]

CoRR, 2021

Meta Module Network for Compositional Visual Reasoning.

[BibT_eX]

[DOI]

Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2021

VALUE: A Multi-Task Benchmark for Video-and-Language Understanding Evaluation.

[BibT_eX]

[DOI]

Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, 2021

LightningDOT: Pre-training Visual-Semantic Embeddings for Real-Time Image-Text Retrieval.

[BibT_eX]

[DOI]

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

Adversarial VQA: A New Benchmark for Evaluating the Robustness of VQA Models.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

UC2: Universal Cross-Lingual Cross-Modal Vision-and-Language Pre-Training.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Less Is More: ClipBERT for Video-and-Language Learning via Sparse Sampling.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020

A Fault Diagnostic Scheme Based on Capsule Network for Rolling Bearing under Different Rotational Speeds.

[BibT_eX]

[DOI]

Linjie Li

Mian Zhang

Kesheng Wang

Sensors, 2020

A Closer Look at the Robustness of Vision-and-Language Pre-trained Models.

[BibT_eX]

[DOI]

Linjie Li

Zhe Gan

Jingjing Liu

CoRR, 2020

Large-Scale Adversarial Training for Vision-and-Language Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Graph Optimal Transport for Cross-Domain Alignment.

[BibT_eX]

[DOI]

Proceedings of the 37th International Conference on Machine Learning, 2020

HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training.

[BibT_eX]

[DOI]

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

UNITER: UNiversal Image-TExt Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

Analysis of Vibration Characteristics of Rolling Linear Guides.

[BibT_eX]

[DOI]

Proceedings of the AIAM2020: 2nd International Conference on Artificial Intelligence and Advanced Manufacture, 2020

2019

UNITER: Learning UNiversal Image-TExt Representations.

[BibT_eX]

[DOI]

CoRR, 2019

Configuration Design and Simulation of Novel Petal Tooth Nutation Joint Drive for Robot.

[BibT_eX]

[DOI]

Proceedings of the Intelligent Robotics and Applications - 12th International Conference, 2019

Relation-Aware Graph Attention Network for Visual Question Answering.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Multi-step Reasoning via Recurrent Dual Attention for Visual Dialog.

[BibT_eX]

[DOI]

Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019

2017

Learning to see people like people.

[BibT_eX]

[DOI]

CoRR, 2017

Learning to See People like People: Predicting Social Perceptions of Faces.

[BibT_eX]

[DOI]

Proceedings of the 39th Annual Meeting of the Cognitive Science Society, 2017

2016

Understanding human facial attractiveness from multiple views.

[BibT_eX]

[DOI]

Proceedings of the 38th Annual Meeting of the Cognitive Science Society, 2016

Extracting Human Face Similarity Judgments: Pairs or Triplets?

[BibT_eX]

[DOI]

Proceedings of the 38th Annual Meeting of the Cognitive Science Society, 2016

Linjie Li

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...