Linjie Li

Orcid: 0000-0003-0867-8863

According to our database1, Linjie Li authored at least 84 papers between 2016 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Multimodal Foundation Models: From Specialists to General-Purpose Assistants.
Found. Trends Comput. Graph. Vis., 2024

An Iterative Resampling Deep Decoupling Domain Adaptation method for class-imbalance bearing fault diagnosis under variant working conditions.
Expert Syst. Appl., 2024

EditRoom: LLM-parameterized Graph Diffusion for Composable 3D Room Layout Editing.
CoRR, 2024

MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models.
CoRR, 2024

MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated Capabilities.
CoRR, 2024

Certainly Uncertain: A Benchmark and Metric for Multimodal Epistemic and Aleatoric Awareness.
CoRR, 2024

VideoGUI: A Benchmark for GUI Automation from Instructional Videos.
CoRR, 2024

MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos.
CoRR, 2024

Motion Consistency Model: Accelerating Video Diffusion with Disentangled Motion-Appearance Distillation.
CoRR, 2024

Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning.
CoRR, 2024

List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs.
CoRR, 2024

Entity6K: A Large Open-Domain Evaluation Dataset for Real-World Entity Recognition.
CoRR, 2024

TaE: Task-aware Expandable Representation for Long Tail Class Incremental Learning.
CoRR, 2024

COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training.
CoRR, 2024

OpenLEAF: A Novel Benchmark for Open-Domain Interleaved Image-Text Generation.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Bring Metric Functions into Diffusion Models.
Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024

MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

The Generative AI Paradox: "What It Can Create, It May Not Understand".
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Enhancing Human-to-Robot Skill Transfer: A Framework Integrating Movement and Variable Impedance Based on EMG.
Proceedings of the IEEE International Conference on Industrial Technology, 2024

IDOL: Unified Dual-Modal Latent Diffusion for Human-Centric Joint Video-Depth Generation.
Proceedings of the Computer Vision - ECCV 2024, 2024

Idea2Img: Iterative Self-refinement with GPT-4V for Automatic Image Design and Generation.
Proceedings of the Computer Vision - ECCV 2024, 2024

MM-Narrator: Narrating Long-form Videos with Multimodal In-Context Learning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Disco: Disentangled Control for Realistic Human Dance Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

MMSum: A Dataset for Multimodal Summarization and Thumbnail Generation of Videos.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Diagnostic Benchmark and Iterative Inpainting for Layout-Guided Image Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023
Interfacing Foundation Models' Embeddings.
CoRR, 2023

GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation.
CoRR, 2023

MM-VID: Advancing Video Understanding with GPT-4V(ision).
CoRR, 2023

DEsignBench: Exploring and Benchmarking DALL-E 3 for Imagining Visual Design.
CoRR, 2023

Idea2Img: Iterative Self-Refinement with GPT-4V(ision) for Automatic Image Design and Generation.
CoRR, 2023

OpenLEAF: Open-Domain Interleaved Image-Text Generation and Evaluation.
CoRR, 2023

The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision).
CoRR, 2023

Spatial-Frequency U-Net for Denoising Diffusion Probabilistic Models.
CoRR, 2023

DisCo: Disentangled Control for Referring Human Dance Generation in Real World.
CoRR, 2023

Aligning Large Multi-Modal Model with Robust Instruction Tuning.
CoRR, 2023

MultiSum: A Dataset for Multimodal Summarization and Thumbnail Generation of Videos.
CoRR, 2023

Segment Everything Everywhere All at Once.
CoRR, 2023

NUWA-XL: Diffusion over Diffusion for eXtremely Long Video Generation.
CoRR, 2023

MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action.
CoRR, 2023

Segment Everything Everywhere All at Once.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Learning 3D Photography Videos via Self-supervised Diffusion on Single Images.
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023

Equivariant Similarity for Vision-Language Foundation Models.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

An Empirical Study of Multimodal Model Merging.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Generalized Decoding for Pixel, Image, and Language.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

ReCo: Region-Controlled Text-to-Image Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Adaptive Human Matting for Dynamic Videos.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

LAVENDER: Unifying Video-Language Understanding as Masked Language Modeling.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

An Empirical Study of End-to-End Video-Language Transformers with Masked Visual Modeling.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

NUWA-XL: Diffusion over Diffusion for eXtremely Long Video Generation.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

2022
Global Profiling of 2-hydroxyisobutyrylome in Common Wheat.
Genom. Proteom. Bioinform., August, 2022

GIT: A Generative Image-to-text Transformer for Vision and Language.
Trans. Mach. Learn. Res., 2022

Vision-Language Pre-Training: Basics, Recent Advances, and Future Trends.
Found. Trends Comput. Graph. Vis., 2022

Cross-modal Representation Learning for Zero-shot Action Recognition.
CoRR, 2022

Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Multiple Z-Complementary Code Sets With Low Inter-Set Cross-Correlation.
Proceedings of the 10th International Workshop on Signal Design and Its Applications in Communications, 2022

Crossmodal Representation Learning for Zero-shot Action Recognition.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

PREVAIL: Pre-trained Variational Adversarial Active Learning for Molecular Property Prediction.
Proceedings of the 8th IEEE International Conference on Cloud Computing and Intelligent Systems, 2022

Playing Lottery Tickets with Vision and Language.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021
MLP Architectures for Vision-and-Language Modeling: An Empirical Study.
CoRR, 2021

VIOLET : End-to-End Video-Language Transformers with Masked Visual-token Modeling.
CoRR, 2021

Playing Lottery Tickets with Vision and Language.
CoRR, 2021

Meta Module Network for Compositional Visual Reasoning.
Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2021

VALUE: A Multi-Task Benchmark for Video-and-Language Understanding Evaluation.
Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, 2021

LightningDOT: Pre-training Visual-Semantic Embeddings for Real-Time Image-Text Retrieval.
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

Adversarial VQA: A New Benchmark for Evaluating the Robustness of VQA Models.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

UC2: Universal Cross-Lingual Cross-Modal Vision-and-Language Pre-Training.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Less Is More: ClipBERT for Video-and-Language Learning via Sparse Sampling.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020
A Fault Diagnostic Scheme Based on Capsule Network for Rolling Bearing under Different Rotational Speeds.
Sensors, 2020

A Closer Look at the Robustness of Vision-and-Language Pre-trained Models.
CoRR, 2020

Large-Scale Adversarial Training for Vision-and-Language Representation Learning.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Graph Optimal Transport for Cross-Domain Alignment.
Proceedings of the 37th International Conference on Machine Learning, 2020

HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training.
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

UNITER: UNiversal Image-TExt Representation Learning.
Proceedings of the Computer Vision - ECCV 2020, 2020

Analysis of Vibration Characteristics of Rolling Linear Guides.
Proceedings of the AIAM2020: 2nd International Conference on Artificial Intelligence and Advanced Manufacture, 2020

2019
UNITER: Learning UNiversal Image-TExt Representations.
CoRR, 2019

Configuration Design and Simulation of Novel Petal Tooth Nutation Joint Drive for Robot.
Proceedings of the Intelligent Robotics and Applications - 12th International Conference, 2019

Relation-Aware Graph Attention Network for Visual Question Answering.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Multi-step Reasoning via Recurrent Dual Attention for Visual Dialog.
Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019

2017
Learning to see people like people.
CoRR, 2017

Learning to See People like People: Predicting Social Perceptions of Faces.
Proceedings of the 39th Annual Meeting of the Cognitive Science Society, 2017

2016
Understanding human facial attractiveness from multiple views.
Proceedings of the 38th Annual Meeting of the Cognitive Science Society, 2016

Extracting Human Face Similarity Judgments: Pairs or Triplets?
Proceedings of the 38th Annual Meeting of the Cognitive Science Society, 2016


  Loading...