Xiaoyi Dong

Orcid: 0000-0002-4654-835X

According to our database1, Xiaoyi Dong authored at least 61 papers between 2008 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
PersonMAE: Person Re-Identification Pre-Training With Masked AutoEncoders.
IEEE Trans. Multim., 2024

PointCAT: Contrastive Adversarial Training for Robust Point Cloud Recognition.
IEEE Trans. Image Process., 2024

Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate.
CoRR, 2024

BroadWay: Boost Your Text-to-Video Generation Model in a Training-free Way.
CoRR, 2024

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output.
CoRR, 2024

MMLongBench-Doc: Benchmarking Long-context Document Understanding with Visualizations.
CoRR, 2024

MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs.
CoRR, 2024

V3Det Challenge 2024 on Vast Vocabulary and Open Vocabulary Object Detection: Methods and Results.
CoRR, 2024

MotionClone: Training-Free Motion Cloning for Controllable Video Generation.
CoRR, 2024

ShareGPT4Video: Improving Video Understanding and Generation with Better Captions.
CoRR, 2024

Bootstrap3D: Improving 3D Content Creation with Synthetic Data.
CoRR, 2024

Streaming Long Video Understanding with Large Language Models.
CoRR, 2024

ReasonPix2Pix: Instruction Reasoning Dataset for Advanced Image Editing.
CoRR, 2024

How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites.
CoRR, 2024

Unified Scene Representation and Reconstruction for 3D Large Language Models.
CoRR, 2024

InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD.
CoRR, 2024

Are We on the Right Way for Evaluating Large Vision-Language Models?
CoRR, 2024

InternLM2 Technical Report.
CoRR, 2024

RAR: Retrieving And Ranking Augmented MLLMs for Visual Recognition.
CoRR, 2024

SongComposer: A Large Language Model for Lyric and Melody Composition in Song Generation.
CoRR, 2024

DualFocus: Integrating Macro and Micro Perspectives in Multi-modal Large Language Models.
CoRR, 2024

InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model.
CoRR, 2024

VLMEvalKit: An Open-Source ToolKit for Evaluating Large Multi-Modality Models.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Long-CLIP: Unlocking the Long-Text Capability of CLIP.
Proceedings of the Computer Vision - ECCV 2024, 2024

OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

VIGC: Visual Instruction Generation and Correction.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Feature Fusion Based Adversarial Example Detection Against Second-Round Adversarial Attacks.
IEEE Trans. Artif. Intell., October, 2023

Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization.
CoRR, 2023

ShareGPT4V: Improving Large Multi-Modal Models with Better Captions.
CoRR, 2023

Emotional Listener Portrait: Neural Listener Head Generation with Emotion.
CoRR, 2023

InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition.
CoRR, 2023

MLLM-DataEngine: An Iterative Refinement Approach for MLLM.
CoRR, 2023

Emotional Listener Portrait: Realistic Listener Motion Simulation in Conversation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Improving Adversarial Robustness of Masked Autoencoders via Test-time Frequency-domain Prompting.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

NTIRE 2023 Image Shadow Removal Challenge Report.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Diversity-Aware Meta Visual Prompting.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

MaskCLIP: Masked Self-Distillation Advances Contrastive Language-Image Pretraining.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

PeCo: Perceptual Codebook for BERT Pre-training of Vision Transformers.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
CLIP Itself is a Strong Fine-tuner: Achieving 85.7% and 88.0% Top-1 Accuracy with ViT-B and ViT-L on ImageNet.
CoRR, 2022

PointCAT: Contrastive Adversarial Training for Robust Point Cloud Recognition.
CoRR, 2022

Protecting Celebrities with Identity Consistency Transformer.
CoRR, 2022

Adaptive Face Forgery Detection in Cross Domain.
Proceedings of the Computer Vision - ECCV 2022, 2022

RISPNet: A Network for Reversed Image Signal Processing.
Proceedings of the Computer Vision - ECCV 2022 Workshops, 2022

Bootstrapped Masked Autoencoders for Vision BERT Pretraining.
Proceedings of the Computer Vision - ECCV 2022, 2022

Shape-invariant 3D Adversarial Point Clouds.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Protecting Celebrities from DeepFake with Identity Consistency Transformer.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Mobile-Former: Bridging MobileNet and Transformer.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021
Local Geometric Distortions Resilient Watermarking Scheme Based on Symmetry.
IEEE Trans. Circuits Syst. Video Technol., 2021

Adversarial steganography based on sparse cover enhancement.
J. Vis. Commun. Image Represent., 2021

Adversarial defense via self-orthogonal randomization super-network.
Neurocomputing, 2021

PeCo: Perceptual Codebook for BERT Pre-training of Vision Transformers.
CoRR, 2021

TACR-Net: Editing on Deep Video and Voice Portraits.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

2020
Identity-Driven DeepFake Detection.
CoRR, 2020

GreedyFool: Distortion-Aware Sparse Adversarial Attack.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

LG-GAN: Label Guided Adversarial Network for Flexible Targeted Attack of Point Cloud Based Deep Networks.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Robust Superpixel-Guided Attentional Adversarial Attack.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Self-Robust 3D Point Recognition via Gather-Vector Guidance.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019
Once a MAN: Towards Multi-Target Attack via Learning Multi-Target Adversarial Network Once.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

2018
CAAD 2018: Powerful None-Access Black-Box Attack Based on Adversarial Transformation Network.
CoRR, 2018

2008
Microstructured optical fiber Bragg gratings and their applications.
Proceedings of the 2008 International Conference on Advanced Infocomm Technology, 2008


  Loading...