Xiaoyi Dong
Orcid: 0000-0002-4654-835X
According to our database1,
Xiaoyi Dong
authored at least 72 papers
between 2008 and 2025.
Collaborative distances:
Collaborative distances:
Timeline
2008
2010
2012
2014
2016
2018
2020
2022
2024
0
5
10
15
20
25
30
35
3
24
5
3
4
11
6
7
1
4
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
On csauthors.net:
Bibliography
2025
CoRR, January, 2025
BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning.
CoRR, January, 2025
Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction.
CoRR, January, 2025
2024
IEEE Trans. Multim., 2024
IEEE Trans. Image Process., 2024
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions.
CoRR, 2024
X-Prompt: Towards Universal In-Context Image Generation in Auto-Regressive Vision Language Foundation Models.
CoRR, 2024
MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models.
CoRR, 2024
PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction.
CoRR, 2024
SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree.
CoRR, 2024
Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate.
CoRR, 2024
CoRR, 2024
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output.
CoRR, 2024
V3Det Challenge 2024 on Vast Vocabulary and Open Vocabulary Object Detection: Methods and Results.
CoRR, 2024
CoRR, 2024
CoRR, 2024
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites.
CoRR, 2024
CoRR, 2024
SongComposer: A Large Language Model for Lyric and Melody Composition in Song Generation.
CoRR, 2024
DualFocus: Integrating Macro and Micro Perspectives in Multi-modal Large Language Models.
CoRR, 2024
InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model.
CoRR, 2024
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
MMLONGBENCH-DOC: Benchmarking Long-context Document Understanding with Visualizations.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024
Proceedings of the Computer Vision - ECCV 2024, 2024
Proceedings of the Computer Vision - ECCV 2024, 2024
OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024
2023
Feature Fusion Based Adversarial Example Detection Against Second-Round Adversarial Attacks.
IEEE Trans. Artif. Intell., October, 2023
Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization.
CoRR, 2023
CoRR, 2023
InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition.
CoRR, 2023
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023
Improving Adversarial Robustness of Masked Autoencoders via Test-time Frequency-domain Prompting.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023
2022
CLIP Itself is a Strong Fine-tuner: Achieving 85.7% and 88.0% Top-1 Accuracy with ViT-B and ViT-L on ImageNet.
CoRR, 2022
CoRR, 2022
Proceedings of the Computer Vision - ECCV 2022, 2022
Proceedings of the Computer Vision - ECCV 2022 Workshops, 2022
Proceedings of the Computer Vision - ECCV 2022, 2022
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022
2021
IEEE Trans. Circuits Syst. Video Technol., 2021
J. Vis. Commun. Image Represent., 2021
Neurocomputing, 2021
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021
2020
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020
LG-GAN: Label Guided Adversarial Network for Flexible Targeted Attack of Point Cloud Based Deep Networks.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020
2019
Once a MAN: Towards Multi-Target Attack via Learning Multi-Target Adversarial Network Once.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019
2018
CAAD 2018: Powerful None-Access Black-Box Attack Based on Adversarial Transformation Network.
CoRR, 2018
2008
Proceedings of the 2008 International Conference on Advanced Infocomm Technology, 2008