Yunlong Tang

Orcid: 0000-0003-2796-1787

Affiliations:

University of Rochester, NY, USA
Tencent (China), Shenzhen, China (former)
Southern University of Science and Technology, Department of Computer Science and Engineering, Shenzhen, China (former)

According to our database¹, Yunlong Tang authored at least 16 papers between 2022 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Bibliography

2025

Generative AI for Cel-Animation: A Survey.

[BibT_eX]

[DOI]

CoRR, January, 2025

2024

Unveiling Visual Perception in Language Models: An Attention Head Analysis Approach.

[BibT_eX]

[DOI]

CoRR, 2024

VidComposition: Can MLLMs Analyze Compositions in Compiled Videos?

[BibT_eX]

[DOI]

CoRR, 2024

Scaling Concept With Text-Guided Diffusion Models.

[BibT_eX]

[DOI]

CoRR, 2024

MMCOMPOSITION: Revisiting the Compositionality of Pre-trained Vision-Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

AIM 2024 Challenge on Video Saliency Prediction: Methods and Results.

[BibT_eX]

[DOI]

CoRR, 2024

CaRDiff: Video Salient Object Ranking Chain of Thought Reasoning for Saliency Prediction with Diffusion.

[BibT_eX]

[DOI]

CoRR, 2024

Do More Details Always Introduce More Hallucinations in LVLM-based Image Captioning?

[BibT_eX]

[DOI]

CoRR, 2024

V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning.

[BibT_eX]

[DOI]

CoRR, 2024

AVicuna: Audio-Visual LLM with Interleaver and Context-Boundary Alignment for Temporal Referential Dialogue.

[BibT_eX]

[DOI]

CoRR, 2024

EAGLE: Egocentric AGgregated Language-video Engine.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

2023

Video Understanding with Large Language Models: A Survey.

[BibT_eX]

[DOI]

CoRR, 2023

LaunchpadGPT: Language Model as Music Visualization Designer on Launchpad.

[BibT_eX]

[DOI]

Siting Xu

Yunlong Tang

Feng Zheng

CoRR, 2023

LLMVA-GEBC: Large Language Model with Video Adapter for Generic Event Boundary Captioning.

[BibT_eX]

[DOI]

CoRR, 2023

Caption Anything: Interactive Image Description with Diverse Multimodal Controls.

[BibT_eX]

[DOI]

CoRR, 2023

2022

Multi-modal Segment Assemblage Network for Ad Video Editing with Importance-Coherence Reward.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ACCV 2022, 2022

Yunlong Tang

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...