Yansong Tang

Orcid: 0000-0002-1534-4549

According to our database¹, Yansong Tang authored at least 83 papers between 2017 and 2024.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Bibliography

2024

StableSwap: Stable Face Swapping in a Shared and Controllable Latent Space.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2024

DOVE: Doodled vessel enhancement for photoacoustic angiography super resolution.

[BibT_eX]

[DOI]

Medical Image Anal., 2024

A Multitask Fourier Transformer Network for Seismic Source Characterization Estimation From a Single-Station Waveform.

[BibT_eX]

[DOI]

IEEE Geosci. Remote. Sens. Lett., 2024

Q-VLM: Post-training Quantization for Large Vision-Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

Fully Aligned Network for Referring Image Segmentation.

[BibT_eX]

[DOI]

Yong Liu

Ruihao Xu

Yansong Tang

CoRR, 2024

Coarse Correspondence Elicit 3D Spacetime Understanding in Multimodal Language Model.

[BibT_eX]

[DOI]

CoRR, 2024

Arena Learning: Build Data Flywheel for LLMs Post-training via Simulated Chatbot Arena.

[BibT_eX]

[DOI]

CoRR, 2024

RodinHD: High-Fidelity 3D Avatar Generation with Diffusion Models.

[BibT_eX]

[DOI]

CoRR, 2024

Hierarchical Memory for Long Video QA.

[BibT_eX]

[DOI]

CoRR, 2024

LIPE: Learning Personalized Identity Prior for Non-rigid Image Editing.

[BibT_eX]

[DOI]

CoRR, 2024

GeoLRM: Geometry-Aware Large Reconstruction Model for High-Quality 3D Gaussian Generation.

[BibT_eX]

[DOI]

CoRR, 2024

VoCo-LLaMA: Towards Vision Compression with Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

Localizing Events in Videos with Multimodal Queries.

[BibT_eX]

[DOI]

CoRR, 2024

Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams.

[BibT_eX]

[DOI]

CoRR, 2024

ManiCM: Real-time 3D Diffusion Policy via Consistency Model for Robotic Manipulation.

[BibT_eX]

[DOI]

CoRR, 2024

OpenCarbonEval: A Unified Carbon Emission Estimation Framework in Large-Scale AI Models.

[BibT_eX]

[DOI]

CoRR, 2024

GaussianCube: Structuring Gaussian Splatting using Optimal Transport for 3D Generative Modeling.

[BibT_eX]

[DOI]

CoRR, 2024

Learning Dual-Level Deformable Implicit Representation for Real-World Scale Arbitrary Super-Resolution.

[BibT_eX]

[DOI]

CoRR, 2024

1st Place Solution for 5th LSVOS Challenge: Referring Video Object Segmentation.

[BibT_eX]

[DOI]

CoRR, 2024

Language-Free Compositional Action Generation via Decoupling Refinement.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Post-training Quantization with Progressive Calibration and Activation Relaxing for Text-to-Image Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Plan, Posture and Go: Towards Open-Vocabulary Text-to-Motion Generation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

MotionLCM: Real-Time Controllable Motion Generation via Latent Consistency Model.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

FlowIE: Efficient Image Enhancement via Rectified Flow.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

DPMesh: Exploiting Diffusion Prior for Occluded Human Mesh Recovery.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Narrative Action Evaluation with Prompt-Guided Multimodal Interaction.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

PTM-VQA: Efficient Video Quality Assessment Leveraging Diverse PreTrained Models from the Wild.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Once for Both: Single Stage of Importance and Sparsity Search for Vision Transformer Compression.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Towards Accurate Post-Training Quantization for Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Universal Segmentation at Arbitrary Granularity with Language Instruction.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Open-Vocabulary Segmentation with Semantic-Assisted Calibration.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Segment and Caption Anything.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

MADTP: Multimodal Alignment-Guided Dynamic Token Pruning for Accelerating Vision-Language Transformer.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

CoSTA: End-to-End Comprehensive Space-Time Entanglement for Spatio-Temporal Video Grounding.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

Learning Multi-Scale Video-Text Correspondence for Weakly Supervised Temporal Article Gronding.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

Plan, Posture and Go: Towards Open-World Text-to-Motion Generation.

[BibT_eX]

[DOI]

CoRR, 2023

OccNeRF: Self-Supervised Multi-Camera Occupancy Prediction with Neural Radiance Fields.

[BibT_eX]

[DOI]

CoRR, 2023

ThinkBot: Embodied Instruction Following with Thought Chain Reasoning.

[BibT_eX]

[DOI]

CoRR, 2023

Fine-tuning vision foundation model for crack segmentation in civil infrastructures.

[BibT_eX]

[DOI]

CoRR, 2023

Lightweight Diffusion Models with Distillation-Based Block Neural Architecture Search.

[BibT_eX]

[DOI]

CoRR, 2023

Language-free Compositional Action Generation via Decoupling Refinement.

[BibT_eX]

[DOI]

CoRR, 2023

Efficient Text-Guided 3D-Aware Portrait Generation with Score Distillation Sampling on Distribution.

[BibT_eX]

[DOI]

CoRR, 2023

Towards Accurate Data-free Quantization for Diffusion Models.

[BibT_eX]

[DOI]

CoRR, 2023

Self-similarity-based super-resolution of photoacoustic angiography from hand-drawn doodles.

[BibT_eX]

[DOI]

CoRR, 2023

Efficient Meshy Neural Fields for Animatable Human Avatars.

[BibT_eX]

[DOI]

CoRR, 2023

SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

MCUFormer: Deploying Vision Tranformers on Microcontrollers with Limited Memory.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Ada-DQA: Adaptive Diverse Quality-aware Feature Acquisition for Video Quality Assessment.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

LUNA: Language as Continuing Anchors for Referring Expression Comprehension.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

HOI-aware Adaptive Network for Weakly-supervised Action Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023

GAIN: On the Generalization of Instructional Action Understanding.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Context-Aware Inpainter-Refiner for Skeleton-Based Human Motion Completion.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Image Processing, 2023

FineDance: A Fine-grained Choreography Dataset for 3D Full Body Dance Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Skip-Plan: Procedure Planning in Instructional Videos via Condensed Action Space Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Global Knowledge Calibration for Fast Open-Vocabulary Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Tem-adapter: Adapting Image-Text Pretraining for Video Question Answer.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

LOGO: A Long-Form Video Dataset for Group Action Quality Assessment.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

FLAG3D: A 3D Fitness Activity Dataset with Language Instruction.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Semantics-Aware Dynamic Localization and Refinement for Referring Image Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022

Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based Action Recognition.

[BibT_eX]

[DOI]

ACM Trans. Multim. Comput. Commun. Appl., 2022

VideoABC: A Real-World Video Dataset for Abductive Visual Reasoning.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2022

HorNet: Efficient High-Order Spatial Interactions with Recursive Gated Convolutions.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

OrdinalCLIP: Learning Rank Prompts for Language-Guided Ordinal Regression.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

ScalableViT: Rethinking the Context-Oriented Generalization of Vision Transformer.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision, 2022

Global Spectral Filter Memory Network for Video Object Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

Semantic-Aware Auto-Encoders for Self-supervised Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

BNV-Fusion: Dense 3D Reconstruction using Bi-level Neural Volume Fusion.

[BibT_eX]

[DOI]

Kejie Li

Yansong Tang

Victor Adrian Prisacariu

Philip H. S. Torr

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

LAVT: Language-Aware Vision Transformer for Referring Image Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

YouMVOS: An Actor-centric Multi-shot Video Object Segmentation Dataset.

[BibT_eX]

[DOI]

Anirudh Srinivasan Chakravarthy

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021

Comprehensive Instructional Video Analysis: The COIN Dataset and Performance Evaluation.

[BibT_eX]

[DOI]

Yansong Tang

Jiwen Lu

Jie Zhou

IEEE Trans. Pattern Anal. Mach. Intell., 2021

Unsupervised Embedding Learning from Uncertainty Momentum Modeling.

[BibT_eX]

[DOI]

CoRR, 2021

Breaking Shortcut: Exploring Fully Convolutional Cycle-Consistency for Video Correspondence Learning.

[BibT_eX]

[DOI]

CoRR, 2021

Hierarchical Interaction Network for Video Object Segmentation from Referring Expressions.

[BibT_eX]

[DOI]

Proceedings of the 32nd British Machine Vision Conference 2021, 2021

2020

Graph Interaction Networks for Relation Transfer in Human Activity Videos.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., 2020

Uncertainty-Aware Score Distribution Learning for Action Quality Assessment.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019

Learning Semantics-Preserving Attention and Contextual Interaction for Group Activity Recognition.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2019

Multi-Stream Deep Neural Networks for RGB-D Egocentric Action Recognition.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., 2019

COIN: A Large-Scale Dataset for Comprehensive Instructional Video Analysis.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018

Mining Semantics-Preserving Attention for Group Activity Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018

Deep Progressive Reinforcement Learning for Skeleton-Based Action Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

2017

Action recognition in RGB-D egocentric videos.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Image Processing, 2017

Yansong Tang

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...