2025
Model Reveals What to Cache: Profiling-Based Feature Reuse for Video Diffusion Models.
CoRR, April, 2025

Temporal Regularization Makes Your Video Generator Stronger.
CoRR, March, 2025

Niagara: Normal-Integrated Geometric Affine Field for Scene Reconstruction from a Single View.
CoRR, March, 2025

VideoMerge: Towards Training-free Long Video Generation.
CoRR, March, 2025

LightGen: Efficient Image Generation through Knowledge Distillation and Direct Preference Optimization.
CoRR, March, 2025

VFX Creator: Animated Visual Effect Generation with Controllable Diffusion Transformer.
CoRR, February, 2025

Encrypted Large Model Inference: The Equivariant Encryption Paradigm.
CoRR, February, 2025

2024
Next Patch Prediction for Autoregressive Visual Generation.
CoRR, 2024

VideoGen-of-Thought: A Collaborative Framework for Multi-Shot Video Generation.
CoRR, 2024

OmniCreator: Self-Supervised Unified Generation with Universal Editing.
CoRR, 2024

DreamDance: Animating Human Images by Enriching 3D Geometry Cues from 2D Poses.
CoRR, 2024

Meta-Learning for Speeding Up Large Model Inference in Decentralized Environments.
CoRR, 2024

Model Agnostic Hybrid Sharding For Heterogeneous Distributed Inference.
CoRR, 2024

Complete Security and Privacy for AI Inference in Decentralized Systems.
CoRR, 2024

AnyV2V: A Plug-and-Play Framework For Any Video-to-Video Editing Tasks.
CoRR, 2024

ConsistI2V: Enhancing Visual Consistency for Image-to-Video Generation.
CoRR, 2024

2023
Latent-Shift: Latent Diffusion with Temporal Shift for Efficient Text-to-Video Generation.
CoRR, 2023

Make-A-Video: Text-to-Video Generation without Text-Video Data.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

2022
RegMixup: Mixup as a Regularizer Can Surprisingly Improve Accuracy and Out Distribution Robustness.
CoRR, 2022

Using Mixup as a Regularizer Can Surprisingly Improve Accuracy & Out-of-Distribution Robustness.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

MUGEN: A Playground for Video-Audio-Text Multimodal Understanding and GENeration.
Proceedings of the Computer Vision - ECCV 2022, 2022

Long Video Generation with Time-Agnostic VQGAN and Time-Sensitive Transformer.
Proceedings of the Computer Vision - ECCV 2022, 2022

2021
Robustness and Generalization via Generative Adversarial Training.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

2019
Fine-grained Synthesis of Unrestricted Adversarial Examples.
CoRR, 2019

2014
Low-rank SIFT: An affine invariant feature for place recognition.
Proceedings of the 2014 IEEE International Conference on Image Processing, 2014