Qi Dai

Orcid: 0000-0002-4693-2968

Affiliations:

Microsoft Research Asia, Beijing, China
Fudan University, School of Computer Science, Shanghai, China (PhD 2017)

According to our database¹, Qi Dai authored at least 63 papers between 2012 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Bibliography

2025

A Survey on Video Diffusion Models.

[BibT_eX]

[DOI]

ACM Comput. Surv., February, 2025

2024

UCDR-Adapter: Exploring Adaptation of Pre-Trained Vision-Language Models for Universal Cross-Domain Retrieval.

[BibT_eX]

[DOI]

CoRR, 2024

MageBench: Bridging Large Multimodal Models to Agents.

[BibT_eX]

[DOI]

CoRR, 2024

StableAnimator: High-Quality Identity-Preserving Human Image Animation.

[BibT_eX]

[DOI]

CoRR, 2024

REDUCIO! Generating 1024⨉1024 Video within 16 Seconds using Extremely Compressed Motion Latents.

[BibT_eX]

[DOI]

CoRR, 2024

AID: Adapting Image2Video Diffusion Models for Instruction-guided Video Prediction.

[BibT_eX]

[DOI]

CoRR, 2024

MotionFollower: Editing Video Motion via Lightweight Score-Guided Diffusion.

[BibT_eX]

[DOI]

CoRR, 2024

Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks and Algorithms.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Human-Aware Vision-and-Language Navigation: Bridging Simulation to Reality with Dynamic Human Interactions.

[BibT_eX]

[DOI]

Alexander G. Hauptmann

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

SimDA: Simple Diffusion Adapter for Efficient Video Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

ART•V: Auto-Regressive Text-to-Video Generation with Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

MotionEditor: Editing Video Motion via Content-Aware Diffusion.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

BlockGCN: Redefine Topology Awareness for Skeleton-Based Action Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023

Deep Uncoupled Discrete Hashing via Similarity Matrix Decomposition.

[BibT_eX]

[DOI]

ACM Trans. Multim. Comput. Commun. Appl., January, 2023

VIDiff: Translating Videos via Multi-Modal Instructions with Diffusion Models.

[BibT_eX]

[DOI]

CoRR, 2023

ART·V: Auto-Regressive Text-to-Video Generation with Diffusion Models.

[BibT_eX]

[DOI]

CoRR, 2023

ChartReader: A Unified Framework for Chart Derendering and Comprehension without Heuristic Rules.

[BibT_eX]

[DOI]

Alexander G. Hauptmann

CoRR, 2023

All in Tokens: Unifying Output Space of Visual Tasks via Soft Token.

[BibT_eX]

[DOI]

CoRR, 2023

HiViT: A Simpler and More Efficient Design of Hierarchical Vision Transformer.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Implicit Temporal Modeling with Learnable Alignment for Video Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

All in Tokens: Unifying Output Space of Visual Tasks via Soft Token.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

ChartReader: A Unified Framework for Chart Derendering and Comprehension without Heuristic Rules.

[BibT_eX]

[DOI]

Zhi-Qi Cheng

Qi Dai

Alexander G. Hauptmann

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Parallel Sentence-Level Explanation Generation for Real-World Low-Resource Scenarios.

[BibT_eX]

[DOI]

Yan Liu

Xiaokang Chen

Qi Dai

Proceedings of the IEEE International Conference on Acoustics, 2023

SVFormer: Semi-supervised Video Transformer for Action Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

On Data Scaling in Masked Image Modeling.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

ResFormer: Scaling ViTs with Multi-Resolution Training.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

HiViT: Hierarchical Vision Transformer Meets Masked Image Modeling.

[BibT_eX]

[DOI]

CoRR, 2022

Deeper Insights into ViTs Robustness towards Common Corruptions.

[BibT_eX]

[DOI]

CoRR, 2022

GSRFormer: Grounded Situation Recognition Transformer with Alternate Semantic Attention Refinement.

[BibT_eX]

[DOI]

Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

On the Connection between Local Attention and Dynamic Depth-wise Convolution.

[BibT_eX]

[DOI]

Proceedings of the Tenth International Conference on Learning Representations, 2022

SimMIM: a Simple Framework for Masked Image Modeling.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Rethinking Spatial Invariance of Convolutional Networks for Object Counting.

[BibT_eX]

[DOI]

Alexander G. Hauptmann

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

MPII: Multi-Level Mutual Promotion for Inference and Interpretation.

[BibT_eX]

[DOI]

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

2021

Reinforced Short-Length Hashing.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., 2021

A novel class restriction loss for unsupervised domain adaptation.

[BibT_eX]

[DOI]

Neurocomputing, 2021

Cross-Modal Attention Consistency for Video-Audio Unsupervised Learning.

[BibT_eX]

[DOI]

CoRR, 2021

Demystifying Local Vision Transformer: Sparse Connectivity, Weight Sharing, and Dynamic Weight.

[BibT_eX]

[DOI]

CoRR, 2021

Self-Supervised Learning with Swin Transformers.

[BibT_eX]

[DOI]

CoRR, 2021

Temporal Action Detection with Multi-level Supervision.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

2020

Reinforcing Short-Length Hashing.

[BibT_eX]

[DOI]

CoRR, 2020

Informative Dropout for Robust Representation Learning: A Shape-bias Perspective.

[BibT_eX]

[DOI]

Proceedings of the 37th International Conference on Machine Learning, 2020

Weakly-Supervised Action Localization by Generative Attention Modeling.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019

Improving the Learning of Multi-column Convolutional Neural Network for Crowd Counting.

[BibT_eX]

[DOI]

Alexander G. Hauptmann

Proceedings of the 27th ACM International Conference on Multimedia, 2019

Decoupling Localization and Classification in Single Shot Temporal Action Detection.

[BibT_eX]

[DOI]

Yupan Huang

Qi Dai

Yutong Lu

Proceedings of the IEEE International Conference on Multimedia and Expo, 2019

Learning Spatial Awareness to Improve Crowd Counting.

[BibT_eX]

[DOI]

Alexander G. Hauptmann

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Deep Incremental Hashing Network for Efficient Image Retrieval.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018

Deep Domain Adaptation Hashing with Adversarial Learning.

[BibT_eX]

[DOI]

Proceedings of the 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, 2018

Recurrent Tubelet Proposal and Recognition Networks for Action Detection.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2018, 2018

2016

A Bayesian Hashing approach and its application to face recognition.

[BibT_eX]

[DOI]

Neurocomputing, 2016

Binary Optimized Hashing.

[BibT_eX]

[DOI]

Proceedings of the 2016 ACM Conference on Multimedia Conference, 2016

2015

Super Fast Event Recognition in Internet Videos.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2015

Human Action Recognition in Unconstrained Videos by Explicit Motion Modeling.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2015

Fudan-Huawei at MediaEval 2015: Detecting Violent Scenes and Affective Impact in Movies with Deep Learning.

[BibT_eX]

[DOI]

Proceedings of the Working Notes Proceedings of the MediaEval 2015 Workshop, 2015

Optimal Bayesian Hashing for Efficient Face Recognition.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015

2014

Fudan-NJUST at MediaEval 2014: Violent Scenes Detection Using Deep Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the Working Notes Proceedings of the MediaEval 2014 Workshop, 2014

Challenge Huawei challenge: Fusing multimodal features with deep neural networks for Mobile Video Annotation.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE International Conference on Multimedia and Expo Workshops, 2014

2013

Beauty is here: evaluating aesthetics in videos using multimodal features and free training data.

[BibT_eX]

[DOI]

Proceedings of the ACM Multimedia Conference, 2013

Fudan at MediaEval 2013: Violent Scenes Detection Using Motion Features and Part-Level Attributes.

[BibT_eX]

[DOI]

Proceedings of the MediaEval 2013 Multimedia Benchmark Workshop, 2013

2012

Fast Semantic Diffusion for Large-Scale Context-Based Image and Video Annotation.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2012

A fast video event recognition system and its application to video search.

[BibT_eX]

[DOI]

Proceedings of the 20th ACM Multimedia Conference, MM '12, Nara, Japan, October 29, 2012

The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Trajectory-based Features.

[BibT_eX]

[DOI]

Proceedings of the Working Notes Proceedings of the MediaEval 2012 Workshop, 2012

Trajectory-Based Modeling of Human Actions with Motion Reference Points.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2012, 2012

Qi Dai

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...