Qi Dai

Orcid: 0000-0002-4693-2968

Affiliations:
  • Microsoft Research Asia, Beijing, China
  • Fudan University, School of Computer Science, Shanghai, China (PhD 2017)


According to our database1, Qi Dai authored at least 59 papers between 2012 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Human-Aware Vision-and-Language Navigation: Bridging Simulation to Reality with Dynamic Human Interactions.
CoRR, 2024

Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks and Algorithms.
CoRR, 2024

AID: Adapting Image2Video Diffusion Models for Instruction-guided Video Prediction.
CoRR, 2024

MotionFollower: Editing Video Motion via Lightweight Score-Guided Diffusion.
CoRR, 2024

SimDA: Simple Diffusion Adapter for Efficient Video Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

ART•V: Auto-Regressive Text-to-Video Generation with Diffusion Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

MotionEditor: Editing Video Motion via Content-Aware Diffusion.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

BlockGCN: Redefine Topology Awareness for Skeleton-Based Action Recognition.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023
Deep Uncoupled Discrete Hashing via Similarity Matrix Decomposition.
ACM Trans. Multim. Comput. Commun. Appl., January, 2023

VIDiff: Translating Videos via Multi-Modal Instructions with Diffusion Models.
CoRR, 2023

ART·V: Auto-Regressive Text-to-Video Generation with Diffusion Models.
CoRR, 2023

A Survey on Video Diffusion Models.
CoRR, 2023

ChartReader: A Unified Framework for Chart Derendering and Comprehension without Heuristic Rules.
CoRR, 2023

All in Tokens: Unifying Output Space of Visual Tasks via Soft Token.
CoRR, 2023

HiViT: A Simpler and More Efficient Design of Hierarchical Vision Transformer.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Implicit Temporal Modeling with Learnable Alignment for Video Recognition.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

All in Tokens: Unifying Output Space of Visual Tasks via Soft Token.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

ChartReader: A Unified Framework for Chart Derendering and Comprehension without Heuristic Rules.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Parallel Sentence-Level Explanation Generation for Real-World Low-Resource Scenarios.
Proceedings of the IEEE International Conference on Acoustics, 2023

SVFormer: Semi-supervised Video Transformer for Action Recognition.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

On Data Scaling in Masked Image Modeling.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

ResFormer: Scaling ViTs with Multi-Resolution Training.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
HiViT: Hierarchical Vision Transformer Meets Masked Image Modeling.
CoRR, 2022

Deeper Insights into ViTs Robustness towards Common Corruptions.
CoRR, 2022

GSRFormer: Grounded Situation Recognition Transformer with Alternate Semantic Attention Refinement.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

On the Connection between Local Attention and Dynamic Depth-wise Convolution.
Proceedings of the Tenth International Conference on Learning Representations, 2022

SimMIM: a Simple Framework for Masked Image Modeling.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Rethinking Spatial Invariance of Convolutional Networks for Object Counting.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

MPII: Multi-Level Mutual Promotion for Inference and Interpretation.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

2021
Reinforced Short-Length Hashing.
IEEE Trans. Circuits Syst. Video Technol., 2021

A novel class restriction loss for unsupervised domain adaptation.
Neurocomputing, 2021

Cross-Modal Attention Consistency for Video-Audio Unsupervised Learning.
CoRR, 2021

Demystifying Local Vision Transformer: Sparse Connectivity, Weight Sharing, and Dynamic Weight.
CoRR, 2021

Self-Supervised Learning with Swin Transformers.
CoRR, 2021

Temporal Action Detection with Multi-level Supervision.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

2020
Reinforcing Short-Length Hashing.
CoRR, 2020

Informative Dropout for Robust Representation Learning: A Shape-bias Perspective.
Proceedings of the 37th International Conference on Machine Learning, 2020

Weakly-Supervised Action Localization by Generative Attention Modeling.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019
Improving the Learning of Multi-column Convolutional Neural Network for Crowd Counting.
Proceedings of the 27th ACM International Conference on Multimedia, 2019

Decoupling Localization and Classification in Single Shot Temporal Action Detection.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2019

Learning Spatial Awareness to Improve Crowd Counting.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Deep Incremental Hashing Network for Efficient Image Retrieval.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018
Deep Domain Adaptation Hashing with Adversarial Learning.
Proceedings of the 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, 2018

Recurrent Tubelet Proposal and Recognition Networks for Action Detection.
Proceedings of the Computer Vision - ECCV 2018, 2018

2016
A Bayesian Hashing approach and its application to face recognition.
Neurocomputing, 2016

Binary Optimized Hashing.
Proceedings of the 2016 ACM Conference on Multimedia Conference, 2016

2015
Super Fast Event Recognition in Internet Videos.
IEEE Trans. Multim., 2015

Human Action Recognition in Unconstrained Videos by Explicit Motion Modeling.
IEEE Trans. Image Process., 2015

Fudan-Huawei at MediaEval 2015: Detecting Violent Scenes and Affective Impact in Movies with Deep Learning.
Proceedings of the Working Notes Proceedings of the MediaEval 2015 Workshop, 2015

Optimal Bayesian Hashing for Efficient Face Recognition.
Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015

2014
Fudan-NJUST at MediaEval 2014: Violent Scenes Detection Using Deep Neural Networks.
Proceedings of the Working Notes Proceedings of the MediaEval 2014 Workshop, 2014

Challenge Huawei challenge: Fusing multimodal features with deep neural networks for Mobile Video Annotation.
Proceedings of the 2013 IEEE International Conference on Multimedia and Expo Workshops, 2014

2013
Beauty is here: evaluating aesthetics in videos using multimodal features and free training data.
Proceedings of the ACM Multimedia Conference, 2013

Fudan at MediaEval 2013: Violent Scenes Detection Using Motion Features and Part-Level Attributes.
Proceedings of the MediaEval 2013 Multimedia Benchmark Workshop, 2013

2012
Fast Semantic Diffusion for Large-Scale Context-Based Image and Video Annotation.
IEEE Trans. Image Process., 2012

A fast video event recognition system and its application to video search.
Proceedings of the 20th ACM Multimedia Conference, MM '12, Nara, Japan, October 29, 2012

The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Trajectory-based Features.
Proceedings of the Working Notes Proceedings of the MediaEval 2012 Workshop, 2012

Trajectory-Based Modeling of Human Actions with Motion Reference Points.
Proceedings of the Computer Vision - ECCV 2012, 2012


  Loading...