2025
Exploring Vision-Language Foundation Model for Novel Object Captioning.
IEEE Trans. Circuits Syst. Video Technol., January, 2025
2024
HIRI-ViT: Scaling Vision Transformer With High Resolution Inputs.
IEEE Trans. Pattern Anal. Mach. Intell., September, 2024
End-to-End Video Scene Graph Generation With Temporal Propagation Transformer.
IEEE Trans. Multim., 2024
SD-DiT: Unleashing the Power of Self-supervised Discrimination in Diffusion Transformer.
CoRR, 2024
Hi3D: Pursuing High-Resolution Image-to-3D Generation with Video Diffusion Models.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024
DreamMesh: Jointly Manipulating and Texturing Triangle Meshes for Text-to-3D Generation.
Proceedings of the Computer Vision - ECCV 2024, 2024
Improving Virtual Try-On with Garment-Focused Diffusion Models.
Proceedings of the Computer Vision - ECCV 2024, 2024
Unleashing Text-to-Image Diffusion Prior for Zero-Shot Image Captioning.
Proceedings of the Computer Vision - ECCV 2024, 2024
Improving Text-Guided Object Inpainting with Semantic Pre-inpainting.
Proceedings of the Computer Vision - ECCV 2024, 2024
SD-DiT: Unleashing the Power of Self-Supervised Discrimination in Diffusion Transformer<sup>*</sup>.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
TRIP: Temporal Residual Learning with Image Noise Prior for Image-to-Video Diffusion Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
Boosting Diffusion Models with Moving Average Sampling in Frequency Domain.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
VP3D: Unleashing 2D Visual Prompt for Text-to-3D Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
Prompt Refinement with Image Pivot for Text-to-Image Generation.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024
2023
IEEE Trans. Pattern Anal. Mach. Intell., September, 2023
A Low Rank Promoting Prior for Unsupervised Contrastive Learning.
IEEE Trans. Pattern Anal. Mach. Intell., March, 2023
Retrieval Augmented Convolutional Encoder-decoder Networks for Video Captioning.
ACM Trans. Multim. Comput. Commun. Appl., February, 2023
Boosting Scene Graph Generation with Visual Relation Saliency.
ACM Trans. Multim. Comput. Commun. Appl., January, 2023
Boosting Vision-and-Language Navigation with Direction Guiding and Backtracing.
ACM Trans. Multim. Comput. Commun. Appl., January, 2023
Bottom-up and Top-down Object Inference Networks for Image Captioning.
ACM Trans. Multim. Comput. Commun. Appl., 2023
Boosting Relationship Detection in Images with Multi-Granular Self-Supervised Learning.
ACM Trans. Multim. Comput. Commun. Appl., 2023
Contextual Transformer Networks for Visual Recognition.
IEEE Trans. Pattern Anal. Mach. Intell., 2023
3DStyle-Diffusion: Pursuing Fine-grained Text-driven 3D Stylization with 2D Diffusion Models.
Proceedings of the 31st ACM International Conference on Multimedia, 2023
ControlStyle: Text-Driven Stylized Image Generation Using Diffusion Priors.
Proceedings of the 31st ACM International Conference on Multimedia, 2023
Control3D: Towards Controllable Text-to-3D Generation.
Proceedings of the 31st ACM International Conference on Multimedia, 2023
3D Creation at Your Fingertips: From Text or Image to 3D Assets.
Proceedings of the 31st ACM International Conference on Multimedia, 2023
Learning Neural Implicit Surfaces with Object-Aware Radiance Fields.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023
ObjectFusion: Multi-modal 3D Object Detection with Object-Centric Fusion.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023
Transforming Radiance Field with Lipschitz Network for Photorealistic 3D Scene Stylization.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
HGNet: Learning Hierarchical Geometry from Points, Edges, and Surfaces.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
Modality-Agnostic Debiasing for Single Domain Generalization.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
Semantic-Conditional Diffusion Networks for Image Captioning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
Learning to Generate Language-Supervised and Open-Vocabulary Scene Graph Using Pre-Trained Visual-Semantic Space.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
2022
Uni-EDEN: Universal Encoder-Decoder Network by Multi-Granular Vision-Language Pre-training.
ACM Trans. Multim. Comput. Commun. Appl., 2022
Unpaired Image Captioning With semantic-Constrained Self-Learning.
IEEE Trans. Multim., 2022
3D Cascade RCNN: High Quality Object Detection in Point Clouds.
IEEE Trans. Image Process., 2022
Out-of-Distribution Detection with Hilbert-Schmidt Independence Optimization.
CoRR, 2022
Silver-Bullet-3D at ManiSkill 2021: Learning-from-Demonstrations and Heuristic Rule-based Methods for Object Manipulation.
CoRR, 2022
Contextual and selective attention networks for image captioning.
Sci. China Inf. Sci., 2022
Out-of-Distribution Detection via Conditional Kernel Independence Model.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022
Auto-captions on GIF: A Large-scale Video-sentence Dataset for Vision-language Pre-training.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022
Wave-ViT: Unifying Wavelet and Transformers for Visual Representation Learning.
Proceedings of the Computer Vision - ECCV 2022, 2022
SPE-Net: Boosting Point Cloud Analysis via Rotation Robustness Enhancement.
Proceedings of the Computer Vision - ECCV 2022, 2022
Dynamic Temporal Filtering in Video Models.
Proceedings of the Computer Vision - ECCV 2022, 2022
Exploring Structure-aware Transformer over Interaction Proposals for Human-Object Interaction Detection.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022
Stand-Alone Inter-Frame Attention in Video Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022
Comprehending and Ordering Semantics for Image Captioning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022
3D-Producer: A Hybrid and User-Friendly 3D Reconstruction System.
Proceedings of the Artificial Intelligence - Second CAAI International Conference, 2022
2021
Smart Director: An Event-Driven Directing System for Live Broadcasting.
ACM Trans. Multim. Comput. Commun. Appl., 2021
Single Shot Video Object Detector.
IEEE Trans. Multim., 2021
MINet: Meta-Learning Instance Identifiers for Video Object Detection.
IEEE Trans. Image Process., 2021
A Style and Semantic Memory Mechanism for Domain Generalization.
CoRR, 2021
A Low Rank Promoting Prior for Unsupervised Contrastive Learning.
CoRR, 2021
Improving Self-supervised Learning with Automated Unsupervised Outlier Arbitration.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021
CoCo-BERT: Improving Video-Language Pre-training with Contrastive Cross-modal Matching and Denoising.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021
X-modaler: A Versatile and High-performance Codebase for Cross-modal Analytics.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021
Transferrable Contrastive Learning for Visual Domain Adaptation.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021
Core-Text: Improving Scene Text Detection with Contrastive Relational Reasoning.
Proceedings of the 2021 IEEE International Conference on Multimedia and Expo, 2021
A Style and Semantic Memory Mechanism for Domain Generalization<sup>*</sup>.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021
Representing Videos As Discriminative Sub-Graphs for Action Recognition.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021
SeCo: Exploring Sequence Supervision for Unsupervised Representation Learning.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021
Scheduled Sampling in Vision-Language Pretraining with Decoupled Encoder-Decoder Network.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021
2020
Deep Metric Learning With Density Adaptivity.
IEEE Trans. Multim., 2020
Pre-training for Video Captioning Challenge 2020 Summary.
CoRR, 2020
Joint Contrastive Learning with Infinite Possibilities.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020
iDirector: An Intelligent Directing System for Live Broadcast.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020
Exploring Depth Information for Spatial Relation Recognition.
Proceedings of the 3rd IEEE Conference on Multimedia Information Processing and Retrieval, 2020
Exploring Category-Agnostic Clusters for Open-Set Domain Adaptation.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020
X-Linear Attention Networks for Image Captioning.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020
Learning a Unified Sample Weighting Network for Object Detection.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020
2019
Learning Click-Based Deep Structure-Preserving Embeddings with Visual Attention.
ACM Trans. Multim. Comput. Commun. Appl., 2019
Multi-Source Domain Adaptation and Semi-Supervised Domain Adaptation with Focus on Visual Domain Adaptation Challenge 2019.
CoRR, 2019
vireoJD-MM at Activity Detection in Extended Videos.
CoRR, 2019
Trimmed Action Recognition, Dense-Captioning Events in Videos, and Spatio-temporal Action Localization with Focus on ActivityNet Challenge 2019.
CoRR, 2019
VireoJD-MM @ TRECVid 2019: Activities in Extended Video (ActEV).
Proceedings of the 2019 TREC Video Retrieval Evaluation, 2019
daBNN: A Super Fast Inference Framework for Binary Neural Networks on ARM devices.
Proceedings of the 27th ACM International Conference on Multimedia, 2019
Animating Your Life: Real-Time Video-to-Animation Translation.
Proceedings of the 27th ACM International Conference on Multimedia, 2019
Mocycle-GAN: Unpaired Video-to-Video Translation.
Proceedings of the 27th ACM International Conference on Multimedia, 2019
Convolutional Auto-encoding of Sentence Topics for Image Paragraph Generation.
Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019
Hierarchy Parsing for Image Captioning.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019
Relation Distillation Networks for Video Object Detection.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019
Transferrable Prototypical Networks for Unsupervised Domain Adaptation.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019
Pointing Novel Objects in Image Captioning.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019
Exploring Object Relation in Mean Teacher for Cross-Domain Detection.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019
Temporal Deformable Convolutional Encoder-Decoder Networks for Video Captioning.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019
2018
Exploring Visual Relationship for Image Captioning.
Proceedings of the Computer Vision - ECCV 2018, 2018
Jointly Localizing and Describing Events for Dense Video Captioning.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018
Memory Matching Networks for One-Shot Image Recognition.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018
2017
Deep Semantic Hashing with Generative Adversarial Networks.
Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2017
Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2017
To Create What You Tell: Generating Videos from Captions.
Proceedings of the 2017 ACM on Multimedia Conference, 2017
Boosting Image Captioning with Attributes.
Proceedings of the IEEE International Conference on Computer Vision, 2017
Incorporating Copying Mechanism in Image Captioning for Learning Novel Objects.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017
Video Captioning with Transferred Semantic Attributes.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017
2016
Learning Deep Intrinsic Video Representation by Exploring Temporal Coherence and Graph Structure.
Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, 2016
Jointly Modeling Embedding and Translation to Bridge Video and Language.
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016
2015
Semi-supervised Hashing with Semantic Confidence for Large Scale Visual Search.
Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2015
Semi-supervised Domain Adaptation with Subspace Learning for visual recognition.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015
2014
Click-through-based cross-view learning for image search.
Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2014
Click-through-based Subspace Learning for Image Search.
Proceedings of the ACM International Conference on Multimedia, MM '14, Orlando, FL, USA, November 03, 2014
2013
Image search by graph-based label propagation with image representation from DNN.
Proceedings of the ACM Multimedia Conference, 2013