Hanwang Zhang

Orcid: 0000-0001-7374-8739

According to our database1, Hanwang Zhang authored at least 240 papers between 2009 and 2025.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
Debiasing vision-language models for vision tasks: a survey.
Frontiers Comput. Sci., January, 2025

2024
NICEST: Noisy Label Correction and Training for Robust Scene Graph Generation.
IEEE Trans. Pattern Anal. Mach. Intell., October, 2024

Learning to Double-Check Model Prediction From a Causal Perspective.
IEEE Trans. Neural Networks Learn. Syst., April, 2024

Fine-Tuning for Few-Shot Image Classification by Multimodal Prototype Regularization.
IEEE Trans. Multim., 2024

Blessing few-shot segmentation via semi-supervised learning with noisy support images.
Pattern Recognit., 2024

Unified Generative and Discriminative Training for Multi-modal Large Language Models.
CoRR, 2024

Enhancing Zero-Shot Vision Models by Label-Free Prompt Distribution Learning and Bias Correcting.
CoRR, 2024

Towards Unified Multimodal Editing with Enhanced Knowledge Collaboration.
CoRR, 2024

Visual Prompt Selection for In-Context Learning Segmentation.
CoRR, 2024

ViD-GPT: Introducing GPT-style Autoregressive Generation in Video Diffusion Models.
CoRR, 2024

EMMA: Your Text-to-Image Diffusion Model Can Secretly Accept Multi-Modal Prompts.
CoRR, 2024

MVGamba: Unify 3D Content Generation as State Space Sequence Modeling.
CoRR, 2024

Towards Semantic Equivalence of Tokenization in Multimodal LLM.
CoRR, 2024

A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training.
CoRR, 2024

Dual-Modal Prompting for Sketch-Based Image Retrieval.
CoRR, 2024

Gamba: Marry Gaussian Splatting with Mamba for single view 3D reconstruction.
CoRR, 2024

Controllable Relation Disentanglement for Few-Shot Class-Incremental Learning.
CoRR, 2024

Selective Vision-Language Subspace Projection for Few-shot CLIP.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

From Multimodal LLM to Human-level AI: Modality, Instruction, Reasoning and Beyond.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Auto-Encoding Morph-Tokens for Multimodal LLM.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Non-confusing Generation of Customized Concepts in Diffusion Models.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Exploring Diffusion Time-steps for Unsupervised Representation Learning.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Fine-tuning Multimodal LLMs to Follow Zero-shot Demonstrative Instructions.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Few-Shot NeRF by Adaptive Rendering Loss Regularization.
Proceedings of the Computer Vision - ECCV 2024, 2024

View-Consistent 3D Editing with Gaussian Splatting.
Proceedings of the Computer Vision - ECCV 2024, 2024

Instruction Tuning-Free Visual Token Complement for Multimodal LLMs.
Proceedings of the Computer Vision - ECCV 2024, 2024

Rethinking and Improving Visual Prompt Selection for In-Context Learning Segmentation.
Proceedings of the Computer Vision - ECCV 2024, 2024

Distributionally Generative Augmentation for Fair Facial Attribute Classification.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Few-Shot Learner Parameterization by Diffusion Time-Steps.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Diffusion Time-step Curriculum for One Image to 3D Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Consistent3D: Towards Consistent High-Fidelity Text-to-3D Generation with Deterministic Sampling Prior.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Disco: Disentangled Control for Realistic Human Dance Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Doubly Abductive Counterfactual Inference for Text-Based Image Editing.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Discriminative Probing and Tuning for Text-to-Image Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Classes Are Not Equal: An Empirical Study on Image Recognition Fairness.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Dysen-VDM: Empowering Dynamics-Aware Text-to-Video Diffusion with LLMs.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Dual-Perspective Knowledge Enrichment for Semi-supervised 3D Object Detection.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

MGNet: Learning Correspondences via Multiple Graphs.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Deconfounded Image Captioning: A Causal Retrospect.
IEEE Trans. Pattern Anal. Mach. Intell., November, 2023

Counterfactual Samples Synthesizing and Training for Robust Visual Question Answering.
IEEE Trans. Pattern Anal. Mach. Intell., November, 2023

Editorial for Special Issue on Large-scale Pre-training: Data, Models, and Fine-tuning.
Mach. Intell. Res., April, 2023

Compositional Prompting Video-language Models to Understand Procedure in Instructional Videos.
Mach. Intell. Res., April, 2023

VL-NMS: Breaking Proposal Bottlenecks in Two-stage Visual-language Matching.
ACM Trans. Multim. Comput. Commun. Appl., 2023

Causal Interventional Training for Image Recognition.
IEEE Trans. Multim., 2023

Cross-GCN: Enhancing Graph Convolutional Network with $k$k-Order Feature Interactions.
IEEE Trans. Knowl. Data Eng., 2023

Learning to Collocate Visual-Linguistic Neural Modules for Image Captioning.
Int. J. Comput. Vis., 2023

ICD-LM: Configuring Vision-Language In-Context Demonstrations by Language Modeling.
CoRR, 2023

ChartLlama: A Multimodal LLM for Chart Understanding and Generation.
CoRR, 2023

Empowering Dynamics-aware Text-to-Video Diffusion with Large Language Models.
CoRR, 2023

Empowering Vision-Language Models to Follow Interleaved Vision-Language Instructions.
CoRR, 2023

DisCo: Disentangled Control for Referring Human Dance Generation in Real World.
CoRR, 2023

Fast Diffusion Model.
CoRR, 2023

An Overview of Challenges in Egocentric Text-Video Retrieval.
CoRR, 2023

Decoupled Kullback-Leibler Divergence Loss.
CoRR, 2023

Adaptively Clustering Neighbor Elements for Image Captioning.
CoRR, 2023

Generalized Logit Adjustment: Calibrating Fine-tuned Models by Removing Label Bias in Foundation Models.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Make the U in UDA Matter: Invariant Consistency Learning for Unsupervised Domain Adaptation.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Imagine That! Abstract-to-Intricate Text-to-Image Synthesis with Scene Graph Hallucination Diffusion.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Tuning Multi-mode Token-level Prompt Alignment across Modalities.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Compositional Prompt Tuning with Motion Cues for Open-vocabulary Video Relation Detection.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Semi-Supervised Few-Shot Segmentation with Noisy Support Images.
Proceedings of the IEEE International Conference on Image Processing, 2023

Prompt-aligned Gradient for Prompt Tuning.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Invariant Training 2D-3D Joint Hard Samples for Few-Shot Point Cloud Recognition.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Learning Trajectory-Word Alignments for Video-Language Tasks.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Random Boxes Are Open-world Object Detectors.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Equivariant Similarity for Vision-Language Foundation Models.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Invariant Feature Regularization for Fair Face Recognition.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Mitigating and Evaluating Static Bias of Action Representations in the Background and the Foreground.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Bootstrap Your Own Prior: Towards Distribution-Agnostic Novel Class Discovery.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Semantic Scene Completion with Cleaner Self.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Unbiased Multiple Instance Learning for Weakly Supervised Video Anomaly Detection.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Towards Debiasing Frame Length Bias in Text-Video Retrieval via Causal Intervention.
Proceedings of the 34th British Machine Vision Conference 2023, 2023

Hypothetical Training for Robust Machine Reading Comprehension of Tabular Context.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

Counterfactual Active Learning for Out-of-Distribution Generalization.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

Debiased Fine-Tuning for Vision-Language Models by Prompt Regularization.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
Align R-CNN: A Pairwise Head Network for Visual Relationship Detection.
IEEE Trans. Multim., 2022

Discriminative Style Learning for Cross-Domain Image Captioning.
IEEE Trans. Image Process., 2022

Guest Editorial Introduction to the Special Section on Video and Language.
IEEE Trans. Circuits Syst. Video Technol., 2022

Editorial paper for Pattern Recognition Letters VSI on cross model understanding for visual question answering.
Pattern Recognit. Lett., 2022

Context-Aware Visual Policy Network for Fine-Grained Image Captioning.
IEEE Trans. Pattern Anal. Mach. Intell., 2022

Auto-Encoding and Distilling Scene Graphs for Image Captioning.
IEEE Trans. Pattern Anal. Mach. Intell., 2022

Learning to Compose and Reason with Language Tree Structures for Visual Grounding.
IEEE Trans. Pattern Anal. Mach. Intell., 2022

PR-NET: Progressively-refined neural network for image manipulation localization.
Int. J. Intell. Syst., 2022

Evaluating and Mitigating Static Bias of Action Representations in the Background and the Foreground.
CoRR, 2022

Attention-based Class Activation Diffusion for Weakly-Supervised Semantic Segmentation.
CoRR, 2022

Exploiting Semantic Role Contextualized Video Features for Multi-Instance Text-Video Retrieval EPIC-KITCHENS-100 Multi-Instance Retrieval Challenge 2022.
CoRR, 2022

RoME: Role-aware Mixture-of-Expert Transformer for Text-to-Video Retrieval.
CoRR, 2022

Respecting Transfer Gap in Knowledge Distillation.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Certified Robustness Against Natural Language Attacks by Causal Intervention.
Proceedings of the International Conference on Machine Learning, 2022

On Non-Random Missing Labels in Semi-Supervised Learning.
Proceedings of the Tenth International Conference on Learning Representations, 2022

NICO Challenge: Out-of-Distribution Generalization for Image Recognition Challenges.
Proceedings of the Computer Vision - ECCV 2022 Workshops, 2022

Identifying Hard Noise in Long-Tailed Sample Distribution.
Proceedings of the Computer Vision - ECCV 2022, 2022

Equivariance and Invariance Inductive Bias for Learning from Insufficient Data.
Proceedings of the Computer Vision - ECCV 2022, 2022

Invariant Feature Learning for Generalized Long-Tailed Classification.
Proceedings of the Computer Vision, 2022

Class Is Invariant to Context and Vice Versa: On Learning Invariance for Out-Of-Distribution Generalization.
Proceedings of the Computer Vision - ECCV 2022, 2022

Class Re-Activation Maps for Weakly-Supervised Semantic Segmentation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Learning to Imagine: Integrating Counterfactual Thinking in Neural Discrete Reasoning.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

KQA Pro: A Dataset with Explicit Compositional Programs for Complex Question Answering over Knowledge Base.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

Cross-Domain Empirical Risk Minimization for Unbiased Long-Tailed Classification.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

Deconfounded Visual Grounding.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021
Introduction to the Special Issue on Fine-grained Visual Computing.
ACM Trans. Multim. Comput. Commun. Appl., 2021

Self-Adaptive Neural Module Transformer for Visual Question Answering.
IEEE Trans. Multim., 2021

Variational Context: Exploiting Visual and Textual Context for Grounding Referring Expressions.
IEEE Trans. Pattern Anal. Mach. Intell., 2021

Adversarial Visual Robustness by Causal Intervention.
CoRR, 2021

VL-NMS: Breaking Proposal Bottlenecks in Two-Stage Visual-Language Matching.
CoRR, 2021

Clicks can be Cheating: Counterfactual Recommendation for Mitigating Clickbait Issue.
Proceedings of the SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2021

Self-Supervised Learning Disentangled Group Representation as Feature.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Introspective Distillation for Robust Question Answering.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

How Should Pre-Trained Language Models Be Fine-Tuned Towards Adversarial Robustness?
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Self-Regulation for Semantic Segmentation.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Transporting Causal Mechanisms for Unsupervised Domain Adaptation.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Auto-Parsing Network for Image Captioning and Visual Question Answering.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Causal Attention for Unbiased Visual Recognition.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

TransferNet: An Effective and Transparent Framework for Multi-hop Question Answering over Relation Graph.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

Counterfactual Zero-Shot and Open-Set Visual Recognition.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Causal Attention for Vision-Language Tasks.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Counterfactual VQA: A Cause-Effect Look at Language Bias.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

The Blessings of Unlabeled Background in Untrimmed Videos.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Distilling Causal Effect of Data in Class-Incremental Learning.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Empowering Language Understanding with Counterfactual Reasoning.
Proceedings of the Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, 2021

Are Missing Links Predictable? An Inferential Benchmark for Knowledge Graph Completion.
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

Ref-NMS: Breaking Proposal Bottlenecks in Two-Stage Referring Expression Grounding.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020
Multi-Level Policy and Reward-Based Deep Reinforcement Learning Framework for Image Captioning.
IEEE Trans. Multim., 2020

Fast Discrete Collaborative Multi-Modal Hashing for Large-Scale Multimedia Retrieval.
IEEE Trans. Knowl. Data Eng., 2020

"Click" Is Not Equal to "Like": Counterfactual Recommendation for Mitigating Clickbait Issue.
CoRR, 2020

Ref-NMS: Breaking Proposal Bottlenecks in Two-Stage Referring Expression Grounding.
CoRR, 2020

KQA Pro: A Large Diagnostic Dataset for Complex Question Answering over Knowledge Base.
CoRR, 2020

Cross-GCN: Enhancing Graph Convolutional Network with k-Order Feature Interactions.
CoRR, 2020

Stochastic Dynamics for Video Infilling.
Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2020

Causal Intervention for Weakly-Supervised Semantic Segmentation.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Interventional Few-Shot Learning.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Long-Tailed Classification by Keeping the Good and Removing the Bad Momentum Causal Effect.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Hierarchical Scene Graph Encoder-Decoder for Image Paragraph Captioning.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Occlusion-Aware GAN for Face De-Occlusion in the Wild.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2020

Feature Pyramid Transformer.
Proceedings of the Computer Vision - ECCV 2020, 2020

More Grounded Image Captioning by Distilling Image-Text Matching Model.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Visual Commonsense R-CNN.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Visual Commonsense Representation Learning via Causal Inference.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Unbiased Scene Graph Generation From Biased Training.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Two Causal Principles for Improving Visual Dialog.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Learning to Segment the Tail.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Learning Filter Pruning Criteria for Deep Convolutional Neural Networks Acceleration.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Iterative Context-Aware Graph Inference for Visual Dialog.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Counterfactual Samples Synthesizing for Robust Visual Question Answering.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

General Partial Label Learning via Dual Bipartite Graph Autoencoder.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019
More is Better: Precise and Detailed Image Captioning Using Online Positive Recall and Missing Concepts Mining.
IEEE Trans. Image Process., 2019

Special issue on multimedia recommendation and multi-modal data analysis.
Multim. Syst., 2019

Referring Expression Grounding by Marginalizing Scene Graph Likelihood.
CoRR, 2019

Making History Matter: Gold-Critic Sequence Training for Visual Dialog.
CoRR, 2019

Question-Aware Tube-Switch Network for Video Question Answering.
Proceedings of the 27th ACM International Conference on Multimedia, 2019

Single-shot Semantic Image Inpainting with Densely Connected Generative Networks.
Proceedings of the 27th ACM International Conference on Multimedia, 2019

Learning Using Privileged Information for Food Recognition.
Proceedings of the 27th ACM International Conference on Multimedia, 2019

Making History Matter: History-Advantage Sequence Training for Visual Dialog.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Learning to Collocate Neural Modules for Image Captioning.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Learning to Assemble Neural Module Tree Networks for Visual Grounding.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Counterfactual Critic Multi-Agent Training for Scene Graph Generation.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Auto-Encoding Scene Graphs for Image Captioning.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Learning to Compose Dynamic Tree Structures for Visual Contexts.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Explainable and Explicit Visual Reasoning Over Scene Graphs.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Recursive Visual Attention in Visual Dialog.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

DeepChannel: Salience Estimation by Contrastive Learning for Extractive Document Summarization.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

Learning to Embed Sentences Using Attentive Recursive Trees.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018
Attributed Social Network Embedding.
IEEE Trans. Knowl. Data Eng., 2018

Self-Supervised Video Hashing With Hierarchical Binary Auto-Encoder.
IEEE Trans. Image Process., 2018

Explainability by Parsing: Neural Module Tree Networks for Natural Language Visual Grounding.
CoRR, 2018

Scene Dynamics: Counterfactual Critic Multi-Agent Training for Scene Graph Generation.
CoRR, 2018

Stochastic Video Long-term Interpolation.
CoRR, 2018

Low-shot Learning via Covariance-Preserving Adversarial Augmentation Networks.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Venue Prediction for Social Images by Exploiting Rich Temporal Patterns in LBSNs.
Proceedings of the MultiMedia Modeling - 24th International Conference, 2018

Context-Aware Visual Policy Network for Sequence-Level Image Captioning.
Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018

Objects, Relationships, and Context in Visual Data.
Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval, 2018

Recommendation Technologies for Multimedia Content.
Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval, 2018

Multi-Level Policy and Reward Reinforcement Learning for Image Captioning.
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, 2018

Discrete Factorization Machines for Fast Feature-based Recommendation.
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, 2018

Shuffle-Then-Assemble: Learning Object-Agnostic Visual Relationship Features.
Proceedings of the Computer Vision - ECCV 2018, 2018

Grounding Referring Expressions in Images by Variational Context.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Zero-Shot Visual Recognition Using Semantics-Preserving Adversarial Embedding Networks.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Learning to Guide Decoding for Image Captioning.
Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

2017
VideoWhisper: Toward Discriminative Unsupervised Video Feature Learning With Attention-Based Recurrent Neural Networks.
IEEE Trans. Multim., 2017

Matryoshka Peek: Toward Learning Fine-Grained, Robust, Discriminative Features for Product Search.
IEEE Trans. Multim., 2017

Video Captioning With Attention-Based LSTM and Semantic Consistency.
IEEE Trans. Multim., 2017

I Know What You Want to Express: Sentence Element Inference by Incorporating External Knowledge Base.
IEEE Trans. Knowl. Data Eng., 2017

Event Classification in Microblogs via Social Tracking.
ACM Trans. Intell. Syst. Technol., 2017

Erratum to: Multi-view feature selection and classification for Alzheimer's Disease Diagnosis.
Multim. Tools Appl., 2017

Multi-view feature selection and classification for Alzheimer's Disease diagnosis.
Multim. Tools Appl., 2017

Zero-Shot Visual Recognition using Semantics-Preserving Adversarial Embedding Network.
CoRR, 2017

Neural Collaborative Filtering.
Proceedings of the 26th International Conference on World Wide Web, 2017

Attentive Collaborative Filtering: Multimedia Recommendation with Item- and Component-Level Attention.
Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2017

Improving Event Extraction via Multimodal Integration.
Proceedings of the 2017 ACM on Multimedia Conference, 2017

Video Question Answering via Gradually Refined Attention over Appearance and Motion.
Proceedings of the 2017 ACM on Multimedia Conference, 2017

Video Visual Relation Detection.
Proceedings of the 2017 ACM on Multimedia Conference, 2017

Enhancing Micro-video Understanding by Harnessing External Sounds.
Proceedings of the 2017 ACM on Multimedia Conference, 2017

Attentional Factorization Machines: Learning the Weight of Feature Interactions via Attention Networks.
Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, 2017

Extracting Generic Features of Artistic Style via Deep Convolutional Neural Network.
Proceedings of the International Conference on Video and Image Processing, 2017

VIDEOWHISPER: Towards unsupervised learning of discriminative features of videos with RNN.
Proceedings of the 2017 IEEE International Conference on Multimedia and Expo, 2017

Object trajectory proposal.
Proceedings of the 2017 IEEE International Conference on Multimedia and Expo, 2017

PPR-FCN: Weakly Supervised Visual Relation Detection via Parallel Pairwise R-FCN.
Proceedings of the IEEE International Conference on Computer Vision, 2017

Visual Translation Embedding Network for Visual Relation Detection.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

SCA-CNN: Spatial and Channel-Wise Attention in Convolutional Networks for Image Captioning.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

Deep Semantic Indexing Using Convolutional Localization Network with Region-Based Visual Attention for Image Database.
Proceedings of the Databases Theory and Applications, 2017

2016
Learning from Collective Intelligence: Feature Learning Using Social Images and Tags.
ACM Trans. Multim. Comput. Commun. Appl., 2016

Deep Aging Face Verification With Large Gaps.
IEEE Trans. Multim., 2016

Deep Fusion of Multiple Semantic Cues for Complex Event Recognition.
IEEE Trans. Image Process., 2016

Robust regression based face recognition with fast outlier removal.
Multim. Tools Appl., 2016

L<sub>2, p</sub>-norm and sample constraint based feature selection and classification for AD diagnosis.
Neurocomputing, 2016

Binary representation learning in computer vision.
Neurocomputing, 2016

Learning content-social influential features for influence analysis.
Int. J. Multim. Inf. Retr., 2016

SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning.
CoRR, 2016

Discrete Collaborative Filtering.
Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, 2016

Fast Matrix Factorization for Online Recommendation with Implicit Feedback.
Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, 2016

Deep Learning Generic Features for Cross-Media Retrieval.
Proceedings of the MultiMedia Modeling - 22nd International Conference, 2016

Mental Visual Browsing.
Proceedings of the MultiMedia Modeling - 22nd International Conference, 2016

Play and Rewind: Optimizing Binary Representations of Videos by Self-Supervised Temporal Hashing.
Proceedings of the 2016 ACM Conference on Multimedia Conference, 2016

Mental Visual Indexing: Towards Fast Video Browsing.
Proceedings of the 2016 ACM Conference on Multimedia Conference, 2016

An Intention-Aware Interactive System for Mobile Video Browsing.
Proceedings of the 2016 ACM Conference on Multimedia Conference, 2016

Micro Tells Macro: Predicting the Popularity of Micro-Videos via a Transductive Model.
Proceedings of the 2016 ACM Conference on Multimedia Conference, 2016

Saliency meets spatial quantization: A practical framework for large scale product search.
Proceedings of the 2016 IEEE International Conference on Multimedia & Expo Workshops, 2016

Online Collaborative Learning for Open-Vocabulary Visual Classifiers.
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

Discrete Image Hashing Using Large Weakly Annotated Photo Collections.
Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2016

2015
Enhancing Video Event Recognition Using Automatically Constructed Semantic-Visual Knowledge Base.
IEEE Trans. Multim., 2015

Multimedia Summarization for Social Events in Microblog Stream.
IEEE Trans. Multim., 2015

Hashing with Inductive Supervised Learning.
Proceedings of the Advances in Multimedia Information Processing - PCM 2015, 2015

Learning Features from Large-Scale, Noisy and Social Image-Tag Collection.
Proceedings of the 23rd Annual ACM Conference on Multimedia Conference, MM '15, Brisbane, Australia, October 26, 2015

Visual Coding in a Semantic Hierarchy.
Proceedings of the 23rd Annual ACM Conference on Multimedia Conference, MM '15, Brisbane, Australia, October 26, 2015

Multi-view Semi-supervised Learning for Web Image Annotation.
Proceedings of the 23rd Annual ACM Conference on Multimedia Conference, MM '15, Brisbane, Australia, October 26, 2015

Learning Image and User Features for Recommendation in Social Networks.
Proceedings of the 2015 IEEE International Conference on Computer Vision, 2015

2014
Attribute-Augmented Semantic Hierarchy: Towards a Unified Framework for Content-Based Image Retrieval.
ACM Trans. Multim. Comput. Commun. Appl., 2014

Robust (Semi) Nonnegative Graph Embedding.
IEEE Trans. Image Process., 2014

Start from Scratch: Towards Automatically Identifying, Modeling, and Naming Visual Attributes.
Proceedings of the ACM International Conference on Multimedia, MM '14, Orlando, FL, USA, November 03, 2014

Perception-Guided Multimodal Feature Fusion for Photo Aesthetics Assessment.
Proceedings of the ACM International Conference on Multimedia, MM '14, Orlando, FL, USA, November 03, 2014

One of a Kind: User Profiling by Social Curation.
Proceedings of the ACM International Conference on Multimedia, MM '14, Orlando, FL, USA, November 03, 2014

Image Tagging with Social Assistance.
Proceedings of the International Conference on Multimedia Retrieval, 2014

2013
Detecting Group Activities With Multi-Camera Context.
IEEE Trans. Circuits Syst. Video Technol., 2013

Attribute-augmented semantic hierarchy: towards bridging semantic gap and intention gap in image retrieval.
Proceedings of the ACM Multimedia Conference, 2013

2012
Attribute feedback.
Proceedings of the 20th ACM Multimedia Conference, MM '12, Nara, Japan, October 29, 2012

Attribute feedback.
Proceedings of the 20th ACM Multimedia Conference, MM '12, Nara, Japan, October 29, 2012

Visual query attributes suggestion.
Proceedings of the 20th ACM Multimedia Conference, MM '12, Nara, Japan, October 29, 2012

Robust Non-negative Graph Embedding: Towards noisy data, unreliable graphs, and noisy labels.
Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012

2009
Web image interpretation: semi-supervised mining annotated words.
Proceedings of the 2009 IEEE International Conference on Multimedia and Expo, 2009


  Loading...