Tao Mei

Orcid: 0000-0002-5990-7307

Affiliations:
  • JD AI Research, Beijing, China
  • Microsoft Research Asia, Beijing, China
  • University of Science and Technology of China, Department of Automation, Hefei, China (PhD 2006)


According to our database1, Tao Mei authored at least 523 papers between 2005 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Visualizing and Understanding Patch Interactions in Vision Transformer.
IEEE Trans. Neural Networks Learn. Syst., October, 2024

CoSeg: Cognitively Inspired Unsupervised Generic Event Segmentation.
IEEE Trans. Neural Networks Learn. Syst., September, 2024

HIRI-ViT: Scaling Vision Transformer With High Resolution Inputs.
IEEE Trans. Pattern Anal. Mach. Intell., September, 2024

Explaining Cross-domain Recognition with Interpretable Deep Classifier.
ACM Trans. Multim. Comput. Commun. Appl., March, 2024

End-to-End Video Scene Graph Generation With Temporal Propagation Transformer.
IEEE Trans. Multim., 2024

Cross-Modal Quantization for Co-Speech Gesture Generation.
IEEE Trans. Multim., 2024

Learning Temporal Dynamics in Videos With Image Transformer.
IEEE Trans. Multim., 2024

Bidirectional Knowledge Reconfiguration for Lightweight Point Cloud Analysis.
IEEE Trans. Multim., 2024

Learning 3D Shape Latent for Point Cloud Completion.
IEEE Trans. Multim., 2024

Motion Capture from Inertial and Vision Sensors.
CoRR, 2024

SD-DiT: Unleashing the Power of Self-supervised Discrimination in Diffusion Transformer.
CoRR, 2024

VideoDrafter: Content-Consistent Multi-Scene Video Generation with LLM.
CoRR, 2024

FreeEnhance: Tuning-Free Image Enhancement via Content-Consistent Noising-and-Denoising Process.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

CLaM: An Open-Source Library for Performance Evaluation of Text-driven Human Motion Generation.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Hi3D: Pursuing High-Resolution Image-to-3D Generation with Video Diffusion Models.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

DreamMesh: Jointly Manipulating and Texturing Triangle Meshes for Text-to-3D Generation.
Proceedings of the Computer Vision - ECCV 2024, 2024

Improving Virtual Try-On with Garment-Focused Diffusion Models.
Proceedings of the Computer Vision - ECCV 2024, 2024

VideoStudio: Generating Consistent-Content and Multi-scene Videos.
Proceedings of the Computer Vision - ECCV 2024, 2024

Improving Text-Guided Object Inpainting with Semantic Pre-inpainting.
Proceedings of the Computer Vision - ECCV 2024, 2024

SD-DiT: Unleashing the Power of Self-Supervised Discrimination in Diffusion Transformer<sup>*</sup>.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

TRIP: Temporal Residual Learning with Image Noise Prior for Image-to-Video Diffusion Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Boosting Diffusion Models with Moving Average Sampling in Frequency Domain.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

VP3D: Unleashing 2D Visual Prompt for Text-to-3D Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Learning Spatial Adaptation and Temporal Coherence in Diffusion Models for Video Super-Resolution.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Prompt Refinement with Image Pivot for Text-to-Image Generation.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023
Deep Person Generation: A Survey from the Perspective of Face, Pose, and Cloth Synthesis.
ACM Comput. Surv., December, 2023

Dual Vision Transformer.
IEEE Trans. Pattern Anal. Mach. Intell., September, 2023

Augmentation Pathways Network for Visual Recognition.
IEEE Trans. Pattern Anal. Mach. Intell., August, 2023

Lightweight and Progressively-Scalable Networks for Semantic Segmentation.
Int. J. Comput. Vis., August, 2023

Bi-calibration Networks for Weakly-Supervised Video Representation Learning.
Int. J. Comput. Vis., July, 2023

A Low Rank Promoting Prior for Unsupervised Contrastive Learning.
IEEE Trans. Pattern Anal. Mach. Intell., March, 2023

Retrieval Augmented Convolutional Encoder-decoder Networks for Video Captioning.
ACM Trans. Multim. Comput. Commun. Appl., February, 2023

A Survey on Learning to Reject.
Proc. IEEE, February, 2023

Boosting Scene Graph Generation with Visual Relation Saliency.
ACM Trans. Multim. Comput. Commun. Appl., January, 2023

Boosting Vision-and-Language Navigation with Direction Guiding and Backtracing.
ACM Trans. Multim. Comput. Commun. Appl., January, 2023

Bottom-up and Top-down Object Inference Networks for Image Captioning.
ACM Trans. Multim. Comput. Commun. Appl., 2023

Boosting Relationship Detection in Images with Multi-Granular Self-Supervised Learning.
ACM Trans. Multim. Comput. Commun. Appl., 2023

Boosting Generic Visual-Linguistic Representation With Dynamic Contexts.
IEEE Trans. Multim., 2023

Contextual Transformer Networks for Visual Recognition.
IEEE Trans. Pattern Anal. Mach. Intell., 2023

Recent Advances of Monocular 2D and 3D Human Pose Estimation: A Deep Learning Perspective.
ACM Comput. Surv., 2023

Selective Volume Mixup for Video Action Recognition.
CoRR, 2023

Deep Equilibrium Multimodal Fusion.
CoRR, 2023

Visual-Aware Text-to-Speech.
CoRR, 2023

Learning and Evaluating Human Preferences for Conversational Head Generation.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

3DStyle-Diffusion: Pursuing Fine-grained Text-driven 3D Stylization with 2D Diffusion Models.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

FastReID: A Pytorch Toolbox for General Instance Re-identification.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

ControlStyle: Text-Driven Stylized Image Generation Using Diffusion Priors.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Control3D: Towards Controllable Text-to-3D Generation.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

3D Creation at Your Fingertips: From Text or Image to 3D Assets.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Learning Neural Implicit Surfaces with Object-Aware Radiance Fields.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

ObjectFusion: Multi-modal 3D Object Detection with Object-Centric Fusion.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Visual-Aware Text-to-Speech<sup>*</sup>.
Proceedings of the IEEE International Conference on Acoustics, 2023

HGNet: Learning Hierarchical Geometry from Points, Edges, and Surfaces.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

TRACE: 5D Temporal Regression of Avatars with Dynamic Cameras in 3D Environments.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Modality-Agnostic Debiasing for Single Domain Generalization.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Semantic-Conditional Diffusion Networks for Image Captioning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

PointClustering: Unsupervised Point Cloud Pre-training using Transformation Invariance in Clustering.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

AnchorFormer: Point Cloud Completion from Discriminative Nodes.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Learning to Generate Language-Supervised and Open-Vocabulary Scene Graph Using Pre-Trained Visual-Semantic Space.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
msr-vtt.
Dataset, December, 2022

The Elements of End-to-end Deep Face Recognition: A Survey of Recent Advances.
ACM Comput. Surv., January, 2022

Uni-EDEN: Universal Encoder-Decoder Network by Multi-Granular Vision-Language Pre-training.
ACM Trans. Multim. Comput. Commun. Appl., 2022

FasterPose: A Faster Simple Baseline for Human Pose Estimation.
ACM Trans. Multim. Comput. Commun. Appl., 2022

Unpaired Image Captioning With semantic-Constrained Self-Learning.
IEEE Trans. Multim., 2022

3D Cascade RCNN: High Quality Object Detection in Point Clouds.
IEEE Trans. Image Process., 2022

Dual Spoof Disentanglement Generation for Face Anti-Spoofing With Depth Uncertainty Learning.
IEEE Trans. Circuits Syst. Video Technol., 2022

Guest Editorial Introduction to the Special Section on Video and Language.
IEEE Trans. Circuits Syst. Video Technol., 2022

Boosting Semi-Supervised Face Recognition With Noise Robustness.
IEEE Trans. Circuits Syst. Video Technol., 2022

Column-Spatial Correction Network for Remote Sensing Image Destriping.
Remote. Sens., 2022

Optimal synthesis of mechanisms using repellency evolutionary algorithm.
Knowl. Based Syst., 2022

Long-tailed visual recognition with deep models: A methodological survey and evaluation.
Neurocomputing, 2022

Weakly Supervised Semantic Segmentation for Large-Scale Point Cloud.
CoRR, 2022

Out-of-Distribution Detection with Hilbert-Schmidt Independence Optimization.
CoRR, 2022

Generalized One-shot Domain Adaption of Generative Adversarial Networks.
CoRR, 2022

Dual Vision Transformer.
CoRR, 2022

Video2StyleGAN: Encoding Video in Latent Space for Manipulation.
CoRR, 2022

Silver-Bullet-3D at ManiSkill 2021: Learning-from-Demonstrations and Heuristic Rule-based Methods for Object Manipulation.
CoRR, 2022

A-ACT: Action Anticipation through Cycle Transformations.
CoRR, 2022

Freeform Body Motion Generation from Speech.
CoRR, 2022

Motion-Focused Contrastive Learning of Video Representations.
CoRR, 2022

Contextual and selective attention networks for image captioning.
Sci. China Inf. Sci., 2022

Generalized One-shot Domain Adaptation of Generative Adversarial Networks.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Out-of-Distribution Detection via Conditional Kernel Independence Model.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

WOC: A Handy Webcam-based 3D Online Chatroom.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Auto-captions on GIF: A Large-scale Video-sentence Dataset for Vision-language Pre-training.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

MAPLE: Masked Pseudo-Labeling autoEncoder for Semi-supervised Point Cloud Action Recognition.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Part-level Action Parsing via a Pose-guided Coarse-to-Fine Framework.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2022

Cross-modal Contrastive Distillation for Instructional Activity Anticipation.
Proceedings of the 26th International Conference on Pattern Recognition, 2022

Responsive Listening Head Generation: A Benchmark Dataset and Baseline.
Proceedings of the Computer Vision - ECCV 2022, 2022

Wave-ViT: Unifying Wavelet and Transformers for Visual Representation Learning.
Proceedings of the Computer Vision - ECCV 2022, 2022

CAViT: Contextual Alignment Vision Transformer for Video Object Re-identification.
Proceedings of the Computer Vision - ECCV 2022, 2022

SPE-Net: Boosting Point Cloud Analysis via Rotation Robustness Enhancement.
Proceedings of the Computer Vision - ECCV 2022, 2022

Dynamic Temporal Filtering in Video Models.
Proceedings of the Computer Vision - ECCV 2022, 2022

Gait Recognition in the Wild with Dense 3D Representations and A Benchmark.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Exploring Structure-aware Transformer over Interaction Proposals for Human-Object Interaction Detection.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Memory-Augmented Non-Local Attention for Video Super-Resolution.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Putting People in their Place: Monocular Regression of 3D People in Depth.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

MLP-3D: A MLP-like 3D Architecture with Grouped Time Mixing.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Stand-Alone Inter-Frame Attention in Video Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Comprehending and Ordering Semantics for Image Captioning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Directional Self-supervised Learning for Heavy Image Augmentations.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021
Smart Director: An Event-Driven Directing System for Live Broadcasting.
ACM Trans. Multim. Comput. Commun. Appl., 2021

Virtual Guidance-Based Coordinated Tracking Control of Multi-Autonomous Underwater Vehicles Using Composite Neural Learning.
IEEE Trans. Neural Networks Learn. Syst., 2021

VehicleNet: Learning Robust Visual Representation for Vehicle Re-Identification.
IEEE Trans. Multim., 2021

Hierarchical Soft Quantization for Skeleton-Based Human Action Recognition.
IEEE Trans. Multim., 2021

Single Shot Video Object Detector.
IEEE Trans. Multim., 2021

Pose-Guided Tracking-by-Detection: Robust Multi-Person Pose Tracking.
IEEE Trans. Multim., 2021

AGRNet: Adaptive Graph Representation Learning and Reasoning for Face Parsing.
IEEE Trans. Image Process., 2021

MINet: Meta-Learning Instance Identifiers for Video Object Detection.
IEEE Trans. Image Process., 2021

Group Reidentification with Multigrained Matching and Integration.
IEEE Trans. Cybern., 2021

Deep Transfer Hashing for Image Retrieval.
IEEE Trans. Circuits Syst. Video Technol., 2021

Noise Augmented Double-Stream Graph Convolutional Networks for Image Captioning.
IEEE Trans. Circuits Syst. Video Technol., 2021

Towards NIR-VIS Masked Face Recognition.
IEEE Signal Process. Lett., 2021

Unpaired Person Image Generation With Semantic Parsing Transformation.
IEEE Trans. Pattern Anal. Mach. Intell., 2021

Responsive Listening Head Generation: A Benchmark Dataset and Baseline.
CoRR, 2021

A Style and Semantic Memory Mechanism for Domain Generalization.
CoRR, 2021

Directional Self-supervised Learning for Risky Image Augmentations.
CoRR, 2021

A Baseline Framework for Part-level Action Parsing and Action Recognition.
CoRR, 2021

CoSeg: Cognitively Inspired Unsupervised Generic Event Segmentation.
CoRR, 2021

Semi-Supervised Domain Generalizable Person Re-Identification.
CoRR, 2021

A Low Rank Promoting Prior for Unsupervised Contrastive Learning.
CoRR, 2021

Augmentation Pathways Network for Visual Recognition.
CoRR, 2021

Multi-Agent Semi-Siamese Training for Long-tail and Shallow Face Learning.
CoRR, 2021

AttriMeter: An Attribute-guided Metric Interpreter for Person Re-Identification.
CoRR, 2021

CM-NAS: Rethinking Cross-Modality Neural Architectures for Visible-Infrared Person Re-Identification.
CoRR, 2021

Adaptive Graph Representation Learning and Reasoning for Face Parsing.
CoRR, 2021

Improving Self-supervised Learning with Automated Unsupervised Outlier Arbitration.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Flat and Shallow: Understanding Fake Image Detection Models by Architecture Profiling.
Proceedings of the MMAsia '21: ACM Multimedia Asia, Gold Coast, Australia, December 1, 2021

FaceX-Zoo: A PyTorch Toolbox for Face Recognition.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

ViDA-MAN: Visual Dialog with Digital Humans.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

The Next Generation Multimodal Conversational Search and Recommendation.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

CoCo-BERT: Improving Video-Language Pre-training with Contrastive Cross-modal Matching and Denoising.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

One-stage Context and Identity Hallucination Network.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

X-modaler: A Versatile and High-performance Codebase for Cross-modal Analytics.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Transferrable Contrastive Learning for Visual Domain Adaptation.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

TraND: Transferable Neighborhood Discovery for Unsupervised Cross-Domain Gait Recognition.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2021

Design of a deployable underwater robot for the recovery of autonomous underwater vehicles based on origami technique.
Proceedings of the IEEE International Conference on Robotics and Automation, 2021

Optimization Planning for 3D ConvNets.
Proceedings of the 38th International Conference on Machine Learning, 2021

Monocular, One-stage, Regression of Multiple 3D People.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Condensing a Sequence to One Informative Frame for Video Recognition.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Motion-Focused Contrastive Learning of Video Representations<sup>*</sup>.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

CM-NAS: Cross-Modality Neural Architecture Search for Visible-Infrared Person Re-Identification.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Explainable Person Re-Identification with Attribute-guided Metric Distillation.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

A Style and Semantic Memory Mechanism for Domain Generalization<sup>*</sup>.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Group-aware Label Transfer for Domain Adaptive Person Re-identification.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Dive Into Ambiguity: Latent Distribution Mining and Pairwise Uncertainty Estimation for Facial Expression Recognition.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Boosting Video Representation Learning With Multi-Faceted Integration.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Action Unit Memory Network for Weakly Supervised Temporal Action Localization.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Representing Videos As Discriminative Sub-Graphs for Action Recognition.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Weakly Supervised Semantic Segmentation for Large-Scale Point Cloud.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

SeCo: Exploring Sequence Supervision for Unsupervised Representation Learning.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

Scheduled Sampling in Vision-Language Pretraining with Decoupled Encoder-Decoder Network.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

Exploiting Relationship for Complex-scene Image Generation.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020
Listen, Look, and Find the One: Robust Person Search with Multimodality Index.
ACM Trans. Multim. Comput. Commun. Appl., 2020

Coarse-to-Fine Localization of Temporal Action Proposals.
IEEE Trans. Multim., 2020

Deep Metric Learning With Density Adaptivity.
IEEE Trans. Multim., 2020

Unlocking Author Power: On the Exploitation of Auxiliary Author-Retweeter Relations for Predicting Key Retweeters.
IEEE Trans. Knowl. Data Eng., 2020

Learning Rich Part Hierarchies With Progressive Attention Networks for Fine-Grained Image Recognition.
IEEE Trans. Image Process., 2020

MetaSearch: Incremental Product Search via Deep Meta-Learning.
IEEE Trans. Image Process., 2020

Collaborative online ranking algorithms for multitask learning.
Knowl. Inf. Syst., 2020

Synthetic Training for Monocular Human Mesh Recovery.
CoRR, 2020

The Elements of End-to-end Deep Face Recognition: A Survey of Recent Advances.
CoRR, 2020

CenterHMR: a Bottom-up Single-shot Method for Multi-person 3D Mesh Recovery from a Single Image.
CoRR, 2020

Pre-training for Video Captioning Challenge 2020 Summary.
CoRR, 2020

NPCFace: A Negative-Positive Cooperation Supervision for Training Large-scale Face Recognition.
CoRR, 2020

Joint Contrastive Learning with Infinite Possibilities.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

iDirector: An Intelligent Directing System for Live Broadcast.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Hierarchical Gumbel Attention Network for Text-based Person Search.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

AI-SAS: Automated In-match Soccer Analysis System.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Black Re-ID: A Head-shoulder Descriptor for the Challenging Problem of Person Re-Identification.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Down to the Last Detail: Virtual Try-on with Fine-grained Details.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Beyond the Parts: Learning Multi-view Cross-part Correlation for Vehicle Re-identification.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

SketchMan: Learning to Create Professional Sketches.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

PyAnomaly: A Pytorch-based Toolkit for Video Anomaly Detection.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

A Cross-modality and Progressive Person Search System.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Pose-native Network Architecture Search for Multi-person Human Pose Estimation.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Learning the Compositional Visual Coherence for Complementary Recommendations.
Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, 2020

Robust Visual Object Tracking with Two-Stream Residual Convolutional Networks.
Proceedings of the 25th International Conference on Pattern Recognition, 2020

Loss Function Search for Face Recognition.
Proceedings of the 37th International Conference on Machine Learning, 2020

Classes Matter: A Fine-Grained Adversarial Approach to Cross-Domain Semantic Segmentation.
Proceedings of the Computer Vision - ECCV 2020, 2020

Exclusivity-Consistency Regularized Knowledge Distillation for Face Recognition.
Proceedings of the Computer Vision - ECCV 2020, 2020

Edge-Aware Graph Representation Learning and Reasoning for Face Parsing.
Proceedings of the Computer Vision - ECCV 2020, 2020

Learning to Localize Actions from Moments.
Proceedings of the Computer Vision - ECCV 2020, 2020

Semi-Siamese Training for Shallow Face Learning.
Proceedings of the Computer Vision - ECCV 2020, 2020

Look-Into-Object: Self-Supervised Structure Modeling for Object Recognition.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Transferring and Regularizing Prediction for Semantic Segmentation.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Exploring Category-Agnostic Clusters for Open-Set Domain Adaptation.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

X-Linear Attention Networks for Image Captioning.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Learning a Unified Sample Weighting Network for Object Detection.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Mis-Classified Vector Guided Softmax Loss for Face Recognition.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

A New Dataset and Boundary-Attention Semantic Segmentation for Face Parsing.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019
Multi-source Multi-level Attention Networks for Visual Question Answering.
ACM Trans. Multim. Comput. Commun. Appl., 2019

Show, Reward, and Tell: Adversarial Visual Story Generation.
ACM Trans. Multim. Comput. Commun. Appl., 2019

Learning Click-Based Deep Structure-Preserving Embeddings with Visual Attention.
ACM Trans. Multim. Comput. Commun. Appl., 2019

Exploring Users' Internal Influence from Reviews for Social Recommendation.
IEEE Trans. Multim., 2019

Unified Spatio-Temporal Attention Networks for Action Recognition in Videos.
IEEE Trans. Multim., 2019

Video Summarization by Learning Deep Side Semantic Embedding.
IEEE Trans. Circuits Syst. Video Technol., 2019

Subspace Clustering by Block Diagonal Representation.
IEEE Trans. Pattern Anal. Mach. Intell., 2019

Deep Collaborative Embedding for Social Image Understanding.
IEEE Trans. Pattern Anal. Mach. Intell., 2019

Toward efficient indexing structure for scalable content-based music retrieval.
Multim. Syst., 2019

Exploiting hierarchical visual features for visual question answering.
Neurocomputing, 2019

Vision and Language: from Visual Perception to Content Creation.
CoRR, 2019

Theme-Matters: Fashion Compatibility Learning via Theme Attention.
CoRR, 2019

Zooming into Face Forensics: A Pixel-level Analysis.
CoRR, 2019

Scheduled Differentiable Architecture Search for Visual Recognition.
CoRR, 2019

Regularizing Proxies with Multi-Adversarial Training for Unsupervised Domain-Adaptive Semantic Segmentation.
CoRR, 2019

Hard-Aware Fashion Attribute Classification.
CoRR, 2019

Group Re-Identification with Multi-grained Matching and Integration.
CoRR, 2019

A High-Efficiency Framework for Constructing Large-Scale Face Parsing Benchmark.
CoRR, 2019

Predictive Ensemble Learning with Application to Scene Text Detection.
CoRR, 2019

WIDER Face and Pedestrian Challenge 2018: Methods and Results.
CoRR, 2019

Rethinking Visual Relationships for High-level Image Understanding.
CoRR, 2019

Improved Selective Refinement Network for Face Detection.
CoRR, 2019

daBNN: A Super Fast Inference Framework for Binary Neural Networks on ARM devices.
Proceedings of the 27th ACM International Conference on Multimedia, 2019

Adaptive Semantic-Visual Tree for Hierarchical Embeddings.
Proceedings of the 27th ACM International Conference on Multimedia, 2019

POINet: Pose-Guided Ovonic Insight Network for Multi-Person Pose Tracking.
Proceedings of the 27th ACM International Conference on Multimedia, 2019

BraidNet: Braiding Semantics and Details for Accurate Human Parsing.
Proceedings of the 27th ACM International Conference on Multimedia, 2019

Long Short-Term Relation Networks for Video Action Detection.
Proceedings of the 27th ACM International Conference on Multimedia, 2019

Animating Your Life: Real-Time Video-to-Animation Translation.
Proceedings of the 27th ACM International Conference on Multimedia, 2019

Mocycle-GAN: Unpaired Video-to-Video Translation.
Proceedings of the 27th ACM International Conference on Multimedia, 2019

Convolutional Auto-encoding of Sentence Topics for Image Paragraph Generation.
Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019

Multi-Granularity Reasoning for Social Relation Recognition From Images.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2019

A Single-Shot Oriented Scene Text Detector with Learnable Anchors.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2019

Everyone is a Cartoonist: Selfie Cartoonization with Attentive Adversarial Networks.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2019

Hierarchy Parsing for Image Captioning.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Co-Mining: Deep Face Recognition With Noisy Labels.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Human Mesh Recovery From Monocular Images via a Skeleton-Disentangled Representation.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Sampling Wisely: Deep Image Embedding by Top-K Precision Optimization.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

VrR-VG: Refocusing Visually-Relevant Relationships.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Relation Distillation Networks for Video Object Detection.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

ScratchDet: Training Single-Shot Object Detectors From Scratch.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Customizable Architecture Search for Semantic Segmentation.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Unsupervised Person Image Generation With Semantic Parsing Transformation.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Learning Spatio-Temporal Representation With Local and Global Diffusion.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Transferrable Prototypical Networks for Unsupervised Domain Adaptation.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Gaussian Temporal Awareness Networks for Action Localization.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Social Relation Recognition From Videos via Multi-Scale Spatial-Temporal Reasoning.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Pointing Novel Objects in Image Captioning.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Destruction and Construction Learning for Fine-Grained Image Recognition.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

To Find Where You Talk: Temporal Sentence Localization in Video with Attention Based Location Regression.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

Structured Two-Stream Attention Network for Video Question Answering.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

Temporal Deformable Convolutional Encoder-Decoder Networks for Video Captioning.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018
Image Similarity.
Proceedings of the Encyclopedia of Database Systems, Second Edition, 2018

Automatic Data Augmentation from Massive Web Images for Deep Visual Recognition.
ACM Trans. Multim. Comput. Commun. Appl., 2018

Learning Deep Spatio-Temporal Dependence for Semantic Video Segmentation.
IEEE Trans. Multim., 2018

PROVID: Progressive and Multimodal Vehicle Reidentification for Large-Scale Urban Surveillance.
IEEE Trans. Multim., 2018

Exploiting Web Images for Video Highlight Detection With Triplet Deep Ranking.
IEEE Trans. Multim., 2018

Multigranular Event Recognition of Personal Photo Albums.
IEEE Trans. Multim., 2018

Automatic Generation of Social Event Storyboard From Image Click-Through Data.
IEEE Trans. Circuits Syst. Video Technol., 2018

PageSense: Toward Stylewise Contextual Advertising via Visual Analysis of Web Pages.
IEEE Trans. Circuits Syst. Video Technol., 2018

Exploiting spatial-temporal context for trajectory based action video retrieval.
Multim. Tools Appl., 2018

Boosting image sentiment analysis with visual attention.
Neurocomputing, 2018

When Multimedia Meets Fashion.
IEEE Multim., 2018

Support Vector Guided Softmax Loss for Face Recognition.
CoRR, 2018

ScratchDet: Exploring to Train Single-Shot Object Detectors from Scratch.
CoRR, 2018

KTAN: Knowledge Transfer Adversarial Network.
CoRR, 2018

DA-GAN: Instance-level Image Translation by Deep Attention Generative Adversarial Networks (with Supplementary Materials).
CoRR, 2018

Deep Domain Adaptation Hashing with Adversarial Learning.
Proceedings of the 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, 2018

Deep Video Understanding: Representation Learning, Action Recognition, and Language Generation.
Proceedings of the 1st Workshop and Challenge on Comprehensive Video Understanding in the Wild, 2018

Session details: Keynote 2.
Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018

Session details: Best Paper Session.
Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018

Learning from History and Present: Next-item Recommendation via Discriminatively Exploiting User Behaviors.
Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2018

Greedy Layer-Wise Training of Long Short Term Memory Networks.
Proceedings of the 2018 IEEE International Conference on Multimedia & Expo Workshops, 2018

Tell-and-Answer: Towards Explainable Visual Question Answering using Attributes and Captions.
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31, 2018

Exploring Visual Relationship for Image Captioning.
Proceedings of the Computer Vision - ECCV 2018, 2018

Part-Aligned Bilinear Representations for Person Re-identification.
Proceedings of the Computer Vision - ECCV 2018, 2018

Recurrent Tubelet Proposal and Recognition Networks for Action Detection.
Proceedings of the Computer Vision - ECCV 2018, 2018

Deep Attention Neural Tensor Network for Visual Question Answering.
Proceedings of the Computer Vision - ECCV 2018, 2018

Fully Convolutional Adaptation Networks for Semantic Segmentation.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

DA-GAN: Instance-Level Image Translation by Deep Attention Generative Adversarial Networks.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Jointly Localizing and Describing Events for Dense Video Captioning.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Memory Matching Networks for One-Shot Image Recognition.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Show, Reward and Tell: Automatic Generation of Narrative Paragraph From Photo Stream by Adversarial Training.
Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

2017
Who Are Your "Real" Friends: Analyzing and Distinguishing Between Offline and Online Friendships From Social Multimedia Data.
IEEE Trans. Multim., 2017

Large-Scale Online Feature Selection for Ultra-High Dimensional Sparse Data.
ACM Trans. Knowl. Discov. Data, 2017

Editorial for special section of video analytics with deep learning.
Pattern Recognit., 2017

CrossbowCam: a handheld adjustable multi-camera system.
Multim. Tools Appl., 2017

Detecting shot boundary with sparse coding for video summarization.
Neurocomputing, 2017

Learning hierarchical video representation for action recognition.
Int. J. Multim. Inf. Retr., 2017

Deep Semantic Hashing with Generative Adversarial Networks.
Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2017

Seeing Bot.
Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2017

Searching Personal Photos on the Phone with Instant Visual Query Suggestion and Joint Text-Image Hashing.
Proceedings of the 2017 ACM on Multimedia Conference, 2017

Learning Multimodal Attention LSTM Networks for Video Captioning.
Proceedings of the 2017 ACM on Multimedia Conference, 2017

Learning Deep Contextual Attention Network for Narrative Photo Stream Captioning.
Proceedings of the on Thematic Workshops of ACM Multimedia 2017, Mountain View, CA, USA, October 23, 2017

To Create What You Tell: Generating Videos from Captions.
Proceedings of the 2017 ACM on Multimedia Conference, 2017

Deep Learning for Intelligent Video Analysis.
Proceedings of the 2017 ACM on Multimedia Conference, 2017

Learning Social Image Embedding with Deep Multimodal Attention Networks.
Proceedings of the on Thematic Workshops of ACM Multimedia 2017, Mountain View, CA, USA, October 23, 2017

Sequential Prediction of Social Media Popularity with Deep Temporal Context Networks.
Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, 2017

Learning Multi-attention Convolutional Neural Network for Fine-Grained Image Recognition.
Proceedings of the IEEE International Conference on Computer Vision, 2017

Boosting Image Captioning with Attributes.
Proceedings of the IEEE International Conference on Computer Vision, 2017

Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks.
Proceedings of the IEEE International Conference on Computer Vision, 2017

Joint Detection and Recounting of Abnormal Events by Learning Deep Generic Knowledge.
Proceedings of the IEEE International Conference on Computer Vision, 2017

Multi-level Attention Networks for Visual Question Answering.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

Incorporating Copying Mechanism in Image Captioning for Learning Novel Objects.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

Deep Quantization: Encoding Convolutional Activations with Deep Generative Model.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

Video Captioning with Transferred Semantic Attributes.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-Grained Image Recognition.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

Let Your Photos Talk: Generating Narrative Paragraph for Photo Stream via Bidirectional Attention Recurrent Neural Networks.
Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 2017

2016
Landmark Reranking for Smart Travel Guide Systems by Combining and Analyzing Diverse Media.
IEEE Trans. Syst. Man Cybern. Syst., 2016

Automatic Generation of Visual-Textual Presentation Layout.
ACM Trans. Multim. Comput. Commun. Appl., 2016

Monet: A System for Reliving Your Memories by Theme-Based Photo Storytelling.
IEEE Trans. Multim., 2016

Service Quality Evaluation by Exploring Social Users' Contextual Information.
IEEE Trans. Knowl. Data Eng., 2016

A Scalable Approach for Content-Based Image Retrieval in Peer-to-Peer Networks.
IEEE Trans. Knowl. Data Eng., 2016

Shop-Type Recommendation Leveraging the Data from Social Media and Location-Based Services.
ACM Trans. Knowl. Discov. Data, 2016

Web Image Search Re-Ranking With Click-Based Similarity and Typicality.
IEEE Trans. Image Process., 2016

A Diffusion and Clustering-Based Approach for Finding Coherent Motions and Understanding Crowd Scenes.
IEEE Trans. Image Process., 2016

Tree-Based Visualization and Optimization for Image Collection.
IEEE Trans. Cybern., 2016

Adaptive Content Condensation Based on Grid Optimization for Thumbnail Image Generation.
IEEE Trans. Circuits Syst. Video Technol., 2016

Personalized Travel Sequence Recommendation on Multi-Source Big Social Media.
IEEE Trans. Big Data, 2016

Guest Editorial: Learning Multimedia for Real World Applications.
Multim. Tools Appl., 2016

High-order local ternary patterns with locality preserving projection for smoke detection and image classification.
Inf. Sci., 2016

Social media analytics and learning.
Neurocomputing, 2016

Storytelling of Photo Stream with Bidirectional Multi-thread Recurrent Neural Network.
CoRR, 2016

Time Matters: Multi-scale Temporalization of Social Media Popularity.
Proceedings of the 2016 ACM Conference on Multimedia Conference, 2016

Multi-Scale Triplet CNN for Person Re-Identification.
Proceedings of the 2016 ACM Conference on Multimedia Conference, 2016

Share-and-Chat: Achieving Human-Level Video Commenting by Search and Multi-View Embedding.
Proceedings of the 2016 ACM Conference on Multimedia Conference, 2016

Video ChatBot: Triggering Live Social Interactions by Automatic Video Commenting.
Proceedings of the 2016 ACM Conference on Multimedia Conference, 2016

Action Recognition by Learning Deep Multi-Granular Spatio-Temporal Video Representation.
Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval, 2016

Deep Semantic-Preserving and Ranking-Based Hashing for Image Retrieval.
Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, 2016

Beyond Object Recognition: Visual Sentiment Analysis with Deep Coupled Adjective and Noun Neural Networks.
Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, 2016

Learning Deep Intrinsic Video Representation by Exploring Temporal Coherence and Graph Structure.
Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, 2016

Automatic suggestion of presentation image for storytelling.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2016

A representative-based framework for parsing and summarizing events in surveillance videos.
Proceedings of the 2016 IEEE International Conference on Multimedia & Expo Workshops, 2016

A Deep Learning-Based Approach to Progressive Vehicle Re-identification for Urban Surveillance.
Proceedings of the Computer Vision - ECCV 2016, 2016

Highlight Detection with Pairwise Deep Ranking for First-Person Video Summarization.
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

MSR-VTT: A Large Video Description Dataset for Bridging Video and Language.
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

Jointly Modeling Embedding and Translation to Bridge Video and Language.
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

You Lead, We Exceed: Labor-Free Video Concept Learning by Jointly Exploiting Web Videos and Images.
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

A Framework of Enlarging Face Datasets Used for Makeup Face Analysis.
Proceedings of the IEEE Second International Conference on Multimedia Big Data, 2016

Unfolding Temporal Dynamics: Predicting Social Media Popularity Using Multi-scale Temporal Decomposition.
Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2016

2015
Query Difficulty Estimation for Image Search With Query Reconstruction Error.
IEEE Trans. Multim., 2015

Query-Dependent Aesthetic Model With Deep Learning for Photo Quality Assessment.
IEEE Trans. Multim., 2015

Author Topic Model-Based Collaborative Filtering for Personalized POI Recommendations.
IEEE Trans. Multim., 2015

Super Fast Event Recognition in Internet Videos.
IEEE Trans. Multim., 2015

Activity Sensor: Check-In Usage Mining for Local Recommendation.
ACM Trans. Intell. Syst. Technol., 2015

Supporting Serendipitous Social Interaction Using Human Mobility Prediction.
IEEE Trans. Hum. Mach. Syst., 2015

MoVieUp: Automatic Mobile Video Mashup.
IEEE Trans. Circuits Syst. Video Technol., 2015

Landmark Summarization With Diverse Viewpoints.
IEEE Trans. Circuits Syst. Video Technol., 2015

Exploratory Product Image Search With Circle-to-Search Interaction.
IEEE Trans. Circuits Syst. Video Technol., 2015

Image Tag Refinement With View-Dependent Concept Representations.
IEEE Trans. Circuits Syst. Video Technol., 2015

Learning salient visual word for scalable mobile image retrieval.
Pattern Recognit., 2015

Click-boosting multi-modality graph-based reranking for image search.
Multim. Syst., 2015

Accurate sensing of scene geo-context via mobile visual localization.
Multim. Syst., 2015

TapTell: Interactive visual search for mobile task recommendation.
J. Vis. Commun. Image Represent., 2015

Tagging Personal Photos with Transfer Deep Learning.
Proceedings of the 24th International Conference on World Wide Web, 2015

Semi-supervised Hashing with Semantic Confidence for Large Scale Visual Search.
Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2015

Power of Tags: Predicting Popularity of Social Media in Geo-Spatial and Temporal Contexts.
Proceedings of the Advances in Multimedia Information Processing - PCM 2015, 2015

Travel Recommendation via Author Topic Model Based Collaborative Filtering.
Proceedings of the MultiMedia Modeling - 21st International Conference, 2015

EMIF: Towards a Scalable and Effective Indexing Framework for Large Scale Music Retrieval.
Proceedings of the 5th ACM on International Conference on Multimedia Retrieval, 2015

On the selection of trending image from the web.
Proceedings of the 2015 IEEE International Conference on Multimedia and Expo, 2015

Learning Query and Image Similarities with Ranking Canonical Correlation Analysis.
Proceedings of the 2015 IEEE International Conference on Computer Vision, 2015

Relaxing from Vocabulary: Robust Weakly-Supervised Deep Learning for Vocabulary-Free Image Tagging.
Proceedings of the 2015 IEEE International Conference on Computer Vision, 2015

Semi-supervised Domain Adaptation with Subspace Learning for visual recognition.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015

Multi-task deep visual-semantic embedding for video thumbnail selection.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015

User-curated image collections: Modeling and recommendation.
Proceedings of the 2015 IEEE International Conference on Big Data (IEEE BigData 2015), Santa Clara, CA, USA, October 29, 2015

Vision-Based Fine-Grained Location Estimation.
Proceedings of the Multimodal Location Estimation of Videos and Images, 2015

2014
Circle & Search: Attribute-Aware Shoe Retrieval.
ACM Trans. Multim. Comput. Commun. Appl., 2014

Personalized Video Recommendation through Graph Propagation.
ACM Trans. Multim. Comput. Commun. Appl., 2014

Browse-to-Search: Interactive Exploratory Search with Visual Entities.
ACM Trans. Inf. Syst., 2014

Socialized Mobile Photography: Learning to Photograph With Social Context via Mobile Devices.
IEEE Trans. Multim., 2014

A Bag-of-Importance Model With Locality-Constrained Coding Based Feature Learning for Video Summarization.
IEEE Trans. Multim., 2014

Instant Mobile Video Search With Layered Audio-Video Indexing and Progressive Transmission.
IEEE Trans. Multim., 2014

Predicting Failing Queries in Video Search.
IEEE Trans. Multim., 2014

Guest Editorial Special Section on Socio-Mobile Media Analysis and Retrieval.
IEEE Trans. Multim., 2014

Personalized Recommendation Combining User Interest and Social Circle.
IEEE Trans. Knowl. Data Eng., 2014

Community Discovery from Social Media by Low-Rank Matrix Recovery.
ACM Trans. Intell. Syst. Technol., 2014

Image Search Reranking With Query-Dependent Click-Based Relevance Feedback.
IEEE Trans. Image Process., 2014

Social Image Tagging With Diverse Semantics.
IEEE Trans. Cybern., 2014

Retrieval-Based Face Annotation by Weak Label Regularized Local Coordinate Coding.
IEEE Trans. Pattern Anal. Mach. Intell., 2014

Image tag refinement by regularized latent Dirichlet allocation.
Comput. Vis. Image Underst., 2014

Multimedia search reranking: A literature survey.
ACM Comput. Surv., 2014

Massive-scale Online Feature Selection for Sparse Ultra-high Dimensional Data.
CoRR, 2014

Learning to personalize trending image search suggestion.
Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2014

Click-through-based cross-view learning for image search.
Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2014

Just-for-me: an adaptive personalization system for location-aware social music recommendation.
Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2014

The Evolution of Research on Multimedia Travel Guide Search and Recommender Systems.
Proceedings of the MultiMedia Modeling - 20th Anniversary International Conference, 2014

Rescue Tail Queries: Learning to Image Search Re-rank via Click-wise Multimodal Fusion.
Proceedings of the ACM International Conference on Multimedia, MM '14, Orlando, FL, USA, November 03, 2014

Mobile visual search via hievarchical sparse coding.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2014

Personalized image recommendation for web search engine users.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2014

Query difficulty estimation via pseudo relevance feedback for image search.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2014

User specific friend recommendation in social media community.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2014

Predicting activity attendance in event-based social networks: content, context and social influence.
Proceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing, 2014

Bridging Human-Centered Social Media Content Across Web Domains.
Proceedings of the Human-Centered Social Media Analytics, 2014

2013
Near-lossless semantic video summarization and its applications to video analysis.
ACM Trans. Multim. Comput. Commun. Appl., 2013

Robust and accurate mobile visual localization and its applications.
ACM Trans. Multim. Comput. Commun. Appl., 2013

Interaction Design for Mobile Visual Search.
IEEE Trans. Multim., 2013

Towards Cross-Domain Learning for Social Video Popularity Prediction.
IEEE Trans. Multim., 2013

Interactive Multimodal Visual Search on Mobile Device.
IEEE Trans. Multim., 2013

GPS Estimation for Places of Interest From Social Users' Uploaded Photos.
IEEE Trans. Multim., 2013

Circular Reranking for Visual Search.
IEEE Trans. Image Process., 2013

Discriminative Exemplar Coding for Sign Language Recognition With Kinect.
IEEE Trans. Cybern., 2013

Marginalized multi-layer multi-instance kernel for video concept detection.
Signal Process., 2013

Video archaeology: understanding video manipulation history.
Multim. Tools Appl., 2013

Unified entity search in social media community.
Proceedings of the 22nd International World Wide Web Conference, 2013

Automatic generation of social media snippets for mobile browsing.
Proceedings of the ACM Multimedia Conference, 2013

Annotation for free: video tagging by mining user search behavior.
Proceedings of the ACM Multimedia Conference, 2013

Image search by graph-based label propagation with image representation from DNN.
Proceedings of the ACM Multimedia Conference, 2013

LAVES: an instant mobile video search system based on layered audio-video indexing.
Proceedings of the ACM Multimedia Conference, 2013

Listen, look, and gotcha: instant video search with mobile phones by layered audio-video indexing.
Proceedings of the ACM Multimedia Conference, 2013

Image search reranking with multi-latent topical graph.
Proceedings of the 2013 IEEE International Symposium on Circuits and Systems (ISCAS2013), 2013

Mobile multimedia travelogue generation by exploring geo-locations and image tags.
Proceedings of the 2013 IEEE International Symposium on Circuits and Systems (ISCAS2013), 2013

Friend transfer: Cold-start friend recommendation with cross-platform transfer learning of social knowledge.
Proceedings of the 2013 IEEE International Conference on Multimedia and Expo, 2013

A bag-of-importance model for video summarization.
Proceedings of the 2013 IEEE International Conference on Multimedia and Expo Workshops, 2013

2012
ImageSense: Towards contextual image advertising.
ACM Trans. Multim. Comput. Commun. Appl., 2012

Societally connected multimedia across cultures.
J. Zhejiang Univ. Sci. C, 2012

A comprehensive representation scheme for video semantic ontology and its applications in semantic concept detection.
Neurocomputing, 2012

Assessing photo quality with geo-context and crowdsourced photos.
Proceedings of the 2012 Visual Communications and Image Processing, 2012

Interactive mobile visual search for social activities completion using query image contextual model.
Proceedings of the 14th IEEE International Workshop on Multimedia Signal Processing, 2012

Local visual words coding for low bit rate mobile visual search.
Proceedings of the 20th ACM Multimedia Conference, MM '12, Nara, Japan, October 29, 2012

SocialTransfer: cross-domain transfer learning from social streams for media applications.
Proceedings of the 20th ACM Multimedia Conference, MM '12, Nara, Japan, October 29, 2012

Browse-to-search.
Proceedings of the 20th ACM Multimedia Conference, MM '12, Nara, Japan, October 29, 2012

Finding perfect rendezvous on the go: accurate mobile visual localization and its applications to routing.
Proceedings of the 20th ACM Multimedia Conference, MM '12, Nara, Japan, October 29, 2012

When video search goes wrong: predicting query failure using search engine logs and visual search results.
Proceedings of the 20th ACM Multimedia Conference, MM '12, Nara, Japan, October 29, 2012

Personalized video recommendation through tripartite graph propagation.
Proceedings of the 20th ACM Multimedia Conference, MM '12, Nara, Japan, October 29, 2012

Crowdsourced Learning to Photograph via Mobile Devices.
Proceedings of the 2012 IEEE International Conference on Multimedia and Expo, 2012

Empowering Cross-Domain Internet Media with Real-Time Topic Learning from Social Streams.
Proceedings of the 2012 IEEE International Conference on Multimedia and Expo, 2012

Predicting Image Popularity in an Incomplete Social Media Community by a Weighted Bi-partite Graph.
Proceedings of the 2012 IEEE International Conference on Multimedia and Expo, 2012

AMIGO: accurate mobile image geotagging.
Proceedings of the 4th International Conference on Internet Multimedia Computing and Service, 2012

Accelerometer-based single-handed video browsing on mobile devices: design and user studies.
Proceedings of the 4th International Conference on Internet Multimedia Computing and Service, 2012

Kinect-based visual communication system.
Proceedings of the 4th International Conference on Internet Multimedia Computing and Service, 2012

Probabilistic sequential POIs recommendation via check-in data.
Proceedings of the SIGSPATIAL 2012 International Conference on Advances in Geographic Information Systems (formerly known as GIS), 2012

Image search results refinement via outlier detection using deep contexts.
Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012

2011
Contextual Video Recommendation by Multimodal Relevance and User Feedback.
ACM Trans. Inf. Syst., 2011

Optimizing Visual Search Reranking via Pairwise Learning.
IEEE Trans. Multim., 2011

Image Decomposition With Multilabel Context: Algorithms and Applications.
IEEE Trans. Image Process., 2011

Contextual Bag-of-Words for Visual Categorization.
IEEE Trans. Circuits Syst. Video Technol., 2011

Visual search reranking via adaptive particle swarm optimization.
Pattern Recognit., 2011

Clip-based hierarchical representation for near-duplicate video detection.
Int. J. Comput. Math., 2011

Tap-to-search: Interactive and contextual visual search on mobile devices.
Proceedings of the IEEE 13th International Workshop on Multimedia Signal Processing (MMSP 2011), 2011

Community Discovery from Movie and Its Application to Poster Generation.
Proceedings of the Advances in Multimedia Modeling, 2011

Modeling social strength in social media community via kernel-based learning.
Proceedings of the 19th International Conference on Multimedia 2011, Scottsdale, AZ, USA, November 28, 2011

TapTell: understanding visual intents on-the-go.
Proceedings of the 19th International Conference on Multimedia 2011, Scottsdale, AZ, USA, November 28, 2011

Context-based friend suggestion in online photo-sharing community.
Proceedings of the 19th International Conference on Multimedia 2011, Scottsdale, AZ, USA, November 28, 2011

JIGSAW: interactive mobile visual search with multimodal queries.
Proceedings of the 19th International Conference on Multimedia 2011, Scottsdale, AZ, USA, November 28, 2011

Internet multimedia advertising: techniques and technologies.
Proceedings of the 19th International Conference on Multimedia 2011, Scottsdale, AZ, USA, November 28, 2011

Million-scale near-duplicate video retrieval system.
Proceedings of the 19th International Conference on Multimedia 2011, Scottsdale, AZ, USA, November 28, 2011

Photosense: Make sense of your photos with enriched harmonic music via emotion association.
Proceedings of the 2011 IEEE International Conference on Multimedia and Expo, 2011

When recommendation meets mobile: contextual and personalized recommendation on the go.
Proceedings of the UbiComp 2011: Ubiquitous Computing, 13th International Conference, 2011

2010
Multiview Spectral Embedding.
IEEE Trans. Syst. Man Cybern. Part B, 2010

Visual query suggestion: Towards capturing user intent in internet image search.
ACM Trans. Multim. Comput. Commun. Appl., 2010

Video Annotation Through Search and Graph Reinforcement Mining.
IEEE Trans. Multim., 2010

Typicality-Based Visual Search Reranking.
IEEE Trans. Circuits Syst. Video Technol., 2010

Contextual Internet Multimedia Advertising.
Proc. IEEE, 2010

GameSense: game-like in-image advertising.
Multim. Tools Appl., 2010

Introduction to the special issue on multimedia intelligent services and technologies.
Multim. Syst., 2010

AdOn: toward contextual overlay in-video advertising.
Multim. Syst., 2010

Visual quality assessment for web videos.
J. Vis. Commun. Image Represent., 2010

Large-scale image and video search: Challenges, technologies, and trends.
J. Vis. Commun. Image Represent., 2010

Knowledge Discovery from Community-Contributed Multimedia.
IEEE Multim., 2010

PageSense: style-wise web page advertising.
Proceedings of the 19th International Conference on World Wide Web, 2010

Dynamic Video Collage.
Proceedings of the Advances in Multimedia Modeling, 2010

Automatic video archaeology: tracing your online videos.
Proceedings of second ACM SIGMM workshop on Social media, 2010

Co-reranking by mutual reinforcement for image search.
Proceedings of the 9th ACM International Conference on Image and Video Retrieval, 2010

Scalable clip-based near-duplicate video detection with ordinal measure.
Proceedings of the 9th ACM International Conference on Image and Video Retrieval, 2010

2009
Image Similarity.
Proceedings of the Encyclopedia of Database Systems, 2009

Video collage: presenting a video sequence using a single image.
Vis. Comput., 2009

VideoSense: A Contextual In-Video Advertising System.
IEEE Trans. Circuits Syst. Video Technol., 2009

Multigraph-Based Query-Independent Learning for Video Search.
IEEE Trans. Circuits Syst. Video Technol., 2009

Multi-video synopsis for video representation.
Signal Process., 2009

Combining global, regional and contextual features for automatic image annotation.
Pattern Recognit., 2009

Graph-based semi-supervised learning with multiple labels.
J. Vis. Commun. Image Represent., 2009

Semi-supervised kernel density estimation for video annotation.
Comput. Vis. Image Underst., 2009

Gamesense.
Proceedings of the 18th International Conference on World Wide Web, 2009

CrowdReranking: exploring multiple search engines for visual search reranking.
Proceedings of the 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2009

AdOn: an intelligent overlay video advertising system.
Proceedings of the 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2009

Graph-Based Pairwise Learning to Rank for Video Search.
Proceedings of the Advances in Multimedia Modeling, 2009

Visual query suggestion.
Proceedings of the 17th International Conference on Multimedia 2009, 2009

NLVS: a near-lossless video summarization system.
Proceedings of the 17th International Conference on Multimedia 2009, 2009

Near-lossless video summarization.
Proceedings of the 17th International Conference on Multimedia 2009, 2009

Robust Distance Metric Learning with Auxiliary Knowledge.
Proceedings of the IJCAI 2009, 2009

Local-driven semi-supervised learning with multi-label.
Proceedings of the 2009 IEEE International Conference on Multimedia and Expo, 2009

Knowledge discovery over community-sharing media: From signal to intelligence.
Proceedings of the 2009 IEEE International Conference on Multimedia and Expo, 2009

Contextual decomposition of multi-label images.
Proceedings of the 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009), 2009

2008
Correlative multilabel video annotation with temporal kernels.
ACM Trans. Multim. Comput. Commun. Appl., 2008

Multi-Layer Multi-Instance Learning for Video Concept Detection.
IEEE Trans. Multim., 2008

Structure and event mining in sports video with efficient mosaic.
Multim. Tools Appl., 2008

Optimizing Training Set Construction for Video Semantic Classification.
EURASIP J. Adv. Signal Process., 2008

Learning Optimal Compact Codebook for Efficient Object Categorization.
Proceedings of the 9th IEEE Workshop on Applications of Computer Vision (WACV 2008), 2008

MSRA atT TRECVID 2008: High-Level Feature Extraction and Automatic Search.
Proceedings of the TRECVID 2008 workshop participants notebook papers, 2008

When multimedia advertising meets the new Internet era.
Proceedings of the International Workshop on Multimedia Signal Processing, 2008

Free-Shaped Video Collage.
Proceedings of the Advances in Multimedia Modeling, 2008

MILC<sup>2</sup>: A Multi-Layer Multi-Instance Learning Approach to Video Concept Detection.
Proceedings of the Advances in Multimedia Modeling, 2008

Contextual in-image advertising.
Proceedings of the 16th International Conference on Multimedia 2008, 2008

ImageSense.
Proceedings of the 16th International Conference on Multimedia 2008, 2008

Optimizing video search reranking via minimum incremental information loss.
Proceedings of the 1st ACM SIGMM International Conference on Multimedia Information Retrieval, 2008

Graph-based semi-supervised learning with multi-label.
Proceedings of the 2008 IEEE International Conference on Multimedia and Expo, 2008

Automatic video annotation through search and mining.
Proceedings of the 2008 IEEE International Conference on Multimedia and Expo, 2008

Query-independent learning for video search.
Proceedings of the 2008 IEEE International Conference on Multimedia and Expo, 2008

Learning to video search rerank via pseudo preference feedback.
Proceedings of the 2008 IEEE International Conference on Multimedia and Expo, 2008

Video<sup>M</sup>: Multi-video Synopsis.
Proceedings of the Workshops Proceedings of the 8th IEEE International Conference on Data Mining (ICDM 2008), 2008

Joint multi-label multi-instance learning for image classification.
Proceedings of the 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2008), 2008

Coherent image annotation by learning semantic distance.
Proceedings of the 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2008), 2008

2007
Modeling and Mining of Users' Capture Intention for Home Videos.
IEEE Trans. Multim., 2007

Home Video Visual Quality Assessment With Spatiotemporal Factors.
IEEE Trans. Circuits Syst. Video Technol., 2007

Interactive Video Annotation by Multi-Concept Multi-Modality Active Learning.
Int. J. Semantic Comput., 2007

MSRA-USTC-SJTU at TRECVID 2007: High-Level Feature Extraction and Search.
Proceedings of the TRECVID 2007 workshop participants notebook papers, 2007

VideoReach: an online video recommendation system.
Proceedings of the SIGIR 2007: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2007

Refining video annotation by exploiting pairwise concurrent relation.
Proceedings of the 15th International Conference on Multimedia 2007, 2007

Video annotation by graph-based learning with neighborhood similarity.
Proceedings of the 15th International Conference on Multimedia 2007, 2007

Structure-sensitive manifold ranking for video concept detection.
Proceedings of the 15th International Conference on Multimedia 2007, 2007

Correlative multi-label video annotation.
Proceedings of the 15th International Conference on Multimedia 2007, 2007

VideoSense: a contextual video advertising system.
Proceedings of the 15th International Conference on Multimedia 2007, 2007

VideoSense: towards effective online video advertising.
Proceedings of the 15th International Conference on Multimedia 2007, 2007

Video collage.
Proceedings of the 15th International Conference on Multimedia 2007, 2007

Multi-layer multi-instance kernel for video concept detection.
Proceedings of the 15th International Conference on Multimedia 2007, 2007

Building a comprehensive ontology to refine video concept detection.
Proceedings of the 9th ACM SIGMM International Workshop on Multimedia Information Retrieval, 2007

Video Collage: A Novel Presentation of Video Sequence.
Proceedings of the 2007 IEEE International Conference on Multimedia and Expo, 2007

Anisotropic Manifold Ranking for Video Annotation.
Proceedings of the 2007 IEEE International Conference on Multimedia and Expo, 2007

EMS: Energy Minimization Based Video Scene Segmentation.
Proceedings of the 2007 IEEE International Conference on Multimedia and Expo, 2007

Temporally Consistent Gaussian Random Field for Video Semantic Analysis.
Proceedings of the International Conference on Image Processing, 2007

Concurrent Multiple Instance Learning for Image Categorization.
Proceedings of the 2007 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2007), 2007

Online video recommendation based on multimodal fusion and relevance feedback.
Proceedings of the 6th ACM International Conference on Image and Video Retrieval, 2007

2006
Microsoft Research Asia TRECVID 2006 High-Level Feature Extraction and Rushes Exploitation.
Proceedings of the 2006 TREC Video Retrieval Evaluation, 2006

To construct optimal training set for video annotation.
Proceedings of the 14th ACM International Conference on Multimedia, 2006

Probabilistic Multimodality Fusion for Event based Home Photo Clustering.
Proceedings of the 2006 IEEE International Conference on Multimedia and Expo, 2006

Automatic Video Genre Categorization using Hierarchical SVM.
Proceedings of the International Conference on Image Processing, 2006

2005
Sports Video Mining with Mosaic.
Proceedings of the 11th International Conference on Multi Media Modeling (MMM 2005), 2005

Natural video browsing.
Proceedings of the 13th ACM International Conference on Multimedia, 2005

Spatio-temporal quality assessment for home videos.
Proceedings of the 13th ACM International Conference on Multimedia, 2005

Tracking users' capture intention: a novel complementary view for home video content analysis.
Proceedings of the 13th ACM International Conference on Multimedia, 2005

Intention-based home video browsing.
Proceedings of the 13th ACM International Conference on Multimedia, 2005

Video booklet: a natural video searching and browsing interface.
Proceedings of the 7th ACM SIGMM International Workshop on Multimedia Information Retrieval, 2005

Efficient video mosaicing based on motion analysis.
Proceedings of the 2005 International Conference on Image Processing, 2005


  Loading...