Qingming Huang

Orcid: 0000-0001-7542-296X

Affiliations:
  • University of Chinese Academy of Sciences, School of Computer Science and Technology, Beijing, China
  • Chinese Academy of Sciences, Institute of Computing Technology, Beijing, China
  • Harbin Institute of Technology, Harbin, China (PhD 1994)


According to our database1, Qingming Huang authored at least 704 papers between 2004 and 2025.

Collaborative distances:

Awards

IEEE Fellow

IEEE Fellow 2018, "For contributions to multimedia content analysis and visual perceptual processing".

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
Towards scalable topic detection on web via simulating Lévy walks nature of topics in similarity space.
Inf. Sci., 2025

Bundle fragments into a whole: Mining more complete clusters via submodular selection of interesting webpages for web topic detection.
Expert Syst. Appl., 2025

2024
Linguistic Hallucination for Text-Based Video Retrieval.
IEEE Trans. Circuits Syst. Video Technol., October, 2024

Self-Supervised Monocular Depth Estimation via Binocular Geometric Correlation Learning.
ACM Trans. Multim. Comput. Commun. Appl., August, 2024

Unsupervised Low-Light Image Enhancement via Luminance Mask and Luminance-Independent Representation Decoupling.
IEEE Trans. Emerg. Top. Comput. Intell., August, 2024

Dynamic Hypergraph Structure Learning for Multivariate Time Series Forecasting.
IEEE Trans. Big Data, August, 2024

Multiple object tracking based on appearance and motion graph convolutional neural networks with an explainer.
Neural Comput. Appl., August, 2024

Multi-Projection Fusion and Refinement Network for Salient Object Detection in 360° Omnidirectional Image.
IEEE Trans. Neural Networks Learn. Syst., July, 2024

Self Supervised Progressive Network for High Performance Video Object Segmentation.
IEEE Trans. Neural Networks Learn. Syst., June, 2024

Multiple-Level Distillation for Video Fine-Grained Accident Detection.
IEEE Trans. Circuits Syst. Video Technol., June, 2024

Self-Supervised Pretraining for Stereoscopic Image Super-Resolution With Parallax-Aware Masking.
IEEE Trans. Broadcast., June, 2024

A 9-10-Bit Adjustable and Energy-Efficient Switching Scheme for Successive Approximation Register Analog-to-Digital Converter with One Least Significant Bit Common-Mode Voltage Variation.
Sensors, June, 2024

Progressive Multi-Resolution Loss for Crowd Counting.
IEEE Trans. Circuits Syst. Video Technol., May, 2024

CenterNet++ for Object Detection.
IEEE Trans. Pattern Anal. Mach. Intell., May, 2024

PIPC-3Ddet: Harnessing Perspective Information and Proposal Correlation for 3D Point Cloud Object Detection.
IEEE Trans. Circuits Syst. Video Technol., March, 2024

A Comprehensive Survey of 3D Dense Captioning: Localizing and Describing Objects in 3D Scenes.
IEEE Trans. Circuits Syst. Video Technol., March, 2024

Unsupervised Single-View Synthesis Network via Style Guidance and Prior Distillation.
IEEE Trans. Circuits Syst. Video Technol., March, 2024

Learning Hierarchical Modular Networks for Video Captioning.
IEEE Trans. Pattern Anal. Mach. Intell., February, 2024

Mitigating Confounding Bias in Practical Recommender Systems With Partially Inaccessible Exposure Status.
IEEE Trans. Pattern Anal. Mach. Intell., February, 2024

Multi-Task Paired Masking With Alignment Modeling for Medical Vision-Language Pre-Training.
IEEE Trans. Multim., 2024

Semi-Supervised Medical Report Generation via Graph-Guided Hybrid Feature Consistency.
IEEE Trans. Multim., 2024

Limb-Aware Virtual Try-On Network With Progressive Clothing Warping.
IEEE Trans. Multim., 2024

Query-Guided Prototype Evolution Network for Few-Shot Segmentation.
IEEE Trans. Multim., 2024

Fine-Grained Accident Detection: Database and Algorithm.
IEEE Trans. Image Process., 2024

Enhancing Sample Utilization in Noise-Robust Deep Metric Learning With Subgroup-Based Positive-Pair Selection.
IEEE Trans. Image Process., 2024

Multi-Granularity Contrastive Cross-Modal Collaborative Generation for End-to-End Long-Term Video Question Answering.
IEEE Trans. Image Process., 2024

Rethink video retrieval representation for video captioning.
Pattern Recognit., 2024

Stereo Image Restoration via Attention-Guided Correspondence Learning.
IEEE Trans. Pattern Anal. Mach. Intell., 2024

Algorithm-Dependent Generalization of AUPRC Optimization: Theory and Algorithm.
IEEE Trans. Pattern Anal. Mach. Intell., 2024

SMART: Syntax-Calibrated Multi-Aspect Relation Transformer for Change Captioning.
IEEE Trans. Pattern Anal. Mach. Intell., 2024

Not All Diffusion Model Activations Have Been Evaluated as Discriminative Features.
CoRR, 2024

AUCSeg: AUC-oriented Pixel-level Long-tail Semantic Segmentation.
CoRR, 2024

Bilateral Sharpness-Aware Minimization for Flatter Minima.
CoRR, 2024

Improved Diversity-Promoting Collaborative Metric Learning for Recommendation.
CoRR, 2024

Decorrelating Structure via Adapters Makes Ensemble Learning Practical for Semi-supervised Learning.
CoRR, 2024

Scalable Graph Compressed Convolutions.
CoRR, 2024

Downstream-Pretext Domain Knowledge Traceback for Active Learning.
CoRR, 2024

Top-K Pairwise Ranking: Bridging the Gap Among Ranking-Based Measures for Multi-Label Classification.
CoRR, 2024

Sequential Manipulation Against Rank Aggregation: Theory and Algorithm.
CoRR, 2024

Retrieval Enhanced Zero-Shot Video Captioning.
CoRR, 2024

Uncertainty-boosted Robust Video Activity Anticipation.
CoRR, 2024

Finding A Taxi with Illegal Driver Substitution Activity via Behavior Modelings.
CoRR, 2024

A Channel-ensemble Approach: Unbiased and Low-variance Pseudo-labels is Critical for Semi-supervised Classification.
CoRR, 2024

Pick-and-Draw: Training-free Semantic Guidance for Text-to-Image Personalization.
CoRR, 2024

Regularized Contrastive Partial Multi-view Outlier Detection.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Not All Pairs are Equal: Hierarchical Learning for Average-Precision-Oriented Video Retrieval.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Unsupervised Image-to-Video Adaptation via Category-aware Flow Memory Bank and Realistic Video Generation.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

MovingColor: Seamless Fusion of Fine-grained Video Color Enhancement.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

HGOE: Hybrid External and Internal Graph Outlier Exposure for Graph Out-of-Distribution Detection.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Data-free Neural Representation Compression with Riemannian Neural Dynamics.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Modeling Language Tokens as Functionals of Semantic Fields.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Size-invariance Matters: Rethinking Metrics and Losses for Imbalanced Multi-object Salient Object Detection.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

ReconBoost: Boosting Can Achieve Modality Reconcilement.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

ESNet: Evolution and Succession Network for High-Resolution Salient Object Detection.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Harnessing Hierarchical Label Distribution Variations in Test Agnostic Long-tail Recognition.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Multimodal Knowledge Graph Embeddings via Lorentz-based Contrastive Learning.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2024

R&B: Region and Boundary Aware Zero-shot Grounded Text-to-image Generation.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Distractors-Immune Representation Learning with Cross-Modal Contrastive Regularization for Change Captioning.
Proceedings of the Computer Vision - ECCV 2024, 2024

Weakly Supervised Video Individual Counting.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Prompt-Enhanced Multiple Instance Learning for Weakly Supervised Video Anomaly Detection.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Feature-based Perturbation Makes a Better Ensemble Learning for SSL Classification.
Proceedings of the 2024 2nd Asia Conference on Computer Vision, 2024

Ensemble of Distinct Students for SSL 2D Pose Estimation.
Proceedings of the 2024 2nd Asia Conference on Computer Vision, 2024

Context-aware Difference Distilling for Multi-change Captioning.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

StyleDubber: Towards Multi-Scale Style Learning for Movie Dubbing.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

Bias-Conflict Sample Synthesis and Adversarial Removal Debias Strategy for Temporal Sentence Grounding in Video.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

ADA-GAD: Anomaly-Denoised Autoencoders for Graph Anomaly Detection.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Deep Gradual-Conversion and Cycle Network for Single-View Synthesis.
IEEE Trans. Emerg. Top. Comput. Intell., December, 2023

Revisiting AUC-Oriented Adversarial Training With Loss-Agnostic Perturbations.
IEEE Trans. Pattern Anal. Mach. Intell., December, 2023

AUC-Oriented Domain Adaptation: From Theory to Algorithm.
IEEE Trans. Pattern Anal. Mach. Intell., December, 2023

Positive-Unlabeled Learning With Label Distribution Alignment.
IEEE Trans. Pattern Anal. Mach. Intell., December, 2023

Temporal Dynamic Concept Modeling Network for Explainable Video Event Recognition.
ACM Trans. Multim. Comput. Commun. Appl., November, 2023

Multi-Modal Multi-Grained Embedding Learning for Generalized Zero-Shot Video Classification.
IEEE Trans. Circuits Syst. Video Technol., October, 2023

Multiple Instance Differentiation Learning for Active Object Detection.
IEEE Trans. Pattern Anal. Mach. Intell., October, 2023

ZS-SBPRnet: A Zero-Shot Sketch-Based Point Cloud Retrieval Network Based on Feature Projection and Cross-Reconstruction.
IEEE Trans. Ind. Informatics, August, 2023

RGB-D Human Matting: A Real-World Benchmark Dataset and a Baseline Method.
IEEE Trans. Circuits Syst. Video Technol., August, 2023

Optimizing Two-Way Partial AUC With an End-to-End Framework.
IEEE Trans. Pattern Anal. Mach. Intell., August, 2023

General Greedy De-Bias Learning.
IEEE Trans. Pattern Anal. Mach. Intell., August, 2023

Rethinking Label Flipping Attack: From Sample Masking to Sample Thresholding.
IEEE Trans. Pattern Anal. Mach. Intell., June, 2023

Self-Regulated Learning for Egocentric Video Activity Anticipation.
IEEE Trans. Pattern Anal. Mach. Intell., June, 2023

Learning Enriched Hop-Aware Correlation for Robust 3D Human Pose Estimation.
Int. J. Comput. Vis., June, 2023

Recurrent Interaction Network for Stereoscopic Image Super-Resolution.
IEEE Trans. Circuits Syst. Video Technol., May, 2023

MaxMatch: Semi-Supervised Learning With Worst-Case Consistency.
IEEE Trans. Pattern Anal. Mach. Intell., May, 2023

Optimizing Partial Area Under the Top-k Curve: Theory and Practice.
IEEE Trans. Pattern Anal. Mach. Intell., April, 2023

A Tale of HodgeRank and Spectral Method: Target Attack Against Rank Aggregation is the Fixed Point of Adversarial Game.
IEEE Trans. Pattern Anal. Mach. Intell., April, 2023

Global-and-Local Collaborative Learning for Co-Salient Object Detection.
IEEE Trans. Cybern., March, 2023

Entity-Enhanced Adaptive Reconstruction Network for Weakly Supervised Referring Expression Grounding.
IEEE Trans. Pattern Anal. Mach. Intell., March, 2023

Weakly Supervised Text-based Actor-Action Video Segmentation by Clip-level Multi-instance Learning.
ACM Trans. Multim. Comput. Commun. Appl., January, 2023

Uncertainty Modeling for Robust Domain Adaptation Under Noisy Environments.
IEEE Trans. Multim., 2023

Joint Embedding of Deep Visual and Semantic Features for Medical Image Report Generation.
IEEE Trans. Multim., 2023

Neighborhood Contrastive Transformer for Change Captioning.
IEEE Trans. Multim., 2023

Automatic Shadow Generation via Exposure Fusion.
IEEE Trans. Multim., 2023

Viewpoint Alignment and Discriminative Parts Enhancement in 3D Space for Vehicle ReID.
IEEE Trans. Multim., 2023

Does Thermal Really Always Matter for RGB-T Salient Object Detection?
IEEE Trans. Multim., 2023

Viewpoint-Adaptive Representation Disentanglement Network for Change Captioning.
IEEE Trans. Image Process., 2023

Unsupervised Low-Light Video Enhancement With Spatial-Temporal Co-Attention Transformer.
IEEE Trans. Image Process., 2023

Fine-Grained Feature Generation for Generalized Zero-Shot Video Classification.
IEEE Trans. Image Process., 2023

PUGAN: Physical Model-Guided Underwater Image Enhancement Using GAN With Dual-Discriminators.
IEEE Trans. Image Process., 2023

Graph-Based Structural Deep Spectral-Spatial Clustering for Hyperspectral Image.
IEEE Trans. Instrum. Meas., 2023

Spatial-Temporal Graph Network for Video Crowd Counting.
IEEE Trans. Circuits Syst. Video Technol., 2023

Rethinking Collaborative Metric Learning: Toward an Efficient Alternative Without Negative Sampling.
IEEE Trans. Pattern Anal. Mach. Intell., 2023

Text-driven Face Image Generation and Manipulation via Multi-level Residual Mapper.
Int. J. Softw. Informatics, 2023

Correction to: Learning Enriched Hop-Aware Correlation for Robust 3D Human Pose Estimation.
Int. J. Comput. Vis., 2023

Subject-Oriented Video Captioning.
CoRR, 2023

Weakly Supervised Video Individual CountingWeakly Supervised Video Individual Counting.
CoRR, 2023

Dynamic Erasing Network Based on Multi-Scale Temporal Features for Weakly Supervised Video Anomaly Detection.
CoRR, 2023

Generating Unbiased Pseudo-labels via a Theoretically Guaranteed Chebyshev Constraint to Unify Semi-supervised Classification and Regression.
CoRR, 2023

Modeling the Uncertainty with Maximum Discrepant Students for Semi-supervised 2D Pose Estimation.
CoRR, 2023

Towards Demystifying the Generalization Behaviors When Neural Collapse Emerges.
CoRR, 2023

Open-Set Knowledge-Based Visual Question Answering with Inference Paths.
CoRR, 2023

Multi-task Paired Masking with Alignment Modeling for Medical Vision-Language Pre-training.
CoRR, 2023

A Study of Neural Collapse Phenomenon: Grassmannian Frame, Symmetry, Generalization.
CoRR, 2023

Stable Attribute Group Editing for Reliable Few-shot Image Generation.
CoRR, 2023

A Unified Generalization Analysis of Re-Weighting and Logit-Adjustment for Imbalanced Learning.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Weighted ROC Curve in Cost Space: Extending AUC to Cost-Sensitive Learning.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

DRAUC: An Instance-wise Distributionally Robust AUC Optimization Framework.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Semantic-Aware Dynamic Feature Selection and Fusion for Object Detection in UAV Videos.
Proceedings of the ACM Multimedia Asia 2023, 2023

Synthesizing Videos from Images for Image-to-Video Adaptation.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Adaptive Feature Swapping for Unsupervised Domain Adaptation.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Conversational Composed Retrieval with Iterative Sequence Refinement.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

When Measures are Unreliable: Imperceptible Adversarial Perturbations toward Top-k Multi-Label Learning.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

PSNEA: Pseudo-Siamese Network for Entity Alignment between Multi-modal Knowledge Graphs.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

MaTCR: Modality-Aligned Thought Chain Reasoning for Multimodal Task-Oriented Dialogue Generation.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Augmented Spatial Context Fusion Network for Scene Graph Generation.
Proceedings of the International Joint Conference on Neural Networks, 2023

All in a Row: Compressed Convolution Networks for Graphs.
Proceedings of the International Conference on Machine Learning, 2023

Feature Directions Matter: Long-Tailed Learning via Rotated Balanced Representation.
Proceedings of the International Conference on Machine Learning, 2023

Self-supervised Cross-view Representation Reconstruction for Change Captioning.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Building Bridge Across the Time: Disruption and Restoration of Murals In the Wild.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Text-Driven Generative Domain Adaptation with Spectral Consistency Regularization.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Exploiting Completeness and Uncertainty of Pseudo Labels for Weakly Supervised Video Anomaly Detection.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Learning to Dub Movies via Hierarchical Prosody Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Towards Decision-Friendly AUC: Learning Multi-Classifier with AUCµ.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
Introduction to the Special Issue on Fine-Grained Visual Recognition and Re-Identification.
ACM Trans. Multim. Comput. Commun. Appl., 2022

Continuation Multiple Instance Learning for Weakly and Fully Supervised Object Detection.
IEEE Trans. Neural Networks Learn. Syst., 2022

Weakly Supervised Anomaly Detection in Videos Considering the Openness of Events.
IEEE Trans. Intell. Transp. Syst., 2022

Toward Understanding and Boosting Adversarial Transferability From a Distribution Perspective.
IEEE Trans. Image Process., 2022

I<sup>2</sup>Transformer: Intra- and Inter-Relation Embedding Transformer for TV Show Captioning.
IEEE Trans. Image Process., 2022

Long Short-Term Relation Transformer With Global Gating for Video Captioning.
IEEE Trans. Image Process., 2022

C2FNet: A Coarse-to-Fine Network for Multi-View 3D Point Cloud Generation.
IEEE Trans. Image Process., 2022

CIR-Net: Cross-Modality Interaction and Refinement for RGB-D Salient Object Detection.
IEEE Trans. Image Process., 2022

SIEV-Net: A Structure-Information Enhanced Voxel Network for 3D Object Detection From LiDAR Point Clouds.
IEEE Trans. Geosci. Remote. Sens., 2022

Fine-Grained Image Quality Assessment: A Revisit and Further Thinking.
IEEE Trans. Circuits Syst. Video Technol., 2022

LVE-S2D: Low-Light Video Enhancement From Static to Dynamic.
IEEE Trans. Circuits Syst. Video Technol., 2022

Deep Affine Motion Compensation Network for Inter Prediction in VVC.
IEEE Trans. Circuits Syst. Video Technol., 2022

Syntax-Guided Hierarchical Attention Network for Video Captioning.
IEEE Trans. Circuits Syst. Video Technol., 2022

Learning With Multiclass AUC: Theory and Algorithms.
IEEE Trans. Pattern Anal. Mach. Intell., 2022

Not All Samples are Trustworthy: Towards Deep Robust SVP Prediction.
IEEE Trans. Pattern Anal. Mach. Intell., 2022

Poisoning Attack Against Estimating From Pairwise Comparisons.
IEEE Trans. Pattern Anal. Mach. Intell., 2022

Meta-Wrapper: Differentiable Wrapping Operator for User Interest Selection in CTR Prediction.
IEEE Trans. Pattern Anal. Mach. Intell., 2022

Consistency-Aware Anchor Pyramid Network for Crowd Localization.
CoRR, 2022

Towards Understanding and Boosting Adversarial Transferability from a Distribution Perspective.
CoRR, 2022

Video frame prediction with dual-stream deep network emphasizing motions and content details.
Appl. Soft Comput., 2022

Exploring the Algorithm-Dependent Generalization of AUPRC Optimization with List Stability.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

OpenAUC: Towards AUC-Oriented Open-Set Recognition.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Asymptotically Unbiased Instance-wise Regularized Partial AUC Optimization: Theory and Algorithm.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

OTKGE: Multi-modal Knowledge Graph Embeddings via Optimal Transport.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

The Minority Matters: A Diversity-Promoting Collaborative Metric Learning Algorithm.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Zero-shot Video Classification with Appropriate Web and Task Knowledge Transfer.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Pay Attention to Your Positive Pairs: Positive Pair Aware Contrastive Knowledge Distillation.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Span-based Audio-Visual Localization.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Confederated Learning: Going Beyond Centralization.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Concept Propagation via Attentional Knowledge Graph Reasoning for Video-Text Retrieval.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

A Unified Framework against Topology and Class Imbalance.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Recurrent Meta-Learning against Generalized Cold-start Problem in CTR Prediction.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Multi-Attention Network for Compressed Video Referring Object Segmentation.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Inferential Visual Question Generation.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

CRNet: Collaborative Refinement Network for Self-Supervised Video Object Segmentation.
Proceedings of the 5th IEEE International Conference on Multimedia Information Processing and Retrieval, 2022

A Sparse-Motif Ensemble Graph Convolutional Network against Over-smoothing.
Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, 2022

Quaternion Ordinal Embedding.
Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, 2022

AdAUC: End-to-end Adversarial AUC Optimization Against Long-tail Problems.
Proceedings of the International Conference on Machine Learning, 2022

Enhanced Semantic Head for Cascade Instance Segmentation.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2022

Think Beyond Words: Exploring Context-Relevant Visual Commonsense for Diverse Dialogue Generation.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022

Learning Linguistic Association Towards Efficient Text-Video Retrieval.
Proceedings of the Computer Vision - ECCV 2022, 2022

Dist-PU: Positive-Unlabeled Learning from a Label Distribution Perspective.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Hierarchical Modular Network for Video Captioning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Few Shot Generative Model Adaption via Relaxed Spatial Structural Alignment.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Attribute Group Editing for Reliable Few-shot Image Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Automatic Relation-aware Graph Network Proliferation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

ER: Equivariance Regularizer for Knowledge Graph Completion.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

Geometry Interaction Knowledge Graph Embeddings.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021
Graph Regularized Encoder-Decoder Networks for Image Representation Learning.
IEEE Trans. Multim., 2021

Self-Supervised Deep TripleNet for Video Object Segmentation.
IEEE Trans. Multim., 2021

Augmented Adversarial Training for Cross-Modal Retrieval.
IEEE Trans. Multim., 2021

Learning Feature Representation and Partial Correlation for Multimodal Multi-Label Data.
IEEE Trans. Multim., 2021

Neural Collaborative Preference Learning With Pairwise Comparisons.
IEEE Trans. Multim., 2021

Embedding Perspective Analysis Into Multi-Column Convolutional Neural Network for Crowd Counting.
IEEE Trans. Image Process., 2021

Learning Self-Supervised Space-Time CNN for Fast Video Style Transfer.
IEEE Trans. Image Process., 2021

Decomposition and Completion Network for Salient Object Detection.
IEEE Trans. Image Process., 2021

DPANet: Depth Potentiality-Aware Gated Attention Network for RGB-D Salient Object Detection.
IEEE Trans. Image Process., 2021

Toward Realistic Face Photo-Sketch Synthesis via Composition-Aided GANs.
IEEE Trans. Cybern., 2021

ASIF-Net: Attention Steered Interweave Fusion Network for RGB-D Salient Object Detection.
IEEE Trans. Cybern., 2021

Long-Term Video Question Answering via Multimodal Hierarchical Memory Attentive Networks.
IEEE Trans. Circuits Syst. Video Technol., 2021

Multi-View Spatial Attention Embedding for Vehicle Re-Identification.
IEEE Trans. Circuits Syst. Video Technol., 2021

Deep Stereoscopic Image Super-Resolution via Interaction Module.
IEEE Trans. Circuits Syst. Video Technol., 2021

Deep Spatial-Spectral Subspace Clustering for Hyperspectral Image.
IEEE Trans. Circuits Syst. Video Technol., 2021

Stereoscopic Image Retargeting Based on Deep Convolutional Neural Network.
IEEE Trans. Circuits Syst. Video Technol., 2021

Task-Feature Collaborative Learning with Application to Personalized Attribute Prediction.
IEEE Trans. Pattern Anal. Mach. Intell., 2021

Harmonized Multimodal Learning with Gaussian Process Latent Variable Models.
IEEE Trans. Pattern Anal. Mach. Intell., 2021

Local-binarized very deep residual network for visual categorization.
Neurocomputing, 2021

Evaluating Visual Properties via Robust HodgeRank.
Int. J. Comput. Vis., 2021

Viewpoint and Scale Consistency Reinforcement for UAV Vehicle Re-Identification.
Int. J. Comput. Vis., 2021

Introduction to the Special Issue on MMAC: Multimodal Affective Computing of Large-Scale Multimedia Data.
IEEE Multim., 2021

DVCFlow: Modeling Information Flow Towards Human-like Video Captioning.
CoRR, 2021

Edge-featured Graph Neural Architecture Search.
CoRR, 2021

Fast Batch Nuclear-norm Maximization and Minimization for Robust Domain Adaptation.
CoRR, 2021

Location-Sensitive Visual Recognition with Cross-IOU Loss.
CoRR, 2021

Rethinking Graph Neural Network Search from Message-passing.
CoRR, 2021

When False Positive is Intolerant: End-to-End Optimization with Low FPR for Multipartite Ranking.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Semi-Autoregressive Image Captioning.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Implicit Feedbacks are Not Always Favorable: Iterative Relabeled One-Class Collaborative Filtering against Noisy Interactions.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Pareto Optimality for Fairness-constrained Collaborative Filtering.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Learning Unified Embeddings for Recommendation via Meta-path Semantics.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Multimodal Entity Linking: A New Dataset and A Baseline.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Cascade Cross-modal Attention Network for Video Actor and Action Segmentation from a Sentence.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

One-Shot Example Videos Localization Network for Weakly-Supervised Temporal Action Localization.
Proceedings of the 4th IEEE International Conference on Multimedia Information Processing and Retrieval, 2021

When All We Need is a Piece of the Pie: A Generic Framework for Optimizing Two-way Partial AUC.
Proceedings of the 38th International Conference on Machine Learning, 2021

Action Category and Phase Consistency Regularization for High-Quality Temporal Action Proposal Generation.
Proceedings of the 2021 IEEE International Conference on Multimedia and Expo, 2021

DBAM: Dense Boundary and Actionness Map for Action Localization in Videos via Sentence Query.
Proceedings of the Image and Graphics - 11th International Conference, 2021

Two-Stage Polishing Network for Camouflaged Object Detection.
Proceedings of the Image and Graphics - 11th International Conference, 2021

Exploiting sample correlation for crowd counting with multi-expert network.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Greedy Gradient Ensemble for Robust Visual Question Answering.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Message from the DIKW 2021 Program Chairs.
Proceedings of the 2021 IEEE 23rd Int Conf on High Performance Computing & Communications; 7th Int Conf on Data Science & Systems; 19th Int Conf on Smart City; 7th Int Conf on Dependability in Sensor, 2021

Seeking the Shape of Sound: An Adaptive Framework for Learning Voice-Face Association.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Rethinking Graph Neural Architecture Search From Message-Passing.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Deep Partial Rank Aggregation for Personalized Attributes.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

Nearest Neighbor Classifier Embedded Network for Active Learning.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

What to Select: Pursuing Consistent Motion Segmentation from Multiple Geometric Models.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

Dual Quaternion Knowledge Graph Embeddings.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020
Proposal Complementary Action Detection.
ACM Trans. Multim. Comput. Commun. Appl., 2020

Spatial Pyramid-Enhanced NetVLAD With Weighted Triplet Loss for Place Recognition.
IEEE Trans. Neural Networks Learn. Syst., 2020

Online Fast Adaptive Low-Rank Similarity Learning for Cross-Modal Retrieval.
IEEE Trans. Multim., 2020

Stereoscopic Image Stitching via Disparity-Constrained Warping and Blending.
IEEE Trans. Multim., 2020

A Recursive Constrained Framework for Unsupervised Video Action Clustering.
IEEE Trans. Ind. Informatics, 2020

Going From RGB to RGBD Saliency: A Depth-Guided Transformation Model.
IEEE Trans. Cybern., 2020

Multimodal Transformer With Multi-View Visual Representation for Image Captioning.
IEEE Trans. Circuits Syst. Video Technol., 2020

Detecting Small Objects Using a Channel-Aware Deconvolutional Network.
IEEE Trans. Circuits Syst. Video Technol., 2020

Discrete Probability Distribution Prediction of Image Emotions with Shared Sparse Learning.
IEEE Trans. Affect. Comput., 2020

Conditional GAN based individual and global motion fusion for multiple object tracking in UAV videos.
Pattern Recognit. Lett., 2020

Intra- and Inter-modal Multilinear Pooling with Multitask Learning for Video Grounding.
Neural Process. Lett., 2020

Deep neural networks for emerging multimedia computing and applications.
Neurocomputing, 2020

Style-adaptive photo aesthetic rating via convolutional neural networks and multi-task learning.
Neurocomputing, 2020

The Unmanned Aerial Vehicle Benchmark: Object Detection, Tracking and Baseline.
Int. J. Comput. Vis., 2020

Two-stream deep sparse network for accurate and efficient image restoration.
Comput. Vis. Image Underst., 2020

Semantic Editing On Segmentation Map Via Multi-Expansion Loss.
CoRR, 2020

Depth Potentiality-Aware Gated Attention Network for RGB-D Salient Object Detection.
CoRR, 2020

CSCNet: A Shallow Single Column Network for Crowd Counting.
Proceedings of the 2020 IEEE International Conference on Visual Communications and Image Processing, 2020

Video Anomaly Detection Using Open Data Filter and Domain Adaptation.
Proceedings of the 2020 IEEE International Conference on Visual Communications and Image Processing, 2020

Heuristic Domain Adaptation.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Fixation guided network for salient object detection.
Proceedings of the MMAsia 2020: ACM Multimedia Asia, 2020

Structural Semantic Adversarial Active Learning for Image Captioning.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

DMVOS: Discriminative Matching for Real-time Video Object Segmentation.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Image Inpainting Based on Multi-frequency Probabilistic Inference Model.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Modeling Temporal Concept Receptive Field Dynamically for Untrimmed Video Analysis.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Towards More Explainability: Concept Knowledge Mining Network for Event Recognition.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Fine-grained Feature Alignment with Part Perspective Transformation for Vehicle ReID.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Transferrable Referring Expression Grounding with Concept Transfer and Context Inheritance.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

IR-GAN: Image Manipulation with Linguistic Instruction by Increment Reasoning.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Diverter-Guider Recurrent Network for Diverse Poems Generation from Image.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Quaternion-Based Knowledge Graph Network for Recommendation.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Generalized Zero-Shot Video Classification via Generative Adversarial Networks.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Task-distribution-aware Meta-learning for Cold-start CTR Prediction.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

A Structured Latent Variable Recurrent Network With Stochastic Attention For Generating Weibo Comments.
Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, 2020

Siamese Dynamic Mask Estimation Network for Fast Video Object Segmentation.
Proceedings of the 25th International Conference on Pattern Recognition, 2020

Weakly-Supervised Crowd Counting Learns from Sorting Rather Than Locations.
Proceedings of the Computer Vision - ECCV 2020, 2020

Interpretable Visual Reasoning via Probabilistic Formulation Under Natural Supervision.
Proceedings of the Computer Vision - ECCV 2020, 2020

Corner Proposal Network for Anchor-Free, Two-Stage Object Detection.
Proceedings of the Computer Vision - ECCV 2020, 2020

State-Relabeling Adversarial Active Learning.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Reverse Perspective Network for Perspective-Aware Object Counting.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Label Decoupling Framework for Salient Object Detection.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Parsing-Based View-Aware Embedding Network for Vehicle Re-Identification.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Gradually Vanishing Bridge for Adversarial Domain Adaptation.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Towards Discriminability and Diversity: Batch Nuclear-Norm Maximization Under Label Insufficient Situations.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Release the Power of Online-Training for Robust Visual Tracking.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

Who Likes What? - SplitLBI in Exploring Preferential Diversity of Ratings.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

F³Net: Fusion, Feedback and Focus for Salient Object Detection.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

Global Context-Aware Progressive Aggregation Network for Salient Object Detection.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019
Two Birds With One Stone: A Coupled Poisson Deconvolution for Detecting and Describing Topics From Multimodal Web Data.
IEEE Trans. Neural Networks Learn. Syst., 2019

SkeletonNet: A Hybrid Network With a Skeleton-Embedding Process for Multi-View Image Representation Learning.
IEEE Trans. Multim., 2019

HSCS: Hierarchical Sparsity Based Co-saliency Detection for RGBD Images.
IEEE Trans. Multim., 2019

Learning to Predict Bus Arrival Time From Heterogeneous Measurements via Recurrent Neural Network.
IEEE Trans. Intell. Transp. Syst., 2019

Split Multiplicative Multi-View Subspace Clustering.
IEEE Trans. Image Process., 2019

Online Asymmetric Metric Learning With Multi-Layer Similarity Aggregation for Cross-Modal Retrieval.
IEEE Trans. Image Process., 2019

Video Saliency Detection via Sparsity-Based Reconstruction and Propagation.
IEEE Trans. Image Process., 2019

Increasing Interpretation of Web Topic Detection via Prototype Learning From Sparse Poisson Deconvolution.
IEEE Trans. Cybern., 2019

An Iterative Co-Saliency Framework for RGBD Images.
IEEE Trans. Cybern., 2019

Learning Coupled Convolutional Networks Fusion for Video Saliency Prediction.
IEEE Trans. Circuits Syst. Video Technol., 2019

Person Re-Identification by Semantic Region Representation and Topology Constraint.
IEEE Trans. Circuits Syst. Video Technol., 2019

Review of Visual Saliency Detection With Comprehensive Information.
IEEE Trans. Circuits Syst. Video Technol., 2019

Deep Constrained Low-Rank Subspace Learning for Multi-View Semi-Supervised Classification.
IEEE Signal Process. Lett., 2019

From Social to Individuals: A Parsimonious Path of Multi-Level Models for Crowdsourced Preference Aggregation.
IEEE Trans. Pattern Anal. Mach. Intell., 2019

Hedging Deep Features for Visual Tracking.
IEEE Trans. Pattern Anal. Mach. Intell., 2019

Beyond global fusion: A group-aware fusion approach for multi-view image clustering.
Inf. Sci., 2019

Improving multi-label classification with missing labels by learning label-specific features.
Inf. Sci., 2019

Multi-modal semantic autoencoder for cross-modal retrieval.
Neurocomputing, 2019

Robust visual tracking via scale-and-state-awareness.
Neurocomputing, 2019

Regularized topic-aware latent influence propagation in dynamic relational networks.
GeoInformatica, 2019

F3Net: Fusion, Feedback and Focus for Salient Object Detection.
CoRR, 2019

Label Correlation Guided Deep Multi-View Image Annotation.
IEEE Access, 2019

Multi-View Multi-Label Learning With View-Label-Specific Features.
IEEE Access, 2019

Generalized Block-Diagonal Structure Pursuit: Learning Soft Latent Task Assignment against Negative Transfer.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

iSplit LBI: Individualized Partial Ranking with Ties via Split LBI.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

DM2C: Deep Mixed-Modal Clustering.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Accelerating Topic Detection on Web for a Large-Scale Data Set via Stochastic Poisson Deconvolution.
Proceedings of the MultiMedia Modeling - 25th International Conference, 2019

Self-balance Motion and Appearance Model for Multi-object Tracking in UAV.
Proceedings of the MMAsia '19: ACM Multimedia Asia, Beijing, China, December 16-18, 2019, 2019

Active Perception Network for Salient Object Detection.
Proceedings of the MMAsia '19: ACM Multimedia Asia, Beijing, China, December 16-18, 2019, 2019

Fast and Accurately Measuring Crack Width via Cascade Principal Component Analysis.
Proceedings of the MMAsia '19: ACM Multimedia Asia, Beijing, China, December 16-18, 2019, 2019

Domain Specific and Idiom Adaptive Video Summarization.
Proceedings of the MMAsia '19: ACM Multimedia Asia, Beijing, China, December 16-18, 2019, 2019

Training Efficient Saliency Prediction Models with Knowledge Distillation.
Proceedings of the 27th ACM International Conference on Multimedia, 2019

Structured Stochastic Recurrent Network for Linguistic Video Prediction.
Proceedings of the 27th ACM International Conference on Multimedia, 2019

Learning Fragment Self-Attention Embeddings for Image-Text Matching.
Proceedings of the 27th ACM International Conference on Multimedia, 2019

Adversarial Preference Learning with Pairwise Comparisons.
Proceedings of the 27th ACM International Conference on Multimedia, 2019

Knowledge-guided Pairwise Reconstruction Network for Weakly Supervised Referring Expression Grounding.
Proceedings of the 27th ACM International Conference on Multimedia, 2019

Duet Robust Deep Subspace Clustering.
Proceedings of the 27th ACM International Conference on Multimedia, 2019

Collaborative Preference Embedding against Sparse Labels.
Proceedings of the 27th ACM International Conference on Multimedia, 2019

Channel-wise Temporal Attention Network for Video Action Recognition.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2019

Two-Stream Sparse Network for Accurate Image Super-Resolution.
Proceedings of the IEEE International Conference on Multimedia & Expo Workshops, 2019


Stacked Cross Refinement Network for Edge-Aware Salient Object Detection.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Adaptive Reconstruction Network for Weakly Supervised Referring Expression Grounding.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

CenterNet: Keypoint Triplets for Object Detection.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Unsupervised Open Domain Recognition by Semantic Discrepancy Minimization.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Deep Robust Subjective Visual Property Prediction in Crowdsourcing.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Cascaded Partial Decoder for Fast and Accurate Salient Object Detection.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Spatiotemporal CNN for Video Object Segmentation.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Learning Personalized Attribute Preference via Multi-Task AUC Optimization.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

Learning Attribute-Specific Representations for Visual Tracking.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018
Generalized Semi-supervised and Structured Subspace Learning for Cross-Modal Retrieval.
IEEE Trans. Multim., 2018

Discovering Fine-Grained Spatial Pattern From Taxi Trips: Where Point Process Meets Matrix Decomposition and Factorization.
IEEE Trans. Intell. Transp. Syst., 2018

Structure-Aware Local Sparse Coding for Visual Tracking.
IEEE Trans. Image Process., 2018

Iterative Graph Seeking for Object Tracking.
IEEE Trans. Image Process., 2018

Co-Saliency Detection for RGBD Images Based on Multi-Constraint Feature Matching and Cross Label Propagation.
IEEE Trans. Image Process., 2018

Joint Feature Selection and Classification for Multilabel Learning.
IEEE Trans. Cybern., 2018

Image Class Prediction by Joint Object, Context, and Background Modeling.
IEEE Trans. Circuits Syst. Video Technol., 2018

Bilevel Multiview Latent Space Learning.
IEEE Trans. Circuits Syst. Video Technol., 2018

Blind image quality prediction by exploiting multi-level deep representations.
Pattern Recognit., 2018

Click data guided query modeling with click propagation and sparse coding.
Multim. Tools Appl., 2018

Joint multi-view representation and image annotation via optimal predictive subspace learning.
Inf. Sci., 2018

Online multiple object tracking via exchanging object context.
Neurocomputing, 2018

A two-step approach to describing web topics via probable keywords and prototype images from background-removed similarities.
Neurocomputing, 2018

Semantic invariant cross-domain image generation with generative adversarial networks.
Neurocomputing, 2018

Multi-label double-layer learning for cross-modal retrieval.
Neurocomputing, 2018

Weakly Supervised Local Attention Network for Fine-Grained Visual Classification.
CoRR, 2018

SCAN: Spatial and Channel Attention Network for Vehicle Re-Identification.
Proceedings of the Advances in Multimedia Information Processing - PCM 2018, 2018

A Margin-based MLE for Crowdsourced Partial Ranking.
Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018

Learning Semantic Structure-preserved Embeddings for Cross-modal Retrieval.
Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018

Joint Global and Co-Attentive Representation Learning for Image-Sentence Retrieval.
Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018

When to Learn What: Deep Cognitive Subspace Clustering.
Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018

ASMMC-MMAC 2018: The Joint Workshop of 4th the Workshop on Affective Social Multimedia Computing and first Multi-Modal Affective Computing of Large-Scale Multimedia Data Workshop.
Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018

Attentive Recurrent Neural Network for Weak-supervised Multi-label Image Classification.
Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018

Who to Ask: An Intelligent Fashion Consultant.
Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval, 2018

Vehicle Detection in UAV Traffic Video Based on Convolution Neural Network.
Proceedings of the IEEE 1st Conference on Multimedia Information Processing and Retrieval, 2018

Affective Image Content Analysis: A Comprehensive Survey.
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, 2018

S2L: Single-Streamline For Complex Video Event Detection.
Proceedings of the 2018 IEEE International Conference on Multimedia & Expo Workshops, 2018

Edge Guided Generation Network for Video Prediction.
Proceedings of the 2018 IEEE International Conference on Multimedia and Expo, 2018

RAM: A Region-Aware Deep Model for Vehicle Re-Identification.
Proceedings of the 2018 IEEE International Conference on Multimedia and Expo, 2018

Semantic Manifold Alignment in Visual Feature Space for Zero-Shot Learning.
Proceedings of the 2018 IEEE International Conference on Multimedia and Expo, 2018

Multi-Stream Region Proposal Network for Pedestrian Detection.
Proceedings of the 2018 IEEE International Conference on Multimedia & Expo Workshops, 2018


VisDrone-DET2018: The Vision Meets Drone Object Detection in Image Challenge Results.
Proceedings of the Computer Vision - ECCV 2018 Workshops, 2018


The Unmanned Aerial Vehicle Benchmark: Object Detection and Tracking.
Proceedings of the Computer Vision - ECCV 2018, 2018

Less Is More: Picking Informative Frames for Video Captioning.
Proceedings of the Computer Vision - ECCV 2018, 2018

Active Sampling for Subjective Video Quality Assessment.
Proceedings of the Fourth IEEE International Conference on Multimedia Big Data, 2018

Saliency-Based Spatiotemporal Attention for Video Captioning.
Proceedings of the Fourth IEEE International Conference on Multimedia Big Data, 2018

Reverse Densely Connected Feature Pyramid Network for Object Detection.
Proceedings of the Computer Vision - ACCV 2018, 2018

From Common to Special: When Multi-Attribute Learning Meets Personalized Opinions.
Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

HodgeRank With Information Maximization for Crowdsourced Pairwise Ranking Aggregation.
Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

Facial Landmarks Detection by Self-Iterative Regression Based Landmarks-Attention Network.
Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

2017
Fine-Grained Image Classification via Low-Rank Sparse Coding With General and Class-Specific Codebooks.
IEEE Trans. Neural Networks Learn. Syst., 2017

Cross-Modal Retrieval Using Multiordered Discriminative Structured Subspace Learning.
IEEE Trans. Multim., 2017

Location-Based Parallel Tag Completion for Geo-Tagged Social Image Retrieval.
ACM Trans. Intell. Syst. Technol., 2017

Multimodal Similarity Gaussian Process Latent Variable Model.
IEEE Trans. Image Process., 2017

Geometric Hypergraph Learning for Visual Tracking.
IEEE Trans. Cybern., 2017

Exploring Coherent Motion Patterns via Structured Trajectory Learning for Crowd Mood Modeling.
IEEE Trans. Circuits Syst. Video Technol., 2017

Contextual Exemplar Classifier-Based Image Representation for Classification.
IEEE Trans. Circuits Syst. Video Technol., 2017

A Bit-Plane Decomposition Matrix-Based VLSI Integer Transform Architecture for HEVC.
IEEE Trans. Circuits Syst. II Express Briefs, 2017

Justify role of Similarity Diffusion Process in cross-media topic ranking: an empirical evaluation.
Multim. Tools Appl., 2017

Cross-media analysis and reasoning: advances and directions.
Frontiers Inf. Technol. Electron. Eng., 2017

Image classification by search with explicitly and implicitly semantic representations.
Inf. Sci., 2017

Rotative maximal pattern: A local coloring descriptor for object classification and recognition.
Inf. Sci., 2017

Hierarchical deep semantic representation for visual categorization.
Neurocomputing, 2017

Multi-label classification by exploiting local positive and negative pairwise label correlation.
Neurocomputing, 2017

Composition-aided Sketch-realistic Portrait Generation.
CoRR, 2017

JEREMIE: Joint Semantic Feature Learning via Multi-relational Matrix Completion.
Proceedings of the Mobility Analytics for Spatio-Temporal and Social Data, 2017

Efficient Cross-Modal Retrieval Using Social Tag Information Towards Mobile Applications.
Proceedings of the Mobility Analytics for Spatio-Temporal and Social Data, 2017

Deep Unsupervised Convolutional Domain Adaptation.
Proceedings of the 2017 ACM on Multimedia Conference, 2017

Multi-Networks Joint Learning for Large-Scale Cross-Modal Retrieval.
Proceedings of the 2017 ACM on Multimedia Conference, 2017

Exploring Outliers in Crowdsourced Ranking for QoE.
Proceedings of the 2017 ACM on Multimedia Conference, 2017

Dependency Exploitation: A Unified CNN-RNN Approach for Visual Emotion Recognition.
Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, 2017

Adaptively Unified Semi-supervised Learning for Cross-Modal Retrieval.
Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, 2017

Metric based on multi-order spaces for cross-modal retrieval.
Proceedings of the 2017 IEEE International Conference on Multimedia and Expo, 2017

Cross-media retrieval with semantics clustering and enhancement.
Proceedings of the 2017 IEEE International Conference on Multimedia and Expo, 2017

Saliency detection with two-level fully convolutional networks.
Proceedings of the 2017 IEEE International Conference on Multimedia and Expo, 2017

Online low-rank similarity function learning with adaptive relative margin for cross-modal retrieval.
Proceedings of the 2017 IEEE International Conference on Multimedia and Expo, 2017

The Visual Object Tracking VOT2017 Challenge Results.
, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
Proceedings of the 2017 IEEE International Conference on Computer Vision Workshops, 2017

Multimodal Gaussian Process Latent Variable Models with Harmonization.
Proceedings of the IEEE International Conference on Computer Vision, 2017

Online multi-target tracking via depth range segmentation.
Proceedings of the 2017 IEEE Global Conference on Signal and Information Processing, 2017

A Graph Regularized Deep Neural Network for Unsupervised Image Representation Learning.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

Online Asymmetric Similarity Learning for Cross-Modal Retrieval.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

Multi-view Subspace Learning with Diversity Enforced Skeleton Embedding.
Proceedings of the Third IEEE International Conference on Multimedia Big Data, 2017

2016
Robust Latent Poisson Deconvolution From Multiple Features for Web Topic Detection.
IEEE Trans. Multim., 2016

Corrections to "Cross-Modal Correlation Learning by Adaptive Hierarchical Semantic Aggregation".
IEEE Trans. Multim., 2016

Cross-Modal Correlation Learning by Adaptive Hierarchical Semantic Aggregation.
IEEE Trans. Multim., 2016

Learning Label-Specific Features and Class-Dependent Labels for Multi-Label Classification.
IEEE Trans. Knowl. Data Eng., 2016

Online Deformable Object Tracking Based on Structure-Aware Hyper-Graph.
IEEE Trans. Image Process., 2016

Coupling Reranking and Structured Output SVM Co-Train for Multitarget Tracking.
IEEE Trans. Circuits Syst. Video Technol., 2016

Effective Multimodality Fusion Framework for Cross-Media Topic Detection.
IEEE Trans. Circuits Syst. Video Technol., 2016

Saliency Detection for Stereoscopic Images Based on Depth Confidence Analysis and Multiple Cues Fusion.
IEEE Signal Process. Lett., 2016

Online web video topic detection and tracking with semi-supervised learning.
Multim. Syst., 2016

Boosted random contextual semantic space based representation for visual recognition.
Inf. Sci., 2016

Socio-mobile landmark recognition using local features with adaptive region selection.
Neurocomputing, 2016

Distributed image understanding with semantic dictionary and semantic expansion.
Neurocomputing, 2016

Beyond appearance model: Learning appearance variations for object tracking.
Neurocomputing, 2016

Tri-level Combination for Image Representation.
Proceedings of the Advances in Multimedia Information Processing - PCM 2016, 2016

From Seed Discovery to Deep Reconstruction: Predicting Saliency in Crowd via Deep Networks.
Proceedings of the 2016 ACM Conference on Multimedia Conference, 2016

PL-ranking: A Novel Ranking Method for Cross-Modal Retrieval.
Proceedings of the 2016 ACM Conference on Multimedia Conference, 2016

Cross-modal Retrieval by Real Label Partial Least Squares.
Proceedings of the 2016 ACM Conference on Multimedia Conference, 2016

Crowd video retrieval via deep attribute-embedding graph ranking.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2016

Video saliency prediction with optimized optical flow and gravity center bias.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2016

Robust latent poisson deconvolution from multiple imperfect features for web topic detection.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2016

Accelerate convolutional neural networks for binary classification via cascading cost-sensitive feature.
Proceedings of the 2016 IEEE International Conference on Image Processing, 2016

Webpage saliency prediction with multi-features fusion.
Proceedings of the 2016 IEEE International Conference on Image Processing, 2016

Relay Backpropagation for Effective Learning of Deep Convolutional Neural Networks.
Proceedings of the Computer Vision - ECCV 2016, 2016

The Visual Object Tracking VOT2016 Challenge Results.
, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
Proceedings of the Computer Vision - ECCV 2016 Workshops, 2016


Hedged Deep Tracking.
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

Joint Multi-View Representation Learning and Image Tagging.
Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2016

2015
Unsupervised Web Topic Detection Using A Ranked Clustering-Like Pattern Across Similarity Cascades.
IEEE Trans. Multim., 2015

Beyond Explicit Codebook Generation: Visual Representation Using Implicitly Transferred Codebooks.
IEEE Trans. Image Process., 2015

Multi-Level Discriminative Dictionary Learning With Application to Large Scale Image Classification.
IEEE Trans. Image Process., 2015

Local Laplacian Coding From Theoretical Analysis of Local Coding Schemes for Locally Linear Classification.
IEEE Trans. Cybern., 2015

Social Attribute-Aware Force Model: Exploiting Richness of Interaction for Abnormal Crowd Detection.
IEEE Trans. Circuits Syst. Video Technol., 2015

ALID: Scalable Dominant Cluster Detection.
Proc. VLDB Endow., 2015

Polysemious visual representation based on feature aggregation for large scale image applications.
Multim. Tools Appl., 2015

Multi-order visual phrase for scalable partial-duplicate visual search.
Multim. Syst., 2015

LSH-based semantic dictionary learning for large scale image understanding.
J. Vis. Commun. Image Represent., 2015

Image classification using boosted local features with random orientation and location selection.
Inf. Sci., 2015

Joint image representation and classification in random semantic spaces.
Neurocomputing, 2015

Fusing cross-media for topic detection by dense keyword groups.
Neurocomputing, 2015

Cluster-sensitive Structured Correlation Analysis for Web cross-modal retrieval.
Neurocomputing, 2015

Online dictionary learning for Local Coordinate Coding with Locality Coding Adaptors.
Neurocomputing, 2015

Set-label modeling and deep metric learning on person re-identification.
Neurocomputing, 2015

Online learning affinity measure with CovBoost for multi-target tracking.
Neurocomputing, 2015

Strategy for aesthetic photography recommendation via collaborative composition model.
IET Comput. Vis., 2015

The Face Object based HEVC System for Video Call.
EAI Endorsed Trans. Future Intell. Educ. Environ., 2015

Learning Deep Convolutional Neural Networks for Places2 Scene Recognition.
CoRR, 2015

Cross-media Topic Detection with Refined CNN based Image-Dominant Topic Model.
Proceedings of the 23rd Annual ACM Conference on Multimedia Conference, MM '15, Brisbane, Australia, October 26, 2015

Formation Period Matters: Towards Socially Consistent Group Detection via Dense Subgraph Seeking.
Proceedings of the 5th ACM on International Conference on Multimedia Retrieval, 2015

Semantic-aware Hashing for Social Image Retrieval.
Proceedings of the 5th ACM on International Conference on Multimedia Retrieval, 2015

Adaptive Sharing for Image Classification.
Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015

GOMES: A group-aware multi-view fusion approach towards real-world image clustering.
Proceedings of the 2015 IEEE International Conference on Multimedia and Expo, 2015

Improving cross-modal correlation learning with hyperlinks.
Proceedings of the 2015 IEEE International Conference on Multimedia and Expo, 2015

Group sensitive Classifier Chains for multi-label classification.
Proceedings of the 2015 IEEE International Conference on Multimedia and Expo, 2015

Image-regulated graph topic model for cross-media topic detection.
Proceedings of the 7th International Conference on Internet Multimedia Computing and Service, 2015

Learning Label Specific Features for Multi-label Classification.
Proceedings of the 2015 IEEE International Conference on Data Mining, 2015

Similarity Gaussian Process Latent Variable Model for Multi-modal Data Analysis.
Proceedings of the 2015 IEEE International Conference on Computer Vision, 2015

Latent influence propagation on dynamic networks.
Proceedings of the IEEE China Summit and International Conference on Signal and Information Processing, 2015

Abnormal Event Detection Based on Multi-scale Markov Random Field.
Proceedings of the Computer Vision - CCF Chinese Conference, 2015

2014
Online HodgeRank on Random Graphs for Crowdsourceable QoE Evaluation.
IEEE Trans. Multim., 2014

Face Distortion Recovery Based on Online Learning Database for Conversational Video.
IEEE Trans. Multim., 2014

A Simulation Analysis on the Existence of Network Traffic Flow Equilibria.
IEEE Trans. Intell. Transp. Syst., 2014

USB: Ultrashort Binary Descriptor for Fast Visual Matching and Retrieval.
IEEE Trans. Image Process., 2014

Cascade Category-Aware Visual Search.
IEEE Trans. Image Process., 2014

Undoing the codebook bias by linear transformation with sparsity and F-norm constraints for image classification.
Pattern Recognit. Lett., 2014

Web video thumbnail recommendation with content-aware analysis and query-sensitive matching.
Multim. Tools Appl., 2014

Relative image similarity learning with contextual information for Internet cross-media retrieval.
Multim. Syst., 2014

Beyond visual word ambiguity: Weighted local feature encoding with governing region.
J. Vis. Commun. Image Represent., 2014

Fusing multi-cues description for partial-duplicate image retrieval.
J. Vis. Commun. Image Represent., 2014

Object categorization in sub-semantic space.
Neurocomputing, 2014

Recognizing human group action by layered model with multiple cues.
Neurocomputing, 2014

Topic detection in cross-media: a semi-supervised co-clustering approach.
Int. J. Multim. Inf. Retr., 2014

Embedding Multi-Order Spatial Clues for Scalable Visual Matching and Retrieval.
IEEE J. Emerg. Sel. Topics Circuits Syst., 2014

ObjectPatchNet: Towards scalable and semantic image annotation and retrieval.
Comput. Vis. Image Underst., 2014

Image classification by non-negative sparse coding, correlation constrained low-rank and sparse decomposition.
Comput. Vis. Image Underst., 2014

Robust Statistical Ranking: Theory and Algorithms.
CoRR, 2014

Cross media topic analytics based on synergetic content and user behavior modeling.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2014

Web topic detection using a ranked clustering-like pattern across similarity cascades.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2014

Graph-Density-based visual word vocabulary for image retrieval.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2014

Sharing model with multi-level feature representations.
Proceedings of the 2014 IEEE International Conference on Image Processing, 2014

Structure-aware multi-object discovery for weakly supervised tracking.
Proceedings of the 2014 IEEE International Conference on Image Processing, 2014

Weakly supervised cross-view action recognition via sequential motion accumulation.
Proceedings of the 2014 IEEE International Conference on Image Processing, 2014

DA-CCD: A novel action representation by Deep Architecture of local depth feature.
Proceedings of the 2014 IEEE International Conference on Image Processing, 2014

Cross modal metric learning with multi-level semantic relevance.
Proceedings of the 2014 IEEE International Conference on Image Processing, 2014

Categorizing Social Multimedia by Neighborhood Decision Using Local Pairwise Label Correlation.
Proceedings of the 2014 IEEE International Conference on Data Mining Workshops, 2014

TINA: Cross-Modal Correlation Learning by Adaptive Hierarchical Semantic Aggregation.
Proceedings of the 2014 IEEE International Conference on Data Mining, 2014

Learning Sparse Prototypes for Crowd Perception via Ensemble Coding Mechanisms.
Proceedings of the Human Behavior Understanding - 5th International Workshop, 2014

Large scale image understanding with non-convex multi-task learning.
Proceedings of the 2014 5th International Conference on Game Theory for Networks, 2014


Coupling Multiple Alignments and Re-ranking for Low-Latency Online Multi-target Tracking.
Proceedings of the Computer Vision - ACCV 2014, 2014

2013
Accurate and efficient cross-domain visual matching leveraging multiple feature representations.
Vis. Comput., 2013

Robust Spatial Consistency Graph Model for Partial Duplicate Image Retrieval.
IEEE Trans. Multim., 2013

Edge-SIFT: Discriminative Binary Descriptor for Scalable Partial-Duplicate Mobile Search.
IEEE Trans. Image Process., 2013

SSOCBT: A Robust Semisupervised Online CovBoost Tracker That Uses Samples Differently.
IEEE Trans. Circuits Syst. Video Technol., 2013

Image classification using Harr-like transformation of local features with coding residuals.
Signal Process., 2013

Image classification using spatial pyramid robust sparse coding.
Pattern Recognit. Lett., 2013

Laplacian affine sparse coding with tilt and orientation consistency for image classification.
J. Vis. Commun. Image Represent., 2013

Beyond visual features: A weak semantic image representation using exemplar classifiers for classification.
Neurocomputing, 2013

Weighted visual vocabulary to balance the descriptive ability on general dataset.
Neurocomputing, 2013

Partial-Duplicate Image Retrieval via Saliency-Guided Visual Matching.
IEEE Multim., 2013

Fine-Grained Image Classification Using Color Exemplar Classifiers.
Proceedings of the Advances in Multimedia Information Processing - PCM 2013, 2013

Cross Concept Local Fisher Discriminant Analysis for Image Classification.
Proceedings of the Advances in Multimedia Modeling, 19th International Conference, 2013

Undo the codebook bias by linear transformation for visual applications.
Proceedings of the ACM Multimedia Conference, 2013

Beyond bag of words: image representation in sub-semantic space.
Proceedings of the ACM Multimedia Conference, 2013

Robust evaluation for quality of experience in crowdsourcing.
Proceedings of the ACM Multimedia Conference, 2013

Optimization of Number of Diffusion Gradient Directions on anisotropy indices and deterministic fiber tracking for diffusion tensor imaging.
Proceedings of the Ninth International Conference on Natural Computation, 2013

Cross-media topic detection: A multi-modality fusion framework.
Proceedings of the 2013 IEEE International Conference on Multimedia and Expo, 2013

WIKI-CMR: A web cross modality dataset for studying and evaluation of cross modality retrieval models.
Proceedings of the 2013 IEEE International Conference on Multimedia and Expo, 2013

Abnormal event detection in crowded scenes based on Structural Multi-scale Motion Interrelated Patterns.
Proceedings of the 2013 IEEE International Conference on Multimedia and Expo, 2013

Beyond particle flow: Bag of Trajectory Graphs for dense crowd event recognition.
Proceedings of the IEEE International Conference on Image Processing, 2013

An efficient occlusion detection method to improve object trackers.
Proceedings of the IEEE International Conference on Image Processing, 2013

Stochastic boosting for large-scale image classification.
Proceedings of the IEEE International Conference on Image Processing, 2013

Set-based classification for person re-identification utilizing mutual-information.
Proceedings of the IEEE International Conference on Image Processing, 2013

Multi-order visual phrase for scalable image search.
Proceedings of the International Conference on Internet Multimedia Computing and Service, 2013

Cross-media topic detection associated with hot search queries.
Proceedings of the International Conference on Internet Multimedia Computing and Service, 2013

Discriminative Spatial Codebook Generation for Image Classification.
Proceedings of the Seventh International Conference on Image and Graphics, 2013

Semantically-Based Human Scanpath Estimation with HMMs.
Proceedings of the IEEE International Conference on Computer Vision, 2013

Online Learning Based Face Distortion Recovery for Conversational Video Coding.
Proceedings of the 2013 Data Compression Conference, 2013

Multi-level Discriminative Dictionary Learning towards Hierarchical Visual Categorization.
Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013

2012
HodgeRank on Random Graphs for Subjective Video Quality Assessment.
IEEE Trans. Multim., 2012

S<sup>3</sup>MKL: Scalable Semi-Supervised Multiple Kernel Learning for Real-World Image Applications.
IEEE Trans. Multim., 2012

Learning Hierarchical Semantic Description Via Mixed-Norm Regularization for Image Understanding.
IEEE Trans. Multim., 2012

A Generic Approach for Systematic Analysis of Sports Videos.
ACM Trans. Intell. Syst. Technol., 2012

A Multiple Targets Appearance Tracker Based on Object Interaction Models.
IEEE Trans. Circuits Syst. Video Technol., 2012

@ICT: attention-based virtual content insertion.
Multim. Syst., 2012

Online selection of the best k-feature subset for object tracking.
J. Vis. Commun. Image Represent., 2012

Nearest-neighbor method using multiple neighborhood similarities for social media data mining.
Neurocomputing, 2012

Visual Saliency and Distortion Weighting Based Video Quality Assessment.
Proceedings of the Advances in Multimedia Information Processing - PCM 2012, 2012

Improving Image Distance Metric Learning by Embedding Semantic Relations.
Proceedings of the Advances in Multimedia Information Processing - PCM 2012, 2012

Spatio-temporal Visual Distortion and Rate Optimization for Video Coding.
Proceedings of the Advances in Multimedia Information Processing - PCM 2012, 2012

Online crowdsourcing subjective image quality assessment.
Proceedings of the 20th ACM Multimedia Conference, MM '12, Nara, Japan, October 29, 2012

An effective multi-clue fusion approach for web video topic detection.
Proceedings of the 20th ACM Multimedia Conference, MM '12, Nara, Japan, October 29, 2012

A Novel Framework for Web Video Thumbnail Generation.
Proceedings of the Eighth International Conference on Intelligent Information Hiding and Multimedia Signal Processing, 2012

Color Maximal-Dissimilarity Pattern for pedestrian detection.
Proceedings of the 21st International Conference on Pattern Recognition, 2012

Theoretical analysis of learning local anchors for classification.
Proceedings of the 21st International Conference on Pattern Recognition, 2012

Motion Based Perceptual Distortion and Rate Optimization for Video Coding.
Proceedings of the 2012 IEEE International Conference on Multimedia and Expo, 2012

Aesthetic composition represetation for portrait photographing recommendation.
Proceedings of the 19th IEEE International Conference on Image Processing, 2012

Abnormal crowd behavior detection based on social attribute-aware force model.
Proceedings of the 19th IEEE International Conference on Image Processing, 2012

Interactive event detection in crowd scenes.
Proceedings of the 4th International Conference on Internet Multimedia Computing and Service, 2012

Cross community news event summary generation based on collaborative ranking.
Proceedings of the 4th International Conference on Internet Multimedia Computing and Service, 2012

Human Daily Action Analysis with Multi-view and Color-Depth Data.
Proceedings of the Computer Vision - ECCV 2012. Workshops and Demonstrations, 2012

Multi-feature metric learning with knowledge transfer among semantics and social tagging.
Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012

2011
Generating Descriptive Visual Words and Visual Phrases for Large-Scale Image Applications.
IEEE Trans. Image Process., 2011

Transferring Boosted Detectors Towards Viewpoint and Scene Adaptiveness.
IEEE Trans. Image Process., 2011

A Novel Rate Control Technique for Multiview Video Plus Depth Based 3D Video Coding.
IEEE Trans. Broadcast., 2011

Modeling spatial and semantic cues for large-scale near-duplicated image retrieval.
Comput. Vis. Image Underst., 2011

ObjectBook construction for large-scale semantic-aware image retrieval.
Proceedings of the IEEE 13th International Workshop on Multimedia Signal Processing (MMSP 2011), 2011

Random partial paired comparison for subjective video quality assessment via hodgerank.
Proceedings of the 19th International Conference on Multimedia 2011, Scottsdale, AZ, USA, November 28, 2011

Human group activity analysis with fusion of motion and appearance information.
Proceedings of the 19th International Conference on Multimedia 2011, Scottsdale, AZ, USA, November 28, 2011

Detection and location of near-duplicate video sub-clips by finding dense subgraphs.
Proceedings of the 19th International Conference on Multimedia 2011, Scottsdale, AZ, USA, November 28, 2011

News video story sentiment classification and ranking.
Proceedings of the 2011 IEEE International Conference on Multimedia and Expo, 2011

Matching Content-based Saliency Regions for partial-duplicate image retrieval.
Proceedings of the 2011 IEEE International Conference on Multimedia and Expo, 2011

Human tracking by structured body parts.
Proceedings of the 18th IEEE International Conference on Image Processing, 2011

Visual perception based Lagrangian rate distortion optimization for video coding.
Proceedings of the 18th IEEE International Conference on Image Processing, 2011

Query sensitive dynamic web video thumbnail generation.
Proceedings of the 18th IEEE International Conference on Image Processing, 2011

Online Vicept learning for web-scale image understanding.
Proceedings of the 18th IEEE International Conference on Image Processing, 2011

Fast common visual pattern detection via radiate geometric model.
Proceedings of the 18th IEEE International Conference on Image Processing, 2011

Content-based intelligent video recorder with its implementation on sports video.
Proceedings of the ICIMCS 2011, 2011

Treat samples differently: Object tracking with semi-supervised online CovBoost.
Proceedings of the IEEE International Conference on Computer Vision, 2011

Learning image Vicept description via mixed-norm regularization for large scale semantic image search.
Proceedings of the 24th IEEE Conference on Computer Vision and Pattern Recognition, 2011

Efficient l<sub>p</sub>-norm multiple feature metric learning for image categorization.
Proceedings of the 20th ACM Conference on Information and Knowledge Management, 2011

2010
Error-resistance and Low-complexity Integer Inverse Discrete Cosine Transform.
J. Signal Process. Syst., 2010

Affective Visualization and Retrieval for Music Video.
IEEE Trans. Multim., 2010

A Low-Cost Very Large Scale Integration Architecture for Multistandard Inverse Transform.
IEEE Trans. Circuits Syst. II Express Briefs, 2010

RD-optimized interactive streaming of multiview video with multiple encodings.
J. Vis. Commun. Image Represent., 2010

MOCC: A Fast and Robust Correlation-Based Method for Interest Point Matching under Large Scale Changes.
EURASIP J. Adv. Signal Process., 2010

A fast intra 4×4 mode decision algorithm for H.264/AVC down rate transcoding.
Proceedings of the Visual Communications and Image Processing 2010, 2010

Building contextual visual vocabulary for large-scale image applications.
Proceedings of the 18th International Conference on Multimedia 2010, 2010

Memory matrix: a novel user experience for home video.
Proceedings of the 18th International Conference on Multimedia 2010, 2010

Vicept: link visual features to concepts for large-scale image understanding.
Proceedings of the 18th International Conference on Multimedia 2010, 2010

S3MKL: scalable semi-supervised multiple kernel learning for image data mining.
Proceedings of the 18th International Conference on Multimedia 2010, 2010

Nearest-neighbor classification using unlabeled data for real world image application.
Proceedings of the 18th International Conference on Multimedia 2010, 2010

The third eye: mining the visual cognition across multi-language communities.
Proceedings of the 18th International Conference on Multimedia 2010, 2010

Localized Image Matte Evaluation by Gradient Correlation.
Proceedings of the 20th International Conference on Pattern Recognition, 2010

Adding Affine Invariant Geometric Constraint for Partial-Duplicate Image Retrieval.
Proceedings of the 20th International Conference on Pattern Recognition, 2010

Multiple Kernel Learning with High Order Kernels.
Proceedings of the 20th International Conference on Pattern Recognition, 2010

Action Recognition Using Spatial-Temporal Context.
Proceedings of the 20th International Conference on Pattern Recognition, 2010

Group Activity Recognition by Gaussian Processes Estimation.
Proceedings of the 20th International Conference on Pattern Recognition, 2010

Bridging the gap between objective score and subjective preference in video quality assessment.
Proceedings of the 2010 IEEE International Conference on Multimedia and Expo, 2010

Event based news video people classification and ranking using multimodality features.
Proceedings of the 2010 IEEE International Conference on Multimedia and Expo, 2010

Fast copy detection based on Slice Entropy Scattergraph.
Proceedings of the 2010 IEEE International Conference on Multimedia and Expo, 2010

A close-up detection method for movies.
Proceedings of the International Conference on Image Processing, 2010

Real-time interactive multi-target tracking using kernel-based trackers.
Proceedings of the International Conference on Image Processing, 2010

Multi-description of local interest point for partial-duplicate image retrieval.
Proceedings of the International Conference on Image Processing, 2010

Building pair-wise visual word tree for efficent image re-ranking.
Proceedings of the IEEE International Conference on Acoustics, 2010

Measuring visual saliency by Site Entropy Rate.
Proceedings of the Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition, 2010

Novel observation model for probabilistic object tracking.
Proceedings of the Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition, 2010

Automatic video genre categorization and event detection techniques on large-scale sports data.
Proceedings of the 2010 conference of the Centre for Advanced Studies on Collaborative Research, 2010

2009
Event Tactic Analysis Based on Broadcast Sports Video.
IEEE Trans. Multim., 2009

Joint video/depth rate allocation for 3D video coding based on view synthesis distortion model.
Signal Process. Image Commun., 2009

A configurable method for multi-style license plate recognition.
Pattern Recognit., 2009

Pornographic Image Detection Based on Multilevel Representation.
Int. J. Pattern Recognit. Artif. Intell., 2009

A framework for flexible summarization of racquet sports video using multiple modalities.
Comput. Vis. Image Underst., 2009

A Useful Visualization Technique: A Literature Review for Augmented Reality and its Application, limitation & future direction.
Proceedings of the Visual Information Communication, 2009

Estimating the value of θ in the intra frame for ρ-domain rate control algorithms.
Proceedings of the 2009 Picture Coding Symposium, 2009

Video Shrinking by Auditory and Visual Cues.
Proceedings of the Advances in Multimedia Information Processing, 2009

Descriptive visual words and visual phrases for image applications.
Proceedings of the 17th International Conference on Multimedia 2009, 2009

Friend recommendation according to appearances on photos.
Proceedings of the 17th International Conference on Multimedia 2009, 2009

Near-duplicate video matching with transformation recognition.
Proceedings of the 17th International Conference on Multimedia 2009, 2009

Visual ContextRank for web image re-ranking.
Proceedings of the First ACM workshop on Large-scale multimedia retrieval and mining, 2009

Automatic sports genre categorization and view-type classification over large-scale dataset.
Proceedings of the 17th International Conference on Multimedia 2009, 2009

Robust copy detection by mining temporal self-similarities.
Proceedings of the 2009 IEEE International Conference on Multimedia and Expo, 2009

Spatial-temporal video browsing for mobile environment based on visual attention analysis.
Proceedings of the 2009 IEEE International Conference on Multimedia and Expo, 2009

A hybrid text segmentation approach.
Proceedings of the 2009 IEEE International Conference on Multimedia and Expo, 2009

Utilizing affective analysis for efficient movie browsing.
Proceedings of the International Conference on Image Processing, 2009

Joint learning for side information and correlation model based on linear regression model in distributed video coding.
Proceedings of the International Conference on Image Processing, 2009

Advertise gently - in-image advertising with low intrusiveness.
Proceedings of the International Conference on Image Processing, 2009

Personalized online video recommendation by neighborhood score propagation based global ranking.
Proceedings of the First International Conference on Internet Multimedia Computing and Service, 2009

A generic approach to classify sports video shots and its application in event detection.
Proceedings of the First International Conference on Internet Multimedia Computing and Service, 2009

Transfer pedestrian detector towards view-adaptiveness and efficiency.
Proceedings of the 12th IEEE International Conference on Computer Vision Workshops, 2009

Compression-Induced Rendering Distortion Analysis for Texture/Depth Rate Allocation in 3D Video Compression.
Proceedings of the 2009 Data Compression Conference (DCC 2009), 2009

Content-Based Video Semantic Analysis.
Proceedings of the Semantic Mining Technologies for Multimedia Databases., 2009

2008
Using Webcast Text for Semantic Event Detection in Broadcast Sports Video.
IEEE Trans. Multim., 2008

Unsupervised texture classification: Automatically discover and classify texture patterns.
Image Vis. Comput., 2008

Highlight Ranking for Broadcast Tennis Video Based on Multi-modality Analysis and Relevance Feedback.
Proceedings of the Advances in Multimedia Information Processing, 2008

Personalized MTV Affective Analysis Using User Profile.
Proceedings of the Advances in Multimedia Information Processing, 2008

Detecting Violent Scenes in Movies by Auditory and Visual Cues.
Proceedings of the Advances in Multimedia Information Processing, 2008

A Two-Stage Approach to Highlight Extraction in Sports Video by Using AdaBoost and Multi-modal.
Proceedings of the Advances in Multimedia Information Processing, 2008

i.MTV: an integrated system for mtv affective analysis.
Proceedings of the 16th International Conference on Multimedia 2008, 2008

A generic virtual content insertion system based on visual attention analysis.
Proceedings of the 16th International Conference on Multimedia 2008, 2008

Naming faces in broadcast news video by image google.
Proceedings of the 16th International Conference on Multimedia 2008, 2008

Matching images more efficiently with local descriptors.
Proceedings of the 19th International Conference on Pattern Recognition (ICPR 2008), 2008

Symmetric segment-based stereo matching of motion blurred images with illumination variations.
Proceedings of the 19th International Conference on Pattern Recognition (ICPR 2008), 2008

Effective scene matching with local feature representatives.
Proceedings of the 19th International Conference on Pattern Recognition (ICPR 2008), 2008

Human reappearance detection based on on-line learning.
Proceedings of the 19th International Conference on Pattern Recognition (ICPR 2008), 2008

Affective MTV analysis based on arousal and valence features.
Proceedings of the 2008 IEEE International Conference on Multimedia and Expo, 2008

A pixel-wise local information-based background subtraction approach.
Proceedings of the 2008 IEEE International Conference on Multimedia and Expo, 2008

Spatial-temporal attention analysis for home video.
Proceedings of the 2008 IEEE International Conference on Multimedia and Expo, 2008

Coarse-to-fine video text detection.
Proceedings of the 2008 IEEE International Conference on Multimedia and Expo, 2008

Lower attentive region detection for virtual content insertion in broadcast video.
Proceedings of the 2008 IEEE International Conference on Multimedia and Expo, 2008

Shot classification for action movies based on motion characteristics.
Proceedings of the International Conference on Image Processing, 2008

Pedestrian detection via logistic multiple instance boosting.
Proceedings of the International Conference on Image Processing, 2008

Fast and effective text detection.
Proceedings of the International Conference on Image Processing, 2008

Object tracking using incremental 2D-LDA learning and Bayes inference.
Proceedings of the International Conference on Image Processing, 2008

People re-detection using Adaboost with sift and color correlogram.
Proceedings of the International Conference on Image Processing, 2008

Visual-aural attention modeling for talk show video highlight detection.
Proceedings of the IEEE International Conference on Acoustics, 2008

Multiple Instance Boost Using Graph Embedding Based Decision Stump for Pedestrian Detection.
Proceedings of the Computer Vision, 2008

Event tactic analysis based on player and ball trajectory in broadcast video.
Proceedings of the 7th ACM International Conference on Image and Video Retrieval, 2008

2007
Human Behavior Analysis for Highlight Ranking in Broadcast Racket Sports Video.
IEEE Trans. Multim., 2007

Joint Source-Channel Rate-Distortion Optimization for H.264 Video Coding Over Error-Prone Networks.
IEEE Trans. Multim., 2007

Video2Cartoon: A System for Converting Broadcast Soccer Video into 3D Cartoon Animation.
IEEE Trans. Consumer Electron., 2007

Effective algorithms for fast transcoding of AVS to H.264/AVC in the spatial domain.
Multim. Tools Appl., 2007

Drift-compensated coding optimization for fast bit-rate reduction transcoding.
Proceedings of the Visual Communications and Image Processing 2007, 2007

Trajectory based event tactics analysis in broadcast sports video.
Proceedings of the 15th International Conference on Multimedia 2007, 2007

Region-based visual attention analysis with its application in image browsing on small displays.
Proceedings of the 15th International Conference on Multimedia 2007, 2007

Story Unit Segmentation with Friendly Acoustic Perception.
Proceedings of the Multimedia Content Analysis and Mining, International Workshop, 2007

Low-delay View Random Access for Multi-view Video Coding.
Proceedings of the International Symposium on Circuits and Systems (ISCAS 2007), 2007

Macroblock-level Reduced Resolution Video Coding Allowing Adaptive DCT Coefficients Selection.
Proceedings of the International Symposium on Circuits and Systems (ISCAS 2007), 2007

Highlight Ranking for Racquet Sports Video in User Attention Subspaces Based on Relevance Feedback.
Proceedings of the 2007 IEEE International Conference on Multimedia and Expo, 2007

A Fast Approach for Natural Image Matting using Structure Information.
Proceedings of the 2007 IEEE International Conference on Multimedia and Expo, 2007

The Demo: A Real-Time Score Detection and Recognition Approach in Broadcast Basketball Sports Video.
Proceedings of the 2007 IEEE International Conference on Multimedia and Expo, 2007

A Real-Time Score Detection and Recognition Approach for Broadcast Basketball Video.
Proceedings of the 2007 IEEE International Conference on Multimedia and Expo, 2007

Generating Video Sequence from Photo Image for Mobile Screens by Content Analysis.
Proceedings of the 2007 IEEE International Conference on Multimedia and Expo, 2007

Mining Information of Attack-Defense Status from Soccer Video Based on Scene Analysis.
Proceedings of the 2007 IEEE International Conference on Multimedia and Expo, 2007

Monocular Tracking 3D People By Gaussian Process Spatio-Temporal Variable Model.
Proceedings of the International Conference on Image Processing, 2007

Mean-Shift Blob Tracking with Adaptive Feature Selection and Scale Adaptation.
Proceedings of the International Conference on Image Processing, 2007

2006
Statistical model, analysis and approximation of rate-distortion function in MPEG-4 FGS videos.
IEEE Trans. Circuits Syst. Video Technol., 2006

An effective method to detect and categorize digitized traditional Chinese paintings.
Pattern Recognit. Lett., 2006

Extracting 3D information from broadcast soccer video.
Image Vis. Comput., 2006

JDL at TRECVID 2006 Shot Boundary Detection.
Proceedings of the 2006 TREC Video Retrieval Evaluation, 2006

Multi-view Video Coding with Flexible View-Temporal Prediction Structure for Fast Random Access.
Proceedings of the Advances in Multimedia Information Processing, 2006

Online Selection of Discriminative Features Using Bayes Error Rate for Visual Tracking.
Proceedings of the Advances in Multimedia Information Processing, 2006

Player action recognition in broadcast tennis video with applications to semantic analysis of sports game.
Proceedings of the 14th ACM International Conference on Multimedia, 2006

Action Recognition in Broadcast Tennis Video.
Proceedings of the 18th International Conference on Pattern Recognition (ICPR 2006), 2006

Unsupervised Texture Classification: Automatically Discover and Classify Texture Patterns.
Proceedings of the 18th International Conference on Pattern Recognition (ICPR 2006), 2006

Automatic Multi-Player Detection and Tracking in Broadcast Sports Video using Support Vector Machine and Particle Filter.
Proceedings of the 2006 IEEE International Conference on Multimedia and Expo, 2006

Highlight Summarization in Sports Video Based on Replay Detection.
Proceedings of the 2006 IEEE International Conference on Multimedia and Expo, 2006

A Fast Intra Mode Decision Algorithm for AVS to H.264 Transcoding.
Proceedings of the 2006 IEEE International Conference on Multimedia and Expo, 2006

Extracting Story Units in Sports Video Based on Unsupervised Video Scene Clustering.
Proceedings of the 2006 IEEE International Conference on Multimedia and Expo, 2006

An Edge-Based Median Filtering Algorithm with Consideration of Motion Vector Reliability for Adaptive Video Deinterlacing.
Proceedings of the 2006 IEEE International Conference on Multimedia and Expo, 2006

Video Shot Detection Using Hidden Markov Models with Complementary Features.
Proceedings of the First International Conference on Innovative Computing, Information and Control (ICICIC 2006), 30 August, 2006

Image Matching by Normalized Cross-Correlation.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

Action Recognition in Broadcast Tennis Video Using Optical Flow and Support Vector Machine.
Proceedings of the Computer Vision in Human-Computer Interaction, 2006

Image Matching by Multiscale Oriented Corner Correlation.
Proceedings of the Computer Vision, 2006

Self-calibration Based 3D Information Extraction and Application in Broadcast Soccer Video.
Proceedings of the Computer Vision, 2006

2005
Robust moving object segmentation on H.264/AVC compressed video using the block-based MRF model.
Real Time Imaging, 2005

Robust real-time transmission of scalable multimedia for heterogeneous client bandwidths.
Real Time Imaging, 2005

Thresholding technique with adaptive window selection for uneven lighting image.
Pattern Recognit. Lett., 2005

Visual Ontology Construction for Digitized Art Image Retrieval.
J. Comput. Sci. Technol., 2005

Fast and robust text detection in images and video frames.
Image Vis. Comput., 2005

A Scheme for Ball Detection and Tracking in Broadcast Soccer Video.
Proceedings of the Advances in Multimedia Information Processing, 2005

Reducing Spatial Resolution for MPEG-2 to H.264/AVC Transcoding.
Proceedings of the Advances in Multimedia Information Processing, 2005

Exciting event detection in broadcast soccer video with mid-level description and incremental learning.
Proceedings of the 13th ACM International Conference on Multimedia, 2005

Video2Cartoon: generating 3D cartoon from broadcast soccer video.
Proceedings of the 13th ACM International Conference on Multimedia, 2005

Linear transform based motion compensated prediction for luminance intensity changes.
Proceedings of the International Symposium on Circuits and Systems (ISCAS 2005), 2005

Viewpoint switching in multiview video streaming.
Proceedings of the International Symposium on Circuits and Systems (ISCAS 2005), 2005

A System for Automatic Generation of Music Sports-Video.
Proceedings of the 2005 IEEE International Conference on Multimedia and Expo, 2005

Improving particle filter with support vector regression for efficient visual tracking.
Proceedings of the 2005 International Conference on Image Processing, 2005

Playfield Detection Using Adaptive GMM and Its Application.
Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

Bandwidth Adaptive Quality Smoothing for Unequal Error Protected Scalable Video Streaming.
Proceedings of the 2005 Data Compression Conference (DCC 2005), 2005

2004
Optimum End-to-End Distortion Estimation for Error Resilient Video Coding.
Proceedings of the Advances in Multimedia Information Processing - PCM 2004, 5th Pacific Rim Conference on Multimedia, Tokyo, Japan, November 30, 2004

Key Techniques of Bit Rate Reduction for H.264 Streams.
Proceedings of the Advances in Multimedia Information Processing - PCM 2004, 5th Pacific Rim Conference on Multimedia, Tokyo, Japan, November 30, 2004

Moving Object Segmentation: A Block-Based Moving Region Detection Approach.
Proceedings of the Advances in Multimedia Information Processing - PCM 2004, 5th Pacific Rim Conference on Multimedia, Tokyo, Japan, November 30, 2004

A New Text Detection Algorithm in Images/Video Frames.
Proceedings of the Advances in Multimedia Information Processing - PCM 2004, 5th Pacific Rim Conference on Multimedia, Tokyo, Japan, November 30, 2004

MULTFRC-LERD: An Improved Rate Control Scheme for Video Streaming over Wireless.
Proceedings of the Advances in Multimedia Information Processing - PCM 2004, 5th Pacific Rim Conference on Multimedia, Tokyo, Japan, November 30, 2004

Embedded Packetization Framework for Layered Multiple Description Coding.
Proceedings of the Advances in Multimedia Information Processing - PCM 2004, 5th Pacific Rim Conference on Multimedia, Tokyo, Japan, November 30, 2004

Multiview Video Coding Based on Global Motion Model.
Proceedings of the Advances in Multimedia Information Processing - PCM 2004, 5th Pacific Rim Conference on Multimedia, Tokyo, Japan, November 30, 2004

Context-based 2D-VLC for video coding.
Proceedings of the 2004 IEEE International Conference on Multimedia and Expo, 2004

Improved error concealment algorithms based on H.264/AVC non-normative decoder.
Proceedings of the 2004 IEEE International Conference on Multimedia and Expo, 2004

New bi-prediction techniques for B pictures coding.
Proceedings of the 2004 IEEE International Conference on Multimedia and Expo, 2004

Mode mapping method for h.264/avc spatial downscaling transcoding.
Proceedings of the 2004 International Conference on Image Processing, 2004

Error resilience video coding in H.264 encoder with potential distortion tracking.
Proceedings of the 2004 International Conference on Image Processing, 2004

Automatic text segmentation from complex background.
Proceedings of the 2004 International Conference on Image Processing, 2004

A novel rate control scheme for video streaming over wireless networks.
Proceedings of the Third International Conference on Image and Graphics, 2004

A novel FGS base-layer encoding model and weight-based rate adaptation for constant-quality streaming.
Proceedings of the Third International Conference on Image and Graphics, 2004

FEC-based multiple description coding for heterogeneous client bandwidths.
Proceedings of the Third International Conference on Image and Graphics, 2004


  Loading...