Yi Yang

Orcid: 0000-0001-5528-0546

Affiliations:
  • Zhejiang University, College of Computer Science and Technology, Hangzhou, China (since 2021)
  • University of Technology Sydney, Centre for Artificial Intelligence, ReLER Lab, Ultimo, Australia
  • University of Technology Sydney, Centre for Quantum Computation and Intelligent Systems, Sydney, Australia
  • The University of Queensland, School of Information Technology and Electrical Engineering, Brisbane, QLD, Australia (former)
  • Carnegie Mellon University, School of Computer Science, Pittsburgh, PA, USA (former)
  • Zhejiang University, Hangzhou, China (PhD 2010)


According to our database1, Yi Yang authored at least 756 papers between 2005 and 2025.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
Cut-and-Paste: Subject-driven video editing with attention control.
Neural Networks, 2025

Aggregating nearest sharp features via hybrid transformers for video deblurring.
Inf. Sci., 2025

2024
Knowledge-Guided Causal Intervention for Weakly-Supervised Object Localization.
IEEE Trans. Knowl. Data Eng., November, 2024

Noise-Tolerant Hybrid Prototypical Learning with Noisy Web Data.
ACM Trans. Multim. Comput. Commun. Appl., October, 2024

Divide and Retain: A Dual-Phase Modeling for Long-Tailed Visual Recognition.
IEEE Trans. Neural Networks Learn. Syst., October, 2024

Active Learning for Deep Visual Tracking.
IEEE Trans. Neural Networks Learn. Syst., October, 2024

Meta Invariance Defense Towards Generalizable Robustness to Unknown Adversarial Attacks.
IEEE Trans. Pattern Anal. Mach. Intell., October, 2024

NICEST: Noisy Label Correction and Training for Robust Scene Graph Generation.
IEEE Trans. Pattern Anal. Mach. Intell., October, 2024

Scalable Video Object Segmentation With Identification Mechanism.
IEEE Trans. Pattern Anal. Mach. Intell., September, 2024

An Open-World, Diverse, Cross-Spatial-Temporal Benchmark for Dynamic Wild Person Re-Identification.
Int. J. Comput. Vis., September, 2024

High Fidelity Makeup via 2D and 3D Identity Preservation Net.
ACM Trans. Multim. Comput. Commun. Appl., August, 2024

Bilaterally Normalized Scale-Consistent Sinkhorn Distance for Few-Shot Image Classification.
IEEE Trans. Neural Networks Learn. Syst., August, 2024

StepNet: Spatial-temporal Part-aware Network for Isolated Sign Language Recognition.
ACM Trans. Multim. Comput. Commun. Appl., July, 2024

Jointly Harnessing Prior Structures and Temporal Consistency for Sign Language Video Generation.
ACM Trans. Multim. Comput. Commun. Appl., June, 2024

Parameter-Efficient Person Re-Identification in the 3D Space.
IEEE Trans. Neural Networks Learn. Syst., June, 2024

MuscleParseNet: A Novel Framework for Parsing Muscles of Drosophila Larva in Light-Sheet Fluorescence Microscopy Images.
IEEE Trans. Circuits Syst. Video Technol., June, 2024

Penalizing the Hard Example But Not Too Much: A Strong Baseline for Fine-Grained Visual Classification.
IEEE Trans. Neural Networks Learn. Syst., May, 2024

Learning to Follow and Generate Instructions for Language-Capable Navigation.
IEEE Trans. Pattern Anal. Mach. Intell., May, 2024

Semantic Hierarchy-Aware Segmentation.
IEEE Trans. Pattern Anal. Mach. Intell., April, 2024

Large language model and domain-specific model collaboration for smart education.
Frontiers Inf. Technol. Electron. Eng., March, 2024

FRC-Net: A Simple Yet Effective Architecture for Low-Light Image Enhancement.
IEEE Trans. Consumer Electron., February, 2024

Understanding and Accelerating Neural Architecture Search With Training-Free and Theory-Grounded Metrics.
IEEE Trans. Pattern Anal. Mach. Intell., February, 2024

Label Semantic Knowledge Distillation for Unbiased Scene Graph Generation.
IEEE Trans. Circuits Syst. Video Technol., January, 2024

Method for calculating pressure losses in the pipelines of slurry shield tunneling based on coupled simulation of computational fluid dynamics and discrete element method.
Comput. Aided Civ. Infrastructure Eng., January, 2024

Context Matters: Distilling Knowledge Graph for Enhanced Object Detection.
IEEE Trans. Multim., 2024

Progressive Stereo Image Dehazing Network via Cross-View Region Interaction.
IEEE Trans. Multim., 2024

Taking a Closer Look At Visual Relation: Unbiased Video Scene Graph Generation With Decoupled Label Learning.
IEEE Trans. Multim., 2024

Show Me a Video: A Large-Scale Narrated Video Dataset for Coherent Story Illustration.
IEEE Trans. Multim., 2024

SKIM: Skeleton-Based Isolated Sign Language Recognition With Part Mixing.
IEEE Trans. Multim., 2024

IcoCap: Improving Video Captioning by Compounding Images.
IEEE Trans. Multim., 2024

ReGO: Reference-Guided Outpainting for Scenery Image.
IEEE Trans. Image Process., 2024

Zero-Shot Video Grounding With Pseudo Query Lookup and Verification.
IEEE Trans. Image Process., 2024

Progressive Frame-Proposal Mining for Weakly Supervised Video Object Detection.
IEEE Trans. Image Process., 2024

Learning Cross-View Geo-Localization Embeddings via Dynamic Weighted Decorrelation Regularization.
IEEE Trans. Geosci. Remote. Sens., 2024

Multiple-environment Self-adaptive Network for aerial-view geo-localization.
Pattern Recognit., 2024

Collaborative group: Composed image retrieval via consensus learning from noisy annotations.
Knowl. Based Syst., 2024

Differentially Private Neural Tangent Kernels (DP-NTK) for Privacy-Preserving Data Generation.
J. Artif. Intell. Res., 2024

Epipolar-Free 3D Gaussian Splatting for Generalizable Novel View Synthesis.
CoRR, 2024

Human-Object Interaction Detection Collaborated with Large Relation-driven Diffusion Models.
CoRR, 2024

Vision-Language Navigation with Energy-Based Policy.
CoRR, 2024

3DIS: Depth-Driven Decoupled Instance Synthesis for Text-to-Image Generation.
CoRR, 2024

MC-Bench: A Benchmark for Multi-Context Visual Grounding in the Era of MLLMs.
CoRR, 2024

Point-Calibrated Spectral Neural Operators.
CoRR, 2024

CktGen: Specification-Conditioned Analog Circuit Generation.
CoRR, 2024

Hydra-SGG: Hybrid Relation Assignment for One-stage Scene Graph Generation.
CoRR, 2024

DICS: Find Domain-Invariant and Class-Specific Features for Out-of-Distribution Generalization.
CoRR, 2024

ZeroMamba: Exploring Visual State Space Model for Zero-Shot Learning.
CoRR, 2024

Autonomous LLM-Enhanced Adversarial Attack for Text-to-Motion.
CoRR, 2024

FreeLong: Training-Free Long Video Generation with SpectralBlend Temporal Attention.
CoRR, 2024

PiPa++: Towards Unification of Domain Adaptive Semantic Segmentation via Self-supervised Learning.
CoRR, 2024

MIGC++: Advanced Multi-Instance Generation Controller for Image Synthesis.
CoRR, 2024

DeltaPhi: Learning Physical Trajectory Residual for PDE Solving.
CoRR, 2024

Reconstructing and Simulating Dynamic 3D Objects with Mesh-adsorbed Gaussian Splatting.
CoRR, 2024

PLA4D: Pixel-Level Alignments for Text-to-4D Gaussian Splatting.
CoRR, 2024

Prompt-Aware Adapter: Towards Learning Adaptive Visual Tokens for Multimodal Large Language Models.
CoRR, 2024

TOPA: Extend Large Language Models for Video Understanding via Text-Only Pre-Alignment.
CoRR, 2024

BrainODE: Dynamic Brain Signal Analysis via Graph-Aided Neural Ordinary Differential Equations.
CoRR, 2024

AudioScenic: Audio-Driven Video Scene Editing.
CoRR, 2024

Joint Conditional Diffusion Model for Image Restoration with Mixed Degradations.
CoRR, 2024

Visual Knowledge in the Big Model Era: Retrospect and Prospect.
CoRR, 2024

EVA: Zero-shot Accurate Attributes and Multi-Object Video Editing.
CoRR, 2024

Ghost Sentence: A Tool for Everyday Users to Copyright Data from Large Language Models.
CoRR, 2024

ProtChatGPT: Towards Understanding Proteins with Large Language Models.
CoRR, 2024

HeadStudio: Text to Animatable Head Avatars with 3D Gaussian Splatting.
CoRR, 2024

MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis.
CoRR, 2024

Retrosynthesis prediction enhanced by in-silico reaction data augmentation.
CoRR, 2024

Explore Synergistic Interaction Across Frames for Interactive Video Object Segmentation.
CoRR, 2024

Product-Level Try-on: Characteristics-preserving Try-on with Realistic Clothes Shading and Wrinkles.
CoRR, 2024

DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models.
CoRR, 2024

AntEval: Quantitatively Evaluating Informativeness and Expressiveness of Agent Social Interactions.
CoRR, 2024

GD^2-NeRF: Generative Detail Compensation via GAN and Diffusion for One-shot Generalizable Neural Radiance Fields.
CoRR, 2024

Nodule-CLIP: Lung nodule classification based on multi-modal contrastive learning.
Comput. Biol. Medicine, 2024

A multi-stage approach for high-precision measurement of cervical curvature in X-ray images.
Biomed. Signal Process. Control., 2024

Pyramid-attentive GAN for multimodal brain image complementation in Alzheimer's disease classification.
Biomed. Signal Process. Control., 2024

GG-Editor: Locally Editing 3D Avatars with Multimodal Large Language Model Guidance.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Neural Interaction Energy for Multi-Agent Trajectory Prediction.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

MoS<sup>2</sup>: Mixture of Scale and Shift Experts for Text-Only Video Captioning.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

MAC 2024: Micro-Action Analysis Grand Challenge.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Transferring to Real-World Layouts: A Depth-aware Framework for Scene Adaptation.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Progressive Point Cloud Denoising with Cross-Stage Cross-Coder Adaptive Edge Graph Convolution Network.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Prototype Learning for Micro-gesture Classification.
Proceedings of IJCAI 2024 Workshop&Challenge on Micro-gesture Analysis for Hidden Emotion Understanding (MiGA 2024) co-located with 33rd International Joint Conference on Artificial Intelligence (IJCAI 2024), 2024

PLTON: Product-Level Try-on with Realistic Clothes Shading and Wrinkles.
Proceedings of the International Joint Conference on Neural Networks, 2024

DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent).
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Improving Context Understanding in Multimodal Large Language Models via Multimodal Composition Learning.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Test-Time Adaptation with CLIP Reward for Zero-Shot Generalization in Vision-Language Models.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

VividDreamer: Invariant Score Distillation for Hyper-Realistic Text-to-3D Generation.
Proceedings of the Computer Vision - ECCV 2024, 2024

HeadStudio: Text to Animatable Head Avatars with 3D Gaussian Splatting.
Proceedings of the Computer Vision - ECCV 2024, 2024

Nonverbal Interaction Detection.
Proceedings of the Computer Vision - ECCV 2024, 2024

Depth-Aware Blind Image Decomposition for Real-World Adverse Weather Recovery.
Proceedings of the Computer Vision - ECCV 2024, 2024

Mutual Learning for Acoustic Matching and Dereverberation via Visual Scene-Driven Diffusion.
Proceedings of the Computer Vision - ECCV 2024, 2024

Controllable Navigation Instruction Generation with Chain of Thought Prompting.
Proceedings of the Computer Vision - ECCV 2024, 2024

Shape2Scene: 3D Scene Representation Learning Through Pre-training on Shape Data.
Proceedings of the Computer Vision - ECCV 2024, 2024

Navigation Instruction Generation with BEV Perception and Large Language Models.
Proceedings of the Computer Vision - ECCV 2024, 2024

General and Task-Oriented Video Segmentation.
Proceedings of the Computer Vision - ECCV 2024, 2024

Improving Bird's Eye View Semantic Segmentation by Task Decomposition.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

MS-DETR: Efficient DETR Training with Mixed Supervision.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Knowledge-Enhanced Dual-Stream Zero-Shot Composed Image Retrieval.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Psychometry: An Omnifit Model for Image Reconstruction from Human Brain Activity.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Clustering for Protein Representation Learning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Entangled View-Epipolar Information Aggregation for Generalizable Neural Radiance Fields.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Vista-llama: Reducing Hallucination in Video Language Models via Equal Distance to Visual Tokens.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Volumetric Environment Representation for Vision-Language Navigation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

CapHuman: Capture Your Moments in Parallel Universes.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Clustering Propagation for Universal Medical Image Segmentation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Neural Clustering Based Visual Representation Learning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

LSK3DNet: Towards Effective and Efficient 3D Perception with Large Sparse Kernels.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

FragRel: Exploiting Fragment-level Relations in the External Memory of Large Language Models.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

MS2SL: Multimodal Spoken Data-Driven Continuous Sign Language Production.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

VillagerAgent: A Graph-Based Multi-Agent Framework for Coordinating Complex Task Dependencies in Minecraft.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

DGL: Dynamic Global-Local Prompt Tuning for Text-Video Retrieval.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

Stitching Segments and Sentences towards Generalization in Video-Text Pre-training.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

Interpretable3D: An Ad-Hoc Interpretable Classifier for 3D Point Clouds.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Personas-based Student Grouping using reinforcement learning and linear programming.
Knowl. Based Syst., December, 2023

Data-Driven single image deraining: A Comprehensive review and new perspectives.
Pattern Recognit., November, 2023

Filter Pruning by Switching to Neighboring CNNs With Good Attributes.
IEEE Trans. Neural Networks Learn. Syst., October, 2023

Variational Cross-Graph Reasoning and Adaptive Structured Semantics Learning for Compositional Temporal Grounding.
IEEE Trans. Pattern Anal. Mach. Intell., October, 2023

A knowledge-guided and traditional Chinese medicine informed approach for herb recommendation.
Frontiers Inf. Technol. Electron. Eng., October, 2023

Temporal Pixel-Level Semantic Understanding Through the VSPW Dataset.
IEEE Trans. Pattern Anal. Mach. Intell., September, 2023

Local-Global Context Aware Transformer for Language-Guided Video Segmentation.
IEEE Trans. Pattern Anal. Mach. Intell., August, 2023

Multi-view Consistent Generative Adversarial Networks for Compositional 3D-Aware Image Synthesis.
Int. J. Comput. Vis., August, 2023

Differentiable Multi-Granularity Human Parsing.
IEEE Trans. Pattern Anal. Mach. Intell., July, 2023

DeepAlgPro: an interpretable deep neural network model for predicting allergenic proteins.
Briefings Bioinform., July, 2023

Symbiotic Attention for Egocentric Action Recognition With Object-Centric Alignment.
IEEE Trans. Pattern Anal. Mach. Intell., June, 2023

DMRNet++: Learning Discriminative Features With Decoupled Networks and Enriched Pairs for One-Step Person Search.
IEEE Trans. Pattern Anal. Mach. Intell., June, 2023

U-Turn: Crafting Adversarial Queries with Opposite-Direction Features.
Int. J. Comput. Vis., April, 2023

Video Scene Parsing in the Wild.
Dataset, April, 2023

Video Pivoting Unsupervised Multi-Modal Machine Translation.
IEEE Trans. Pattern Anal. Mach. Intell., March, 2023

A Differentiable Parallel Sampler for Efficient Video Classification.
ACM Trans. Multim. Comput. Commun. Appl., 2023

Align and Tell: Boosting Text-Video Retrieval With Local Alignment and Fine-Grained Supervision.
IEEE Trans. Multim., 2023

Progressive Local Filter Pruning for Image Retrieval Acceleration.
IEEE Trans. Multim., 2023

Self-Supervised Point Cloud Representation Learning via Separating Mixed Shapes.
IEEE Trans. Multim., 2023

Cross-Modal Data Augmentation for Tasks of Different Modalities.
IEEE Trans. Multim., 2023

Cyclic Self-Training With Proposal Weight Modulation for Cross-Supervised Object Detection.
IEEE Trans. Image Process., 2023

Co-Learning Meets Stitch-Up for Noisy Multi-Label Visual Recognition.
IEEE Trans. Image Process., 2023

Collaborative Content-Dependent Modeling: A Return to the Roots of Salient Object Detection.
IEEE Trans. Image Process., 2023

Dynamic Slimmable Denoising Network.
IEEE Trans. Image Process., 2023

Query-Efficient Black-Box Adversarial Attack With Customized Iteration and Sampling.
IEEE Trans. Pattern Anal. Mach. Intell., 2023

Point Spatio-Temporal Transformer Networks for Point Cloud Video Modeling.
IEEE Trans. Pattern Anal. Mach. Intell., 2023

Switchable Novel Object Captioner.
IEEE Trans. Pattern Anal. Mach. Intell., 2023

Fast data-free model compression via dictionary-pair reconstruction.
Knowl. Inf. Syst., 2023

Exploring viewport features for semi-supervised saliency prediction in omnidirectional images.
Image Vis. Comput., 2023

Human101: Training 100+FPS Human Gaussians in 100s from 1 View.
CoRR, 2023

SEEAvatar: Photorealistic Text-to-3D Avatar Generation with Constrained Geometry and Appearance.
CoRR, 2023

Vista-LLaMA: Reliable Video Narrator via Equal Distance to Visual Tokens.
CoRR, 2023

SIFU: Side-view Conditioned Implicit Function for Real-world Usable Clothed Human Reconstruction.
CoRR, 2023

SPROUT: Authoring Programming Tutorials with Interactive Visualization of Large Language Model Generation Process.
CoRR, 2023

AvatarStudio: High-fidelity and Animatable 3D Avatar Creation from Text.
CoRR, 2023

FlowZero: Zero-Shot Text-to-Video Synthesis with LLM-Driven Dynamic Scene Syntax.
CoRR, 2023

Clarity ChatGPT: An Interactive and Adaptive Processing System for Image Restoration and Enhancement.
CoRR, 2023

Combating Label Noise With A General Surrogate Model For Sample Selection.
CoRR, 2023

LOGICSEG: Parsing Visual Semantics with Neural Logic Learning and Reasoning.
CoRR, 2023

Global-correlated 3D-decoupling Transformer for Clothed Avatar Reconstruction.
CoRR, 2023

Aggregating Long-term Sharp Features via Hybrid Transformers for Video Deblurring.
CoRR, 2023

DiverseMotion: Towards Diverse Human Motion Generation via Discrete Diffusion.
CoRR, 2023

DPMix: Mixture of Depth and Point Cloud Video Experts for 4D Action Segmentation.
CoRR, 2023

Kefa: A Knowledge Enhanced and Fine-grained Aligned Speaker for Navigation Instruction Generation.
CoRR, 2023

Tachikuma: Understading Complex Interactions with Multi-Character and Novel Objects by Large Language Models.
CoRR, 2023

ZJU ReLER Submission for EPIC-KITCHEN Challenge 2023: TREK-150 Single Object Tracking.
CoRR, 2023

ZJU ReLER Submission for EPIC-KITCHEN Challenge 2023: Semi-Supervised Video Object Segmentation.
CoRR, 2023

Action Sensitivity Learning for the Ego4D Episodic Memory Challenge 2023.
CoRR, 2023

Relieving Triplet Ambiguity: Consensus Network for Language-Guided Image Retrieval.
CoRR, 2023

Whitening-based Contrastive Learning of Sentence Embeddings.
CoRR, 2023

Mitigating Biased Activation in Weakly-supervised Object Localization via Counterfactual Learning.
CoRR, 2023

CLIP4STR: A Simple Baseline for Scene Text Recognition with Pre-trained Vision-Language Model.
CoRR, 2023

VLAB: Enhancing Video Language Pre-training by Feature Adapting and Blending.
CoRR, 2023

Learning Structured Components: Towards Modular and Interpretable Multivariate Time Series Forecasting.
CoRR, 2023

Segment and Track Anything.
CoRR, 2023

Towards Medical Artificial General Intelligence via Knowledge-Enhanced Multimodal Pretraining.
CoRR, 2023

Feature-compatible Progressive Learning for Video Copy Detection.
CoRR, 2023

TransHP: Image Classification with Hierarchical Prompting.
CoRR, 2023

Taking A Closer Look at Visual Relation: Unbiased Video Scene Graph Generation with Decoupled Label Learning.
CoRR, 2023

Decomposed Prototype Learning for Few-Shot Scene Graph Generation.
CoRR, 2023

Exploring Expression-related Self-supervised Learning for Affective Behaviour Analysis.
CoRR, 2023

Unsupervised Facial Expression Representation Learning with Contrastive Local Warping.
CoRR, 2023

Temporal Perceiving Video-Language Pre-training.
CoRR, 2023

Further Improving Weakly-supervised Object Localization via Causal Knowledge Distillation.
CoRR, 2023

Global-correlated 3D-decoupling Transformer for Clothed Avatar Reconstruction.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Hyperbolic Space with Hierarchical Margin Boosts Fine-Grained Learning from Coarse Labels.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Neural-Logic Human-Object Interaction Detection.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

DAC-DETR: Divide the Attention Layers and Conquer.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Decoupled Cross-Scale Cross-View Interaction for Stereo Image Enhancement in the Dark.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

CATR: Combinatorial-Dependence Audio-Queried Transformer for Audio-Visual Video Segmentation.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

AvatarFusion: Zero-shot Generation of Clothing-Decoupled 3D Avatars Using 2D Diffusion.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

PiPa: Pixel- and Patch-wise Self-supervised Learning for Domain Adaptative Semantic Segmentation.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Dark Knowledge Balance Learning for Unbiased Scene Graph Generation.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Pyramid Diffusion Models for Low-light Image Enhancement.
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023

Video Object Segmentation in Panoptic Wild Scenes.
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023

DeCap: Decoding CLIP Latents for Zero-Shot Captioning via Text-Only Training.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Suppressing the Heterogeneity: A Strong Feature Extractor for Few-shot Segmentation.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Continuous-Discrete Convolution for Geometry-Sequence Modeling in Proteins.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Decompose to Generalize: Species-Generalized Animal Pose Estimation.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

The First Visual Object Tracking Segmentation VOTS2023 Challenge Results.
, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Learning Symmetry-Aware Geometry Correspondences for 6D Object Pose Estimation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

GETAvatar: Generative Textured Meshes for Animatable Human Avatars.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Integrating Boxes and Masks: A Multi-Object Framework for Unified Visual Tracking and Segmentation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Action Sensitivity Learning for Temporal Action Localization.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

TransHuman: A Transformer-based Human Representation for Generalizable Neural Human Rendering.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Bird's-Eye-View Scene Graph for Vision-Language Navigation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

MAAL: Multimodality-Aware Autoencoder-based Affordance Learning for 3D Articulated Objects.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Logic-induced Diagnostic Reasoning for Semi-supervised Semantic Segmentation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

JOTR: 3D Joint Contrastive Learning with Transformers for Occluded Human Mesh Recovery.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Distilling DETR with Visual-Linguistic Knowledge for Open-Vocabulary Object Detection.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Compositional Feature Augmentation for Unbiased Scene Graph Generation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Efficient Emotional Adaptation for Audio-Driven Talking-Head Generation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Omnidirectional Information Gathering for Knowledge Transfer-based Audio-Visual Navigation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Clustering based Point Cloud Representation Learning for 3D Analysis.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Shuffled Autoregression for Motion Interpolation.
Proceedings of the IEEE International Conference on Acoustics, 2023

Unsupervised Multi-Hashing for Image Retrieval in Non-stationary Environments.
Proceedings of the 15th International Conference on Advanced Computational Intelligence, 2023

Text Augmented Spatial Aware Zero-shot Referring Image Segmentation.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Exploiting Contrastive Learning and Numerical Evidence for Confusing Legal Judgment Prediction.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Exploring Expression-related Self-supervised Learning and Spatial Reserve Pooling for Affective Behaviour Analysis.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Context-Aware Pretraining for Efficient Blind Image Decomposition.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

LANA: A Language-Capable Navigator for Instruction Following and Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

DETR with Additional Global Aggregation for Cross-domain Weakly Supervised Object Detection.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Global-to-Local Modeling for Video-Based 3D Human Pose and Shape Estimation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Joint Video Multi-Frame Interpolation and Deblurring under Unknown Exposure Time.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

FedSeg: Class-Heterogeneous Federated Learning for Semantic Segmentation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

ProD: Prompting-to-disentangle Domain Knowledge for Cross-domain Few-shot Image Classification.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Unified Mask Embedding and Correspondence Learning for Self-Supervised Video Segmentation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Efficient Multimodal Fusion via Interactive Prompting.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

MIST : Multi-modal Iterative Spatial-Temporal Transformer for Long-form Video Question Answering.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

PointListNet: Deep Learning on 3D Point Lists.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023


Adversarially Masking Synthetic to Mimic Real: Adaptive Noise Injection for Point Cloud Segmentation Adaptation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

WhitenedCSE: Whitening-based Contrastive Learning of Sentence Embeddings.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

Gloss-Free End-to-End Sign Language Translation.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

Semi-attention Partition for Occluded Person Re-identification.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
Rich Embedding Features for One-Shot Semantic Segmentation.
IEEE Trans. Neural Networks Learn. Syst., 2022

Identifying Visible Parts via Pose Estimation for Occluded Person Re-Identification.
IEEE Trans. Neural Networks Learn. Syst., 2022

Learning With Noisy Labels via Self-Reweighting From Class Centroids.
IEEE Trans. Neural Networks Learn. Syst., 2022

Temporal Cross-Layer Correlation Mining for Action Recognition.
IEEE Trans. Multim., 2022

Zero-Shot Video Event Detection With High-Order Semantic Concept Discovery and Matching.
IEEE Trans. Multim., 2022

Infrared Action Detection in the Dark via Cross-Stream Attention Mechanism.
IEEE Trans. Multim., 2022

Adaptive Boosting for Domain Adaptation: Toward Robust Predictions in Scene Segmentation.
IEEE Trans. Image Process., 2022

Boosting RGB-D Saliency Detection by Leveraging Unlabeled RGB Images.
IEEE Trans. Image Process., 2022

Joint Representation Learning and Keypoint Detection for Cross-View Geo-Localization.
IEEE Trans. Image Process., 2022

Action Keypoint Network for Efficient Video Recognition.
IEEE Trans. Image Process., 2022

Soft Person Reidentification Network Pruning via Blockwise Adjacent Filter Decaying.
IEEE Trans. Cybern., 2022

Point Adversarial Self-Mining: A Simple Method for Facial Expression Recognition.
IEEE Trans. Cybern., 2022

Unsupervised Visual Representation Learning via Dual-Level Progressive Similar Instance Selection.
IEEE Trans. Cybern., 2022

Each Part Matters: Local Patterns Facilitate Cross-View Geo-Localization.
IEEE Trans. Circuits Syst. Video Technol., 2022

IDBP: Image Dehazing Using Blended Priors Including Non-Local, Local, and Global Priors.
IEEE Trans. Circuits Syst. Video Technol., 2022

Partial Alignment for Object Detection in the Wild.
IEEE Trans. Circuits Syst. Video Technol., 2022

Understanding Atomic Hand-Object Interaction With Human Intention.
IEEE Trans. Circuits Syst. Video Technol., 2022

SemGloVe: Semantic Co-Occurrences for GloVe From BERT.
IEEE ACM Trans. Audio Speech Lang. Process., 2022

Single image based 3D human pose estimation via uncertainty learning.
Pattern Recognit., 2022

Label Independent Memory for Semi-Supervised Few-Shot Video Classification.
IEEE Trans. Pattern Anal. Mach. Intell., 2022

Saying the Unseen: Video Descriptions via Dialog Agents.
IEEE Trans. Pattern Anal. Mach. Intell., 2022

Tasks Integrated Networks: Joint Detection and Retrieval for Image Search.
IEEE Trans. Pattern Anal. Mach. Intell., 2022

Collaborative Video Object Segmentation by Multi-Scale Foreground-Background Integration.
IEEE Trans. Pattern Anal. Mach. Intell., 2022

Instance-Invariant Domain Adaptive Object Detection Via Progressive Disentanglement.
IEEE Trans. Pattern Anal. Mach. Intell., 2022

Category-Level Adversarial Adaptation for Semantic Segmentation Using Purified Features.
IEEE Trans. Pattern Anal. Mach. Intell., 2022

Self-Correction for Human Parsing.
IEEE Trans. Pattern Anal. Mach. Intell., 2022

Contrastive Adaptation Network for Single- and Multi-Source Domain Adaptation.
IEEE Trans. Pattern Anal. Mach. Intell., 2022

Deep Hierarchical Representation of Point Cloud Videos via Spatio-Temporal Decomposition.
IEEE Trans. Pattern Anal. Mach. Intell., 2022

AutoESD: a web tool for automatic editing sequence design for genetic manipulation of microorganisms.
Nucleic Acids Res., 2022

NAP: Neural architecture search with pruning.
Neurocomputing, 2022

Weakly Supervised Moment Localization with Decoupled Consistent Concept Prediction.
Int. J. Comput. Vis., 2022

StepNet: Spatial-temporal Part-aware Network for Sign Language Recognition.
CoRR, 2022

Stereo Image Rain Removal via Dual-View Mutual Attention.
CoRR, 2022

ReLER@ZJU Submission to the Ego4D Moment Queries Challenge 2022.
CoRR, 2022

Exploiting Contrastive Learning and Numerical Evidence for Improving Confusing Legal Judgment Prediction.
CoRR, 2022

Learning Cross-view Geo-localization Embeddings via Dynamic Weighted Decorrelation Regularization.
CoRR, 2022

NoiSER: Noise is All You Need for Low-Light Image Enhancement.
CoRR, 2022

Simple Primitives with Feasibility- and Contextuality-Dependence for Open-World Compositional Zero-shot Learning.
CoRR, 2022

Towards Data-and Knowledge-Driven Artificial Intelligence: A Survey on Neuro-Symbolic Computing.
CoRR, 2022

Seeing Through The Noisy Dark: Toward Real-world Low-Light Image Enhancement and Denoising.
CoRR, 2022

Slimmable Networks for Contrastive Self-supervised Learning.
CoRR, 2022

Label Semantic Knowledge Distillation for Unbiased Scene Graph Generation.
CoRR, 2022

V<sup>2</sup>L: Leveraging Vision and Vision-language Models into Large-scale Product Retrieval.
CoRR, 2022

Subband-based Generative Adversarial Network for Non-parallel Many-to-many Voice Conversion.
CoRR, 2022

ReLER@ZJU-Alibaba Submission to the Ego4D Natural Language Queries Challenge 2022.
CoRR, 2022

A Benchmark and Asymmetrical-Similarity Learning for Practical Image Copy Detection.
CoRR, 2022

3D Magic Mirror: Clothing Reconstruction from a Single Image via a Causal Perspective.
CoRR, 2022

Rethinking Multi-Modal Alignment in Video Question Answering from Feature and Sample Perspectives.
CoRR, 2022

Associating Objects with Scalable Transformers for Video Object Segmentation.
CoRR, 2022

Bridging the Source-to-target Gap for Cross-domain Person Re-Identification with Intermediate Domains.
CoRR, 2022

CenterCLIP: Token Clustering for Efficient Text-Video Retrieval.
Proceedings of the SIGIR '22: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, July 11, 2022

Feature-Robust Optimal Transport for High-Dimensional Data.
Proceedings of the Machine Learning and Knowledge Discovery in Databases, 2022

Feature-Proxy Transformer for Few-Shot Segmentation.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Decoupling Features in Hierarchical Propagation for Video Object Segmentation.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

GMMSeg: Gaussian Mixture based Generative Semantic Segmentation Models.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Triggerless Backdoor Attack for NLP Tasks with Clean Labels.
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

Deep Multi-Resolution Mutual Learning for Image Inpainting.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

FCL-GAN: A Lightweight and Real-Time Baseline for Unsupervised Blind Image Deblurring.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

SGINet: Toward Sufficient Interaction Between Single Image Deraining and Semantic Segmentation.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Active Learning for Point Cloud Semantic Segmentation via Spatial-Structural Diversity Reasoning.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

In-N-Out Generative Learning for Dense Unsupervised Video Segmentation.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Bidirectional Self-Training with Multiple Anisotropic Prototypes for Domain Adaptive Semantic Segmentation.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Integrating Object-aware and Interaction-aware Knowledge for Weakly Supervised Scene Graph Generation.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Switch to Generalize: Domain-Switch Learning for Cross-Domain Few-Shot Classification.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Rethinking Multi-Modal Alignment in Multi-Choice VideoQA from Feature and Sample Perspectives.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

Instance as Identity: A Generic Online Paradigm for Video Instance Segmentation.
Proceedings of the Computer Vision - ECCV 2022, 2022

MHR-Net: Multiple-Hypothesis Reconstruction of Non-Rigid Shapes from 2D Views.
Proceedings of the Computer Vision - ECCV 2022, 2022

The Tenth Visual Object Tracking VOT2022 Challenge Results.
, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
Proceedings of the Computer Vision - ECCV 2022 Workshops, 2022

Multi-View Consistent Generative Adversarial Networks for 3D-aware Image Synthesis.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

H2FA R-CNN: Holistic and Hierarchical Feature Alignment for Cross-domain Weakly Supervised Object Detection.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Large-scale Video Panoptic Segmentation in the Wild: A Benchmark.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Unified Transformer Tracker for Object Tracking.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Learning Memory-Augmented Unidirectional Metrics for Cross-modality Person Re-identification.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

A Simple Episodic Linear Probe Improves Visual Recognition in the Wild.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Visual Abductive Reasoning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

SEEG: Semantic Energized Co-speech Gesture Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Locality-Aware Inter-and Intra-Video Reconstruction for Self-Supervised Correspondence Learning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Deep Hierarchical Semantic Segmentation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Automated Progressive Learning for Efficient Training of Vision Transformers.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Learning to Learn by Jointly Optimizing Neural Architecture and Weights.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Compositional Temporal Grounding with Structured Variational Cross-Graph Correspondence Learning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

DUDA: Online-Offline Dual Domain Adaption for Semantic Segmentation.
Proceedings of the 33rd British Machine Vision Conference 2022, 2022

Boost CTR Prediction for New Advertisements via Modeling Visual Content.
Proceedings of the IEEE International Conference on Big Data, 2022

Divide-and-Regroup Clustering for Domain Adaptive Person Re-identification.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

Monocular Camera-Based Point-Goal Navigation by Learning Depth Channel and Cross-Modality Pyramid Fusion.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021
ART-UP: A Novel Method for Generating Scanning-Robust Aesthetic QR Codes.
ACM Trans. Multim. Comput. Commun. Appl., 2021

VehicleNet: Learning Robust Visual Representation for Vehicle Re-Identification.
IEEE Trans. Multim., 2021

Few-Shot Common-Object Reasoning Using Common-Centric Localization Network.
IEEE Trans. Image Process., 2021

Pyramidal Multiple Instance Detection Network With Mask Guided Self-Correction for Weakly Supervised Object Detection.
IEEE Trans. Image Process., 2021

Training Robust Object Detectors From Noisy Category Labels and Imprecise Bounding Boxes.
IEEE Trans. Image Process., 2021

Learning to Anticipate Egocentric Actions by Imagination.
IEEE Trans. Image Process., 2021

DerainCycleGAN: Rain Attentive CycleGAN for Single Image Deraining and Rainmaking.
IEEE Trans. Image Process., 2021

Sketch-Guided Scenery Image Outpainting.
IEEE Trans. Image Process., 2021

Holistic LSTM for Pedestrian Trajectory Prediction.
IEEE Trans. Image Process., 2021

Progressive Transfer Learning for Face Anti-Spoofing.
IEEE Trans. Image Process., 2021

Semantics-Aware Spatial-Temporal Binaries for Cross-Modal Video Retrieval.
IEEE Trans. Image Process., 2021

IDE: Image Dehazing and Exposure Using an Enhanced Atmospheric Scattering Model.
IEEE Trans. Image Process., 2021

Discriminative Feature Learning for Thorax Disease Classification in Chest X-ray Images.
IEEE Trans. Image Process., 2021

Hierarchical Memory Decoder for Visual Narrating.
IEEE Trans. Circuits Syst. Video Technol., 2021

Hierarchical Temporal Modeling With Mutual Distance Matching for Video Based Person Re-Identification.
IEEE Trans. Circuits Syst. Video Technol., 2021

Learning to Adapt Invariance in Memory for Person Re-Identification.
IEEE Trans. Pattern Anal. Mach. Intell., 2021

Learning Part-based Convolutional Features for Person Re-Identification.
IEEE Trans. Pattern Anal. Mach. Intell., 2021

Supervision by Registration and Triangulation for Landmark Detection.
IEEE Trans. Pattern Anal. Mach. Intell., 2021

An integrated classification model for incremental learning.
Multim. Tools Appl., 2021

Multiple knowledge representation for big data artificial intelligence: framework, applications, and case studies.
Frontiers Inf. Technol. Electron. Eng., 2021

Visual commonsense reasoning with directional visual connections.
Frontiers Inf. Technol. Electron. Eng., 2021

A Survey on Concept Factorization: From Shallow to Deep Representation Learning.
Inf. Process. Manag., 2021

Rectifying Pseudo Label Learning via Uncertainty Estimation for Domain Adaptive Semantic Segmentation.
Int. J. Comput. Vis., 2021

Bag of Tricks and A Strong baseline for Image Copy Detection.
CoRR, 2021

D^2LV: A Data-Driven and Local-Verification Approach for Image Copy Detection.
CoRR, 2021

Dynamic Slimmable Denoising Network.
CoRR, 2021

Contrastive Video-Language Segmentation.
CoRR, 2021

Point Cloud Pre-training by Mixing and Disentangling.
CoRR, 2021

Understanding and Accelerating Neural Architecture Search with Training-Free and Theory-Grounded Metrics.
CoRR, 2021

Generating Superpixels for High-resolution Images with Decoupled Patch Calibration.
CoRR, 2021

Less is More: Sparse Sampling for Dense Reaction Predictions.
CoRR, 2021

Rethinking Cross-modal Interaction from a Top-down Perspective for Referring Video Object Segmentation.
CoRR, 2021

Prior-Enhanced Few-Shot Segmentation with Meta-Prototypes.
CoRR, 2021

Divide and Rule: Recurrent Partitioned Network for Dynamic Processes.
CoRR, 2021

VidFace: A Full-Transformer Solver for Video FaceHallucination with Unaligned Tiny Snapshots.
CoRR, 2021

OR-Net: Pointwise Relational Inference for Data Completion under Partial Observation.
CoRR, 2021

Action-Based Conversations Dataset: A Corpus for Building More In-Depth Task-Oriented Dialogue Systems.
CoRR, 2021

Adaptive Boosting for Domain Adaptation: Towards Robust Predictions in Scene Segmentation.
CoRR, 2021

ClawCraneNet: Leveraging Object-level Relation for Text-based Video Segmentation.
CoRR, 2021

Decoupled Spatial Temporal Graphs for Generic Visual Grounding.
CoRR, 2021

Universal-Prototype Augmentation for Few-Shot Object Detection.
CoRR, 2021

Modeling the Probabilistic Distribution of Unlabeled Data forOne-shot Medical Image Segmentation.
CoRR, 2021

Auto-Navigator: Decoupled Neural Architecture Search for Visual Navigation.
Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2021

LSMI-Sinkhorn: Semi-supervised Mutual Information Estimation with Optimal Transport.
Proceedings of the Machine Learning and Knowledge Discovery in Databases. Research Track, 2021

Few-Shot Segmentation via Cycle-Consistent Transformer.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Associating Objects with Transformers for Video Object Segmentation.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Results and findings of the 2021 Image Similarity Challenge.
Proceedings of the NeurIPS 2021 Competitions and Demonstrations Track, 2021

Action-Based Conversations Dataset: A Corpus for Building More In-Depth Task-Oriented Dialogue Systems.
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

WAB'21: 1st Workshop on Multimodal Product Identification in Livestreaming and WAB Challenge.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

GCM-Net: Towards Effective Global Context Modeling for Image Inpainting.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Improving Weakly Supervised Object Localization via Causal Intervention.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Latent Memory-augmented Graph Transformer for Visual Storytelling.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Video-to-Image Casting: A Flatting Method for Video Analysis.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

PSTNet: Point Spatio-Temporal Convolution on Point Cloud Sequences.
Proceedings of the 9th International Conference on Learning Representations, 2021

Triplet Deep Subspace Clustering via Self-Supervised Data Augmentation.
Proceedings of the IEEE International Conference on Data Mining, 2021

Dictionary Pair-based Data-Free Fast Deep Neural Network Compression.
Proceedings of the IEEE International Conference on Data Mining, 2021

PR-RRN: Pairwise-Regularized Residual-Recursive Networks for Non-rigid Structure-from-Motion.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Vector-Decomposed Disentanglement for Domain-Invariant Object Detection.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Universal-Prototype Enhancing for Few-Shot Object Detection.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Interactive Prototype Learning for Egocentric Action Recognition.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

AINet: Association Implantation for Superpixel Segmentation.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Sub-bit Neural Networks: Learning to Compress and Accelerate Binary Neural Networks.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

A Multi-Mode Modulator for Multi-Domain Few-Shot Classification.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Super-Resolving Cross-Domain Face Miniatures by Peeking at One-Shot Exemplar.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Weakly Supervised Person Search with Region Siamese Networks.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

RFNet: Region-aware Fusion Network for Incomplete Multi-modal Brain Tumor Segmentation.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Adaptive Hierarchical Graph Reasoning with Semantic Coherence for Video-and-Language Inference.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Learning Audio-Visual Correlations From Variational Cross-Modal Generation.
Proceedings of the IEEE International Conference on Acoustics, 2021

Differentiable Multi-Granularity Human Representation Learning for Instance-Aware Human Semantic Parsing.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

OpenMix: Reviving Known Knowledge for Discovering Novel Visual Categories in an Open World.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Grounded, Controllable and Debiased Image Completion With Lexical Semantics.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2021

DSC-PoseNet: Learning 6DoF Object Pose Estimation via Dual-Scale Consistency.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Faster Meta Update Strategy for Noise-Robust Deep Learning.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Exploring Heterogeneous Clues for Weakly-Supervised Audio-Visual Video Parsing.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

T2VLAD: Global-Local Sequence Alignment for Text-Video Retrieval.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Removing Raindrops and Rain Streaks in One Go.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

VSPW: A Large-scale Dataset for Video Scene Parsing in the Wild.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Robust Vehicle Re-Identification via Rigid Structure Prior.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2021

DOTS: Decoupling Operation and Topology in Differentiable Architecture Search.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Point 4D Transformer Networks for Spatio-Temporal Modeling in Point Cloud Videos.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Connecting Language and Vision for Natural Language-Based Vehicle Retrieval.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2021

Domain Consensus Clustering for Universal Domain Adaptation.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Decoupled and Memory-Reinforced Networks: Towards Effective Feature Learning for One-Step Person Search.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

Judgment Prediction via Injecting Legal Knowledge into Neural Networks.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

Modeling the Probabilistic Distribution of Unlabeled Data for One-shot Medical Image Segmentation.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020
Dual-path Convolutional Image-Text Embeddings with Instance Loss.
ACM Trans. Multim. Comput. Commun. Appl., 2020

Recurrent Attention Network with Reinforced Generator for Visual Dialog.
ACM Trans. Multim. Comput. Commun. Appl., 2020

Adaptive Exploration for Unsupervised Person Re-identification.
ACM Trans. Multim. Comput. Commun. Appl., 2020

Person Reidentification via Multi-Feature Fusion With Adaptive Graph Learning.
IEEE Trans. Neural Networks Learn. Syst., 2020

Deep Top-$k$ Ranking for Image-Sentence Matching.
IEEE Trans. Multim., 2020

Fast and Low Memory Cost Matrix Factorization: Algorithm, Analysis, and Case Study.
IEEE Trans. Knowl. Data Eng., 2020

Learning Distilled Graph for Large-Scale Social Network Data Clustering.
IEEE Trans. Knowl. Data Eng., 2020

Personalized Video Recommendation Using Rich Contents from Videos.
IEEE Trans. Knowl. Data Eng., 2020

Pair-based Uncertainty and Diversity Promoting Early Active Learning for Person Re-identification.
ACM Trans. Intell. Syst. Technol., 2020

Revisiting EmbodiedQA: A Simple Baseline and Beyond.
IEEE Trans. Image Process., 2020

Unsupervised Person Re-identification via Cross-Camera Similarity Exploration.
IEEE Trans. Image Process., 2020

SG-One: Similarity Guidance Network for One-Shot Semantic Segmentation.
IEEE Trans. Cybern., 2020

Asymptotic Soft Filter Pruning for Deep Convolutional Neural Networks.
IEEE Trans. Cybern., 2020

Convolutional Reconstruction-to-Sequence for Video Captioning.
IEEE Trans. Circuits Syst. Video Technol., 2020

Cascaded Revision Network for Novel Object Captioning.
IEEE Trans. Circuits Syst. Video Technol., 2020

Bayesian query expansion for multi-camera person re-identification.
Pattern Recognit. Lett., 2020

Thorax disease classification with attention guided convolutional neural network.
Pattern Recognit. Lett., 2020

Every node counts: Self-ensembling graph convolutional networks for semi-supervised learning.
Pattern Recognit., 2020

Self-paced Multi-view Co-training.
J. Mach. Learn. Res., 2020

Understanding Image Retrieval Re-Ranking: A Graph Neural Network Perspective.
CoRR, 2020

LID 2020: The Learning from Imperfect Data Challenge Results.
CoRR, 2020

DOTS: Decoupling Operation and Topology in Differentiable Architecture Search.
CoRR, 2020

Each Part Matters: Local Patterns Facilitate Cross-view Geo-localization.
CoRR, 2020

Point Adversarial Self Mining: A Simple Method for Facial Expression Recognition in the Wild.
CoRR, 2020

DONet: Dual Objective Networks for Skin Lesion Segmentation.
CoRR, 2020

Rethinking Localization Map: Towards Accurate Object Perception with Self-Enhancement Maps.
CoRR, 2020

Person Re-identification in the 3D Space.
CoRR, 2020

Feature Robust Optimal Transport for High-dimensional Data.
CoRR, 2020

Omni-supervised Facial Expression Recognition: A Simple Baseline.
CoRR, 2020

Grounded and Controllable Image Completion by Incorporating Lexical Semantics.
CoRR, 2020

UTS Submission at the TRECVID 2020 Disaster Scene Description and Indexing Task.
Proceedings of the 2020 TREC Video Retrieval Evaluation, 2020

Adversarial Style Mining for One-Shot Unsupervised Domain Adaptation.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Consistent Structural Relation Learning for Zero-Shot Segmentation.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Pixel-Level Cycle Association: A New Perspective for Domain Adaptive Semantic Segmentation.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

University-1652: A Multi-view Multi-source Benchmark for Drone-based Geo-localization.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Few-Shot Ensemble Learning for Video Classification with SlowFast Memory Networks.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Meta Parsing Networks: Towards Generalized Few-shot Scene Parsing with Adaptive Metric Learning.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Unsupervised Scene Adaptation with Memory Regularization in vivo.
Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, 2020

Query-efficient Meta Attack to Deep Neural Networks.
Proceedings of the 8th International Conference on Learning Representations, 2020

NAS-Bench-201: Extending the Scope of Reproducible Neural Architecture Search.
Proceedings of the 8th International Conference on Learning Representations, 2020

Describing Unseen Videos via Multi-modal Cooperative Dialog Agents.
Proceedings of the Computer Vision - ECCV 2020, 2020

Learning to Transfer Learn: Reinforcement Learning-Based Selection for Adaptive Transfer Learning.
Proceedings of the Computer Vision - ECCV 2020, 2020

Motion-Excited Sampler: Video Adversarial Attack with Sparked Prior.
Proceedings of the Computer Vision - ECCV 2020, 2020

Inter-Image Communication for Weakly Supervised Localization.
Proceedings of the Computer Vision - ECCV 2020, 2020

Collaborative Video Object Segmentation by Foreground-Background Integration.
Proceedings of the Computer Vision - ECCV 2020, 2020

SF-Net: Single-Frame Supervision for Temporal Action Localization.
Proceedings of the Computer Vision - ECCV 2020, 2020

Content-Consistent Matching for Domain Adaptive Semantic Segmentation.
Proceedings of the Computer Vision - ECCV 2020, 2020

ActBERT: Learning Global-Local Video-Text Representations.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Inflated Episodic Memory With Region Self-Attention for Long-Tailed Visual Recognition.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Going Beyond Real Data: A Robust Visual Representation for Vehicle Re-identification.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Gated Channel Transformation for Visual Recognition.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Dynamic Inference: A New Approach Toward Efficient Video Action Recognition.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Imitative Non-Autoregressive Modeling for Trajectory Forecasting and Imputation.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Memory Aggregation Networks for Efficient Interactive Video Object Segmentation.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Semantic Correspondence as an Optimal Transport Problem.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Learning Filter Pruning Criteria for Deep Convolutional Neural Networks Acceleration.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Salience-Guided Cascaded Suppression Network for Person Re-Identification.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

When Humans Meet Machines: Towards Efficient Segmentation Networks.
Proceedings of the 31st British Machine Vision Conference 2020, 2020

FASTER Recurrent Networks for Efficient Video Classification.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

EEMEFN: Low-Light Image Enhancement via Edge-Enhanced Multi-Exposure Fusion Network.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

Random Erasing Data Augmentation.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

Symbiotic Attention with Privileged Information for Egocentric Action Recognition.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

Context Modulated Dynamic Networks for Actor and Action Video Segmentation with Language Queries.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

Adversarial Localized Energy Network for Structured Prediction.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

Person Tube Retrieval via Language Description.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019
Modality-Invariant Image-Text Embedding for Image-Sentence Matching.
ACM Trans. Multim. Comput. Commun. Appl., 2019

Exploiting Combination Effect for Unsupervised Feature Selection by ℓ<sub>2, 0</sub> Norm.
IEEE Trans. Neural Networks Learn. Syst., 2019

Graph Structure Fusion for Multiview Clustering.
IEEE Trans. Knowl. Data Eng., 2019

CamStyle: A Novel Data Augmentation Method for Person Re-Identification.
IEEE Trans. Image Process., 2019

Pose-Invariant Embedding for Deep Person Re-Identification.
IEEE Trans. Image Process., 2019

Multiview Consensus Graph Clustering.
IEEE Trans. Image Process., 2019

Progressive Learning for Person Re-Identification With One Example.
IEEE Trans. Image Process., 2019

Late Fusion via Subspace Search With Consistency Preservation.
IEEE Trans. Image Process., 2019

Adaptive Structure Discovery for Multimedia Analysis Using Multiple Features.
IEEE Trans. Cybern., 2019

Learning Latent Stable Patterns for Image Understanding With Weak and Noisy Labels.
IEEE Trans. Cybern., 2019

Pedestrian Alignment Network for Large-scale Person Re-Identification.
IEEE Trans. Circuits Syst. Video Technol., 2019

Improving person re-identification by attribute and identity learning.
Pattern Recognit., 2019

Few-Example Object Detection with Model Communication.
IEEE Trans. Pattern Anal. Mach. Intell., 2019

PointRNN: Point Recurrent Neural Network for Moving Point Cloud Processing.
CoRR, 2019

Multi-scale discriminative Region Discovery for Weakly-Supervised Object Localization.
CoRR, 2019

LSMI-Sinkhorn: Semi-supervised Squared-Loss Mutual Information Estimation with Optimal Transport.
CoRR, 2019

Learning to Transfer Learn.
CoRR, 2019

Cascaded Revision Network for Novel Object Captioning.
CoRR, 2019

Baidu-UTS Submission to the EPIC-Kitchens Action Recognition Challenge 2019.
CoRR, 2019

FASTER Recurrent Networks for Video Classification.
CoRR, 2019

Meta Filter Pruning to Accelerate Deep Convolutional Neural Networks.
CoRR, 2019

Connective Cognition Network for Directional Visual Commonsense Reasoning.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Network Pruning via Transformable Architecture Search.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Generalized Majorization-Minimization for Non-Convex Optimization.
Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019

Video Interactive Captioning with Human Prompts.
Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019

Truncated Gradient Confidence-Weighted Based Online Learning for Imbalance Streaming Data.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2019

Learning to Propagate Labels: Transductive Propagation Network for Few-Shot Learning.
Proceedings of the 7th International Conference on Learning Representations, 2019

Going Deeper Into Embedding Learning for Video Object Segmentation.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshops, 2019

Dual Embedding Learning for Video Instance Segmentation.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshops, 2019

Very Long Natural Scenery Image Prediction by Outpainting.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Dual Attention Matching for Audio-Visual Event Localization.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Auto-ReID: Searching for a Part-Aware ConvNet for Person Re-Identification.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Pose-Guided Feature Alignment for Occluded Person Re-Identification.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Significance-Aware Information Bottleneck for Domain Adaptive Semantic Segmentation.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Entangled Transformer for Image Captioning.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Attract or Distract: Exploit the Margin of Open Set.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

One-Shot Neural Architecture Search via Self-Evaluated Template Network.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Teacher Supervises Students How to Learn From Partially Labeled Images for Facial Landmark Detection.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Sim-Real Joint Reinforcement Transfer for 3D Indoor Navigation.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

DM-GAN: Dynamic Memory Generative Adversarial Networks for Text-To-Image Synthesis.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Invariance Matters: Exemplar Memory for Domain Adaptive Person Re-Identification.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Joint Discriminative and Generative Learning for Person Re-Identification.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

VehicleNet: Learning Robust Feature Representation for Vehicle Re-identification.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2019

Taking a Closer Look at Domain Shift: Category-Level Adversaries for Semantics Consistent Domain Adaptation.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Contrastive Adaptation Network for Unsupervised Domain Adaptation.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Searching for a Robust Neural Architecture in Four GPU Hours.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Adaptive Sparse Confidence-Weighted Learning for Online Feature Selection.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

A Bottom-Up Clustering Approach to Unsupervised Person Re-Identification.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

Cubic LSTMs for Video Prediction.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018
Exploiting detected visual objects for frame-level video filtering.
World Wide Web, 2018

A Discriminatively Learned CNN Embedding for Person Reidentification.
ACM Trans. Multim. Comput. Commun. Appl., 2018

Unsupervised Person Re-identification: Clustering and Fine-tuning.
ACM Trans. Multim. Comput. Commun. Appl., 2018

Identifying Objective and Subjective Words via Topic Modeling.
IEEE Trans. Neural Networks Learn. Syst., 2018

Adaptive Unsupervised Feature Selection With Structure Regularization.
IEEE Trans. Neural Networks Learn. Syst., 2018

Dynamic Affinity Graph Construction for Spectral Clustering Using Multiple Features.
IEEE Trans. Neural Networks Learn. Syst., 2018

Rank-Constrained Spectral Clustering With Flexible Embedding.
IEEE Trans. Neural Networks Learn. Syst., 2018

Fusing Geometric Features for Skeleton-Based Action Recognition Using Multilayer LSTM Networks.
IEEE Trans. Multim., 2018

Twitter100k: A Real-World Dataset for Weakly Supervised Cross-Media Retrieval.
IEEE Trans. Multim., 2018

Few-Shot Text and Image Classification via Analogical Transfer Learning.
ACM Trans. Intell. Syst. Technol., 2018

Two-Stream Multirate Recurrent Neural Network for Video-Based Pedestrian Reidentification.
IEEE Trans. Ind. Informatics, 2018

An Adaptive Semisupervised Feature Analysis for Video Semantic Recognition.
IEEE Trans. Cybern., 2018

SIFT Meets CNN: A Decade Survey of Instance Retrieval.
IEEE Trans. Pattern Anal. Mach. Intell., 2018

Indexing of the CNN features for the large scale image search.
Multim. Tools Appl., 2018

Similarity-preserving Image-image Domain Adaptation for Person Re-identification.
CoRR, 2018

Pruning Filter via Geometric Median for Deep Convolutional Neural Networks Acceleration.
CoRR, 2018

Learning Discriminators as Energy Networks in Adversarial Learning.
CoRR, 2018

Every Node Counts: Self-Ensembling Graph Convolutional Networks for Semi-Supervised Learning.
CoRR, 2018

Open Set Adversarial Examples.
CoRR, 2018

Attentive Sequence to Sequence Translation for Localizing Clips of Interest by Natural Language Descriptions.
CoRR, 2018

Progressive Deep Neural Networks Acceleration via Soft Filter Pruning.
CoRR, 2018

Transductive Propagation Network for Few-shot Learning.
CoRR, 2018

Diagnose like a Radiologist: Attention Guided Convolutional Neural Network for Thorax Disease Classification.
CoRR, 2018

A query execution scheduling scheme for Impala system.
Concurr. Comput. Pract. Exp., 2018

UTS_CAI submission at TRECVID 2018 Ad-hoc Video Search Task.
Proceedings of the 2018 TREC Video Retrieval Evaluation, 2018

Activities in Extended Video.
Proceedings of the 2018 TREC Video Retrieval Evaluation, 2018

On the Large-Scale Transferability of Convolutional Neural Networks.
Proceedings of the Trends and Applications in Knowledge Discovery and Data Mining, 2018

Decoupled Novel Object Captioner.
Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018

Fast Parameter Adaptation for Few-shot Image Captioning and Visual Question Answering.
Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018

A Unified Analysis of Stochastic Momentum Methods for Deep Learning.
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, 2018

Uncertainty Sampling for Action Recognition via Maximizing Expected Average Precision.
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, 2018

Soft Filter Pruning for Accelerating Deep Convolutional Neural Networks.
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, 2018

Watching a Small Portion could be as Good as Watching All: Towards Efficient Video Classification.
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, 2018

Compound Memory Networks for Few-Shot Video Classification.
Proceedings of the Computer Vision - ECCV 2018, 2018

Generalizing a Person Retrieval Model Hetero- and Homogeneously.
Proceedings of the Computer Vision - ECCV 2018, 2018

Self-produced Guidance for Weakly-Supervised Object Localization.
Proceedings of the Computer Vision - ECCV 2018, 2018

Beyond Part Models: Person Retrieval with Refined Part Pooling (and A Strong Convolutional Baseline).
Proceedings of the Computer Vision - ECCV 2018, 2018

Macro-Micro Adversarial Network for Human Parsing.
Proceedings of the Computer Vision - ECCV 2018, 2018

Deep Adversarial Attention Alignment for Unsupervised Domain Adaptation: The Benefit of Target Expectation Maximization.
Proceedings of the Computer Vision - ECCV 2018, 2018

RCAA: Relational Context-Aware Agents for Person Search.
Proceedings of the Computer Vision - ECCV 2018, 2018

Camera Style Adaptation for Person Re-Identification.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Adversarial Complementary Learning for Weakly Supervised Object Localization.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Exploit the Unknown Gradually: One-Shot Video-Based Person Re-Identification by Stepwise Learning.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Supervision-by-Registration: An Unsupervised Approach to Improve the Precision of Facial Landmark Detectors.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Style Aggregated Network for Facial Landmark Detection.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Image-Image Domain Adaptation With Preserved Self-Similarity and Domain-Dissimilarity for Person Re-Identification.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Semi-Supervised Bayesian Attribute Learning for Person Re-Identification.
Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

Balanced Clustering via Exclusive Lasso: A Pragmatic Approach.
Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

2017
Semisupervised Feature Analysis by Mining Correlations Among Multiple Tasks.
IEEE Trans. Neural Networks Learn. Syst., 2017

The Many Shades of Negativity.
IEEE Trans. Multim., 2017

Bag-of-Discriminative-Words (BoDW) Representation via Topic Modeling.
IEEE Trans. Knowl. Data Eng., 2017

Data-Driven Answer Selection in Community QA Systems.
IEEE Trans. Knowl. Data Eng., 2017

Beyond Trace Ratio: Weighted Harmonic Mean of Trace Ratios for Multiclass Discriminant Analysis.
IEEE Trans. Knowl. Data Eng., 2017

Feature Interaction Augmented Sparse Learning for Fast Kinect Motion Detection.
IEEE Trans. Image Process., 2017

Bi-Level Semantic Representation Analysis for Multimedia Event Detection.
IEEE Trans. Cybern., 2017

Regularized Deep Belief Network for Image Attribute Detection.
IEEE Trans. Circuits Syst. Video Technol., 2017

Semantic Pooling for Complex Event Analysis in Untrimmed Videos.
IEEE Trans. Pattern Anal. Mach. Intell., 2017

Avoiding Optimal Mean ℓ<sub>2, 1</sub>-Norm Maximization-Based Robust PCA for Reconstruction.
Neural Comput., 2017

Special issue on cross-media big data analytics.
J. Vis. Commun. Image Represent., 2017

Logical query optimization for Cloudera Impala system.
J. Syst. Softw., 2017

Uncovering the Temporal Context for Video Question Answering.
Int. J. Comput. Vis., 2017

Beyond Part Models: Person Retrieval with Refined Part Pooling.
CoRR, 2017

Dual-Path Convolutional Image-Text Embedding.
CoRR, 2017

EraseReLU: A Simple Way to Ease the Training of Deep Convolution Neural Networks.
CoRR, 2017

UTS submission to Google YouTube-8M Challenge 2017.
CoRR, 2017

An End-to-End Approach to Natural Language Object Retrieval via Context-Aware Deep Reinforcement Learning.
CoRR, 2017

Simple to Complex Cross-modal Learning to Rank.
CoRR, 2017

A New Evaluation Protocol and Benchmarking Results for Extendable Cross-media Retrieval.
CoRR, 2017

Improving Person Re-identification by Attribute and Identity Learning.
CoRR, 2017

PatchShuffle Regularization.
CoRR, 2017

Unsupervised Person Re-identification: Clustering and Fine-tuning.
CoRR, 2017

Few-shot Object Detection.
CoRR, 2017

UTS CAI submission at TRECVID 2017 Video to Text Description Task.
Proceedings of the 2017 TREC Video Retrieval Evaluation, 2017

Early Active Learning with Pairwise Constraint for Person Re-identification.
Proceedings of the Machine Learning and Knowledge Discovery in Databases, 2017

FastShrinkage: Perceptually-aware Retargeting Toward Mobile Platforms.
Proceedings of the 2017 ACM on Multimedia Conference, 2017

A Dual-Network Progressive Approach to Weakly Supervised Object Detection.
Proceedings of the 2017 ACM on Multimedia Conference, 2017

Robust Top-<i>k</i> Multiclass SVM for Visual Category Recognition.
Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, August 13, 2017

Online compressed robust PCA.
Proceedings of the 2017 International Joint Conference on Neural Networks, 2017

Unlabeled Samples Generated by GAN Improve the Person Re-identification Baseline in Vitro.
Proceedings of the IEEE International Conference on Computer Vision, 2017

Recursive Spatial Transformer (ReST) for Alignment-Free Face Recognition.
Proceedings of the IEEE International Conference on Computer Vision, 2017

Learning Discriminative Latent Attributes for Zero-Shot Classification.
Proceedings of the IEEE International Conference on Computer Vision, 2017

Complex Event Detection by Identifying Reliable Shots from Untrimmed Videos.
Proceedings of the IEEE International Conference on Computer Vision, 2017

Bidirectional Multirate Reconstruction for Temporal Modeling in Videos.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

Person Re-identification in the Wild.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

Few-Shot Object Recognition from Machine-Labeled Web Images.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

More is Less: A More Complicated Network with Less Inference Complexity.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

A Framework of Online Learning with Imbalanced Streaming Data.
Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 2017

Probabilistic Non-Negative Matrix Factorization and Its Robust Extensions for Topic Modeling.
Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 2017

2016
Guest editorial: web multimedia semantic inference using multi-cues.
World Wide Web, 2016

Compound Rank-k Projections for Bilinear Analysis.
IEEE Trans. Neural Networks Learn. Syst., 2016

Image Classification by Cross-Media Active Learning With Privileged Information.
IEEE Trans. Multim., 2016

Convex Sparse PCA for Unsupervised Feature Learning.
ACM Trans. Knowl. Discov. Data, 2016

Weakly Supervised Human Fixations Prediction.
IEEE Trans. Cybern., 2016

Weakly Supervised Multilabel Clustering and its Applications in Computer Vision.
IEEE Trans. Cybern., 2016

Aspect Learning for Multimedia Summarization via Nonparametric Bayesian.
IEEE Trans. Circuits Syst. Video Technol., 2016

Unsupervised discriminative hashing.
J. Vis. Commun. Image Represent., 2016

Guest editorial: Adaptation methods for multimedia analysis.
Neurocomputing, 2016

Recognizing an Action Using Its Name: A Knowledge-Based Approach.
Int. J. Comput. Vis., 2016

Guest editors' introduction: Perception, Aesthetics, and Emotion in Multimedia Quality Modeling.
IEEE Multim., 2016

Personal health indexing based on medical examinations: A data mining approach.
Decis. Support Syst., 2016

A Discriminatively Learned CNN Embedding for Person Re-identification.
CoRR, 2016

Person Re-identification: Past, Present and Future.
CoRR, 2016

Strategies for Searching Video Content with Text Queries or Video Examples.
CoRR, 2016

Long-Term Identity-Aware Multi-Person Tracking for Surveillance Video Summarization.
CoRR, 2016

Exploiting Rich Contents for Personalized Video Recommendation.
CoRR, 2016

UTS-CMU-D2DCRC Submission at TRECVID 2016 Video Localization.
Proceedings of the 2016 TREC Video Retrieval Evaluation, 2016

WARD@TRECVID 2016.
Proceedings of the 2016 TREC Video Retrieval Evaluation, 2016

Informedia @ TRECVID 2016.
Proceedings of the 2016 TREC Video Retrieval Evaluation, 2016

Avoiding Optimal Mean Robust PCA/2DPCA with Non-greedy ℓ<sub>1</sub>-Norm Maximization.
Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, 2016

Hierarchical Recurrent Neural Encoder for Video Representation with Application to Captioning.
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

You Lead, We Exceed: Labor-Free Video Concept Learning by Jointly Exploiting Web Videos and Images.
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

They are Not Equally Reliable: Semantic Event Search Using Differentiated Concept Classifiers.
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

Using Detected Visual Objects to Index Video Database.
Proceedings of the Databases Theory and Applications, 2016

Robust Semi-Supervised Learning through Label Aggregation.
Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2016

Concepts Not Alone: Exploring Pairwise Relationships for Zero-Shot Video Activity Recognition.
Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2016

Dynamic Concept Composition for Zero-Example Event Detection.
Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2016

2015
Semisupervised Feature Selection via Spline Regression for Video Semantic Recognition.
IEEE Trans. Neural Networks Learn. Syst., 2015

Event Oriented Dictionary Learning for Complex Event Detection.
IEEE Trans. Image Process., 2015

Compact and Discriminative Descriptor Inference Using Multi-Cues.
IEEE Trans. Image Process., 2015

Multitask Spectral Clustering by Exploring Intertask Correlation.
IEEE Trans. Cybern., 2015

Weakly Semi-Supervised Deep Learning for Multi-Label Image Annotation.
IEEE Trans. Big Data, 2015

Guest Editorial: Ad Hoc Web Multimedia Analysis with Limited Supervision.
Multim. Tools Appl., 2015

Multi-Class Active Learning by Uncertainty Sampling with Diversity Maximization.
Int. J. Comput. Vis., 2015

Uncovering Temporal Context for Video Question and Answering.
CoRR, 2015

Group $K$-Means.
CoRR, 2015


Beyond Doctors: Future Health Prediction from Multimedia and Multimodal Observations.
Proceedings of the 23rd Annual ACM Conference on Multimedia Conference, MM '15, Brisbane, Australia, October 26, 2015

Fast and Accurate Content-based Semantic Search in 100M Internet Videos.
Proceedings of the 23rd Annual ACM Conference on Multimedia Conference, MM '15, Brisbane, Australia, October 26, 2015

Searching Persuasively: Joint Event Detection and Evidence Recounting with Limited Supervision.
Proceedings of the 23rd Annual ACM Conference on Multimedia Conference, MM '15, Brisbane, Australia, October 26, 2015

Fine-Grained Image Categorization by Localizing TinyObject Parts from Unannotated Images.
Proceedings of the 5th ACM on International Conference on Multimedia Retrieval, 2015

Content-Based Video Search over 1 Million Videos with 1 Core in 1 Second.
Proceedings of the 5th ACM on International Conference on Multimedia Retrieval, 2015

Scalable Maximum Margin Matrix Factorization by Active Riemannian Subspace Search.
Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015

Inferring Painting Style with Multi-Task Dictionary Learning.
Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015

Semantic Concept Discovery for Large-Scale Zero-Shot Event Detection.
Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015

Complex Event Detection using Semantic Saliency and Nearly-Isotonic SVM.
Proceedings of the 32nd International Conference on Machine Learning, 2015

A discriminative CNN video representation for event detection.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015

DevNet: A Deep Event Network for multimedia event detection and evidence recounting.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015

Complex Event Detection via Event Oriented Dictionary Learning.
Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015

Structured Embedding via Pairwise Relations and Long-Range Interactions in Knowledge Base.
Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015

Exploring Semantic Inter-Class Relationships (SIR) for Zero-Shot Action Recognition.
Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015

A Convex Formulation for Spectral Shrunk Clustering.
Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015

Monitoring and Coaching the Use of Home Medical Devices.
Proceedings of the Health Monitoring and Personalized Feedback using Multimedia Data, 2015

2014
Weakly Supervised Photo Cropping.
IEEE Trans. Multim., 2014

Sparse Multi-Modal Hashing.
IEEE Trans. Multim., 2014

Semi-Supervised Multiple Feature Analysis for Action Recognition.
IEEE Trans. Multim., 2014

Image Attribute Adaptation.
IEEE Trans. Multim., 2014

Augmenting Image Descriptions Using Structured Prediction Output.
IEEE Trans. Multim., 2014

On the Influence Propagation of Web Videos.
IEEE Trans. Knowl. Data Eng., 2014

Clustering-Guided Sparse Structural Learning for Unsupervised Feature Selection.
IEEE Trans. Knowl. Data Eng., 2014

A Probabilistic Associative Model for Segmenting Weakly Supervised Images.
IEEE Trans. Image Process., 2014

Robust Hashing With Local Models for Approximate Similarity Search.
IEEE Trans. Cybern., 2014

Knowledge Adaptation with PartiallyShared Features for Event DetectionUsing Few Exemplars.
IEEE Trans. Pattern Anal. Mach. Intell., 2014

E-LAMP: integration of innovative ideas for multimedia event detection.
Mach. Vis. Appl., 2014

Harnessing Lab Knowledge for Real-World Action Recognition.
Int. J. Comput. Vis., 2014

Large-Scale Geosocial Multimedia [Guest editorial].
IEEE Multim., 2014

Discriminative Orthogonal Nonnegative matrix factorization with flexibility for data representation.
Expert Syst. Appl., 2014

Special section on learning from multiple evidences for large scale multimedia analysis.
Comput. Vis. Image Underst., 2014

Guest Editorial: Special issue on large scale multimedia semantic indexing.
Comput. Vis. Image Underst., 2014

Semi-supervised Feature Analysis by Mining Correlations among Multiple Tasks.
CoRR, 2014

Compound Rank-k Projections for Bilinear Analysis.
CoRR, 2014

A Convex Sparse PCA for Feature Analysis.
CoRR, 2014

Balanced k-Means and Min-Cut Clustering.
CoRR, 2014


Discriminative coupled dictionary hashing for fast cross-media retrieval.
Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2014

Resource Constrained Multimedia Event Detection.
Proceedings of the MultiMedia Modeling - 20th Anniversary International Conference, 2014

Dynamic Background Learning through Deep Auto-encoder Networks.
Proceedings of the ACM International Conference on Multimedia, MM '14, Orlando, FL, USA, November 03, 2014

Multiple Features But Few Labels?: A Symbiotic Solution Exemplified for Video Analysis.
Proceedings of the ACM International Conference on Multimedia, MM '14, Orlando, FL, USA, November 03, 2014

Discriminative Cellets Discovery for Fine-Grained Image Categories Retrieval.
Proceedings of the International Conference on Multimedia Retrieval, 2014

Viral Video Style: A Closer Look at Viral Videos on YouTube.
Proceedings of the International Conference on Multimedia Retrieval, 2014

Interactive Surveillance Event Detection through Mid-level Discriminative Representation.
Proceedings of the International Conference on Multimedia Retrieval, 2014

Cross-media hashing with kernel regression.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2014

Cross-media relevance mining for evaluating text-based image search engine.
Proceedings of the 2013 IEEE International Conference on Multimedia and Expo Workshops, 2014

Overcoming Semantic Drift in Information Extraction.
Proceedings of the 17th International Conference on Extending Database Technology, 2014

Unsupervised Video Adaptation for Parsing Human Motion.
Proceedings of the Computer Vision - ECCV 2014, 2014

Event Detection Using Multi-level Relevance Labels and Multiple Features.
Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014

Decomposable Nonlocal Tensor Dictionary Learning for Multispectral Image Denoising.
Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014

A Convex Formulation for Semi-Supervised Multi-Label Feature Selection.
Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, 2014

2013
Effective transfer tagging from image to video.
ACM Trans. Multim. Comput. Commun. Appl., 2013

Multi-Feature Fusion via Hierarchical Regression for Multimedia Analysis.
IEEE Trans. Multim., 2013

Feature Selection for Multimedia Analysis by Sharing Information Among Multiple Tasks.
IEEE Trans. Multim., 2013

Effective Multiple Feature Hashing for Large-Scale Near-Duplicate Video Retrieval.
IEEE Trans. Multim., 2013

Multimedia Event Detection Using A Classifier-Specific Intermediate Representation.
IEEE Trans. Multim., 2013

Discriminative Nonnegative Spectral Clustering with Out-of-Sample Extension.
IEEE Trans. Knowl. Data Eng., 2013

Discovering Discriminative Graphlets for Aerial Image Categories Recognition.
IEEE Trans. Image Process., 2013

Infrared Patch-Image Model for Small Target Detection in a Single Image.
IEEE Trans. Image Process., 2013

Indexing of large-scale multimedia signals.
Signal Process., 2013

Local image tagging via graph regularized joint group sparsity.
Pattern Recognit., 2013

Retrieval-based cartoon gesture recognition and applications via semi-supervised heterogeneous classifiers learning.
Pattern Recognit., 2013

Unified Dictionary Learning and Region Tagging with Hierarchical Sparse Representation.
Comput. Vis. Image Underst., 2013


Inter-media hashing for large-scale retrieval from heterogeneous data sources.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2013

Fall detection in multi-camera surveillance videos: experimentations and observations.
Proceedings of the 1st ACM international workshop on Multimedia indexing and information retrieval for healthcare, 2013

We are not equally negative: fine-grained labeling for multimedia event detection.
Proceedings of the ACM Multimedia Conference, 2013

A cognitive assistive system for monitoring the use of home medical devices.
Proceedings of the 1st ACM international workshop on Multimedia indexing and information retrieval for healthcare, 2013

Thinking of Images as What They Are: Compound Matrix Regression for Image Classification.
Proceedings of the IJCAI 2013, 2013

Co-Regularized Ensemble for Feature Selection.
Proceedings of the IJCAI 2013, 2013

Robust Tensor Clustering with Non-Greedy Maximization.
Proceedings of the IJCAI 2013, 2013

Towards efficient search for activity trajectories.
Proceedings of the 29th IEEE International Conference on Data Engineering, 2013

How Related Exemplars Help Complex Event Detection in Web Videos?
Proceedings of the IEEE International Conference on Computer Vision, 2013

Feature Weighting via Optimal Thresholding for Video Analysis.
Proceedings of the IEEE International Conference on Computer Vision, 2013

Space-Time Robust Representation for Action Recognition.
Proceedings of the IEEE International Conference on Computer Vision, 2013

Harry Potter's Marauder's Map: Localizing and Tracking Multiple Persons-of-Interest by Nonnegative Discretization.
Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013

Complex Event Detection via Multi-source Video Attributes.
Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013

Supervised Nonnegative Tensor Factorization with Maximum-Margin Constraint.
Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence, 2013

2012
Interactive Video Indexing With Statistical Active Learning.
IEEE Trans. Multim., 2012

Discriminating Joint Feature Analysis for Multimedia Data Understanding.
IEEE Trans. Multim., 2012

Web Image Annotation Via Subspace-Sparsity Collaborated Feature Selection.
IEEE Trans. Multim., 2012

Web and Personal Image Annotation by Mining Label Correlation With Relaxed Visual Graph Embedding.
IEEE Trans. Image Process., 2012

Spline Regression Hashing for Fast Image Search.
IEEE Trans. Image Process., 2012

A Multimedia Retrieval Framework Based on Semi-Supervised Ranking and Relevance Feedback.
IEEE Trans. Pattern Anal. Mach. Intell., 2012

Active learning for social image retrieval using Locally Regressive Optimal Design.
Neurocomputing, 2012


Robust cross-media transfer for visual event detection.
Proceedings of the 20th ACM Multimedia Conference, MM '12, Nara, Japan, October 29, 2012

Knowledge adaptation for ad hoc multimedia event detection with few exemplars.
Proceedings of the 20th ACM Multimedia Conference, MM '12, Nara, Japan, October 29, 2012

Classifier-specific intermediate representation for multimedia tasks.
Proceedings of the International Conference on Multimedia Retrieval, 2012

Action recognition by exploring data distribution and feature correlation.
Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012

Weakly supervised sparse coding with geometric consistency pooling.
Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012

Learning to predict health status of geriatric patients from observational data.
Proceedings of the 2012 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, 2012

Unsupervised Feature Selection Using Nonnegative Spectral Analysis.
Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence, 2012

2011
Learning a 3D Human Pose Distance Metric from Geometric Pose Descriptor.
IEEE Trans. Vis. Comput. Graph., 2011

3D human pose recovery from image by efficient visual feature selection.
Comput. Vis. Image Underst., 2011

Transfer tagging from image to video.
Proceedings of the 19th International Conference on Multimedia 2011, Scottsdale, AZ, USA, November 28, 2011

Learning frame relevance for video classification.
Proceedings of the 19th International Conference on Multimedia 2011, Scottsdale, AZ, USA, November 28, 2011

Multiple feature hashing for real-time large scale near-duplicate video retrieval.
Proceedings of the 19th International Conference on Multimedia 2011, Scottsdale, AZ, USA, November 28, 2011

Exploiting the entire feature space with sparsity for automatic image annotation.
Proceedings of the 19th International Conference on Multimedia 2011, Scottsdale, AZ, USA, November 28, 2011

l<sub>2, 1</sub>-Norm Regularized Discriminative Feature Selection for Unsupervised Learning.
Proceedings of the IJCAI 2011, 2011

Tag localization with spatial correlations and joint group sparsity.
Proceedings of the 24th IEEE Conference on Computer Vision and Pattern Recognition, 2011

Nonnegative Spectral Clustering with Discriminative Regularization.
Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence, 2011

2010
Image Clustering Using Local Discriminant Models and Global Integration.
IEEE Trans. Image Process., 2010

Recognizing Cartoon Image Gestures for Retrieval and Interactive Cartoon Clip Synthesis.
IEEE Trans. Circuits Syst. Video Technol., 2010

Cross-media retrieval using query dependent search methods.
Pattern Recognit., 2010

Classification by semi-supervised discriminative regularization.
Neurocomputing, 2010

Combining location and feature information for multimedia retrieval.
Int. J. Comput. Appl. Technol., 2010

Skin Region Tracking Using Hybrid Color Model and Gradient Vector Flow.
Proceedings of the 2010 International Conference on Machine Vision and Human-machine Interface, 2010

Local and Global Regressive Mapping for Manifold Learning with Out-of-Sample Extrapolation.
Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence, 2010

2009
Retrieval based interactive cartoon synthesis via unsupervised bi-distance metric learning.
Proceedings of the 17th International Conference on Multimedia 2009, 2009

Ranking with local regression and global alignment for cross media retrieval.
Proceedings of the 17th International Conference on Multimedia 2009, 2009

2008
Mining Semantic Correlation of Heterogeneous Multimedia Data for Cross-Media Retrieval.
IEEE Trans. Multim., 2008

Harmonizing Hierarchical Manifolds for Multimedia Document Semantics Understanding and Cross-Media Retrieval.
IEEE Trans. Multim., 2008

Heterogeneous multimedia data semantics mining using content and location context.
Proceedings of the 16th International Conference on Multimedia 2008, 2008

2007
Manifold Learning Based Cross-media Retrieval: A Solution to Media Object Complementary Nature.
J. VLSI Signal Process., 2007

Boosting Cross-Media Retrieval by Learning with Positive and Negative Examples.
Proceedings of the Advances in Multimedia Modeling, 2007

2005
Understanding Multimedia Document Semantics for Cross-Media Retrieval.
Proceedings of the Advances in Multimedia Information Processing, 2005


  Loading...