Yu-Gang Jiang

Orcid: 0000-0002-1907-8567

According to our database1, Yu-Gang Jiang authored at least 385 papers between 2006 and 2025.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
A Survey on Video Diffusion Models.
ACM Comput. Surv., February, 2025

Dynamic Routing and Knowledge Re-Learning for Data-Free Black-Box Attack.
IEEE Trans. Pattern Anal. Mach. Intell., January, 2025

2024
MOSS: An Open Conversational Large Language Model.
Mach. Intell. Res., October, 2024

Adaptive Cross-Modal Transferable Adversarial Attacks From Images to Videos.
IEEE Trans. Pattern Anal. Mach. Intell., May, 2024

Imbalanced gradients: a subtle cause of overestimated adversarial robustness.
Mach. Learn., May, 2024

HCMS: Hierarchical and Conditional Modality Selection for Efficient Video Recognition.
ACM Trans. Multim. Comput. Commun. Appl., February, 2024

CDistNet: Perceiving Multi-domain Character Distance for Robust Text Recognition.
Int. J. Comput. Vis., February, 2024

Locate Before Answering: Answer Guided Question Localization for Video Question Answering.
IEEE Trans. Multim., 2024

Building an Open-Vocabulary Video CLIP Model With Better Architectures, Optimization and Data.
IEEE Trans. Pattern Anal. Mach. Intell., 2024

IDEATOR: Jailbreaking Large Vision-Language Models Using Themselves.
CoRR, 2024

BlueSuffix: Reinforced Blue Teaming for Vision-Language Models Against Jailbreak Attacks.
CoRR, 2024

Llama Scope: Extracting Millions of Features from Llama-3.1-8B with Sparse Autoencoders.
CoRR, 2024

Expose Before You Defend: Unifying and Enhancing Backdoor Defenses via Exposed Models.
CoRR, 2024

UnSeg: One Universal Unlearnable Example Generator is Enough against All Image Segmentation.
CoRR, 2024

Towards a Theoretical Understanding of Memorization in Diffusion Models.
CoRR, 2024

EAGLE: Towards Efficient Arbitrary Referring Visual Prompts Comprehension for Multimodal Large Language Models.
CoRR, 2024

EventHallusion: Diagnosing Event Hallucinations in Video LLMs.
CoRR, 2024

GenRec: Unifying Video Generation and Recognition with Diffusion Models.
CoRR, 2024

ReToMe-VA: Recursive Token Merging for Video Diffusion-based Unrestricted Adversarial Attack.
CoRR, 2024

EnJa: Ensemble Jailbreak on Large Language Models.
CoRR, 2024

Downstream Transfer Attack: Adversarial Attacks on Downstream Models with Pre-trained Vision Transformers.
CoRR, 2024

Out of Length Text Recognition with Sub-String Matching.
CoRR, 2024

Infinite Motion: Extended Motion Generation via Long Text Instructions.
CoRR, 2024

PECTP: Parameter-Efficient Cross-Task Prompts for Incremental Vision Transformer.
CoRR, 2024

MMLongBench-Doc: Benchmarking Long-context Document Understanding with Visualizations.
CoRR, 2024

A Survey of Multimodal-Guided Image Editing with Text-to-Image Diffusion Models.
CoRR, 2024

V3Det Challenge 2024 on Vast Vocabulary and Open Vocabulary Object Detection: Methods and Results.
CoRR, 2024

OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation.
CoRR, 2024

AutoTVG: A New Vision-language Pre-training Paradigm for Temporal Video Grounding.
CoRR, 2024

Hydra-MDP: End-to-end Multimodal Planning with Multi-target Hydra-Distillation.
CoRR, 2024

AID: Adapting Image2Video Diffusion Models for Instruction-guided Video Prediction.
CoRR, 2024

DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs.
CoRR, 2024

AgentGym: Evolving Large Language Model-based Agents across Diverse Environments.
CoRR, 2024

MotionFollower: Editing Video Motion via Lightweight Score-Guided Diffusion.
CoRR, 2024

Automating the Diagnosis of Human Vision Disorders by Cross-modal 3D Generation.
CoRR, 2024

Adaptive Rentention & Correction for Continual Learning.
CoRR, 2024

FedCAda: Adaptive Client-Side Optimization for Accelerated and Stable Federated Learning.
CoRR, 2024

PoseAnimate: Zero-shot high fidelity pose controllable character animation.
CoRR, 2024

Eyes Can Deceive: Benchmarking Counterfactual Reasoning Abilities of Multi-modal Large Language Models.
CoRR, 2024

The Dog Walking Theory: Rethinking Convergence in Federated Learning.
CoRR, 2024

Whose Side Are You On? Investigating the Political Stance of Large Language Models.
CoRR, 2024

FDGaussian: Fast Gaussian Splatting from Single Image via Geometric-aware Diffusion Model.
CoRR, 2024

From Canteen Food to Daily Meals: Generalizing Food Recognition to More Practical Scenarios.
CoRR, 2024

Lumen: Unleashing Versatile Vision-Centric Capabilities of Large Multimodal Models.
CoRR, 2024

Instruction-Guided Scene Text Recognition.
CoRR, 2024

MouSi: Poly-Visual-Expert Vision-Language Models.
CoRR, 2024

Multi-Trigger Backdoor Attacks: More Triggers, More Threats.
CoRR, 2024

Secrets of RLHF in Large Language Models Part II: Reward Modeling.
CoRR, 2024

Fake Alignment: Are LLMs Really Aligned Well?
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024

Decoder Pre-Training with only Text for Scene Text Recognition.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Identity-Driven Multimedia Forgery Detection via Reference Assistance.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Highly Transferable Diffusion-based Unrestricted Adversarial Attack on Pre-trained Vision-Language Models.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

White-box Multimodal Jailbreaks Against Large Vision-Language Models.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

FreeEnhance: Tuning-Free Image Enhancement via Content-Consistent Noising-and-Denoising Process.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Navigating Weight Prediction with Diet Diary.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

ModelLock: Locking Your Model With a Spell.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

ReToMe-VA: Recursive Token Merging for Video Diffusion-based Unrestricted Adversarial Attack.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

AdvQDet: Detecting Query-Based Adversarial Attacks with Adversarial Contrastive Prompt Tuning.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Zero-shot High-fidelity and Pose-controllable Character Animation.
Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024

MagDiff: Multi-alignment Diffusion for High-Fidelity Video Generation and Editing.
Proceedings of the Computer Vision - ECCV 2024, 2024

Adversarial Prompt Tuning for Vision-Language Models.
Proceedings of the Computer Vision - ECCV 2024, 2024

DreamMesh: Jointly Manipulating and Texturing Triangle Meshes for Text-to-3D Generation.
Proceedings of the Computer Vision - ECCV 2024, 2024

SegIC: Unleashing the Emergent Correspondence for In-Context Segmentation.
Proceedings of the Computer Vision - ECCV 2024, 2024

Unlocking Textual and Visual Wisdom: Open-Vocabulary 3D Object Detection Enhanced by Comprehensive Guidance from Text and Image.
Proceedings of the Computer Vision - ECCV 2024, 2024

Reliable and Efficient Concept Erasure of Text-to-Image Diffusion Models.
Proceedings of the Computer Vision - ECCV 2024, 2024

PromptFusion: Decoupling Stability and Plasticity for Continual Learning.
Proceedings of the Computer Vision - ECCV 2024, 2024

SimDA: Simple Diffusion Adapter for Efficient Video Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

OmniViD: A Generative Framework for Universal Video Understanding.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

MotionEditor: Editing Video Motion via Content-Aware Diffusion.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Doubly Abductive Counterfactual Inference for Text-Based Image Editing.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Learning to Rank Patches for Unbiased Image Redundancy Reduction.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

LRANet: Towards Accurate and Efficient Scene Text Detection with Low-Rank Approximation Network.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

NuScenes-QA: A Multi-Modal Visual Question Answering Benchmark for Autonomous Driving Scenario.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

Instance-Aware Multi-Camera 3D Object Detection with Structural Priors Mining and Self-Boosting Learning.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Knowledge driven weights estimation for large-scale few-shot image recognition.
Pattern Recognit., October, 2023

Cross-Domain Contrastive Learning for Unsupervised Domain Adaptation.
IEEE Trans. Multim., 2023

FT-TDR: Frequency-Guided Transformer and Top-Down Refinement Network for Blind Face Inpainting.
IEEE Trans. Multim., 2023

Scene Graph Refinement Network for Visual Question Answering.
IEEE Trans. Multim., 2023

Self-Supervised Learning for Semi-Supervised Temporal Language Grounding.
IEEE Trans. Multim., 2023

Dynamic Mixup for Multi-Label Long-Tailed Food Ingredient Recognition.
IEEE Trans. Multim., 2023

Towards Transferable Adversarial Attacks on Image and Video Transformers.
IEEE Trans. Image Process., 2023

FoodLMM: A Versatile Food Assistant using Large Multi-modal Model.
CoRR, 2023

VIDiff: Translating Videos via Multi-Modal Instructions with Diffusion Models.
CoRR, 2023

VideoAssembler: Identity-Consistent Video Generation with Reference Entities using Diffusion Model.
CoRR, 2023

AdaDiff: Adaptive Step Selection for Fast Diffusion.
CoRR, 2023

To See is to Believe: Prompting GPT-4V for Better Visual Instruction Tuning.
CoRR, 2023

Reuse and Diffuse: Iterative Denoising for Text-to-Video Generation.
CoRR, 2023

Context Perception Parallel Decoder for Scene Text Recognition.
CoRR, 2023

Prompting Large Language Models to Reformulate Queries for Moment Localization.
CoRR, 2023

NuScenes-QA: A Multi-modal Visual Question Answering Benchmark for Autonomous Driving Scenario.
CoRR, 2023

ChatVideo: A Tracklet-centric Multimodal and Versatile Video Understanding System.
CoRR, 2023

OmniTracker: Unifying Object Tracking by Tracking-with-Detection.
CoRR, 2023

DiffusionAD: Denoising Diffusion for Anomaly Detection.
CoRR, 2023

Meta Style Adversarial Training for Cross-Domain Few-Shot Learning.
CoRR, 2023

Transforming CLIP to an Open-vocabulary Video Model via Interpolated Weight Optimization.
CoRR, 2023

Learning from Rich Semantics and Coarse Locations for Long-tailed Object Detection.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Multi-Prompt Alignment for Multi-Source Unsupervised Domain Adaptation.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

On the Importance of Spatial Relations for Few-shot Action Recognition.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Generalizing Face Forgery Detection via Uncertainty Learning.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Relation Triplet Construction for Cross-modal Text-to-Video Retrieval.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Suspected Objects Matter: Rethinking Model's Prediction for One-stage Visual Grounding.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

GCMA: Generative Cross-Modal Transferable Adversarial Attacks from Images to Videos.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

TPS++: Attention-Enhanced Thin-Plate Spline for Scene Text Recognition.
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023

Open-VCLIP: Transforming CLIP to an Open-vocabulary Video Model via Interpolated Weight Optimization.
Proceedings of the International Conference on Machine Learning, 2023

Reconstructive Neuron Pruning for Backdoor Defense.
Proceedings of the International Conference on Machine Learning, 2023

Adaptive Split-Fusion Transformer.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2023

Downstream Task-agnostic Transferable Attacks on Language-Image Pre-training Models.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2023

MRN: Multiplexed Routing Network for Incremental Multilingual Text Recognition.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Implicit Temporal Modeling with Learnable Alignment for Video Recognition.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

SVFormer: Semi-supervised Video Transformer for Action Recognition.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Enhancing the Self-Universality for Transferable Targeted Attacks.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Look Before You Match: Instance Understanding Matters in Video Object Segmentation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

ResFormer: Scaling ViTs with Multi-Resolution Training.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Bi-directional Feature Fusion Generative Adversarial Network for Ultra-high Resolution Pathological Image Virtual Re-staining.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Detection Hub: Unifying Object Detection Datasets via Query Adaptation on Language Embedding.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

MSMDFusion: Fusing LiDAR and Camera at Multiple Scales with Multi-Depth Seeds for 3D Object Detection.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

StyleAdv: Meta Style Adversarial Training for Cross-Domain Few-Shot Learning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Prototypical Residual Networks for Anomaly Detection and Localization.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Unlearnable Clusters: Towards Label-Agnostic Unlearnable Examples.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

PolarFormer: Multi-Camera 3D Object Detection with Polar Transformer.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
SAM: Modeling Scene, Object and Action With Semantics Attention Modules for Video Recognition.
IEEE Trans. Multim., 2022

Spatial-Temporal Graphs for Cross-Modal Text2Video Retrieval.
IEEE Trans. Multim., 2022

Generalized Meta-FDMixup: Cross-Domain Few-Shot Learning Guided by Labeled Target Data.
IEEE Trans. Image Process., 2022

A Dynamic Frame Selection Framework for Fast Video Recognition.
IEEE Trans. Pattern Anal. Mach. Intell., 2022

Fighting Malicious Media Data: A Survey on Tampering Detection and Deepfake Detection.
CoRR, 2022

Transferability Estimation Based On Principal Gradient Expectation.
CoRR, 2022

Text-driven Video Prediction.
CoRR, 2022

Locate before Answering: Answer Guided Question Localization for Video Question Answering.
CoRR, 2022

Multi-Prompt Alignment for Multi-source Unsupervised Domain Adaptation.
CoRR, 2022

Incorporating Locality of Images to Generate Targeted Transferable Adversarial Examples.
CoRR, 2022

MSMDFusion: Fusing LiDAR and Camera at Multiple Scales with Multi-Depth Seeds for 3D Object Detection.
CoRR, 2022

Video Mobile-Former: Video Recognition with Efficient Global Spatial-temporal Modeling.
CoRR, 2022

PolarFormer: Multi-camera 3D Object Detection with Polar Transformers.
CoRR, 2022

Adaptive Split-Fusion Transformer.
CoRR, 2022

Deeper Insights into ViTs Robustness towards Common Corruptions.
CoRR, 2022

Video Moment Retrieval from Text Queries via Single Frame Annotation.
CoRR, 2022

Wave-SAN: Wavelet based Style Augmentation Network for Cross-Domain Few-Shot Learning.
CoRR, 2022

Suspected Object Matters: Rethinking Model's Prediction for One-stage Visual Grounding.
CoRR, 2022

Video Moment Retrieval from Text Queries via Single Frame Annotation.
Proceedings of the SIGIR '22: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, July 11, 2022

OmniVL: One Foundation Model for Image-Language and Video-Language Tasks.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

TGDM: Target Guided Dynamic Mixup for Cross-Domain Few-Shot Learning.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Mix-DANN and Dynamic-Modal-Distillation for Video Domain Adaptation.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

ME-D2N: Multi-Expert Domain Decompositional Network for Cross-Domain Few-Shot Learning.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Ingredient-enriched Recipe Generation from Cooking Videos.
Proceedings of the ICMR '22: International Conference on Multimedia Retrieval, Newark, NJ, USA, June 27, 2022

Adaptive Temporal Grouping for Black-box Adversarial Attacks on Videos.
Proceedings of the ICMR '22: International Conference on Multimedia Retrieval, Newark, NJ, USA, June 27, 2022

M2TR: Multi-modal Multi-scale Transformers for Deepfake Detection.
Proceedings of the ICMR '22: International Conference on Multimedia Retrieval, Newark, NJ, USA, June 27, 2022

SVTR: Scene Text Recognition with a Single Visual Model.
Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, 2022

Data-Free Network Debiasing for Long-Tailed Visual Recognition.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2022

Semi-supervised Single-View 3D Reconstruction via Prototype Shape Priors.
Proceedings of the Computer Vision - ECCV 2022, 2022

Semi-supervised Vision Transformers.
Proceedings of the Computer Vision - ECCV 2022, 2022

Efficient Video Transformers with Spatial-Temporal Token Selection.
Proceedings of the Computer Vision - ECCV 2022, 2022

MORE: Multi-Order RElation Mining for Dense Captioning in 3D Scenes.
Proceedings of the Computer Vision - ECCV 2022, 2022

Balanced Contrastive Learning for Long-Tailed Visual Recognition.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Cross-Modal Transferable Adversarial Attacks from Images to Videos.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

ObjectFormer for Image Manipulation Detection and Localization.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

BEVT: BERT Pretraining of Video Transformers.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

AdaViT: Adaptive Vision Transformers for Efficient Image Recognition.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Boosting the Transferability of Video Adversarial Examples via Temporal Translation.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

Towards Transferable Adversarial Attacks on Vision Transformers.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

Attacking Video Recognition Models with Bullet-Screen Comments.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021
Story-driven Video Editing.
IEEE Trans. Multim., 2021

Co-Attention Memory Network for Multimodal Microblog's Hashtag Recommendation.
IEEE Trans. Knowl. Data Eng., 2021

A Study of Multi-Task and Region-Wise Deep Learning for Food Ingredient Recognition.
IEEE Trans. Image Process., 2021

Predicting Content Similarity via Multimodal Modeling for Video-In-Video Advertising.
IEEE Trans. Circuits Syst. Video Technol., 2021

Pixel2Mesh: 3D Mesh Model Generation via Image Guided Deformation.
IEEE Trans. Pattern Anal. Mach. Intell., 2021

DB-LSTM: Densely-connected Bi-directional LSTM for human action recognition.
Neurocomputing, 2021

A Coarse-to-Fine Framework for Resource Efficient Video Recognition.
Int. J. Comput. Vis., 2021

Unified Multimodal Pre-training and Prompt-based Tuning for Vision-Language Understanding and Generation.
CoRR, 2021

Efficient Video Transformers with Spatial-Temporal Token Selection.
CoRR, 2021

M2TR: Multi-modal Multi-scale Transformers for Deepfake Detection.
CoRR, 2021

HMS: Hierarchical Modality Selection for Efficient Video Recognition.
CoRR, 2021

What Do Deep Nets Learn? Class-wise Patterns Revealed in the Input Space.
CoRR, 2021

A Multimodal Framework for Video Ads Understanding.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Visual Co-Occurrence Alignment Learning for Weakly-Supervised Video Moment Retrieval.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Two-stage Visual Cues Enhancement Network for Referring Image Segmentation.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Meta-FDMixup: Cross-Domain Few-Shot Learning Guided by Labeled Target Data.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Can Action be Imitated? Learn to Reconstruct and Transfer Human Dynamics from Videos.
Proceedings of the ICMR '21: International Conference on Multimedia Retrieval, 2021

Bag of Tricks for Building an Accurate and Slim Object Detector for Embedded Applications.
Proceedings of the ICMR '21: International Conference on Multimedia Retrieval, 2021

Revisiting Adversarial Robustness Distillation: Robust Soft Labels Make Student Better.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

VideoLT: Large-scale Long-tailed Video Recognition.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Motion Guided Region Message Passing for Video Captioning.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Towards Bridging Event Captioner and Sentence Localizer for Weakly Supervised Dense Event Captioning.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020
A Multi-Task Neural Approach for Emotion Attribution, Classification, and Summarization.
IEEE Trans. Multim., 2020

Re-Caption: Saliency-Enhanced Image Captioning Through Two-Phase Learning.
IEEE Trans. Image Process., 2020

Pose-Guided Person Image Synthesis in the Non-Iconic Views.
IEEE Trans. Image Process., 2020

Learning Layer-Skippable Inference Network.
IEEE Trans. Image Process., 2020

Deep Ranking for Image Zero-Shot Multi-Label Classification.
IEEE Trans. Image Process., 2020

Learning to Score Figure Skating Sport Videos.
IEEE Trans. Circuits Syst. Video Technol., 2020

Matching Image and Sentence With Multi-Faceted Representations.
IEEE Trans. Circuits Syst. Video Technol., 2020

Object Detection from Scratch with Deep Supervision.
IEEE Trans. Pattern Anal. Mach. Intell., 2020

Leader-Based Multi-Scale Attention Deep Architecture for Person Re-Identification.
IEEE Trans. Pattern Anal. Mach. Intell., 2020

Vocabulary-Informed Zero-Shot and Open-Set Learning.
IEEE Trans. Pattern Anal. Mach. Intell., 2020

Extreme vocabulary learning.
Frontiers Comput. Sci., 2020

Colonoscopy Polyp Detection: Domain Adaptation From Medical Report Images to Real-time Videos.
CoRR, 2020

Imbalanced Gradients: A New Cause of Overestimated Adversarial Robustness.
CoRR, 2020

Learning to Augment Expressions for Few-shot Fine-grained Facial Expression Recognition.
CoRR, 2020

Recurrent Memory Reasoning Network for Expert Finding in Community Question Answering.
Proceedings of the WSDM '20: The Thirteenth ACM International Conference on Web Search and Data Mining, 2020

Instance Image Retrieval with Generative Adversarial Training.
Proceedings of the MultiMedia Modeling - 26th International Conference, 2020

WildDeepfake: A Challenging Real-World Dataset for Deepfake Detection.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Video Relation Detection via Multiple Hypothesis Association.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Multi-modal Cooking Workflow Construction for Food Recipes.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Person-level Action Recognition in Complex Events via TSD-TSM Networks.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Depth Guided Adaptive Meta-Fusion Network for Few-shot Video Recognition.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Visual Relations Augmented Cross-modal Retrieval.
Proceedings of the 2020 on International Conference on Multimedia Retrieval, 2020

Learning Modality Interaction for Temporal Sentence Localization and Event Captioning in Videos.
Proceedings of the Computer Vision - ECCV 2020, 2020

Hierarchical Visual-Textual Graph for Temporal Activity Localization via Language.
Proceedings of the Computer Vision - ECCV 2020, 2020

Clean-Label Backdoor Attacks on Video Recognition Models.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

FM2u-Net: Face Morphological Multi-Branch Network for Makeup-Invariant Face Verification.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Hyperbolic Visual Embedding Learning for Zero-Shot Recognition.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Sketch-BERT: Learning Sketch Bidirectional Encoder Representation From Transformers by Self-Supervised Learning of Sketch Gestalt.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Long-Term Cloth-Changing Person Re-identification.
Proceedings of the Computer Vision - ACCV 2020 - 15th Asian Conference on Computer Vision, Kyoto, Japan, November 30, 2020

Heuristic Black-Box Adversarial Attacks on Video Recognition Models.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

Feature Deformation Meta-Networks in Image Captioning of Novel Objects.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019
Visual Content Recognition by Exploiting Semantic Feature Map with Attention and Multi-task Learning.
ACM Trans. Multim. Comput. Commun. Appl., 2019

Dense Dilated Network for Video Action Recognition.
IEEE Trans. Image Process., 2019

Multi-Level Semantic Feature Augmentation for One-Shot Learning.
IEEE Trans. Image Process., 2019

Social Anchor-Unit Graph Regularized Tensor Completion for Large-Scale Image Retagging.
IEEE Trans. Pattern Anal. Mach. Intell., 2019

Reformulating natural language queries using sequence-to-sequence models.
Sci. China Inf. Sci., 2019

FDU Participation in TRECVID 2019 VTT Task.
Proceedings of the 2019 TREC Video Retrieval Evaluation, 2019

Hot Topic-Aware Retweet Prediction with Masked Self-attentive Model.
Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2019

LiteEval: A Coarse-to-Fine Framework for Resource Efficient Video Recognition.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Comp-GAN: Compositional Generative Adversarial Network in Synthesizing and Recognizing Facial Expression.
Proceedings of the 27th ACM International Conference on Multimedia, 2019

TC-Net for iSBIR: Triplet Classification Network for Instance-level Sketch Based Image Retrieval.
Proceedings of the 27th ACM International Conference on Multimedia, 2019

Black-box Adversarial Attacks on Video Recognition Models.
Proceedings of the 27th ACM International Conference on Multimedia, 2019

Towards Optimal CNN Descriptors for Large-Scale Image Retrieval.
Proceedings of the 27th ACM International Conference on Multimedia, 2019

Embodied One-Shot Video Recognition: Learning from Actions of a Virtual Embodied Agent.
Proceedings of the 27th ACM International Conference on Multimedia, 2019

Sparse Temporal Causal Convolution for Efficient Action Modeling.
Proceedings of the 27th ACM International Conference on Multimedia, 2019

TC-GAN: Triangle Cycle-Consistent GANs for Face Frontalization with Facial Features Preserved.
Proceedings of the 27th ACM International Conference on Multimedia, 2019

Take Goods from Shelves: A Dataset for Class-Incremental Object Detection.
Proceedings of the 2019 on International Conference on Multimedia Retrieval, 2019

CNN-Based Chinese NER with Lexicon Rethinking.
Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019

Deep Learning for Video Captioning: A Review.
Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019

Smart Advertising in Videos Based on Comprehensive Content Analytics.
Proceedings of the IEEE International Conference on Multimedia & Expo Workshops, 2019

An End-to-End Architecture for Class-Incremental Object Detection with Knowledge Distillation.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2019

Composite Binary Decomposition Networks.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

Trainable Undersampling for Class-Imbalance Learning.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

Semantic Proposal for Activity Localization in Videos via Sentence Query.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

Motion Guided Spatial Attention for Video Captioning.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

Image Block Augmentation for One-Shot Learning.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018
DeepProduct: Mobile Product Search With Portable Deep Features.
ACM Trans. Multim. Comput. Commun. Appl., 2018

Editorial IEEE Transactions on Multimedia Special Section on Video Analytics: Challenges, Algorithms, and Applications.
IEEE Trans. Multim., 2018

Modeling Multimodal Clues in a Hybrid Deep Learning Framework for Video Classification.
IEEE Trans. Multim., 2018

NAIS: Neural Attentive Item Similarity Model for Recommendation.
IEEE Trans. Knowl. Data Eng., 2018

Hookworm Detection in Wireless Capsule Endoscopy Images With Deep Learning.
IEEE Trans. Image Process., 2018

Image Classification With Tailored Fine-Grained Dictionaries.
IEEE Trans. Circuits Syst. Video Technol., 2018

Heterogeneous Knowledge Transfer in Video Emotion Recognition, Attribution and Summarization.
IEEE Trans. Affect. Comput., 2018

Recent Advances in Zero-Shot Recognition: Toward Data-Efficient Understanding of Visual Content.
IEEE Signal Process. Mag., 2018

Exploiting Feature and Class Relationships in Video Categorization with Regularized Deep Neural Networks.
IEEE Trans. Pattern Anal. Mach. Intell., 2018

Stacked multichannel autoencoder - an efficient way of learning from synthetic data.
Multim. Tools Appl., 2018

Learning part-based mid-level representation for visual recognition.
Neurocomputing, 2018

Learning to Separate Domains in Generalized Zero-Shot and Open Set Learning: a probabilistic perspective.
CoRR, 2018

Semantic Feature Augmentation in Few-shot Learning.
CoRR, 2018

Learning to score and summarize figure skating sport videos.
CoRR, 2018

Dense Dilated Network for Few Shot Action Recognition.
Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval, 2018

Harnessing Synthesized Abstraction Images to Improve Facial Attribute Recognition.
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, 2018

Pixel2Mesh: Generating 3D Mesh Models from Single RGB Images.
Proceedings of the Computer Vision - ECCV 2018, 2018

Non-local NetVLAD Encoding for Video Classification.
Proceedings of the Computer Vision - ECCV 2018 Workshops, 2018

Pose-Normalized Image Generation for Person Re-identification.
Proceedings of the Computer Vision - ECCV 2018, 2018

Unsupervised Image-to-Image Translation with Stacked Cycle-Consistent Adversarial Networks.
Proceedings of the Computer Vision - ECCV 2018, 2018

Recurrent Fusion Network for Image Captioning.
Proceedings of the Computer Vision - ECCV 2018, 2018

Dual Skipping Networks.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Generating Keyword Queries for Natural Language Queries to Alleviate Lexical Chasm Problem.
Proceedings of the 27th ACM International Conference on Information and Knowledge Management, 2018

Cross-Domain Sentiment Classification with Target Domain Specific Information.
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018

Deep learning for video classification and captioning.
Proceedings of the Frontiers of Multimedia Research, 2018

2017
The THUMOS challenge on action recognition for videos "in the wild".
Comput. Vis. Image Underst., 2017

Left-Right Skip-DenseNets for Coarse-to-Fine Object Categorization.
CoRR, 2017

Recent Advances in Zero-shot Recognition.
CoRR, 2017

Aggregating Frame-level Features for Large-Scale Video Classification.
CoRR, 2017

Learning Semantic Feature Map for Visual Content Recognition.
Proceedings of the 2017 ACM on Multimedia Conference, 2017

Learning to Generate and Edit Hairstyles.
Proceedings of the 2017 ACM on Multimedia Conference, 2017

LSVC2017: Large-Scale Video Classification Challenge.
Proceedings of the 2017 ACM on Multimedia Conference, 2017

VSCC'2017: Visual Analysis for Smart and Connected Communities.
Proceedings of the 2017 ACM on Multimedia Conference, 2017

Sketch Recognition with Deep Visual-Sequential Fusion Model.
Proceedings of the 2017 ACM on Multimedia Conference, 2017

Adaptively Weighted Multi-task Deep Network for Person Attribute Classification.
Proceedings of the 2017 ACM on Multimedia Conference, 2017

Learning Fashion Compatibility with Bidirectional LSTMs.
Proceedings of the 2017 ACM on Multimedia Conference, 2017

Multi-task Deep Neural Network for Joint Face Recognition and Facial Attribute Prediction.
Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval, 2017

Frame-Transformer Emotion Classification Network.
Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval, 2017

Iterative object and part transfer for fine-grained recognition.
Proceedings of the 2017 IEEE International Conference on Multimedia and Expo, 2017

DSOD: Learning Deeply Supervised Object Detectors from Scratch.
Proceedings of the IEEE International Conference on Computer Vision, 2017

Multi-scale Deep Learning Architectures for Person Re-identification.
Proceedings of the IEEE International Conference on Computer Vision, 2017

Weakly Supervised Dense Video Captioning.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

Adaptive Proximal Average Approximation for Composite Convex Minimization.
Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 2017

2016
Hierarchical Visualization of Video Search Results for Topic-Based Browsing.
IEEE Trans. Multim., 2016

Partial Copy Detection in Videos: A Benchmark and an Evaluation of Popular Methods.
IEEE Trans. Big Data, 2016

Flexible multi-task learning with latent task grouping.
Neurocomputing, 2016

Multiple task learning with flexible structure regularization.
Neurocomputing, 2016

A Bayesian Hashing approach and its application to face recognition.
Neurocomputing, 2016

Web video categorization using category-predictive classifiers and category-specific concept classifiers.
Neurocomputing, 2016

Fast Summarization of User-Generated Videos: Exploiting Semantic, Emotional, and Quality Clues.
IEEE Multim., 2016

Deep Learning for Video Classification and Captioning.
CoRR, 2016

NTTFudan Team @ TRECVID 2016: Multimedia Event Detection.
Proceedings of the 2016 TREC Video Retrieval Evaluation, 2016

Multi-Stream Multi-Class Fusion of Deep Networks for Video Classification.
Proceedings of the 2016 ACM Conference on Multimedia Conference, 2016

Exploiting Objects with LSTMs for Video Categorization.
Proceedings of the 2016 ACM Conference on Multimedia Conference, 2016

Binary Optimized Hashing.
Proceedings of the 2016 ACM Conference on Multimedia Conference, 2016

Emotion in Context: Deep Semantic Feature Fusion for Video Emotion Recognition.
Proceedings of the 2016 ACM Conference on Multimedia Conference, 2016

Video Emotion Recognition with Transferred Deep Feature Encodings.
Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval, 2016

Matching User Photos to Online Products with Robust Deep Features.
Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval, 2016

BigVid at MediaEval 2016: Predicting Interestingness in Images and Videos.
Proceedings of the Working Notes Proceedings of the MediaEval 2016 Workshop, 2016

On Stochastic Primal-Dual Hybrid Gradient Approach for Compositely Regularized Minimization.
Proceedings of the ECAI 2016 - 22nd European Conference on Artificial Intelligence, 29 August-2 September 2016, The Hague, The Netherlands, 2016

Harnessing Object and Scene Semantics for Large-Scale Video Understanding.
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

Regional Gating Neural Networks for Multi-label Image Classification.
Proceedings of the British Machine Vision Conference 2016, 2016

2015
Super Fast Event Recognition in Internet Videos.
IEEE Trans. Multim., 2015

Human Action Recognition in Unconstrained Videos by Explicit Motion Modeling.
IEEE Trans. Image Process., 2015

CHCF: A Cloud-Based Heterogeneous Computing Framework for Large-Scale Image Retrieval.
IEEE Trans. Circuits Syst. Video Technol., 2015

GPU-based MapReduce for large-scale near-duplicate video retrieval.
Multim. Tools Appl., 2015

A relative similarity based method for interactive patient risk prediction.
Data Min. Knowl. Discov., 2015

Fusing Multi-Stream Deep Networks for Video Classification.
CoRR, 2015

Fudan at TRECVID 2015: Adaptive Feature Fusion for Multimedia Event Detection in Videos.
Proceedings of the 2015 TREC Video Retrieval Evaluation, 2015

NTT-Fudan Team @ TRECVID 2015: Multimedia Event Detection.
Proceedings of the 2015 TREC Video Retrieval Evaluation, 2015

Modeling Spatial-Temporal Clues in a Hybrid Deep Learning Framework for Video Classification.
Proceedings of the 23rd Annual ACM Conference on Multimedia Conference, MM '15, Brisbane, Australia, October 26, 2015

ASM'15: The 1st International Workshop on Affect and Sentiment in Multimedia.
Proceedings of the 23rd Annual ACM Conference on Multimedia Conference, MM '15, Brisbane, Australia, October 26, 2015

Evaluating Two-Stream CNN for Video Classification.
Proceedings of the 5th ACM on International Conference on Multimedia Retrieval, 2015

Fudan-Huawei at MediaEval 2015: Detecting Violent Scenes and Affective Impact in Movies with Deep Learning.
Proceedings of the Working Notes Proceedings of the MediaEval 2015 Workshop, 2015

Portfolio Choices with Orthogonal Bandit Learning.
Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015

Optimal Bayesian Hashing for Efficient Face Recognition.
Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015

VSD2014: A dataset for violent scenes detection in hollywood movies and web videos.
Proceedings of the 13th International Workshop on Content-Based Multimedia Indexing, 2015

Categorizing Big Video Data on the Web: Challenges and Opportunities.
Proceedings of the 2015 IEEE International Conference on Multimedia Big Data, BigMM 2015, 2015

2014
Placing Videos on a Semantic Hierarchy for Search Result Navigation.
ACM Trans. Multim. Comput. Commun. Appl., 2014

Video Event Detection Using Motion Relativity and Feature Selection.
IEEE Trans. Multim., 2014

Guest Editorial Special Section on Socio-Mobile Media Analysis and Retrieval.
IEEE Trans. Multim., 2014

Learning Multiple Relative Attributes With Humans in the Loop.
IEEE Trans. Image Process., 2014

Special issue on Multimedia Event Detection.
Mach. Vis. Appl., 2014

Discovering joint audio-visual codewords for video event detection.
Mach. Vis. Appl., 2014

Name-Face Association in Web Videos: A Large-Scale Dataset, Baselines, and Open Issues.
J. Comput. Sci. Technol., 2014

A Framework of Video Coding for Compressing Near-Duplicate Videos.
Proceedings of the MultiMedia Modeling - 20th Anniversary International Conference, 2014

Exploring Inter-feature and Inter-class Relationships with Deep Neural Networks for Video Classification.
Proceedings of the ACM International Conference on Multimedia, MM '14, Orlando, FL, USA, November 03, 2014

Organizing Video Search Results to Adapted Semantic Hierarchies for Topic-based Browsing.
Proceedings of the ACM International Conference on Multimedia, MM '14, Orlando, FL, USA, November 03, 2014

Real-time summarization of user-generated videos based on semantic recognition.
Proceedings of the ACM International Conference on Multimedia, MM '14, Orlando, FL, USA, November 03, 2014

The MediaEval 2014 Affect Task: Violent Scenes Detection.
Proceedings of the Working Notes Proceedings of the MediaEval 2014 Workshop, 2014

Fudan-NJUST at MediaEval 2014: Violent Scenes Detection Using Deep Neural Networks.
Proceedings of the Working Notes Proceedings of the MediaEval 2014 Workshop, 2014

Challenge Huawei challenge: Fusing multimodal features with deep neural networks for Mobile Video Annotation.
Proceedings of the 2013 IEEE International Conference on Multimedia and Expo Workshops, 2014

News Credibility Evaluation on Microblog with a Hierarchical Propagation Model.
Proceedings of the 2014 IEEE International Conference on Data Mining, 2014

Which Looks Like Which: Exploring Inter-class Relationships in Fine-Grained Visual Categorization.
Proceedings of the Computer Vision - ECCV 2014, 2014

VCDB: A Large-Scale Database for Partial Copy Detection in Videos.
Proceedings of the Computer Vision - ECCV 2014, 2014

Benchmarking Violent Scenes Detection in movies.
Proceedings of the 12th International Workshop on Content-Based Multimedia Indexing, 2014

Predicting Emotions in User-Generated Videos.
Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, 2014

2013
Query-Adaptive Image Search With Hash Codes.
IEEE Trans. Multim., 2013

High-level event recognition in unconstrained videos.
Int. J. Multim. Inf. Retr., 2013

Strong geometrical consistency in large scale partial-duplicate image search.
Proceedings of the ACM Multimedia Conference, 2013

Beauty is here: evaluating aesthetics in videos using multimodal features and free training data.
Proceedings of the ACM Multimedia Conference, 2013

The MediaEval 2013 Affect Task: Violent Scenes Detection.
Proceedings of the MediaEval 2013 Multimedia Benchmark Workshop, 2013

Fudan at MediaEval 2013: Violent Scenes Detection Using Motion Features and Part-Level Attributes.
Proceedings of the MediaEval 2013 Multimedia Benchmark Workshop, 2013

Multiple Task Learning Using Iteratively Reweighted Least Square.
Proceedings of the IJCAI 2013, 2013

Learning Hash Codes with Listwise Supervision.
Proceedings of the IEEE International Conference on Computer Vision, 2013

Understanding and Predicting Interestingness of Videos.
Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence, 2013

2012
Sampling and Ontologically Pooling Web Images for Visual Concept Learning.
IEEE Trans. Multim., 2012

Fast Semantic Diffusion for Large-Scale Context-Based Image and Video Annotation.
IEEE Trans. Image Process., 2012

A fast video event recognition system and its application to video search.
Proceedings of the 20th ACM Multimedia Conference, MM '12, Nara, Japan, October 29, 2012

Joint audio-visual bi-modal codewords for video event detection.
Proceedings of the International Conference on Multimedia Retrieval, 2012

SUPER: towards real-time event recognition in internet videos.
Proceedings of the International Conference on Multimedia Retrieval, 2012

The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Trajectory-based Features.
Proceedings of the Working Notes Proceedings of the MediaEval 2012 Workshop, 2012

Learning Hybrid Part Filters for Scene Recognition.
Proceedings of the Computer Vision - ECCV 2012, 2012

Trajectory-Based Modeling of Human Actions with Motion Reference Points.
Proceedings of the Computer Vision - ECCV 2012, 2012

Supervised hashing with kernels.
Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012

2011
Concept-Driven Multi-Modality Fusion for Video Search.
IEEE Trans. Circuits Syst. Video Technol., 2011

Modeling Scene and Object Contexts for Human Action Retrieval With Few Examples.
IEEE Trans. Circuits Syst. Video Technol., 2011

The MediaMill TRECVID 2011 Semantic Video Search Engine.
Proceedings of the 2011 TREC Video Retrieval Evaluation, 2011

On the pooling of positive examples with ontology for visual concept learning.
Proceedings of the 19th International Conference on Multimedia 2011, Scottsdale, AZ, USA, November 28, 2011

Towards textually describing complex video contents with audio-visual concept classifiers.
Proceedings of the 19th International Conference on Multimedia 2011, Scottsdale, AZ, USA, November 28, 2011

Consumer video understanding: a benchmark database and an evaluation of human and machine performance.
Proceedings of the 1st International Conference on Multimedia Retrieval, 2011

Lost in binarization: query-adaptive ranking for similar image search with compact codes.
Proceedings of the 1st International Conference on Multimedia Retrieval, 2011

Noise resistant graph ranking for improved web image search.
Proceedings of the 24th IEEE Conference on Computer Vision and Pattern Recognition, 2011

2010
Representations of Keypoint-Based Semantic Concept Detection: A Comprehensive Study.
IEEE Trans. Multim., 2010

Columbia-UCF TRECVID2010 Multimedia Event Detection: Combining Multiple Modalities, Contextual Concepts, and Temporal Matching.
Proceedings of the TRECVID 2010 workshop participants notebook papers, 2010

On the sampling of web images for learning visual concept classifiers.
Proceedings of the 9th ACM International Conference on Image and Video Retrieval, 2010

2009
Visual word proximity and linguistics for semantic video indexing and near-duplicate retrieval.
Comput. Vis. Image Underst., 2009

VIREO/DVMM at TRECVID 2009: High-Level Feature Extraction, Automatic Video Search, and Content-Based Copy Detection.
Proceedings of the TRECVID 2009 workshop participants notebook papers, 2009

Brain state decoding for rapid image retrieval.
Proceedings of the 17th International Conference on Multimedia 2009, 2009

Semantic context transfer across heterogeneous sources for domain adaptive video search.
Proceedings of the 17th International Conference on Multimedia 2009, 2009

Domain adaptive semantic diffusion for large scale context-based video annotation.
Proceedings of the IEEE 12th International Conference on Computer Vision, ICCV 2009, Kyoto, Japan, September 27, 2009

Label diagnosis through self tuning forweb image search.
Proceedings of the 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009), 2009

Exploring inter-concept relationship with context space for semantic video indexing.
Proceedings of the 8th ACM International Conference on Image and Video Retrieval, 2009

2008
Selection of Concept Detectors for Video Search by Ontology-Enriched Semantic Spaces.
IEEE Trans. Multim., 2008

Beyond Semantic Search: What You Observe May Not Be What You Think.
Proceedings of the TRECVID 2008 workshop participants notebook papers, 2008

Columbia University/VIREO-CityU/IRIT TRECVID2008 High-Level Feature Extraction and Interactive Video Search.
Proceedings of the TRECVID 2008 workshop participants notebook papers, 2008

Bag-of-visual-words expansion using visual relatedness for video indexing.
Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2008

Video event detection using motion relativity and visual relatedness.
Proceedings of the 16th International Conference on Multimedia 2008, 2008

Ontology-based visual word matching for near-duplicate retrieval.
Proceedings of the 2008 IEEE International Conference on Multimedia and Expo, 2008

2007
Experimenting VIREO-374: Bag-of-Visual-Words and Visual-Based Ontology for Semantic Video Indexing and search.
Proceedings of the TRECVID 2007 workshop participants notebook papers, 2007

Evaluating bag-of-visual-words representations in scene classification.
Proceedings of the 9th ACM SIGMM International Workshop on Multimedia Information Retrieval, 2007

Towards optimal bag-of-features for object categorization and semantic video retrieval.
Proceedings of the 6th ACM International Conference on Image and Video Retrieval, 2007

2006
Modeling Local Interest Points for Semantic Detection and Video Search at TRECVID 2006.
Proceedings of the 2006 TREC Video Retrieval Evaluation, 2006

Fast tracking of near-duplicate keyframes in broadcast domain with transitivity propagation.
Proceedings of the 14th ACM International Conference on Multimedia, 2006

Keyframe Retrieval by Keypoints: Can Point-to-Point Matching Help?.
Proceedings of the Image and Video Retrieval, 5th International Conference, 2006


  Loading...