Xiaoshuai Sun

Orcid: 0000-0003-3912-9306

According to our database1, Xiaoshuai Sun authored at least 219 papers between 2008 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
M3ixup: A multi-modal data augmentation approach for image captioning.
Pattern Recognit., 2025

2024
Towards Language-Guided Visual Recognition via Dynamic Convolutions.
Int. J. Comput. Vis., January, 2024

A Survivor in the Era of Large-Scale Pretraining: An Empirical Study of One-Stage Referring Expression Comprehension.
IEEE Trans. Multim., 2024

γ-MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models.
CoRR, 2024

DiffusionFake: Enhancing Generalization in Deepfake Detection via Guided Stable Diffusion.
CoRR, 2024

I2EBench: A Comprehensive Benchmark for Instruction-based Image Editing.
CoRR, 2024

TraDiffusion: Trajectory-Based Training-Free Image Generation.
CoRR, 2024

ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models.
CoRR, 2024

INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model.
CoRR, 2024

Routing Experts: Learning to Route Dynamic Experts in Multi-modal Large Language Models.
CoRR, 2024

Exploring Phrase-Level Grounding with Text-to-Image Diffusion Model.
CoRR, 2024

Evaluating and Analyzing Relationship Hallucinations in LVLMs.
CoRR, 2024

Image Captioning via Dynamic Path Customization.
CoRR, 2024

DiffusionFace: Towards a Comprehensive Dataset for Diffusion-Based Face Forgery Analysis.
CoRR, 2024

Not All Attention is Needed: Parameter and Computation Efficient Transfer Learning for Multi-modal Large Language Models.
CoRR, 2024

Feast Your Eyes: Mixture-of-Resolution Adaptation for Multimodal Large Language Models.
CoRR, 2024

StealthDiffusion: Towards Evading Diffusion Forensic Detection through Diffusion Model.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

3D-GRES: Generalized 3D Referring Expression Segmentation.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Deep Instruction Tuning for Segment Anything Model.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

QueryMatch: A Query-based Contrastive Learning Framework for Weakly Supervised Visual Grounding.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

SAM as the Guide: Mastering Pseudo-Label Refinement in Semi-Supervised Referring Expression Segmentation.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Evaluating and Analyzing Relationship Hallucinations in Large Vision-Language Models.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

X-Oscar: A Progressive Framework for High-quality Text-guided 3D Animatable Avatar Generation.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Fast Text-to-3D-Aware Face Generation and Manipulation via Direct Cross-modal Mapping and Geometric Regularization.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Towards Omni-supervised Referring Expression Segmentation.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2024

AnyTrans: Translate AnyText in the Image with Large Scale Models.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

Multi-branch Collaborative Learning Network for 3D Visual Grounding.
Proceedings of the Computer Vision - ECCV 2024, 2024

Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Towards Efficient Diffusion-Based Image Editing with Instant Attention Masks.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

3D-STMN: Dependency-Driven Superpoint-Text Matching Network for End-to-End 3D Referring Expression Segmentation.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

Toward Open-Set Human Object Interaction Detection.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

X-RefSeg3D: Enhancing Referring 3D Instance Segmentation via Structured Cross-Modal Graph Neural Networks.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

Improving Panoptic Narrative Grounding by Harnessing Semantic Relationships and Visual Confirmation.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Towards local visual modeling for image captioning.
Pattern Recognit., June, 2023

A Real-Time Global Inference Network for One-Stage Referring Expression Comprehension.
IEEE Trans. Neural Networks Learn. Syst., 2023

Fast Monocular Depth Estimation via Side Prediction Aggregation with Continuous Spatial Refinement.
IEEE Trans. Multim., 2023

Knowing What it is: Semantic-Enhanced Dual Attention Transformer.
IEEE Trans. Multim., 2023

Multi-Branch Distance-Sensitive Self-Attention Network for Image Captioning.
IEEE Trans. Multim., 2023

X-Dreamer: Creating High-quality 3D Content by Bridging the Domain Gap Between Text-to-2D and Text-to-3D Generation.
CoRR, 2023

NICE: Improving Panoptic Narrative Detection and Segmentation with Cascading Collaborative Learning.
CoRR, 2023

JM3D & JM3D-LLM: Elevating 3D Representation with Joint Multi-modal Cues.
CoRR, 2023

Continual Face Forgery Detection via Historical Distribution Preserving.
CoRR, 2023

Towards General Visual-Linguistic Face Forgery Detection.
CoRR, 2023

Systematic Investigation of Sparse Perturbed Sharpness-Aware Minimization Optimizer.
CoRR, 2023

Adapting Pre-trained Language Models to Vision-Language Tasks via Dynamic Visual Prompting.
CoRR, 2023

Towards End-to-end Semi-supervised Learning for One-stage Object Detection.
CoRR, 2023

Towards Efficient Visual Adaption via Structural Re-parameterization.
CoRR, 2023

HSM-QA: Question Answering System Based on Hierarchical Semantic Matching.
IEEE Access, 2023

Parameter and Computation Efficient Transfer Learning for Vision-Language Pre-trained Models.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Semi-Supervised Panoptic Narrative Grounding.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Beyond First Impressions: Integrating Joint Multi-modal Cues for Comprehensive 3D Representation.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Beat: Bi-directional One-to-Many Embedding Alignment for Text-based Person Retrieval.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

PixelFace+: Towards Controllable Face Generation and Manipulation with Text Descriptions and Segmentation Masks.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

X-Mesh: Towards Fast and Accurate Text-driven 3D Stylization via Dynamic Textual Guidance.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

RefTeacher: A Strong Baseline for Semi-Supervised Referring Expression Comprehension.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

RefCLIP: A Universal Teacher for Weakly Supervised Referring Expression Comprehension.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Clover: Towards A Unified Video-Language Alignment and Fusion Model.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

End-to-End Zero-Shot HOI Detection via Vision and Language Knowledge Distillation.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

Towards Real-Time Panoptic Narrative Grounding by an End-to-End Grounding Network.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
Knowledge-Driven Generative Adversarial Network for Text-to-Image Synthesis.
IEEE Trans. Multim., 2022

Towards Lightweight Transformer Via Group-Wise Transformation for Vision-and-Language Tasks.
IEEE Trans. Image Process., 2022

Knowing What to Learn: A Metric-Oriented Focal Mechanism for Image Captioning.
IEEE Trans. Image Process., 2022

Plenty is Plague: Fine-Grained Learning for Visual Question Answering.
IEEE Trans. Pattern Anal. Mach. Intell., 2022

Fast Class-Wise Updating for Online Hashing.
IEEE Trans. Pattern Anal. Mach. Intell., 2022

Modeling long-term video semantic distribution for temporal action proposal generation.
Neurocomputing, 2022

Clover: Towards A Unified Video-Language Alignment and Fusion Model.
CoRR, 2022

What Goes beyond Multi-modal Fusion in One-stage Referring Expression Comprehension: An Empirical Study.
CoRR, 2022

End-to-End Zero-Shot HOI Detection via Vision and Language Knowledge Distillation.
CoRR, 2022

PixelFolder: An Efficient Progressive Pixel Synthesis Network for Image Generation.
CoRR, 2022

Global2Local: A Joint-Hierarchical Attention for Video Captioning.
CoRR, 2022

Differentiated Relevances Embedding for Group-based Referring Expression Comprehension.
CoRR, 2022

Make Sharpness-Aware Minimization Stronger: A Sparsified Perturbation Approach.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Towards Open-Ended Text-to-Face Generation, Combination and Manipulation.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Learning Dynamic Prior Knowledge for Text-to-Face Pixel Synthesis.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

X-CLIP: End-to-End Multi-grained Contrastive Learning for Video-Text Retrieval.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

SeqTR: A Simple Yet Universal Network for Visual Grounding.
Proceedings of the Computer Vision - ECCV 2022, 2022

An Information Theoretic Approach for Attention-Driven Face Forgery Detection.
Proceedings of the Computer Vision - ECCV 2022, 2022

PixelFolder: An Efficient Progressive Pixel Synthesis Network for Image Generation.
Proceedings of the Computer Vision - ECCV 2022, 2022

DIFNet: Boosting Visual Information Flow for Image Captioning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Active Teacher for Semi-Supervised Object Detection.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021
Deep Semantic Parsing of Freehand Sketches With Homogeneous Transformation, Soft-Weighted Loss, and Staged Learning.
IEEE Trans. Multim., 2021

Evolving Fully Automated Machine Learning via Life-Long Knowledge Anchors.
IEEE Trans. Pattern Anal. Mach. Intell., 2021

Sketch-specific data augmentation for freehand sketch recognition.
Neurocomputing, 2021

Towards Language-guided Visual Recognition via Dynamic Convolutions.
CoRR, 2021

TRAR: Routing the Attention Spans in Transformer for Visual Question Answering.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

RSTNet: Captioning With Adaptive Attention on Visual and Non-Visual Words.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Dual-level Collaborative Transformer for Image Captioning.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

Improving Image Captioning by Leveraging Intra- and Inter-layer Global Representation in Transformer Network.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020
Similarity-Preserving Linkage Hashing for Online Image Retrieval.
IEEE Trans. Image Process., 2020

Deep Saliency Hashing for Fine-Grained Retrieval.
IEEE Trans. Image Process., 2020

TVENet: Temporal variance embedding network for fine-grained action representation.
Pattern Recognit., 2020

Semi-Supervised Adversarial Monocular Depth Estimation.
IEEE Trans. Pattern Anal. Mach. Intell., 2020

What is damaged: a benchmark dataset for abnormal traffic object classification.
Multim. Tools Appl., 2020

Actionness-pooled Deep-convolutional Descriptor for fine-grained action recognition.
Neurocomputing, 2020

Hadamard Matrix Guided Online Hashing.
Int. J. Comput. Vis., 2020

K-armed Bandit based Multi-Modal Network Architecture Search for Visual Question Answering.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Exploring Language Prior for Mode-Sensitive Visual Attention Modeling.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Cascade Grouped Attention Network for Referring Expression Segmentation.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Attacking Image Captioning Towards Accuracy-Preserving Target Words Removal.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Multi-Task Collaborative Network for Joint Referring Expression Comprehension and Segmentation.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

SSAH: Semi-Supervised Adversarial Deep Hashing with Self-Paced Hard Sample Generation.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019
Discovering Latent Discriminative Patterns for Multi-Mode Event Representation.
IEEE Trans. Multim., 2019

Correntropy-Induced Robust Low-Rank Hypergraph.
IEEE Trans. Image Process., 2019

Gradual recovery based occluded digit images recognition.
Multim. Tools Appl., 2019

Action recognition with multi-scale trajectory-pooled 3D convolutional descriptors.
Multim. Tools Appl., 2019

Robust ℓ2-Hypergraph and its applications.
Inf. Sci., 2019

Unsupervised semantic deep hashing.
Neurocomputing, 2019

Hadamard Codebook Based Deep Hashing.
CoRR, 2019

Toward 3D Object Reconstruction from Stereo Images.
CoRR, 2019

Semantic-aware Image Deblurring.
CoRR, 2019

Scene-based Factored Attention for Image Captioning.
CoRR, 2019

Supervised Online Hashing via Similarity Distribution Learning.
CoRR, 2019

Pix2Vox: Context-aware 3D Reconstruction from Single and Multi-view Images.
CoRR, 2019

Social Media Based Topic Modeling for Smart Campus: A Deep Topical Correlation Analysis Method.
IEEE Access, 2019

Information Competing Process for Learning Diversified Representations.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Variational Structured Semantic Inference for Diverse Image Captioning.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Multi-modal Multi-layer Fusion Network with Average Binary Center Loss for Face Anti-spoofing.
Proceedings of the 27th ACM International Conference on Multimedia, 2019

Hypergraph Induced Convolutional Manifold Networks.
Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019

A Video Post-Filter Deblocking Method Based on Temporal Boosting Residual Networks.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2019

Pix2Vox: Context-Aware 3D Reconstruction From Single and Multi-View Images.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Towards Cross-modality Topic Modelling via Deep Topical Correlation Analysis.
Proceedings of the IEEE International Conference on Acoustics, 2019

Dynamic Capsule Attention for Visual Question Answering.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

Free VQA Models from Knowledge Inertia by Pairwise Inconformity Learning.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

Towards Optimal Fine Grained Retrieval via Decorrelated Centralized Loss with Normalize-Scale Layer.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

Towards Optimal Discrete Online Hashing with Balanced Similarity.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018
Two-Stream 3-D convNet Fusion for Action Recognition in Videos With Arbitrary Size and Length.
IEEE Trans. Multim., 2018

Distinctive action sketch for human action recognition.
Signal Process., 2018

Event patches: Mining effective parts for event detection and understanding.
Signal Process., 2018

Exploring part-aware segmentation for fine-grained visual categorization.
Multim. Tools Appl., 2018

Rediscover flowers structurally.
Multim. Tools Appl., 2018

Hierarchical semantic image matching using CNN feature pyramid.
Comput. Vis. Image Underst., 2018

Semantic and Contrast-Aware Saliency.
CoRR, 2018

The Effectiveness of Instance Normalization: a Strong Baseline for Single Image Dehazing.
CoRR, 2018

Centralized Ranking Loss with Weakly Supervised Localization for Fine-Grained Object Retrieval.
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, 2018

Add: Actionness-Pooled Deep-Convolutional Descriptor.
Proceedings of the 2018 IEEE International Conference on Multimedia and Expo, 2018

Cycle-Consistency Based Hierarchical Dense Semantic Correspondence.
Proceedings of the 2018 IEEE International Conference on Image Processing, 2018

Illustrate your travel notes: web-based story visualization.
Proceedings of the 10th International Conference on Internet Multimedia Computing and Service, 2018

Weighted voxel: a novel voxel representation for 3D reconstruction.
Proceedings of the 10th International Conference on Internet Multimedia Computing and Service, 2018

Restricted Boltzmann Machine Based Active Learning for Sparse Recommendation.
Proceedings of the Database Systems for Advanced Applications, 2018

GroupCap: Group-Based Image Captioning With Structured Relevance and Diversity Constraints.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Strong Baseline for Single Image Dehazing with Deep Features and Instance Normalization.
Proceedings of the British Machine Vision Conference 2018, 2018

2017
Dancelets Mining for Video Recommendation Based on Dance Styles.
IEEE Trans. Multim., 2017

Hierarchical Latent Concept Discovery for Video Event Detection.
IEEE Trans. Image Process., 2017

Breaking video into pieces for action recognition.
Multim. Tools Appl., 2017

Anomaly detection based on spatio-temporal sparse representation and visual attention analysis.
Multim. Tools Appl., 2017

Exploiting the complementary strengths of multi-layer CNN features for image retrieval.
Neurocomputing, 2017

Actor identification via mining representative actions.
Neurocomputing, 2017

Shallow and Deep Model Investigation for Distinguishing Corn and Weeds.
Proceedings of the Advances in Multimedia Information Processing - PCM 2017, 2017

Object Discovery and Cosegmentation Based on Dense Correspondences.
Proceedings of the Advances in Multimedia Information Processing - PCM 2017, 2017

Multi-scale Discriminative Patches for Fined-Grained Visual Categorization.
Proceedings of the Advances in Multimedia Information Processing - PCM 2017, 2017

Trajectory-Pooled 3D Convolutional Descriptors for Action Recognition.
Proceedings of the Advances in Multimedia Information Processing - PCM 2017, 2017

Gated additive skip context connection for object detection.
Proceedings of the 2017 IEEE International Conference on Image Processing, 2017

Dancing like a superstar: Action guidance based on pose estimation and conditional pose alignment.
Proceedings of the 2017 IEEE International Conference on Image Processing, 2017

SPTF: A Scalable Probabilistic Tensor Factorization Model for Semantic-Aware Behavior Prediction.
Proceedings of the 2017 IEEE International Conference on Data Mining, 2017

An Integrated Model for Effective Saliency Prediction.
Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 2017

Web-Based Semantic Fragment Discovery for On-Line Lingual-Visual Similarity.
Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 2017

2016
Robust spatial-temporal deep model for multimedia event detection.
Neurocomputing, 2016

Unsupervised discovery of crowd activities by saliency-based clustering.
Neurocomputing, 2016

Quartet-net Learning for Visual Instance Retrieval.
Proceedings of the 2016 ACM Conference on Multimedia Conference, 2016

Mining representative actions for actor identification.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

2015
深度学习中的自编码器的表达能力研究 (Representation Ability Research of Auto-encoders in Deep Learning).
计算机科学, 2015

Strategy for dynamic 3D depth data matching towards robust action retrieval.
Neurocomputing, 2015

Strategy for aesthetic photography recommendation via collaborative composition model.
IET Comput. Vis., 2015

Part-Aware Segmentation for Fine-Grained Categorization.
Proceedings of the Advances in Multimedia Information Processing - PCM 2015, 2015

"Clustering of Dancelets": Towards Video Recommendation Based on Dance Styles.
Proceedings of the 23rd Annual ACM Conference on Multimedia Conference, MM '15, Brisbane, Australia, October 26, 2015

Distinctive action sketch.
Proceedings of the 2015 IEEE International Conference on Image Processing, 2015

Predicting discrete probability distribution of image emotions.
Proceedings of the 2015 IEEE International Conference on Image Processing, 2015

Dual-mode video stabilization based on adaptive motion clustering.
Proceedings of the 7th International Conference on Internet Multimedia Computing and Service, 2015

Boost sparse coding based abnormal event detection via explicitly applying temporal continuity constraint.
Proceedings of the 7th International Conference on Internet Multimedia Computing and Service, 2015

2014
Toward Statistical Modeling of Saccadic Eye-Movement and Visual Saliency.
IEEE Trans. Image Process., 2014

Where should I stand? Learning based human position recommendation for mobile photographing.
Multim. Tools Appl., 2014

Using Label Propagation to Get Confidence Map for Segmentation.
Proceedings of the Advances in Multimedia Information Processing - PCM 2014, 2014

Exploring Principles-of-Art Features For Image Emotion Recognition.
Proceedings of the ACM International Conference on Multimedia, MM '14, Orlando, FL, USA, November 03, 2014

Exploring covert attention for generic boosting of saliency models.
Proceedings of the 2014 IEEE International Conference on Image Processing, 2014

Structure-aware multi-object discovery for weakly supervised tracking.
Proceedings of the 2014 IEEE International Conference on Image Processing, 2014

"Clustering by saliency" - Unsupervised discovery of crowd activities.
Proceedings of the 2014 IEEE International Conference on Image Processing, 2014

Discriminative Features for Bird Species Classification.
Proceedings of the International Conference on Internet Multimedia Computing and Service, 2014

2013
Bidirectional-isomorphic manifold learning at image semantic understanding & representation.
Multim. Tools Appl., 2013

Visual attention modeling based on short-term environmental adaption.
J. Vis. Commun. Image Represent., 2013

Video classification and recommendation based on affective analysis of viewers.
Neurocomputing, 2013

Flexible Presentation of Videos Based on Affective Content Analysis.
Proceedings of the Advances in Multimedia Modeling, 19th International Conference, 2013

On dense sampling size.
Proceedings of the IEEE International Conference on Image Processing, 2013

Exploring Implicit Image Statistics for Visual Representativeness Modeling.
Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013

2012
Context-Aware Semi-Local Feature Detector.
ACM Trans. Intell. Syst. Technol., 2012

Task-Dependent Visual-Codebook Compression.
IEEE Trans. Image Process., 2012

Action retrieval based on generalized dynamic depth data matching.
Proceedings of the 2012 Visual Communications and Image Processing, 2012

Action Segmentation in Dance Videos.
Proceedings of the Advances in Multimedia Information Processing - PCM 2012, 2012

Real-Time Viewfinder Composition Assessment and Recommendation to Mobile Photographing.
Proceedings of the Advances in Multimedia Information Processing - PCM 2012, 2012

Memorable basis: towards human-centralized sparse representation.
Proceedings of the 20th ACM Multimedia Conference, MM '12, Nara, Japan, October 29, 2012

Aesthetic composition represetation for portrait photographing recommendation.
Proceedings of the 19th IEEE International Conference on Image Processing, 2012

What are we looking for: Towards statistical modeling of saccadic eye movements and visual saliency.
Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012

2011
Actor-independent action search using spatiotemporal vocabulary with appearance hashing.
Pattern Recognit., 2011

Video indexing and recommendation based on affective analysis of viewers.
Proceedings of the 19th International Conference on Multimedia 2011, Scottsdale, AZ, USA, November 28, 2011

Unsupervised fast anomaly detection in crowds.
Proceedings of the 19th International Conference on Multimedia 2011, Scottsdale, AZ, USA, November 28, 2011

Learning heterogeneous data for hierarchical web video classification.
Proceedings of the 19th International Conference on Multimedia 2011, Scottsdale, AZ, USA, November 28, 2011

Sparse representation based visual element analysis.
Proceedings of the 18th IEEE International Conference on Image Processing, 2011

Video stabilization based on saliency driven SIFT matching and discriminative RANSAC.
Proceedings of the ICIMCS 2011, 2011

Contextual dictionaries for image super resolution.
Proceedings of the ICIMCS 2011, 2011

A spatiotemporal context phrase description for general dynamic texture.
Proceedings of the ICIMCS 2011, 2011

Affective Video Classification Based on Spatio-temporal Feature Fusion.
Proceedings of the Sixth International Conference on Image and Graphics, 2011

Saliency Detection: A Self-Adaption Sparse Representation Approach.
Proceedings of the Sixth International Conference on Image and Graphics, 2011

2010
A rotation and scale invariant texture description approach.
Proceedings of the Visual Communications and Image Processing 2010, 2010

Saliency detection based on short-term sparse representation.
Proceedings of the International Conference on Image Processing, 2010

Visual saliency as sequential eye fixation probability.
Proceedings of the International Conference on Image Processing, 2010

A robust texture descriptor using multifractal analysis with Gabor filter.
Proceedings of the Second International Conference on Internet Multimedia Computing and Service, 2010

Visual topic model for web image annotation.
Proceedings of the Second International Conference on Internet Multimedia Computing and Service, 2010

Mining actor correlations with hierarchical concurrence parsing.
Proceedings of the IEEE International Conference on Acoustics, 2010

Towards semantic embedding in visual vocabulary.
Proceedings of the Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition, 2010

2009
Visual and textual fusion for semantically supervised region-based retrieval.
Multim. Syst., 2009

Photo assessment based on computational visual attention model.
Proceedings of the 17th International Conference on Multimedia 2009, 2009

What is a complete set of keywords for image description & annotation on the web.
Proceedings of the 17th International Conference on Multimedia 2009, 2009

VisualCor system: search actor correlations in TV series.
Proceedings of the First International Conference on Internet Multimedia Computing and Service, 2009

2008
Vision-Based Semi-supervised Homecare with Spatial Constraint.
Proceedings of the Advances in Multimedia Information Processing, 2008

Attention-driven action retrieval with DTW-based 3d descriptor matching.
Proceedings of the 16th International Conference on Multimedia 2008, 2008

Place retrieval with graph-based place-view model.
Proceedings of the 1st ACM SIGMM International Conference on Multimedia Information Retrieval, 2008

Cross-media manifold learning for image retrieval & annotation.
Proceedings of the 1st ACM SIGMM International Conference on Multimedia Information Retrieval, 2008

Directional correlation analysis of local Haar binary pattern for text detection.
Proceedings of the 2008 IEEE International Conference on Multimedia and Expo, 2008

Text Particles Multi-band Fusion for Robust Text Detection.
Proceedings of the Image Analysis and Recognition, 5th International Conference, 2008


  Loading...