Yu Qiao

Orcid: 0000-0002-1889-2567

Affiliations:
  • Shanghai AI Laboratory, OpenGVLab, China
  • Chinese Academy of Sciences, Shenzhen Institutes of Advanced Technology, China
  • University of Tokyo, Graduate School of Information Science and Technology, Japan (former)
  • University of Electro-Communications, Tokyo, Japan (PhD 2006)


According to our database1, Yu Qiao authored at least 663 papers between 2003 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Diff-Font: Diffusion Model for Robust One-Shot Font Generation.
Int. J. Comput. Vis., November, 2024

F2S-Net: learning frame-to-segment prediction for online action detection.
J. Real Time Image Process., May, 2024

Mimic before Reconstruct: Enhancing Masked Autoencoders with Feature Mimicking.
Int. J. Comput. Vis., May, 2024

Delving Into the Devils of Bird's-Eye-View Perception: A Review, Evaluation and Recipe.
IEEE Trans. Pattern Anal. Mach. Intell., April, 2024

Temporally consistent video colorization with deep feature propagation and self-regularization learning.
Comput. Vis. Media, April, 2024

CLIP-Adapter: Better Vision-Language Models with Feature Adapters.
Int. J. Comput. Vis., February, 2024

CP-Net: Contour-Perturbed Reconstruction Network for Self-Supervised Point Cloud Learning.
IEEE Trans. Multim., 2024

Dual Masked Modeling for Weakly-Supervised Temporal Boundary Discovery.
IEEE Trans. Multim., 2024

Attentive Snippet Prompting for Video Retrieval.
IEEE Trans. Multim., 2024

Progressive Frame-Proposal Mining for Weakly Supervised Video Object Detection.
IEEE Trans. Image Process., 2024

AdaptBIR: Adaptive Blind Image Restoration with latent diffusion prior for higher fidelity.
Pattern Recognit., 2024

MixStyle Neural Networks for Domain Generalization and Adaptation.
Int. J. Comput. Vis., 2024

Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training.
CoRR, 2024

ToMiE: Towards Modular Growth in Enhanced SMPL Skeleton for 3D Human with Animatable Garments.
CoRR, 2024

Towards Synergistic, Generalized, and Efficient Dual-System for Robotic Manipulation.
CoRR, 2024

Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation.
CoRR, 2024

MinerU: An Open-Source Solution for Precise Document Content Extraction.
CoRR, 2024

CLSP: High-Fidelity Contrastive Language-State Pre-training for Agent State Representation.
CoRR, 2024

PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions.
CoRR, 2024

GigaGS: Scaling up Planar-Based 3D Gaussians for Large Scene Surface Reconstruction.
CoRR, 2024

A Preliminary Exploration Towards General Image Restoration.
CoRR, 2024

MUSES: 3D-Controllable Image Generation via Multi-Modal Agent Collaboration.
CoRR, 2024

GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI.
CoRR, 2024

VisionUnite: A Vision-Language Foundation Model for Ophthalmology Enhanced with Clinical Knowledge.
CoRR, 2024

MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models.
CoRR, 2024

Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining.
CoRR, 2024

DriveArena: A Closed-loop Generative Simulation Platform for Autonomous Driving.
CoRR, 2024

Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model.
CoRR, 2024

The Shadow of Fraud: The Emerging Danger of AI-powered Social Engineering and its Possible Cure.
CoRR, 2024

MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diversity.
CoRR, 2024

Cinemo: Consistent and Controllable Image Animation with Motion Diffusion Models.
CoRR, 2024

ViLLa: Video Reasoning Segmentation with Large Language Model.
CoRR, 2024

GRIDS: Grouped Multiple-Degradation Restoration with Image Degradation Similarity.
CoRR, 2024

Navigating the Data Trading Crossroads: An Interdisciplinary Survey.
CoRR, 2024

Building Intelligence Identification System via Large Language Model Watermarking: A Survey and Beyond.
CoRR, 2024

EfficientQAT: Efficient Quantization-Aware Training for Large Language Models.
CoRR, 2024

GRUtopia: Dream General Robots in a City at Scale.
CoRR, 2024

VEnhancer: Generative Space-Time Enhancement for Video Generation.
CoRR, 2024

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output.
CoRR, 2024

Lumina-Next: Making Lumina-T2X Stronger and Faster with Next-DiT.
CoRR, 2024

EgoVideo: Exploring Egocentric Foundation Model and Downstream Adaptation.
CoRR, 2024

OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI.
CoRR, 2024

SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Model.
CoRR, 2024

MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs.
CoRR, 2024

PhyBench: A Physical Commonsense Benchmark for Evaluating Text-to-Image Models.
CoRR, 2024

DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models.
CoRR, 2024

Rethinking Human Evaluation Protocol for Text-to-Video Models: Enhancing Reliability, Reproducibility, and Practicality.
CoRR, 2024

GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices.
CoRR, 2024

OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text.
CoRR, 2024

VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks.
CoRR, 2024

Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning.
CoRR, 2024

Needle In A Multimodal Haystack.
CoRR, 2024

Learning 1D Causal Visual Representation with De-focus Attention Networks.
CoRR, 2024

Parameter-Inverted Image Pyramid Networks.
CoRR, 2024

ShareGPT4Video: Improving Video Understanding and Generation with Better Captions.
CoRR, 2024

Ouroboros3D: Image-to-3D Generation via 3D-aware Recursive Diffusion.
CoRR, 2024

Learning Manipulation by Predicting Interaction.
CoRR, 2024

Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models.
CoRR, 2024

Continuously Learning, Adapting, and Improving: A Dual-Process Approach to Autonomous Driving.
CoRR, 2024

FLoRA: Low-Rank Core Space for N-dimension.
CoRR, 2024

UDKAG: Augmenting Large Vision-Language Models with Up-to-Date Knowledge.
CoRR, 2024

Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers.
CoRR, 2024

Causal Evaluation of Language Models.
CoRR, 2024

How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites.
CoRR, 2024

InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD.
CoRR, 2024

Linear Attention Sequence Parallelism.
CoRR, 2024

VideoDistill: Language-aware Vision Distillation for Video Question Answering.
CoRR, 2024

Are We on the Right Way for Evaluating Large Vision-Language Models?
CoRR, 2024

ConvBench: A Multi-Turn Conversation Evaluation Benchmark with Hierarchical Capability for Large Vision-Language Models.
CoRR, 2024

RH20T-P: A Primitive-Level Robotic Dataset Towards Composable Generalization Agents.
CoRR, 2024

Assessment of Multimodal Large Language Models in Alignment with Human Values.
CoRR, 2024

InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding.
CoRR, 2024

MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simulated-World Control.
CoRR, 2024

AVIBench: Towards Evaluating the Robustness of Large Vision-Language Model on Adversarial Visual-Instructions.
CoRR, 2024

Exploring Safety Generalization Challenges of Large Language Models via Code.
CoRR, 2024

Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures.
CoRR, 2024

Towards Implicit Prompt For Text-To-Image Models.
CoRR, 2024

Efficient Action Counting with Dynamic Queries.
CoRR, 2024

WanJuan-CC: A Safe and High-Quality Open-sourced English Webtext Dataset.
CoRR, 2024

Percept, Chat, and then Adapt: Multimodal Knowledge Transfer of Foundation Models for Open-World Video Recognition.
CoRR, 2024

RoboScript: Code Generation for Free-Form Manipulation Tasks across Real and Simulation.
CoRR, 2024

ChartX & ChartVLM: A Versatile Benchmark and Foundation Model for Complicated Chart Reasoning.
CoRR, 2024

Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey.
CoRR, 2024

SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models.
CoRR, 2024

Real-time Holistic Robot Pose Estimation with Unknown States.
CoRR, 2024

InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model.
CoRR, 2024

From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities.
CoRR, 2024

MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer.
CoRR, 2024

Latte: Latent Diffusion Transformer for Video Generation.
CoRR, 2024

ChartAssisstant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training and Multitask Instruction Tuning.
CoRR, 2024

Hierarchical Diffusion Autoencoders and Disentangled Image Manipulation.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024

Drive Like a Human: Rethinking Autonomous Driving with Large Language Models.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision Workshops, 2024

Learning A Low-Level Vision Generalist via Visual Task Prompt.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

LimSim++: A Closed-Loop Platform for Deploying Multimodal LLMs in Autonomous Driving.
Proceedings of the IEEE Intelligent Vehicles Symposium, 2024

Safety of Multimodal Large Language Models on Images and Text.
Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024

MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Position: Towards Implicit Prompt For Text-To-Image Models.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Unifying Image Processing as Visual Prompting Question Answering.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Causal Discovery via Conditional Independence Testing with Proxy Variables.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

SEAL: A Framework for Systematic Evaluation of Real-World Super-Resolution.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

LLaMA-Adapter: Efficient Fine-tuning of Large Language Models with Zero-initialized Attention.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Personalize Segment Anything Model with One Shot.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

BESA: Pruning Large Language Models with Blockwise Parameter-Efficient Sparsity Allocation.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

DiLu: A Knowledge-Driven Approach to Autonomous Driving with Large Language Models.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

CO2: Efficient Distributed Training with Full Communication-Computation Overlap.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Tree-Planner: Efficient Close-loop Task Planning with Large Language Models.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

ReSimAD: Zero-Shot 3D Domain Transfer for Autonomous Driving with Source Reconstruction and Target Simulation.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Align, Adapt and Inject: Audio-Guided Image Generation, Editing and Stylization.
Proceedings of the IEEE International Conference on Acoustics, 2024

Rethinking Mutual Information for Language Conditioned Skill Discovery on Imitation Learning.
Proceedings of the Thirty-Fourth International Conference on Automated Planning and Scheduling, 2024

Embodied Understanding of Driving Scenarios.
Proceedings of the Computer Vision - ECCV 2024, 2024

Clearer Frames, Anytime: Resolving Velocity Ambiguity in Video Frame Interpolation.
Proceedings of the Computer Vision - ECCV 2024, 2024

MATHVERSE: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
Proceedings of the Computer Vision - ECCV 2024, 2024

Reg-TTA3D: Better Regression Makes Better Test-Time Adaptive 3D Object Detection.
Proceedings of the Computer Vision - ECCV 2024, 2024

Mask as Supervision: Leveraging Unified Mask Information for Unsupervised 3D Pose Estimation.
Proceedings of the Computer Vision - ECCV 2024, 2024

The All-Seeing Project V2: Towards General Relation Comprehension of the Open World.
Proceedings of the Computer Vision - ECCV 2024, 2024

MM-SafetyBench: A Benchmark for Safety Evaluation of Multimodal Large Language Models.
Proceedings of the Computer Vision - ECCV 2024, 2024

SPHINX: A Mixer of Weights, Visual Embeddings and Image Scales for Multi-modal Large Language Models.
Proceedings of the Computer Vision - ECCV 2024, 2024

Distilling Knowledge from Large-Scale Image Models for Object Detection.
Proceedings of the Computer Vision - ECCV 2024, 2024

VideoMamba: State Space Model for Efficient Video Understanding.
Proceedings of the Computer Vision - ECCV 2024, 2024

Within the Dynamic Context: Inertia-Aware 3D Human Modeling with Pose Sequence.
Proceedings of the Computer Vision - ECCV 2024, 2024

A Comparative Study of Image Restoration Networks for General Backbone Network Design.
Proceedings of the Computer Vision - ECCV 2024, 2024

LLaMA-Excitor: General Instruction Tuning via Indirect Feature Interaction.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Language-aware Visual Semantic Distillation for Video Question Answering.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Vlogger: Make Your Dream A Vlog.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Asymmetric Masked Distillation for Pre-Training Small Foundation Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Point2RBox: Combine Knowledge from Synthetic Visual Patterns for End-to-End Oriented Object Detection with Single Point Supervision.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Generalized Predictive Model for Autonomous Driving.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

ScoreHypo: Probabilistic Human Mesh Estimation with Hypothesis Scoring.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

SinSR: Diffusion-Based Image Super-Resolution in a Single Step.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

DiffInDScene: Diffusion-Based High-Quality 3D Indoor Scene Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

EpiDiff: Enhancing Multi-View Synthesis via Localized Epipolar-Constrained Diffusion.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

VBench: Comprehensive Benchmark Suite for Video Generative Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for Medical LVLM.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

OneLLM: One Framework to Align All Modalities with Language.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Intern VL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Towards Real-world Video Face Restoration: A New Benchmark.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

VideoBooth: Diffusion-based Video Generation with Image Prompts.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

MVBench: A Comprehensive Multi-modal Video Understanding Benchmark.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Point Transformer V3: Simpler, Faster, Stronger.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

Emulated Disalignment: Safety Alignment for Large Language Models May Backfire!
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

PsySafe: A Comprehensive Framework for Psychological-based Attack, Defense, and Evaluation of Multi-agent System Safety.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

ChartAssistant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training and Multitask Instruction Tuning.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

MoPS: Modular Story Premise Synthesis for Open-Ended Automatic Story Generation.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

SALAD-Bench: A Hierarchical and Comprehensive Safety Benchmark for Large Language Models.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

SEER: Facilitating Structured Reasoning and Explanation via Reinforcement Learning.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

Towards Tracing Trustworthiness Dynamics: Revisiting Pre-training Period of Large Language Models.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

Brush Your Text: Synthesize Any Scene Text on Images via Diffusion Model.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

Critic-Guided Decision Transformer for Offline Reinforcement Learning.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

Data Adaptive Traceback for Vision-Language Foundation Models in Image Classification.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

ConditionVideo: Training-Free Condition-Guided Video Generation.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

Aleth-NeRF: Illumination Adaptive NeRF with Concealing Field Assumption.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

M-BEV: Masked BEV Perception for Robust Autonomous Driving.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Evaluating the Generalization Ability of Super-Resolution Networks.
IEEE Trans. Pattern Anal. Mach. Intell., December, 2023

UniFormer: Unifying Convolution and Self-Attention for Visual Recognition.
IEEE Trans. Pattern Anal. Mach. Intell., October, 2023

CP3: Unifying Point Cloud Completion by Pretrain-Prompt-Predict Paradigm.
IEEE Trans. Pattern Anal. Mach. Intell., August, 2023

Hybrid token transformer for deep face recognition.
Pattern Recognit., July, 2023

Blind Image Super-Resolution: A Survey and Beyond.
IEEE Trans. Pattern Anal. Mach. Intell., May, 2023

COCAS+: Large-Scale Clothes-Changing Person Re-Identification With Clothes Templates.
IEEE Trans. Circuits Syst. Video Technol., April, 2023

Domain Generalization: A Survey.
IEEE Trans. Pattern Anal. Mach. Intell., April, 2023

ActFloor-GAN: Activity-Guided Adversarial Networks for Human-Centric Floorplan Design.
IEEE Trans. Vis. Comput. Graph., March, 2023

Towards robustness and generalization of point cloud representation: A geometry coding method and a large-scale object-level dataset.
Comput. Vis. Media, February, 2023

Protein engineering via Bayesian optimization-guided evolutionary algorithm and robotic experiments.
Briefings Bioinform., January, 2023

Hierarchical and Progressive Image Matting.
ACM Trans. Multim. Comput. Commun. Appl., 2023

Blind Image Restoration Based on Cycle-Consistent Network.
IEEE Trans. Multim., 2023

Region-Aware Arbitrary-Shaped Text Detection With Progressive Fusion.
IEEE Trans. Multim., 2023

Very Lightweight Photo Retouching Network With Conditional Sequential Modulation.
IEEE Trans. Multim., 2023

Character-Aware Sampling and Rectification for Scene Text Recognition.
IEEE Trans. Multim., 2023

Dual Relation Network for Scene Text Recognition.
IEEE Trans. Multim., 2023

InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks.
CoRR, 2023

Towards the Unification of Generative and Discriminative Visual Foundation Model: A Survey.
CoRR, 2023

DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving.
CoRR, 2023

MoVQA: A Benchmark of Versatile Question-Answering for Long-Form Movie Understanding.
CoRR, 2023

Towards Knowledge-driven Autonomous Driving.
CoRR, 2023

Open-sourced Data Ecosystem in Autonomous Driving: the Present and Future.
CoRR, 2023

MLLMs-Augmented Visual-Language Representation Learning.
CoRR, 2023

Query-Relevant Images Jailbreak Large Multi-Modal Models.
CoRR, 2023

MVBench: A Comprehensive Multi-modal Video Understanding Benchmark.
CoRR, 2023

Point2RBox: Combine Knowledge from Synthetic Visual Patterns for End-to-end Oriented Object Detection with Single Point Supervision.
CoRR, 2023

DiffusionMat: Alpha Matting as Sequential Refinement Learning.
CoRR, 2023

SA-Med2D-20M Dataset: Segment Anything in 2D Medical Imaging with 20 Million masks.
CoRR, 2023

SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models.
CoRR, 2023

On the Road with GPT-4V(ision): Early Explorations of Visual-Language Model on Autonomous Driving.
CoRR, 2023

ChEF: A Comprehensive Evaluation Framework for Standardized Assessment of Multimodal Large Language Models.
CoRR, 2023

Octavius: Mitigating Task Interference in MLLMs via MoE.
CoRR, 2023

Harvest Video Foundation Models via Efficient Post-Pretraining.
CoRR, 2023

ControlLLM: Augment Language Models with Tools by Searching on Graphs.
CoRR, 2023

SAM-Med3D.
CoRR, 2023

PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm.
CoRR, 2023

ConditionVideo: Training-Free Condition-Guided Text-to-Video Generation.
CoRR, 2023

Mini-DALLE3: Interactive Text to Image by Prompting Large Language Models.
CoRR, 2023

REVO-LION: Evaluating and Refining Vision-Language Instruction Tuning Datasets.
CoRR, 2023

Language-driven Open-Vocabulary Keypoint Detection for Animal Body and Face.
CoRR, 2023

Beyond One-Preference-for-All: Multi-Objective Direct Preference Optimization for Language Models.
CoRR, 2023

Exploring Counterfactual Alignment Loss towards Human-centered AI.
CoRR, 2023

InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition.
CoRR, 2023

LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion Models.
CoRR, 2023

StructChart: Perception, Structuring, Reasoning for Visual Chart Understanding.
CoRR, 2023

SPOT: Scalable 3D Pre-training via Occupancy Prediction for Autonomous Driving.
CoRR, 2023

HAT: Hybrid Attention Transformer for Image Restoration.
CoRR, 2023

Towards Efficient SDRTV-to-HDRTV by Learning from Image Formation.
CoRR, 2023

A-Eval: A Benchmark for Cross-Dataset Evaluation of Abdominal Multi-Organ Segmentation.
CoRR, 2023

ImageBind-LLM: Multi-modality Instruction Tuning.
CoRR, 2023

SEAL: A Framework for Systematic Evaluation of Real-World Super-Resolution.
CoRR, 2023

SAM-Med2D.
CoRR, 2023

DiffBIR: Towards Blind Image Restoration with Generative Diffusion Prior.
CoRR, 2023

Tiny LVLM-eHub: Early Multimodal Experiments with Bard.
CoRR, 2023

Scaling TransNormer to 175 Billion Parameters.
CoRR, 2023

Meta-Transformer: A Unified Framework for Multimodal Learning.
CoRR, 2023

InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation.
CoRR, 2023

AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning.
CoRR, 2023

JourneyDB: A Benchmark for Generative Image Understanding.
CoRR, 2023

Align, Adapt and Inject: Sound-guided Unified Image Generation.
CoRR, 2023

MedFMC: A Real-world Dataset and Benchmark For Foundation Model Adaptation in Medical Image Classification.
CoRR, 2023

LVLM-eHub: A Comprehensive Evaluation Benchmark for Large Vision-Language Models.
CoRR, 2023

Denoising Diffusion Semantic Segmentation with Mask Prior Modeling.
CoRR, 2023

DiffRoom: Diffusion-based High-Quality 3D Room Reconstruction and Generation with Occupancy Prior.
CoRR, 2023

Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory.
CoRR, 2023

Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation.
CoRR, 2023

VideoLLM: Modeling Video Sequence with Large Language Models.
CoRR, 2023

Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model.
CoRR, 2023

VideoChat: Chat-Centric Video Understanding.
CoRR, 2023

InternGPT: Solving Vision-Centric Tasks by Interacting with Chatbots Beyond Language.
CoRR, 2023

Causal Discovery with Unobserved Variables: A Proxy Variable Approach.
CoRR, 2023

LEO: Generative Latent Image Animator for Human Video Synthesis.
CoRR, 2023

LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model.
CoRR, 2023

Road Genome: A Topology Reasoning Benchmark for Scene Understanding in Autonomous Driving.
CoRR, 2023

Perception Imitation: Towards Synthesis-free Simulator for Autonomous Vehicles.
CoRR, 2023

STU-Net: Scalable and Transferable Medical Image Segmentation Models Empowered by Large-Scale Supervised Pre-training.
CoRR, 2023

Topology Reasoning for Driving Scenes.
CoRR, 2023

LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention.
CoRR, 2023

Aleth-NeRF: Low-light Condition View Synthesis with Concealing Fields.
CoRR, 2023

Policy Pre-training for End-to-end Autonomous Driving via Self-supervised Geometric Modeling.
CoRR, 2023

TMT-VIS: Taxonomy-aware Multi-dataset Joint Training for Video Instance Segmentation.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Real-World Image Super-Resolution as Multi-Task Learning.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

AD-PT: Autonomous Driving Pre-Training with Large-scale Point Cloud Dataset.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

JourneyDB: A Benchmark for Generative Image Understanding.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Foundation Model is Efficient Multimodal Multitask Model Selector.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Leveraging Vision-Centric Multi-Modal Expertise for 3D Object Detection.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Networks are Slacking Off: Understanding Generalization Problem in Image Deraining.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Learning Discriminative Feature Representation for Open Set Action Recognition.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Text-Guided Foundation Model Adaptation for Pathological Image Classification.
Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2023, 2023

LimSim: A Long-Term Interactive Multi-Scenario Traffic Simulator.
Proceedings of the 25th IEEE International Conference on Intelligent Transportation Systems, 2023

Parallelizable Simple Recurrent Units with Hierarchical Memory.
Proceedings of the Neural Information Processing - 30th International Conference, 2023

Long-Term Rhythmic Video Soundtracker.
Proceedings of the International Conference on Machine Learning, 2023

Policy Pre-training for Autonomous Driving via Self-supervised Geometric Modeling.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

CO3: Cooperative Unsupervised 3D Representation Learning for Autonomous Driving.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Vision Transformer Adapter for Dense Predictions.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Retrieving-to-Answer: Zero-Shot Video Question Answering with Frozen Large Language Models.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

MonoDETR: Depth-guided Transformer for Monocular 3D Object Detection.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Shrinking Class Space for Enhanced Certainty in Semi-Supervised Learning.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Scaling Data Generation in Vision-and-Language Navigation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Multi-view Spectral Polarization Propagation for Video Glass Segmentation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

DetZero: Rethinking Offboard 3D Object Detection with Long-term Sequential Point Clouds.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

UniSeg: A Unified Multi-Modal LiDAR Segmentation Network and the OpenPCSeg Codebase.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

UniFormerV2: Unlocking the Potential of Image ViTs for Video Understanding.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Unmasked Teacher: Towards Training-Efficient Video Foundation Models.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Rethinking Range View Representation for LiDAR Segmentation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

MGMAE: Motion Guided Masking for Video Masked Autoencoding.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

DiffRate : Differentiable Compression Rate for Efficient Vision Transformers.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

HTML: Hybrid Temporal-scale Multimodal Learning Framework for Referring Video Object Segmentation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Learning Weather-General and Weather-Specific Features for Image Restoration Under Multiple Adverse Weather Conditions.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Uni3D: A Unified Baseline for Multi-Dataset 3D Object Detection.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Learning 3D Representations from 2D Pre-Trained Models via Image-to-Point Masked Autoencoders.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Prompt, Generate, Then Cache: Cascade of Foundation Models Makes Strong Few-Shot Learners.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Distilling Focal Knowledge from Imperfect Expert for 3D Object Detection.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Bi3D: Bi-Domain Active Learning for Cross-Domain 3D Object Detection.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Stare at What You See: Masked Image Modeling without Reconstruction.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

MM-3DScene: 3D Scene Understanding by Customizing Masked Modeling with Informative-Preserved Reconstruction and Self-Distilled Consistency.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Learning Open-Vocabulary Semantic Segmentation Models From Natural Language Supervision.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Video Dehazing via a Multi-Range Temporal Alignment Network with Physical Prior.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

SCPNet: Semantic Scene Completion on Point Cloud.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

ResFormer: Scaling ViTs with Multi-Resolution Training.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Siamese Image Modeling for Self-Supervised Vision Representation Learning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Fine-grained Audible Video Description.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

LoGoNet: Towards Accurate 3D Object Detection with Local-to-Global Cross- Modal Fusion.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Planning-oriented Autonomous Driving.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Activating More Pixels in Image Super-Resolution Transformer.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

CLIP2Scene: Towards Label-efficient 3D Scene Understanding by CLIP.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Towards All-in-One Pre-Training via Maximizing Multi-Modal Mutual Information.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

DegAE: A New Pretraining Paradigm for Low-Level Vision.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
Prior-Induced Information Alignment for Image Matting.
IEEE Trans. Multim., 2022

FineAction: A Fine-Grained Video Dataset for Temporal Action Localization.
IEEE Trans. Image Process., 2022

Robust Image Forgery Detection Against Transmission Over Online Social Networks.
IEEE Trans. Inf. Forensics Secur., 2022

Temporal Weighting Appearance-Aligned Network for Nighttime Video Retrieval.
IEEE Signal Process. Lett., 2022

Unsupervised person re-identification with multi-label learning guided self-paced clustering.
Pattern Recognit., 2022

RankSRGAN: Super Resolution Generative Adversarial Networks With Learning to Rank.
IEEE Trans. Pattern Anal. Mach. Intell., 2022

Interactive Multi-Dimension Modulation for Image Restoration.
IEEE Trans. Pattern Anal. Mach. Intell., 2022

Author Correction: Development and clinical deployment of a smartphone-based visual field deep learning system for glaucoma detection.
npj Digit. Medicine, 2022

Joint 3D facial shape reconstruction and texture completion from a single image.
Comput. Vis. Media, 2022

ADAS: A Simple Active-and-Adaptive Baseline for Cross-Domain 3D Semantic Segmentation.
CoRR, 2022

Goal-oriented Autonomous Driving.
CoRR, 2022

InternVideo: General Video Foundation Models via Generative and Discriminative Learning.
CoRR, 2022

Toward Efficient Language Model Pretraining and Downstream Adaptation via Self-Evolution: A Case Study on SuperGLUE.
CoRR, 2022

UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer.
CoRR, 2022

InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D Challenges.
CoRR, 2022

Demystify Transformers & Convolutions in Modern Image Deep Networks.
CoRR, 2022

Hierarchical and Progressive Image Matting.
CoRR, 2022

Low-Resolution Action Recognition for Tiny Actions Challenge.
CoRR, 2022

Collaboration of Pre-trained Models Makes Better Few-shot Learner.
CoRR, 2022

Delving into the Devils of Bird's-eye-view Perception: A Review, Evaluation and Recipe.
CoRR, 2022

Vision-Centric BEV Perception: A Survey.
CoRR, 2022

Tip-Adapter: Training-free Adaption of CLIP for Few-shot Classification.
CoRR, 2022

CP3: Unifying Point Cloud Completion by Pretrain-Prompt-Predict Paradigm.
CoRR, 2022

Level 2 Autonomous Driving on a Single Device: Diving into the Devils of Openpilot.
CoRR, 2022

Siamese Image Modeling for Self-Supervised Vision Representation Learning.
CoRR, 2022

Illumination Adaptive Transformer.
CoRR, 2022

NTIRE 2022 Challenge on Efficient Super-Resolution: Methods and Results.
CoRR, 2022

ConvMAE: Masked Convolution Meets Masked Autoencoders.
CoRR, 2022

POS-BERT: Point Cloud One-Stage BERT Pre-Training.
CoRR, 2022

MonoDETR: Depth-aware Transformer for Monocular 3D Object Detection.
CoRR, 2022

X-Learner: Learning Cross Sources and Tasks for Universal Visual Representation.
CoRR, 2022

Bamboo: Building Mega-Scale Vision Dataset Continually with Human-Machine Synergy.
CoRR, 2022

Distillation with Contrast is All You Need for Self-Supervised Point Cloud Representation Learning.
CoRR, 2022

UniFormer: Unified Transformer for Efficient Spatiotemporal Representation Learning.
CoRR, 2022

Asynchronous feature regularization and cross-modal distillation for OCT based glaucoma diagnosis.
Comput. Biol. Medicine, 2022

Point-M2AE: Multi-scale Masked Autoencoders for Hierarchical Point Cloud Pre-training.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Trajectory-guided Control Prediction for End-to-end Autonomous Driving: A Simple yet Strong Baseline.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

MCMAE: Masked Convolution Meets Masked Autoencoders.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Cycle-Consistent Learning for Weakly Supervised Semantic Segmentation.
Proceedings of the HCMA@MM 2022: Proceedings of the 3rd International Workshop on Human-Centric Multimedia Analysis, 2022

Visual Knowledge Graph for Human Action Reasoning in Videos.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

VideoPipe 2022 Challenge: Real-World Video Understanding for Urban Pipe Inspection.
Proceedings of the 26th International Conference on Pattern Recognition, 2022

UniFormer: Unified Transformer for Efficient Spatial-Temporal Representation Learning.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Self-slimmed Vision Transformer.
Proceedings of the Computer Vision - ECCV 2022, 2022

Efficient Image Super-Resolution Using Vast-Receptive-Field Attention.
Proceedings of the Computer Vision - ECCV 2022 Workshops, 2022

Tip-Adapter: Training-Free Adaption of CLIP for Few-Shot Classification.
Proceedings of the Computer Vision - ECCV 2022, 2022

MorphMLP: An Efficient MLP-Like Backbone for Spatial-Temporal Representation Learning.
Proceedings of the Computer Vision - ECCV 2022, 2022

Recurrent Bilinear Optimization for Binary Neural Networks.
Proceedings of the Computer Vision, 2022

PalGAN: Image Colorization with Palette Generative Adversarial Networks.
Proceedings of the Computer Vision - ECCV 2022, 2022

VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual Recognition.
Proceedings of the Computer Vision - ECCV 2022, 2022

Frozen CLIP Models are Efficient Video Learners.
Proceedings of the Computer Vision - ECCV 2022, 2022

BEVFormer: Learning Bird's-Eye-View Representation from Multi-camera Images via Spatiotemporal Transformers.
Proceedings of the Computer Vision - ECCV 2022, 2022

X-Learner: Learning Cross Sources and Tasks for Universal Visual Representation.
Proceedings of the Computer Vision - ECCV 2022, 2022

PersFormer: 3D Lane Detection via Perspective Transformer and the OpenLane Benchmark.
Proceedings of the Computer Vision - ECCV 2022, 2022

PointCLIP: Point Cloud Understanding by CLIP.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

NTIRE 2022 Challenge on Efficient Super-Resolution: Methods and Results.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2022

Blueprint Separable Residual Network for Efficient Image Super-Resolution.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2022

Reflash Dropout in Image Super-Resolution.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Cross Domain Object Detection by Target-Perceived Dual Branch Distillation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Dual-AI: Dual-path Actor Interaction Learning for Group Activity Recognition.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Towards Capturing the Temporal Dynamics for Trajectory Prediction: a Coarse-to-Fine Approach.
Proceedings of the Conference on Robot Learning, 2022

Wider and Higher: Intensive Integration and Global Foreground Perception for Image Matting.
Proceedings of the Advances in Computer Graphics, 2022

Unleashing the Potential of Vision-Language Models for Long-Tailed Visual Recognition.
Proceedings of the 33rd British Machine Vision Conference 2022, 2022

You Only Need 90K Parameters to Adapt Light: a Light Weight Transformer for Image Enhancement and Exposure Correction.
Proceedings of the 33rd British Machine Vision Conference 2022, 2022

CPRAL: Collaborative Panoptic-Regional Active Learning for Semantic Segmentation.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021
Smart Scribbles for Image Matting.
ACM Trans. Multim. Comput. Commun. Appl., 2021

Wildfish++: A Comprehensive Fish Benchmark for Multimedia Research.
IEEE Trans. Multim., 2021

Deep Relation Transformer for Diagnosing Glaucoma With Optical Coherence Tomography and Visual Field Function.
IEEE Trans. Medical Imaging, 2021

Domain Adaptive Ensemble Learning.
IEEE Trans. Image Process., 2021

Learning Dynamical Human-Joint Affinity for 3D Pose Estimation in Videos.
IEEE Trans. Image Process., 2021

Deep Learning-Based Chroma Prediction for Intra Versatile Video Coding.
IEEE Trans. Circuits Syst. Video Technol., 2021

Multi-view self-supervised learning for 3D facial texture reconstruction from single image.
Image Vis. Comput., 2021

TTPP: Temporal Transformer with Progressive Prediction for efficient action anticipation.
Neurocomputing, 2021

A Comprehensive Review of Group Activity Recognition in Videos.
Int. J. Autom. Comput., 2021

Multi-View Partial (MVP) Point Cloud Challenge 2021 on Completion and Registration: Methods and Results.
CoRR, 2021

A Simple Long-Tailed Recognition Baseline via Vision-Language Model.
CoRR, 2021

VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual Recognition.
CoRR, 2021

MorphMLP: A Self-Attention Free, MLP-Like Backbone for Image and Video.
CoRR, 2021

INTERN: A New Learning Paradigm Towards General Vision.
CoRR, 2021

Tip-Adapter: Training-free CLIP-Adapter for Better Vision-Language Modeling.
CoRR, 2021

Discovering "Semantics" in Super-Resolution Networks.
CoRR, 2021

Transferable Knowledge-Based Multi-Granularity Aggregation Network for Temporal Action Localization: Submission to ActivityNet Challenge 2021.
CoRR, 2021

Scalable Transformers for Neural Machine Translation.
CoRR, 2021

TSI: Temporal Saliency Integration for Video Action Recognition.
CoRR, 2021

Multiple Domain Experts Collaborative Learning: Multi-Source Domain Generalization For Person Re-Identification.
CoRR, 2021

FineAction: A Fined Video Dataset for Temporal Action Localization.
CoRR, 2021

Neighbourhood-guided Feature Reconstruction for Occluded Person Re-Identification.
CoRR, 2021

NTIRE 2021 Challenge on Perceptual Image Quality Assessment.
CoRR, 2021

Smart Scribbles for Image Mating.
CoRR, 2021

Self-speculation of clinical features based on knowledge distillation for accurate ocular disease classification.
Biomed. Signal Process. Control., 2021

Multi-label ocular disease classification with a dense correlation deep neural network.
Biomed. Signal Process. Control., 2021

Group Shift Pointwise Convolution for Volumetric Medical Image Segmentation.
Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2021 - 24th International Conference, Strasbourg, France, September 27, 2021

A Novel Hybrid Convolutional Neural Network for Accurate Organ Segmentation in 3D Head and Neck CT Images.
Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2021 - 24th International Conference, Strasbourg, France, September 27, 2021

Collaborative Multi-View Convolutions With Gating For Accurate And Fast Volumetric Medical Image Segmentation.
Proceedings of the 18th IEEE International Symposium on Biomedical Imaging, 2021

Domain Generalization with MixStyle.
Proceedings of the 9th International Conference on Learning Representations, 2021

CT-Net: Channel Tensorization Network for Video Classification.
Proceedings of the 9th International Conference on Learning Representations, 2021

Digging into Uncertainty in Self-supervised Multi-view Stereo.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Tripartite Information Mining and Integration for Image Matting.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

A New Journey from SDRTV to HDRTV.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Refining Pseudo Labels With Clustering Consensus Over Generations for Unsupervised Object Re-Identification.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Temporal Context Aggregation Network for Temporal Action Proposal Refinement.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

ClassSR: A General Framework to Accelerate Super-Resolution Networks by Data Characteristic.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Detecting Human-Object Interaction via Fabricated Compositional Learning.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Affordance Transfer Learning for Human-Object Interaction Detection.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

NTIRE 2021 Challenge on Perceptual Image Quality Assessment.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2021

HDRUNet: Single Image HDR Reconstruction With Denoising and Dequantization.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2021

Toward Interactive Modulation for Photo-Realistic Image Restoration.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2021

Learning Geometry-Disentangled Representation for Complementary Understanding of 3D Object Point Cloud.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

Investigate Indistinguishable Points in Semantic Segmentation of 3D Point Cloud.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

Self-supervised Multi-view Stereo via Effective Co-Segmentation and Data-Augmentation.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

BSN++: Complementary Boundary Regressor with Scale-Balanced Relation Modeling for Temporal Action Proposal Generation.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

PC-HMR: Pose Calibration for 3D Human Mesh Recovery from 2D Images/Videos.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

MP-Mono: Monocular 3D Detection Using Multiple Priors for Autonomous Driving.
Proceedings of the International Conference on 3D Vision, 2021

2020
FeatherCNN: Fast Inference Computation with TensorGEMM on ARM Architectures.
IEEE Trans. Parallel Distributed Syst., 2020

Region Attention Networks for Pose and Occlusion Robust Facial Expression Recognition.
IEEE Trans. Image Process., 2020

Progressive Object Transfer Detection.
IEEE Trans. Image Process., 2020

DID: Disentangling-Imprinting-Distilling for Continuous Low-Shot Detection.
IEEE Trans. Image Process., 2020

Learning label correlations for multi-label image recognition with graph networks.
Pattern Recognit. Lett., 2020

Development and clinical deployment of a smartphone-based visual field deep learning system for glaucoma detection.
npj Digit. Medicine, 2020

Finding hard faces with better proposals and classifier.
Mach. Vis. Appl., 2020

Cascade multi-head attention networks for action recognition.
Comput. Vis. Image Underst., 2020

Product image recognition with guidance learning and noisy supervision.
Comput. Vis. Image Underst., 2020

Collaborative Distillation in the Parameter and Spectrum Domains for Video Action Recognition.
CoRR, 2020

Exploring Multi-Scale Feature Propagation and Communication for Image Super Resolution.
CoRR, 2020

Refined Gate: A Simple and Effective Gating Mechanism for Recurrent Units.
CoRR, 2020

A Comprehensive Study on Temporal Modeling for Online Action Detection.
CoRR, 2020

Multi-scale Information Assembly for Image Matting.
Comput. Graph. Forum, 2020

SIAT-3DFE: A High-Resolution 3D Facial Expression Dataset.
IEEE Access, 2020

Dense Correlation Network for Automated Multi-Label Ocular Disease Detection with Paired Color Fundus Photographs.
Proceedings of the 17th IEEE International Symposium on Biomedical Imaging, 2020

Classification of Ocular Diseases Employing Attention-Based Unilateral and Bilateral Feature Weighting and Fusion.
Proceedings of the 17th IEEE International Symposium on Biomedical Imaging, 2020

Learning Discriminative Representation For Facial Expression Recognition From Uncertainties.
Proceedings of the IEEE International Conference on Image Processing, 2020

Efficient Image Super-Resolution Using Pixel Attention.
Proceedings of the Computer Vision - ECCV 2020 Workshops, 2020

RBF-Softmax: Learning Deep Representative Prototypes with Radial Basis Function Softmax.
Proceedings of the Computer Vision - ECCV 2020, 2020


Attention-Driven Dynamic Graph Convolutional Network for Multi-label Image Recognition.
Proceedings of the Computer Vision - ECCV 2020, 2020

MEAD: A Large-Scale Audio-Visual Dataset for Emotional Talking-Face Generation.
Proceedings of the Computer Vision - ECCV 2020, 2020

AIM 2020 Challenge on Video Temporal Super-Resolution.
Proceedings of the Computer Vision - ECCV 2020 Workshops, 2020

Suppressing Mislabeled Data via Grouping and Self-attention.
Proceedings of the Computer Vision - ECCV 2020, 2020

Enhanced Quadratic Video Interpolation.
Proceedings of the Computer Vision - ECCV 2020 Workshops, 2020

Learning to Predict Context-Adaptive Convolution for Semantic Segmentation.
Proceedings of the Computer Vision - ECCV 2020, 2020

Visual Compositional Learning for Human-Object Interaction Detection.
Proceedings of the Computer Vision - ECCV 2020, 2020

Conditional Sequential Modulation for Efficient Global Image Retouching.
Proceedings of the Computer Vision - ECCV 2020, 2020

Interactive Multi-dimension Modulation with Dynamic Controllable Residual Learning for Image Restoration.
Proceedings of the Computer Vision - ECCV 2020, 2020

Mining Inter-Video Proposal Relations for Video Object Detection.
Proceedings of the Computer Vision - ECCV 2020, 2020

COCAS: A Large-Scale Clothes Changing Person Dataset for Re-Identification.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Suppressing Uncertainties for Large-Scale Facial Expression Recognition.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Fast Texture Synthesis via Pseudo Optimizer.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Attention-Guided Hierarchical Structure Aggregation for Image Matting.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

SmallBigNet: Integrating Core and Contextual Views for Video Classification.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Multiple Transfer Learning and Multi-label Balanced Training Strategies for Facial AU Detection In the Wild.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Adaptive Dilated Network With Self-Correction Supervision for Counting.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

aDMSCN: A Novel Perspective for User Intent Prediction in Customer Service Bots.
Proceedings of the CIKM '20: The 29th ACM International Conference on Information and Knowledge Management, 2020

Learning Attentive Pairwise Interaction for Fine-Grained Classification.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

Context-Transformer: Tackling Object Confusion for Few-Shot Detection.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

Geometry Sharing Network for 3D Point Cloud Classification and Segmentation.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

Pose-Assisted Multi-Camera Collaboration for Active Object Tracking.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

Dynamic Sampling Network for Semantic Segmentation.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

FD-GAN: Generative Adversarial Networks with Fusion-Discriminator for Single Image Dehazing.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019
Mutual Component Convolutional Neural Networks for Heterogeneous Face Recognition.
IEEE Trans. Image Process., 2019

A Literature Review: Geometric Methods and Their Applications in Human-Related Analysis.
Sensors, 2019

Dual-supervised attention network for deep cross-modal hashing.
Pattern Recognit. Lett., 2019

Temporal Segment Networks for Action Recognition in Videos.
IEEE Trans. Pattern Anal. Mach. Intell., 2019

DeepDeblur: text image recovery from blur to sharp.
Multim. Tools Appl., 2019

Pedestrian detection with unsupervised multispectral feature learning using deep neural networks.
Inf. Fusion, 2019

A Comprehensive Study on Center Loss for Deep Face Recognition.
Int. J. Comput. Vis., 2019

Multi-Dimension Modulation for Image Restoration with Dynamic Controllable Residual Learning.
CoRR, 2019

Learning Category Correlations for Multi-label Image Recognition with Graph Networks.
CoRR, 2019

Product Image Recognition with Guidance Learning and Noisy Supervision.
CoRR, 2019

Correction to: Automatic differentiation of Glaucoma visual field from non-glaucoma visual field using deep convolutional neural network.
BMC Medical Imaging, 2019

Robust Text Line Detection in Equipment Nameplate Images.
Proceedings of the 2019 IEEE International Conference on Robotics and Biomimetics, 2019

The Equipment Nameplate Dataset for Scene Text Detection and Recognition<sup>∗</sup>.
Proceedings of the 2019 IEEE International Conference on Robotics and Biomimetics, 2019

Orientation Robust Scene Text Recognition in Natural Scene.
Proceedings of the 2019 IEEE International Conference on Robotics and Biomimetics, 2019

AnoPCN: Video Anomaly Detection via Deep Predictive Coding Network.
Proceedings of the 27th ACM International Conference on Multimedia, 2019

Intelligent Glaucoma Diagnosis Via Active Learning And Adversarial Data Augmentation.
Proceedings of the 16th IEEE International Symposium on Biomedical Imaging, 2019

Prostate Segmentation using 2D Bridged U-net.
Proceedings of the International Joint Conference on Neural Networks, 2019

Exploring Emotion Features and Fusion Strategies for Audio-Video Emotion Recognition.
Proceedings of the International Conference on Multimodal Interaction, 2019

Bootstrap Model Ensemble and Rank Loss for Engagement Intensity Regression.
Proceedings of the International Conference on Multimodal Interaction, 2019

Exploring Regularizations with Face, Body and Image Cues for Group Cohesion Prediction.
Proceedings of the International Conference on Multimodal Interaction, 2019

Visual-Textual Sentiment Analysis in Product Reviews.
Proceedings of the 2019 IEEE International Conference on Image Processing, 2019

Frame Attention Networks for Facial Expression Recognition in Videos.
Proceedings of the 2019 IEEE International Conference on Image Processing, 2019

RankSRGAN: Generative Adversarial Networks With Ranker for Image Super-Resolution.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

DF2Net: A Dense-Fine-Finer Network for Detailed 3D Face Reconstruction.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Dynamic Multi-Scale Filters for Semantic Segmentation.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

AdaCos: Adaptively Scaling Cosine Logits for Effectively Learning Deep Face Representations.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

MetaCleaner: Learning to Hallucinate Clean Representations for Noisy-Labeled Visual Recognition.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

P2SGrad: Refined Gradients for Optimizing Deep Face Models.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

PA3D: Pose-Action 3D Machine for Video Recognition.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Adaptive Pyramid Context Network for Semantic Segmentation.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Modulating Image Restoration With Continual Levels via Adaptive Feature Modification Layers.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Suppressing Model Overfitting for Image Super-Resolution Networks.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2019


Residual Compensation Networks for Heterogeneous Face Recognition.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018
Real-Time Action Recognition With Deeply Transferred Motion Vector CNNs.
IEEE Trans. Image Process., 2018

Recurrent Spatial-Temporal Attention Network for Action Recognition in Videos.
IEEE Trans. Image Process., 2018

Deep embedding convolutional neural network for synthesizing CT image from T1-Weighted MR image.
Medical Image Anal., 2018

Transferring Deep Object and Scene Representations for Event Recognition in Still Images.
Int. J. Comput. Vis., 2018

ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks.
CoRR, 2018

W-net: Bridged U-net for 2D Medical Image Segmentation.
CoRR, 2018

Structured Triplet Learning with POS-tag Guided Attention for Visual Question Answering.
CoRR, 2018

Deep Reinforcement Learning for Unsupervised Video Summarization with Diversity-Representativeness Reward.
CoRR, 2018

Automatic differentiation of Glaucoma visual field from non-glaucoma visual filed using deep convolutional neural network.
BMC Medical Imaging, 2018

Structured Triplet Learning with POS-Tag Guided Attention for Visual Question Answering.
Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision, 2018

WildFish: A Large Benchmark for Fish Recognition in the Wild.
Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018

StripNet: Towards Topology Consistent Strip Structure Segmentation.
Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018

Visual Field Based Automatic Diagnosis of Glaucoma Using Deep Convolutional Neural Network.
Proceedings of the Computational Pathology and Ophthalmic Medical Image Analysis, 2018

A Multi-task Learning Approach for Image Captioning.
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, 2018

Deep Recurrent Multi-instance Learning with Spatio-temporal Features for Engagement Intensity Prediction.
Proceedings of the 2018 on International Conference on Multimodal Interaction, 2018

Cascade Attention Networks For Group Emotion Recognition with Face, Body and Image Cues.
Proceedings of the 2018 on International Conference on Multimodal Interaction, 2018

Super-Identity Convolutional Neural Network for Face Hallucination.
Proceedings of the Computer Vision - ECCV 2018, 2018

SpiderCNN: Deep Learning on Point Sets with Parameterized Convolutional Filters.
Proceedings of the Computer Vision - ECCV 2018, 2018

ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks.
Proceedings of the Computer Vision - ECCV 2018 Workshops, 2018

Find and Focus: Retrieve and Localize Video Events with Natural Language Queries.
Proceedings of the Computer Vision - ECCV 2018, 2018


Temporal Hallucinating for Action Recognition With Few Still Images.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

FOTS: Fast Oriented Text Spotting With a Unified Network.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

An End-to-End TextSpotter With Explicit Alignment and Attention.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

RDS-Denoiser: a Detail-preserving Convolutional Neural Network for Image Denoising.
Proceedings of the IEEE International Conference on Cyborg and Bionic Systems, 2018

Boosting up Scene Text Detectors with Guided CNN.
Proceedings of the British Machine Vision Conference 2018, 2018

Deep Reinforcement Learning for Unsupervised Video Summarization With Diversity-Representativeness Reward.
Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

LSTD: A Low-Shot Transfer Detector for Object Detection.
Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

2017
Weakly Supervised PatchNets: Describing and Aggregating Local Patches for Scene Recognition.
IEEE Trans. Image Process., 2017

Knowledge Guided Disambiguation for Large-Scale Scene Classification With Multi-Resolution CNNs.
IEEE Trans. Image Process., 2017

Locally Supervised Deep Hybrid Model for Scene Recognition.
IEEE Trans. Image Process., 2017

Improving scale invariant feature transform with local color contrastive descriptor for image classification.
J. Electronic Imaging, 2017

A robust coherent point drift approach based on rotation invariant shape context.
Neurocomputing, 2017

Deep auto-context convolutional neural networks for standard-dose PET image estimation from low-dose PET/MRI.
Neurocomputing, 2017

Learning multiple local binary descriptors for image matching.
Neurocomputing, 2017

Deep Embedding Convolutional Neural Network for Synthesizing CT Image from T1-Weighted MR Image.
CoRR, 2017

Group emotion recognition with individual facial emotion CNNs and global image based CNNs.
Proceedings of the 19th ACM International Conference on Multimodal Interaction, 2017

Depth driven people counting using deep region proposal network.
Proceedings of the IEEE International Conference on Information and Automation, 2017

Detecting Faces Using Inside Cascaded Contextual CNN.
Proceedings of the IEEE International Conference on Computer Vision, 2017

Range Loss for Deep Face Recognition with Long-Tailed Training Data.
Proceedings of the IEEE International Conference on Computer Vision, 2017

Single Shot Text Detector with Regional Attention.
Proceedings of the IEEE International Conference on Computer Vision, 2017

RPAN: An End-to-End Recurrent Pose-Attention Network for Action Recognition in Videos.
Proceedings of the IEEE International Conference on Computer Vision, 2017


Marine Animal Detection and Recognition with Advanced Deep Learning Models.
Proceedings of the Working Notes of CLEF 2017, 2017

Dual Learning for Cross-domain Image Captioning.
Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, 2017

Orientation-Aware Text Proposals Network for Scene Text Detection.
Proceedings of the Biometric Recognition - 12th Chinese Conference, 2017

Sparse Deep Transfer Learning for Convolutional Neural Network.
Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 2017

2016
Bridging Music and Image via Cross-Modal Ranking Analysis.
IEEE Trans. Multim., 2016

Text-Attentional Convolutional Neural Network for Scene Text Detection.
IEEE Trans. Image Process., 2016

Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks.
IEEE Signal Process. Lett., 2016

Adaptive Part-Level Model Knowledge Transfer for Gender Classification.
IEEE Signal Process. Lett., 2016

MoFAP: A Multi-level Representation for Action Recognition.
Int. J. Comput. Vis., 2016

Reference-omitted affine soft correspondence algorithm.
IET Image Process., 2016

Bag of visual words and fusion methods for action recognition: Comprehensive study and good practice.
Comput. Vis. Image Underst., 2016

Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Networks.
CoRR, 2016

Range Loss for Deep Face Recognition with Long-tail.
CoRR, 2016

CUHK & ETHZ & SIAT Submission to ActivityNet Challenge 2016.
CoRR, 2016

Transferring Object-Scene Convolutional Neural Networks for Event Recognition in Still Images.
CoRR, 2016

Accurate Text Localization in Natural Image with Cascaded Convolutional Text Network.
CoRR, 2016

Locally-Supervised Deep Hybrid Model for Scene Recognition.
CoRR, 2016

Shenzhen Institutes of Advanced Technology, CAS, China at TRECVID INS 2016.
Proceedings of the 2016 TREC Video Retrieval Evaluation, 2016

Deep rehabilitation gait learning for modeling knee joints of lower-limb exoskeleton.
Proceedings of the 2016 IEEE International Conference on Robotics and Biomimetics, 2016

Deep face attributes recognition using spatial transformer network.
Proceedings of the IEEE International Conference on Information and Automation, 2016

DeepWriter: A Multi-stream Deep CNN for Text-Independent Writer Identification.
Proceedings of the 15th International Conference on Frontiers in Handwriting Recognition, 2016

Codebook enhancement of vlad representation for visual recognition.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Human action recognition with DeepAction Kernel Gaussian Process.
Proceedings of the 2016 International Conference on Advanced Robotics and Mechatronics, 2016

A Discriminative Feature Learning Approach for Deep Face Recognition.
Proceedings of the Computer Vision - ECCV 2016, 2016

Temporal Segment Networks: Towards Good Practices for Deep Action Recognition.
Proceedings of the Computer Vision - ECCV 2016, 2016

Detecting Text in Natural Image with Connectionist Text Proposal Network.
Proceedings of the Computer Vision - ECCV 2016, 2016

A Key Volume Mining Deep Framework for Action Recognition.
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

Real-Time Action Recognition with Enhanced Motion Vector CNNs.
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

Gender and Smile Classification Using Deep Convolutional Neural Networks.
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2016

Latent Factor Guided Convolutional Neural Networks for Age-Invariant Face Recognition.
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

Actionness Estimation Using Hybrid Fully Convolutional Networks.
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

Reading Scene Text in Deep Convolutional Sequences.
Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2016

2015
Local Multi-Grouped Binary Descriptor With Ring-Based Pooling Configuration and Optimization.
IEEE Trans. Image Process., 2015

On feature-specific parameter learning in conditional random field-based approach for interactive object segmentation.
J. Electronic Imaging, 2015

Towards Good Practices for Very Deep Two-Stream ConvNets.
CoRR, 2015

Object-Scene Convolutional Neural Networks for Event Recognition in Images.
CoRR, 2015

Places205-VGGNet Models for Scene Recognition.
CoRR, 2015

Text-Attentional Convolutional Neural Networks for Scene Text Detection.
CoRR, 2015

Local Color Contrastive Descriptor for Image Classification.
CoRR, 2015

Boosting Optical Character Recognition: A Super-Resolution Approach.
CoRR, 2015

Deep classification of vehicle makers and models: The effectiveness of pre-training and data enhancement.
Proceedings of the 2015 IEEE International Conference on Robotics and Biomimetics, 2015

Road segmentation via iterative deep analysis.
Proceedings of the 2015 IEEE International Conference on Robotics and Biomimetics, 2015

Fast single image dehazing through Edge-Guided Interpolated Filter.
Proceedings of the 14th IAPR International Conference on Machine Vision Applications, 2015

MIL: Music Exploration and Visualization via Lyric and Image.
Proceedings of the 23rd Annual ACM Conference on Multimedia Conference, MM '15, Brisbane, Australia, October 26, 2015

Better Exploiting OS-CNNs for Better Event Recognition in Images.
Proceedings of the 2015 IEEE International Conference on Computer Vision Workshop, 2015

Object-Scene Convolutional Neural Networks for event recognition in images.
Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2015

Exploring Fisher vector and deep networks for action spotting.
Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2015

Action recognition with trajectory-pooled deep-convolutional descriptors.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015

2014
Latent Hierarchical Model of Temporal Structure for Complex Activity Classification.
IEEE Trans. Image Process., 2014

Common Feature Discriminant Analysis for Matching Infrared Face Images to Optical Face Images.
IEEE Trans. Image Process., 2014

Large Margin Dimensionality Reduction for Action Similarity Labeling.
IEEE Signal Process. Lett., 2014

Bayesian salient object detection based on saliency driven clustering.
Signal Process. Image Commun., 2014

Pairwise Rotation Invariant Co-Occurrence Local Binary Pattern.
IEEE Trans. Pattern Anal. Mach. Intell., 2014

Motion boundary based sampling and 3D co-occurrence descriptors for action recognition.
Image Vis. Comput., 2014

Robust visual tracking based on local kernelized representation.
Proceedings of the 2014 IEEE International Conference on Robotics and Biomimetics, 2014

A Joint Evaluation of Dictionary Learning and Feature Encoding for Action Recognition.
Proceedings of the 22nd International Conference on Pattern Recognition, 2014

Saliency detection via foreground rendering and background exclusion.
Proceedings of the 2014 IEEE International Conference on Image Processing, 2014

Saliency driven clustering for salient object detection.
Proceedings of the IEEE International Conference on Acoustics, 2014

Saliency detection based on extended boundary prior with foci of attention.
Proceedings of the IEEE International Conference on Acoustics, 2014

Video Action Detection with Relational Dynamic-Poselets.
Proceedings of the Computer Vision - ECCV 2014, 2014

Action Recognition with Stacked Fisher Vectors.
Proceedings of the Computer Vision - ECCV 2014, 2014

Boosting VLAD with Supervised Dictionary Learning and High-Order Statistics.
Proceedings of the Computer Vision - ECCV 2014, 2014

Action and Gesture Temporal Spotting with Super Vector Representation.
Proceedings of the Computer Vision - ECCV 2014 Workshops, 2014

Robust Scene Text Detection with Convolution Neural Network Induced MSER Trees.
Proceedings of the Computer Vision - ECCV 2014, 2014

Multi-view Super Vector for Action Recognition.
Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014

2013
One-class support vector machine-assisted robust tracking.
J. Electronic Imaging, 2013

Unsupervised optimal phoneme segmentation: theory and experimental evaluation.
IET Signal Process., 2013

A Study on Unsupervised Dictionary Learning and Feature Encoding for Action Classification.
CoRR, 2013

Multi-feature canonical correlation analysis for face photo-sketch image retrieval.
Proceedings of the ACM Multimedia Conference, 2013

Salient Object Segmentation Based on Automatic Labeling.
Proceedings of the Neural Information Processing - 20th International Conference, 2013

An active contour model based on multiple boundary measures.
Proceedings of the IEEE International Conference on Image Processing, 2013

Affine SoftAssign with bidirectional distance for point matching.
Proceedings of the IEEE International Conference on Image Processing, 2013

A semantic model for video based face recognition.
Proceedings of the IEEE International Conference on Information and Automation, 2013

LTD: Local Ternary Descriptor for image matching.
Proceedings of the IEEE International Conference on Information and Automation, 2013

Exploring dense trajectory feature and encoding methods for human interaction recognition.
Proceedings of the International Conference on Internet Multimedia Computing and Service, 2013

Mining Motion Atoms and Phrases for Complex Action Recognition.
Proceedings of the IEEE International Conference on Computer Vision, 2013

Motionlets: Mid-level 3D Parts for Human Motion Recognition.
Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013

Exploring Cross-Channel Texture Correlation for Color Texture Classification.
Proceedings of the British Machine Vision Conference, 2013

Multi-scale Joint Encoding of Local Binary Patterns for Texture and Material Classification.
Proceedings of the British Machine Vision Conference, 2013

Exploring Motion Boundary based Sampling and Spatial-Temporal Context Descriptors for Action Recognition.
Proceedings of the British Machine Vision Conference, 2013

2012
Automatic music video generation: cross matching of music and image.
Proceedings of the 20th ACM Multimedia Conference, MM '12, Nara, Japan, October 29, 2012

Cross matching of music and image.
Proceedings of the 20th ACM Multimedia Conference, MM '12, Nara, Japan, October 29, 2012

Voice conversion using Bayesian mixture of Probabilistic Linear Regressions and dynamic kernel features.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012

Bayesian Mixture of Probabilistic Linear Regressions for Voice Conversion.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Learning geodesic CRF model for image segmentation.
Proceedings of the 19th IEEE International Conference on Image Processing, 2012

Person re-identification across multi-camera system based on local descriptors.
Proceedings of the Sixth International Conference on Distributed Smart Cameras, 2012

One-Class SVM assisted accurate tracking.
Proceedings of the Sixth International Conference on Distributed Smart Cameras, 2012

A Comparative Study of Encoding, Pooling and Normalization Methods for Action Recognition.
Proceedings of the Computer Vision - ACCV 2012, 2012

2011
Regularized Maximum Likelihood Linear Regression Adaptation for Computer-Assisted Language Learning Systems.
IEICE Trans. Inf. Syst., 2011

A Study on Bag of Gaussian Model with Application to Voice Conversion.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Gesture Design of Hand-to-Speech Converter Derived from Speech-to-Hand Converter Based on Probabilistic Integration Model.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Knowledge-Based Segmentation of Spine and Ribs from Bone Scintigraphy.
Proceedings of the Neural Information Processing - 18th International Conference, 2011

Adaptive Region Growing Based on Boundary Measures.
Proceedings of the Neural Information Processing - 18th International Conference, 2011

Adaptive Detection of Hotspots in Thoracic Spine from Bone Scintigraphy.
Proceedings of the Neural Information Processing - 18th International Conference, 2011


Structure-constrained distribution matching using quadratic programming and its application to pronunciation evaluation.
Proceedings of the First Asian Conference on Pattern Recognition, 2011

2010
A study on invariance of f-divergence and its application to speech recognition.
IEEE Trans. Signal Process., 2010

Speech Structure and Its Application to Robust Speech Processing.
New Gener. Comput., 2010

Face recognition based on gradient gabor feature and Efficient Kernel Fisher analysis.
Neural Comput. Appl., 2010

Dialect-based speaker classification using speaker-invariant dialect features.
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010

Integration of multilayer regression analysis with structure-based pronunciation assessment.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Regularized-MLLR speaker adaptation for computer-assisted language learning system.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

HMM-based sequence-to-frame mapping for voice conversion.
Proceedings of the IEEE International Conference on Acoustics, 2010

2009
A Theory of Phase Singularities for Image Representation and its Applications to Object Tracking and Image Matching.
IEEE Trans. Image Process., 2009

Optimal event search using a structural cost function - improvement of structure to speech conversion.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

On invariant structural representation for speech recognition: theoretical validation and experimental improvement.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Structural analysis of dialects, sub-dialects and sub-sub-dialects of Chinese.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Analysis and utilization of MLLR speaker adaptation technique for learners' pronunciation evaluation.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Speech generation from hand gestures based on space mapping.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Affine invariant features and their application to speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2009

Mixture of Probabilistic Linear Regressions: A unified view of GMM-based mapping techiques.
Proceedings of the IEEE International Conference on Acoustics, 2009

Free hand sketch understanding using SVMs-chain modeling for spatial and temporal patterns.
Proceedings of the first ACM/SIGEVO Summit on Genetic and Evolutionary Computation, 2009

A study on Hidden Structural Model and its application to labeling sequences.
Proceedings of the 2009 IEEE Workshop on Automatic Speech Recognition & Understanding, 2009

2008
f-divergence is a generalized invariant measure between distributions.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Metric learning for unsupervised phoneme segmentation.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Face recognition based on Gradient Gabor feature.
Proceedings of the International Conference on Image Processing, 2008

Phase singularities for image representation and matching.
Proceedings of the IEEE International Conference on Acoustics, 2008

Unsupervised optimal phoneme segmentation: Objectives, algorithm and comparisons.
Proceedings of the IEEE International Conference on Acoustics, 2008

2007
Optimal Euler Circuit of Maximum Contiguous Cost.
IEICE Trans. Fundam. Electron. Commun. Comput. Sci., 2007

Offline Signature Verification Using Online Handwriting Registration.
Proceedings of the 2007 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2007), 2007

Random discriminant structure analysis for automatic recognition of connected vowels.
Proceedings of the IEEE Workshop on Automatic Speech Recognition & Understanding, 2007

2006
A Framework Toward Restoration of Writing Order from Single-Stroked Handwriting Image.
IEEE Trans. Pattern Anal. Mach. Intell., 2006

Recover Writing Trajectory from Multiple Stroked Image Using Bidirectional Dynamic Search.
Proceedings of the 18th International Conference on Pattern Recognition (ICPR 2006), 2006

Affine Invariant Dynamic Time Warping and its Application to Online Rotated Handwriting Recognition.
Proceedings of the 18th International Conference on Pattern Recognition (ICPR 2006), 2006

Recovering Drawing Order from Offline Handwritten Image Using Direction Context and Optimal Euler Path.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

2005
A Novel Approach to Recover Writing Order From Single Stroke Offline Handwritten Images.
Proceedings of the Eighth International Conference on Document Analysis and Recognition (ICDAR 2005), 29 August, 2005

2004
Recovering dynamic information from static handwritten images.
Proceedings of the Ninth International Workshop on Frontiers in Handwriting Recognition, 2004

2003
Vehicle Detection on Highway Based on Direction-Fractal Dimension.
Proceedings of the Wavelet Analysis and Its Applications, 2003


  Loading...