Hongsheng Li

Orcid: 0000-0002-2664-7975

Affiliations:
  • Chinese University of Hong Kong, Department of Electrical Engineering, CUHK-SenseTime Joint Laboratory, Hong Kong
  • Lehigh University, Department of Computer Science and Engineering, PA, USA (former)


According to our database1, Hongsheng Li authored at least 378 papers between 2009 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
FeatAug-DETR: Enriching One-to-Many Matching for DETRs With Feature Augmentation.
IEEE Trans. Pattern Anal. Mach. Intell., September, 2024

Unified 3D and 4D Panoptic Segmentation via Dynamic Shifting Networks.
IEEE Trans. Pattern Anal. Mach. Intell., May, 2024

Mimic before Reconstruct: Enhancing Masked Autoencoders with Feature Mimicking.
Int. J. Comput. Vis., May, 2024

CGOF++: Controllable 3D Face Synthesis With Conditional Generative Occupancy Fields.
IEEE Trans. Pattern Anal. Mach. Intell., February, 2024

CLIP-Adapter: Better Vision-Language Models with Feature Adapters.
Int. J. Comput. Vis., February, 2024

Structured Domain Adaptation With Online Relation Regularization for Unsupervised Person Re-ID.
IEEE Trans. Neural Networks Learn. Syst., January, 2024

LIF-Seg: LiDAR and Camera Image Fusion for 3D LiDAR Semantic Segmentation.
IEEE Trans. Multim., 2024

Pyramid Fusion Transformer for Semantic Segmentation.
IEEE Trans. Multim., 2024

Enhancing Vision-Language Model with Unmasked Token Alignment.
Trans. Mach. Learn. Res., 2024

RNNPose: 6-DoF Object Pose Estimation via Recurrent Correspondence Field Estimation and Pose Optimization.
IEEE Trans. Pattern Anal. Mach. Intell., 2024

A Global Depth-Range-Free Multi-View Stereo Transformer Network with Pose Embedding.
CoRR, 2024

PUMA: Empowering Unified MLLM with Multi-granular Visual Generation.
CoRR, 2024

A foundation model for generalizable disease diagnosis in chest X-ray images.
CoRR, 2024

SmartPretrain: Model-Agnostic and Dataset-Agnostic Representation Learning for Motion Prediction.
CoRR, 2024

CoPESD: A Multi-Level Surgical Motion Dataset for Training Large Vision-Language Models to Co-Pilot Endoscopic Submucosal Dissection.
CoRR, 2024

I-Max: Maximize the Resolution Potential of Pre-trained Rectified Flow Transformers with Projected Flow.
CoRR, 2024

Rectified Diffusion: Straightness Is Not Your Need in Rectified Flow.
CoRR, 2024

Towards Realistic UAV Vision-Language Navigation: Platform, Benchmark, and Methodology.
CoRR, 2024

MC-MoE: Mixture Compressor for Mixture-of-Experts LLMs Gains More.
CoRR, 2024

UniAff: A Unified Representation of Affordances for Tool Usage and Articulation with Vision-Language Models.
CoRR, 2024

MedViLaM: A multimodal large language model with advanced generalizability and explainability for medical data understanding and generation.
CoRR, 2024

SKT: Integrating State-Aware Keypoint Trajectories with Vision-Language Models for Robotic Garment Manipulation.
CoRR, 2024

PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions.
CoRR, 2024

MMSearch: Benchmarking the Potential of Large Models as Multi-modal Search Engines.
CoRR, 2024

LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation.
CoRR, 2024

Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining.
CoRR, 2024

AMEX: Android Multi-annotation Expo Dataset for Mobile GUI Agents.
CoRR, 2024

MAVIS: Mathematical Visual Instruction Tuning.
CoRR, 2024

Step-Controlled DPO: Leveraging Stepwise Error for Enhanced Mathematical Reasoning.
CoRR, 2024

Lumina-Next: Making Lumina-T2X Stronger and Faster with Next-DiT.
CoRR, 2024

Exploring the Role of Large Language Models in Prompt Encoding for Diffusion Models.
CoRR, 2024

UniZero: Generalized and Efficient Planning with Scalable Latent World Models.
CoRR, 2024

A3VLM: Actionable Articulation-Aware Vision Language Model.
CoRR, 2024

Learning 1D Causal Visual Representation with De-focus Attention Networks.
CoRR, 2024

Phased Consistency Model.
CoRR, 2024

ReflectionCoder: Learning from Reflection Sequence for Enhanced One-off Code Generation.
CoRR, 2024

TerDiT: Ternary Diffusion Models with Transformers.
CoRR, 2024

Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers.
CoRR, 2024

MoVA: Adapting Mixture of Vision Experts to Multimodal Context.
CoRR, 2024

Urban Architect: Steerable 3D Urban Scene Generation with Layout Prior.
CoRR, 2024

CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching.
CoRR, 2024

Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want.
CoRR, 2024

ECNet: Effective Controllable Text-to-Image Diffusion Models.
CoRR, 2024

Visual CoT: Unleashing Chain-of-Thought Reasoning in Multi-Modal Language Models.
CoRR, 2024

MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
CoRR, 2024

ManipVQA: Injecting Robotic Affordance and Physically Grounded Information into Multi-Modal Large Language Models.
CoRR, 2024

Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures.
CoRR, 2024

MathGenie: Generating Synthetic Data with Question Back-translation for Enhancing Mathematical Reasoning of LLMs.
CoRR, 2024

Measuring Multimodal Mathematical Reasoning with MATH-Vision Dataset.
CoRR, 2024

SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models.
CoRR, 2024

AnimateLCM: Accelerating the Animation of Personalized Diffusion Models and Adapters with Decoupled Consistency Learning.
CoRR, 2024

MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer.
CoRR, 2024

AnimateLCM: Computation-Efficient Personalized Style Video Generation without Personalized Video Data.
Proceedings of the SIGGRAPH Asia 2024 Technical Communications, 2024

Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling.
Proceedings of the ACM SIGGRAPH 2024 Conference Papers, 2024

VeloVox: A Low-Cost and Accurate 4D Object Detector with Single-Frame Point Cloud of Livox LiDAR.
Proceedings of the IEEE International Conference on Robotics and Automation, 2024

SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

LLaMA-Adapter: Efficient Fine-tuning of Large Language Models with Zero-initialized Attention.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Personalize Segment Anything Model with One Shot.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

ADDP: Learning General Representations for Image Recognition and Generation with Alternating Denoising Diffusion Process.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

nuCraft: Crafting High Resolution 3D Semantic Occupancy for Unified 3D Scene Understanding.
Proceedings of the Computer Vision - ECCV 2024, 2024

Three Things We Need to Know About Transferring Stable Diffusion to Visual Dense Prediction Tasks.
Proceedings of the Computer Vision - ECCV 2024, 2024

MATHVERSE: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
Proceedings of the Computer Vision - ECCV 2024, 2024

Deep Reward Supervisions for Tuning Text-to-Image Diffusion Models.
Proceedings of the Computer Vision - ECCV 2024, 2024

Be-Your-Outpainter: Mastering Video Outpainting Through Input-Specific Adaptation.
Proceedings of the Computer Vision - ECCV 2024, 2024

GiT: Towards Generalist Vision Transformer Through Universal Language Interface.
Proceedings of the Computer Vision - ECCV 2024, 2024

ZoLA: Zero-Shot Creative Long Animation Generation with Short Video Model.
Proceedings of the Computer Vision - ECCV 2024, 2024

Any2Point: Empowering Any-Modality Large Models for Efficient 3D Understanding.
Proceedings of the Computer Vision - ECCV 2024, 2024

Ponymation: Learning Articulated 3D Animal Motions from Unlabeled Online Videos.
Proceedings of the Computer Vision - ECCV 2024, 2024

SPHINX: A Mixer of Weights, Visual Embeddings and Image Scales for Multi-modal Large Language Models.
Proceedings of the Computer Vision - ECCV 2024, 2024

BlinkVision: A Benchmark for Optical Flow, Scene Flow and Point Tracking Estimation Using RGB Frames and Events.
Proceedings of the Computer Vision - ECCV 2024, 2024

FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis.
Proceedings of the Computer Vision - ECCV 2024, 2024

SmartRefine: A Scenario-Adaptive Refinement Framework for Efficient Motion Prediction.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

LMDrive: Closed-Loop End-to-End Driving with Large Language Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

GLID: Pre-training a Generalist Encoder-Decoder Vision Model.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

DiffInDScene: Diffusion-Based High-Quality 3D Indoor Scene Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Empowering Character-level Text Infilling by Eliminating Sub-Tokens.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023
Teach-DETR: Better Training DETR With Teachers.
IEEE Trans. Pattern Anal. Mach. Intell., December, 2023

Hippocampus segmentation after brain tumor resection via postoperative region synthesis.
BMC Medical Imaging, December, 2023

Predicting cancer outcomes from whole slide images via hybrid supervision learning.
Neurocomputing, November, 2023

A Holistically-Guided Decoder for Deep Representation Learning With Applications to Semantic Segmentation and Object Detection.
IEEE Trans. Pattern Anal. Mach. Intell., October, 2023

UniFormer: Unifying Convolution and Self-Attention for Visual Recognition.
IEEE Trans. Pattern Anal. Mach. Intell., October, 2023

3D Object Detection for Autonomous Driving: A Comprehensive Survey.
Int. J. Comput. Vis., August, 2023

ST3D++: Denoised Self-Training for Unsupervised Domain Adaptation on 3D Object Detection.
IEEE Trans. Pattern Anal. Mach. Intell., May, 2023

Refined probability distribution module for fine-grained visual categorization.
Neurocomputing, 2023

PV-RCNN++: Point-Voxel Feature Set Abstraction With Local Vector Representation for 3D Object Detection.
Int. J. Comput. Vis., 2023

Ponymation: Learning 3D Animal Motions from Unlabeled Online Videos.
CoRR, 2023

A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise.
CoRR, 2023

LMDrive: Closed-Loop End-to-End Driving with Large Language Models.
CoRR, 2023

InstructSeq: Unifying Vision Tasks with Instruction-conditioned Multi-modal Sequence Generation.
CoRR, 2023

ViLaM: A Vision-Language Model with Enhanced Visual Grounding and Generalization Capability.
CoRR, 2023

SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models.
CoRR, 2023

Learning A Multi-Task Transformer Via Unified And Customized Instruction Tuning For Chest Radiograph Interpretation.
CoRR, 2023

Towards Large-scale Masked Face Recognition.
CoRR, 2023

ImageBind-LLM: Multi-modality Instruction Tuning.
CoRR, 2023

Point-Bind & Point-LLM: Aligning Point Cloud with Multi-modality for 3D Understanding, Generation, and Instruction Following.
CoRR, 2023

Tiny LVLM-eHub: Early Multimodal Experiments with Bard.
CoRR, 2023

Meta-Transformer: A Unified Framework for Multimodal Learning.
CoRR, 2023

JourneyDB: A Benchmark for Generative Image Understanding.
CoRR, 2023

Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis.
CoRR, 2023

FlowFormer: A Transformer Architecture and Its Masked Cost Volume Autoencoding for Optical Flow.
CoRR, 2023

Context-TAP: Tracking Any Point Demands Spatial Context Features.
CoRR, 2023

Denoising Diffusion Semantic Segmentation with Mask Prior Modeling.
CoRR, 2023

DiffRoom: Diffusion-based High-Quality 3D Room Reconstruction and Generation with Occupancy Prior.
CoRR, 2023

Voxel2Hemodynamics: An End-to-end Deep Learning Method for Predicting Coronary Artery Hemodynamics.
CoRR, 2023

Gen-L-Video: Multi-Text to Long Video Generation via Temporal Co-Denoising.
CoRR, 2023

Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model.
CoRR, 2023

Segmentation and Vascular Vectorization for Coronary Artery by Geometry-based Cascaded Neural Network.
CoRR, 2023

Personalize Segment Anything Model with One Shot.
CoRR, 2023

LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model.
CoRR, 2023

Perception Imitation: Towards Synthesis-free Simulator for Autonomous Vehicles.
CoRR, 2023

LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention.
CoRR, 2023

Better Aligning Text-to-Image Models with Human Preference.
CoRR, 2023

Towards Better 3D Knowledge Transfer via Masked Image Modeling for Multi-view 3D Understanding.
CoRR, 2023

Parameter is Not All You Need: Starting from Non-Parametric Networks for 3D Point Cloud Analysis.
CoRR, 2023

Mimic before Reconstruct: Enhancing Masked Autoencoders with Feature Mimicking.
CoRR, 2023

Geometry-Based End-to-End Segmentation of Coronary Artery in Computed Tomography Angiography.
Proceedings of the Trustworthy Machine Learning for Healthcare, 2023

Voxel2Hemodynamics: An End-to-End Deep Learning Method for Predicting Coronary Artery Hemodynamics.
Proceedings of the Statistical Atlases and Computational Models of the Heart. Regular and CMRxRecon Challenge Papers, 2023

A Unified Conditional Framework for Diffusion-based Image Restoration.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

JourneyDB: A Benchmark for Generative Image Understanding.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

LightZero: A Unified Benchmark for Monte Carlo Tree Search in General Sequential Decision Scenarios.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Context-PIPs: Persistent Independent Particles Demands Context Features.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

SUG: Single-dataset Unified Generalization for 3D Point Cloud Classification.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

BlinkFlow: A Dataset to Push the Limits of Event-Based Optical Flow Estimation.
IROS, 2023

Retrieving-to-Answer: Zero-Shot Video Question Answering with Frozen Large Language Models.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Temporal Enhanced Training of Multi-view 3D Object Detector via Historical Object Prediction.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

SparseMAE: Sparse Training Meets Masked Autoencoders.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Decoupled DETR: Spatially Disentangling Localization and Classification for Improved End-to-End Object Detection.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

MonoDETR: Depth-guided Transformer for Monocular 3D Object Detection.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

NDC-Scene: Boost Monocular 3D Semantic Scene Completion in Normalized Device Coordinates Space.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Human Preference Score: Better Aligning Text-to-image Models with Human Preference.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

VideoFlow: Exploiting Temporal Cues for Multi-frame Optical Flow Estimation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

DetZero: Rethinking Offboard 3D Object Detection with Long-term Sequential Point Clouds.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Urban Radiance Field Representation with Deformable Neural Mesh Primitives.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

GeoMIM: Towards Better 3D Knowledge Transfer via Masked Image Modeling for Multi-view 3D Understanding.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Simulating Fluids in Real-World Still Images.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Omnidirectional Information Gathering for Knowledge Transfer-based Audio-Visual Navigation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

TrajectoryFormer: 3D Object Tracking Transformer with Predictive Trajectory Hypotheses.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

ConQueR: Query Contrast Voxel-DETR for 3D Object Detection.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Improving Weakly Supervised Temporal Action Localization by Bridging Train-Test Gap in Pseudo Labels.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Starting from Non-Parametric Networks for 3D Point Cloud Analysis.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Learning 3D Representations from 2D Pre-Trained Models via Image-to-Point Masked Autoencoders.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Prompt, Generate, Then Cache: Cascade of Foundation Models Makes Strong Few-Shot Learners.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

CORA: Adapting CLIP for Open-Vocabulary Detection with Region Prompting and Anchor Pre-Matching.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

FlowFormer++: Masked Cost Volume Autoencoding for Pretraining Optical Flow Estimation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

ReasonNet: End-to-End Driving with Temporal and Global Reasoning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

PATS: Patch Area Transportation with Subdivision for Local Feature Matching.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

MixMAE: Mixed and Masked Autoencoder for Efficient Pretraining of Hierarchical Vision Transformers.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

A Simple Baseline for Video Restoration with Grouped Spatial-Temporal Shift.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Adaptive Zone-aware Hierarchical Planner for Vision-Language Navigation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
NeuralMarker: A Framework for Learning General Marker Correspondence.
ACM Trans. Graph., 2022

Multi-Modality Self-Distillation for Weakly Supervised Temporal Action Localization.
IEEE Trans. Image Process., 2022

Robust Self-Supervised LiDAR Odometry Via Representative Structure Discovery and 3D Inherent Error Modeling.
IEEE Robotics Autom. Lett., 2022

Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR-Based Perception.
IEEE Trans. Pattern Anal. Mach. Intell., 2022

SymReg-GAN: Symmetric Image Registration With Generative Adversarial Networks.
IEEE Trans. Pattern Anal. Mach. Intell., 2022

DigestPath: A benchmark dataset with challenge review for the pathological detection and segmentation of digestive-system.
Medical Image Anal., 2022

Efficient Burst Raw Denoising with Variance Stabilization and Multi-frequency Denoising Network.
Int. J. Comput. Vis., 2022

Collaboration of Pre-trained Models Makes Better Few-shot Learner.
CoRR, 2022

Tip-Adapter: Training-free Adaption of CLIP for Few-shot Classification.
CoRR, 2022

ST-Adapter: Parameter-Efficient Image-to-Video Transfer Learning for Action Recognition.
CoRR, 2022

No Attention is Needed: Grouped Spatial-temporal Shift for Simple and Efficient Video Restorers.
CoRR, 2022

3D Object Detection for Autonomous Driving: A Review and New Outlooks.
CoRR, 2022

MixMIM: Mixed and Masked Image Modeling for Efficient Visual Representation Learning.
CoRR, 2022

ConvMAE: Masked Convolution Meets Masked Autoencoders.
CoRR, 2022

Generalizable Neural Performer: Learning Robust Radiance Fields for Human Novel View Synthesis.
CoRR, 2022

MonoDETR: Depth-aware Transformer for Monocular 3D Object Detection.
CoRR, 2022

LiDAR-based 4D Panoptic Segmentation via Dynamic Shifting Network.
CoRR, 2022

Meta Knowledge Distillation.
CoRR, 2022

Distillation with Contrast is All You Need for Self-Supervised Point Cloud Representation Learning.
CoRR, 2022

UniFormer: Unified Transformer for Efficient Spatiotemporal Representation Learning.
CoRR, 2022

Automatic segmentation of the clinical target volume and organs at risk for rectal cancer radiotherapy using structure-contextual representations based on 3D high-resolution network.
Biomed. Signal Process. Control., 2022

Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEs.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Point-M2AE: Multi-scale Masked Autoencoders for Hierarchical Point Cloud Pre-training.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Controllable 3D Face Synthesis with Conditional Generative Occupancy Fields.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

ST-Adapter: Parameter-Efficient Image-to-Video Transfer Learning.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

MCMAE: Masked Convolution Meets Masked Autoencoders.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

UniFormer: Unified Transformer for Efficient Spatial-Temporal Representation Learning.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Tip-Adapter: Training-Free Adaption of CLIP for Few-Shot Classification.
Proceedings of the Computer Vision - ECCV 2022, 2022

Towards Robust Face Recognition with Comprehensive Search.
Proceedings of the Computer Vision - ECCV 2022, 2022

EdgeViTs: Competing Light-Weight CNNs on Mobile Devices with Vision Transformers.
Proceedings of the Computer Vision - ECCV 2022, 2022

TokenMix: Rethinking Image Mixing for Data Augmentation in Vision Transformers.
Proceedings of the Computer Vision - ECCV 2022, 2022

UniNet: Unified Architecture Search with Convolution, Transformer, and MLP.
Proceedings of the Computer Vision - ECCV 2022, 2022

Frozen CLIP Models are Efficient Video Learners.
Proceedings of the Computer Vision - ECCV 2022, 2022

Learning Degradation Representations for Image Deblurring.
Proceedings of the Computer Vision - ECCV 2022, 2022

FlowFormer: A Transformer Architecture for Optical Flow.
Proceedings of the Computer Vision - ECCV 2022, 2022

MPPNet: Multi-frame Feature Intertwining with Proxy Points for 3D Temporal Object Detection.
Proceedings of the Computer Vision - ECCV 2022, 2022

Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

IDR: Self-Supervised Image Denoising via Iterative Data Refinement.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

PointCLIP: Point Cloud Understanding by CLIP.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

RNNPose: Recurrent 6-DoF Object Pose Refinement with Robust Correspondence Field Estimation and Pose Optimization.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

RBGNet: Ray-based Grouping for 3D Object Detection.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

AutoLoss-Zero: Searching Loss Functions from Scratch for Generic Tasks.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Weakly Supervised Temporal Action Localization via Representative Snippet Knowledge Propagation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Learning a Structured Latent Space for Unsupervised Point Cloud Completion.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Safety-Enhanced Autonomous Driving Using Interpretable Sensor Fusion Transformer.
Proceedings of the Conference on Robot Learning, 2022

Unleashing the Potential of Vision-Language Models for Long-Tailed Visual Recognition.
Proceedings of the 33rd British Machine Vision Conference 2022, 2022

Task Generalizable Spatial and Texture Aware Image Downsizing Network.
Proceedings of the 33rd British Machine Vision Conference 2022, 2022

2021
From Points to Parts: 3D Object Detection From Point Cloud With Part-Aware and Part-Aggregation Network.
IEEE Trans. Pattern Anal. Mach. Intell., 2021

Person Re-Identification With Deep Kronecker-Product Matching and Group-Shuffling Random Walk.
IEEE Trans. Pattern Anal. Mach. Intell., 2021

FocusNetv2: Imbalanced large and small organ segmentation with adversarial shape constraint for head and neck CT images.
Medical Image Anal., 2021

Guest editorial: Deep learning for medical image analysis.
Neurocomputing, 2021

Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks.
CoRR, 2021

A Simple Long-Tailed Recognition Baseline via Vision-Language Model.
CoRR, 2021

Tip-Adapter: Training-free CLIP-Adapter for Better Vision-Language Modeling.
CoRR, 2021

Mixed Supervision Learning for Whole Slide Image Classification.
CoRR, 2021

Scalable Transformers for Neural Machine Translation.
CoRR, 2021

Container: Context Aggregation Network.
CoRR, 2021

FNAS: Uncertainty-Aware Fast Neural Architecture Search.
CoRR, 2021

Self-distillation with Batch Knowledge Ensembling Improves ImageNet Classification.
CoRR, 2021

Decoupled Spatial-Temporal Transformer for Video Inpainting.
CoRR, 2021

LIFE: Lighting Invariant Flow Estimation.
CoRR, 2021

Fixing the Teacher-Student Knowledge Discrepancy in Distillation.
CoRR, 2021

Consensus-Guided Correspondence Denoising.
CoRR, 2021

Efficient Attention: Attention with Linear Complexities.
Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2021

DominoSearch: Find layer-wise fine-grained N: M sparse schemes from dense neural networks.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Container: Context Aggregation Networks.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Categorical Relation-Preserving Contrastive Knowledge Distillation for Medical Image Classification.
Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2021 - 24th International Conference, Strasbourg, France, September 27, 2021

Hybrid Supervision Learning for Pathology Whole Slide Image Classification.
Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2021 - 24th International Conference, Strasbourg, France, September 27, 2021

Learning N: M Fine-grained Structured Sparse Neural Networks From Scratch.
Proceedings of the 9th International Conference on Learning Representations, 2021

Progressive Correspondence Pruning by Consensus Learning.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Rethinking Noise Synthesis and Modeling in Raw Denoising.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Encoder-decoder with Multi-level Attention for 3D Human Shape and Pose Estimation.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Unsupervised Domain Adaptive 3D Detection with Multi-Level Consistency.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Foreground-Action Consistency Network for Weakly Supervised Temporal Action Localization.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

LIGA-Stereo: Learning LiDAR Geometry Aware Representations for Stereo-based 3D Detector.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Fast Convergence of DETR with Spatially Modulated Co-Attention.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR Segmentation.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Refining Pseudo Labels With Clustering Consensus Over Generations for Unsupervised Object Re-Identification.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

ST3D: Self-Training for Unsupervised Domain Adaptation on 3D Object Detection.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Inverting Generative Adversarial Renderer for Face Reconstruction.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Actor-Context-Actor Relation Network for Spatio-Temporal Action Localization.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

VS-Net: Voting With Segmentation for Visual Localization.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

LiDAR-Based Panoptic Segmentation via Dynamic Shifting Network.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Semantic Scene Completion via Integrating Instances and Scene In-the-Loop.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

DivCo: Diverse Conditional Image Synthesis via Contrastive Generative Adversarial Network.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

End-to-End Object Detection with Adaptive Clustering Transformer.
Proceedings of the 32nd British Machine Vision Conference 2021, 2021

REFINE: Prediction Fusion Network for Panoptic Segmentation.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

Dynamic Graph Representation Learning for Video Dialog via Multi-Modal Shuffled Transformers.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

A Unified Multi-Scenario Attacking Network for Visual Object Tracking.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020
HMS-Net: Hierarchical Multi-Scale Sparsity-Invariant Network for Sparse Depth Completion.
IEEE Trans. Image Process., 2020

Guest Editorial: Generative Adversarial Networks for Computer Vision.
Int. J. Comput. Vis., 2020

Towards Overcoming False Positives in Visual Relationship Detection.
CoRR, 2020

A Holistically-Guided Decoder for Deep Representation Learning with Applications to Semantic Segmentation and Object Detection.
CoRR, 2020

End-to-End Object Detection with Adaptive Clustering Transformer.
CoRR, 2020

PV-RCNN: The Top-Performing LiDAR-only Solutions for 3D Detection / 3D Tracking / Domain Adaptation of Waymo Open Dataset Challenges.
CoRR, 2020

Cylinder3D: An Effective 3D Framework for Driving-scene LiDAR Semantic Segmentation.
CoRR, 2020

Complementary Boundary Generator with Scale-Invariant Relation Modeling for Temporal Action Localization: Submission to ActivityNet Challenge 2020.
CoRR, 2020

1st place solution for AVA-Kinetics Crossover in AcitivityNet Challenge 2020.
CoRR, 2020

Actor-Context-Actor Relation Network for Spatio-Temporal Action Localization.
CoRR, 2020

Structured Domain Adaptation for Unsupervised Person Re-identification.
CoRR, 2020

MagnifierNet: Towards Semantic Regularization and Fusion for Person Re-identification.
CoRR, 2020

Balanced Meta-Softmax for Long-Tailed Visual Recognition.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Self-paced Contrastive Learning with Hybrid Memory for Domain Adaptive Object Re-ID.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Multi-organ Segmentation via Co-training Weight-Averaged Models from Few-Organ Datasets.
Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2020, 2020

Mutual Mean-Teaching: Pseudo Label Refinery for Unsupervised Domain Adaptation on Person Re-identification.
Proceedings of the 8th International Conference on Learning Representations, 2020

RBF-Softmax: Learning Deep Representative Prototypes with Radial Basis Function Softmax.
Proceedings of the Computer Vision - ECCV 2020, 2020

Open-Edit: Open-Domain Image Manipulation with Open-Vocabulary Instructions.
Proceedings of the Computer Vision - ECCV 2020, 2020

EfficientFCN: Holistically-Guided Decoding for Semantic Segmentation.
Proceedings of the Computer Vision - ECCV 2020, 2020

Learning to Predict Context-Adaptive Convolution for Semantic Segmentation.
Proceedings of the Computer Vision - ECCV 2020, 2020

Self-supervising Fine-Grained Region Similarities for Large-Scale Image Localization.
Proceedings of the Computer Vision - ECCV 2020, 2020

Bi-directional Cross-Modality Feature Propagation with Separation-and-Aggregation Gate for RGB-D Semantic Segmentation.
Proceedings of the Computer Vision - ECCV 2020, 2020

PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

StereoGAN: Bridging Synthetic-to-Real Domain Gap by Joint Optimization of Domain Translation and Stereo Matching.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Robust Superpixel-Guided Attentional Adversarial Attack.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

3D Sketch-Aware Semantic Scene Completion via Semi-Supervised Structure Prior.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

SelfVoxeLO: Self-supervised LiDAR Odometry with Voxel-based Deep Neural Networks.
Proceedings of the 4th Conference on Robot Learning, 2020

MagnifierNet: Towards Semantic Adversary and Fusion for Person Re-identification.
Proceedings of the 31st British Machine Vision Conference 2020, 2020

Monocular 3D Object Detection with Decoupled Structured Polygon Estimation and Height-Guided Depth Estimation.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019
Deep Continuous Conditional Random Fields With Asymmetric Inter-Object Constraints for Online Multi-Object Tracking.
IEEE Trans. Circuits Syst. Video Technol., 2019

StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks.
IEEE Trans. Pattern Anal. Mach. Intell., 2019

Part-A<sup>2</sup> Net: 3D Part-Aware and Aggregation Neural Network for Object Detection from Point Cloud.
CoRR, 2019

A^2-Net: Molecular Structure Estimation from Cryo-EM Density Volumes.
CoRR, 2019

Learning to Predict Layout-to-image Conditional Convolutions for Semantic Image Synthesis.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

FocusNet: Imbalanced Large and Small Organ Segmentation with an End-to-End Deep Neural Network for Head and Neck CT Images.
Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2019, 2019

Signet Ring Cell Detection with a Semi-supervised Learning Framework.
Proceedings of the Information Processing in Medical Imaging, 2019

Generalizing Monocular 3D Human Pose Estimation in the Wild.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshops, 2019

Depth Completion From Sparse LiDAR Data With Depth-Normal Constraints.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Semi-Supervised Monocular 3D Face Reconstruction With End-to-End Shape-Preserved Domain Transfer.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Interpolated Convolutional Networks for 3D Point Cloud Understanding.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Multi-Modality Latent Interaction Network for Visual Question Answering.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

AdaCos: Adaptively Scaling Cosine Logits for Effectively Learning Deep Face Representations.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

P2SGrad: Refined Gradients for Optimizing Deep Face Models.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

PointRCNN: 3D Object Proposal Generation and Detection From Point Cloud.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Improving Referring Expression Grounding With Cross-Modal Attention-Guided Erasing.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Conditional Adversarial Generative Flow for Controllable Image Synthesis.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Group-Wise Correlation Stereo Network.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Dynamic Fusion With Intra- and Inter-Modality Attention Flow for Visual Question Answering.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

A2-Net: Molecular Structure Estimation from Cryo-EM Density Volumes.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

Unsupervised Cross-Spectral Stereo Matching by Learning to Synthesize.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018
T-CNN: Tubelets With Convolutional Neural Networks for Object Detection From Videos.
IEEE Trans. Circuits Syst. Video Technol., 2018

Crafting GBD-Net for Object Detection.
IEEE Trans. Pattern Anal. Mach. Intell., 2018

Jointly Learning Deep Features, Deformable Parts, Occlusion and Classification for Pedestrian Detection.
IEEE Trans. Pattern Anal. Mach. Intell., 2018

Fast iteratively reweighted least squares algorithms for analysis-based sparse reconstruction.
Medical Image Anal., 2018

HMS-Net: Hierarchical Multi-scale Sparsity-invariant Network for Sparse Depth Completion.
CoRR, 2018

Improving Deep Visual Representation for Person Re-identification by Global and Local Image-language Association.
CoRR, 2018

FD-GAN: Pose-guided Feature Distilling GAN for Robust Person Re-identification.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Person Re-identification with Deep Similarity-Guided Graph Neural Network.
Proceedings of the Computer Vision - ECCV 2018, 2018

Show, Tell and Discriminate: Image Captioning by Self-retrieval with Partially Labeled Data.
Proceedings of the Computer Vision - ECCV 2018, 2018

Learning Monocular Depth by Distilling Cross-Domain Stereo Networks.
Proceedings of the Computer Vision - ECCV 2018, 2018

Question-Guided Hybrid Convolution for Visual Question Answering.
Proceedings of the Computer Vision - ECCV 2018, 2018

Improving Deep Visual Representation for Person Re-identification by Global and Local Image-language Association.
Proceedings of the Computer Vision - ECCV 2018, 2018

3D Human Pose Estimation in the Wild by Adversarial Learning.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Eliminating Background-Bias for Robust Person Re-Identification.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

End-to-End Deep Kronecker-Product Matching for Person Re-Identification.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Deep Group-Shuffling Random Walk for Person Re-Identification.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Single View Stereo Matching.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Video Person Re-Identification With Competitive Snippet-Similarity Aggregation and Co-Attentive Snippet Embedding.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Group Consistent Similarity Learning via Deep CRF for Person Re-Identification.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Co-Attending Free-Form Regions and Detections With Multi-Modal Multiplicative Feature Embedding for Visual Question Answering.
Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

Generative Adversarial Frontal View to Bird View Synthesis.
Proceedings of the 2018 International Conference on 3D Vision, 2018

2017
L<sub>0</sub> Regularized Stationary-Time Estimation for Crowd Analysis.
IEEE Trans. Pattern Anal. Mach. Intell., 2017

DeepID-Net: Object Detection with Deformable Part Based Convolutional Neural Networks.
IEEE Trans. Pattern Anal. Mach. Intell., 2017

Statistical Evaluation of No-Reference Image Quality Assessment Metrics for Remote Sensing Images.
ISPRS Int. J. Geo Inf., 2017

Learning Deep Representations for Scene Labeling with Semantic Context Guided Supervision.
CoRR, 2017

Zoom-in-Net: Deep Mining Lesions for Diabetic Retinopathy Detection.
Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2017, 2017

StackGAN: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks.
Proceedings of the IEEE International Conference on Computer Vision, 2017

Learning Feature Pyramids for Human Pose Estimation.
Proceedings of the IEEE International Conference on Computer Vision, 2017

Orientation Invariant Feature Embedding and Spatial Temporal Regularization for Vehicle Re-identification.
Proceedings of the IEEE International Conference on Computer Vision, 2017

Learning Deep Neural Networks for Vehicle Re-ID with Visual-spatio-Temporal Path Proposals.
Proceedings of the IEEE International Conference on Computer Vision, 2017

Identity-Aware Textual-Visual Matching with Latent Co-attention.
Proceedings of the IEEE International Conference on Computer Vision, 2017

Online Multi-object Tracking Using CNN-Based Single Object Tracker with Spatial-Temporal Attention Mechanism.
Proceedings of the IEEE International Conference on Computer Vision, 2017

Learning Spatial Regularization with Image-Level Supervisions for Multi-label Image Classification.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

Person Search with Natural Language Description.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

Object Detection in Videos with Tubelet Proposal Networks.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

2016
Data-Driven Crowd Understanding: A Baseline for a Large-Scale Crowd Dataset.
IEEE Trans. Multim., 2016

Pedestrian Behavior Modeling From Stationary Crowds With Applications to Intelligent Surveillance.
IEEE Trans. Image Process., 2016

Magnetic Resonance Fingerprinting with compressed sensing and distance metric learning.
Neurocomputing, 2016

StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks.
CoRR, 2016

CRF-CNN: Modeling Structured Information in Human Pose Estimation.
Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

Crossing-Line Crowd Counting with Two-Phase Deep Neural Networks.
Proceedings of the Computer Vision - ECCV 2016, 2016

Pedestrian Behavior Understanding and Prediction with Deep Neural Networks.
Proceedings of the Computer Vision - ECCV 2016, 2016

Learnable Histogram: Statistical Context Features for Deep Neural Networks.
Proceedings of the Computer Vision - ECCV 2016, 2016

End-to-End Learning of Deformable Mixture of Parts and Deep Convolutional Neural Networks for Human Pose Estimation.
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

Learning Deep Feature Representations with Domain Guided Dropout for Person Re-identification.
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

Object Detection from Video Tubelets with Convolutional Neural Networks.
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

Structured Feature Learning for Pose Estimation.
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

2015
Silhouette Analysis for Human Action Recognition Based on Supervised Temporal t-SNE and Incremental Learning.
IEEE Trans. Image Process., 2015

Computer-Aided Diagnosis of Mammographic Masses Using Scalable Image Retrieval.
IEEE Trans. Biomed. Eng., 2015

Pedestrian Travel Time Estimation in Crowded Scenes.
Proceedings of the 2015 IEEE International Conference on Computer Vision, 2015

Saliency detection by multi-context deep learning.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015

Cross-scene crowd counting via deep convolutional neural networks.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015

Understanding pedestrian behaviors from stationary crowd groups.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015

DeepID-Net: Deformable deep convolutional neural networks for object detection.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015

2014
Solving a Special Type of Jigsaw Puzzles: Banknote Reconstruction From a Large Number of Fragments.
IEEE Trans. Multim., 2014

Feature Matching with Affine-Function Transformation Models.
IEEE Trans. Pattern Anal. Mach. Intell., 2014

Silhouette analysis for human action recognition based on maximum spatio-temporal dissimilarity embedding.
Mach. Vis. Appl., 2014

Landmark matching based retinal image alignment by enforcing sparsity in correspondence matrix.
Medical Image Anal., 2014

SAR target recognition based on improved joint sparse representation.
EURASIP J. Adv. Signal Process., 2014

DeepID-Net: multi-stage and deformable deep convolutional neural networks for object detection.
CoRR, 2014

Highly Efficient Forward and Backward Propagation of Convolutional Neural Networks for Pixelwise Classification.
CoRR, 2014

Fast Iteratively Reweighted Least Squares Algorithms for Analysis-Based Sparsity Reconstruction.
CoRR, 2014

Preconditioning for Accelerated Iteratively Reweighted Least Squares in Structured Sparsity Reconstruction.
Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014

2013
Object Matching Using a Locally Affine Invariant and Linear Programming Techniques.
IEEE Trans. Pattern Anal. Mach. Intell., 2013

2012
Automatic Image Annotation and Retrieval Using Group Sparsity.
IEEE Trans. Syst. Man Cybern. Part B, 2012

A hierarchical image clustering cosegmentation framework.
Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012

2011
Active Volume Models for Medical Image Segmentation.
IEEE Trans. Medical Imaging, 2011

Approximately Global Optimization for Robust Alignment of Generalized Shapes.
IEEE Trans. Pattern Anal. Mach. Intell., 2011

Composite splitting algorithms for convex optimization.
Comput. Vis. Image Underst., 2011

Extraction and analysis of actin networks based on Open Active Contour models.
Proceedings of the 8th IEEE International Symposium on Biomedical Imaging: From Nano to Macro, 2011

Actin Filament Segmentation Using Dynamic Programming.
Proceedings of the Information Processing in Medical Imaging, 2011

A 3D Laplacian-driven parametric deformable model.
Proceedings of the IEEE International Conference on Computer Vision, 2011

Optimal object matching via convexification and composition.
Proceedings of the IEEE International Conference on Computer Vision, 2011

2010
Actin Filament Segmentation Using Spatiotemporal Active-Surface and Active-Contour Models.
Proceedings of the Medical Image Computing and Computer-Assisted Intervention, 2010

Automatic image annotation using group sparsity.
Proceedings of the Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition, 2010

Object matching with a locally affine-invariant constraint.
Proceedings of the Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition, 2010

2009
Actin Filament Tracking Based on Particle Filters and Stretching Open Active Contour Models.
Proceedings of the Medical Image Computing and Computer-Assisted Intervention, 2009

Automated Actin Filament Segmentation, Tracking and TIP Elongation Measurements Based on Open Active Contour Models.
Proceedings of the 2009 IEEE International Symposium on Biomedical Imaging: From Nano to Macro, Boston, MA, USA, June 28, 2009

Active volume models for 3D medical image segmentation.
Proceedings of the 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009), 2009

Global optimization for alignment of generalized shapes.
Proceedings of the 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009), 2009


  Loading...