Wanli Ouyang

Orcid: 0000-0002-9163-2761

Affiliations:
  • Chinese University of Hong Kong, Electronic Engineering, Hong Kong


According to our database1, Wanli Ouyang authored at least 445 papers between 2009 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
TCFormer: Visual Recognition via Token Clustering Transformer.
IEEE Trans. Pattern Anal. Mach. Intell., December, 2024

MCTformer+: Multi-Class Token Transformer for Weakly Supervised Semantic Segmentation.
IEEE Trans. Pattern Anal. Mach. Intell., December, 2024

Fast-BEV: A Fast and Strong Bird's-Eye View Perception Baseline.
IEEE Trans. Pattern Anal. Mach. Intell., December, 2024

Multidimensional Pruning and Its Extension: A Unified Framework for Model Compression.
IEEE Trans. Neural Networks Learn. Syst., September, 2024

Unsupervised contrastive learning with simple transformation for 3D point cloud data.
Vis. Comput., August, 2024

HVDistill: Transferring Knowledge from Images to Point Clouds via Unsupervised Hybrid-View Distillation.
Int. J. Comput. Vis., July, 2024

Content-Aware Rectified Activation for Zero-Shot Fine-Grained Image Retrieval.
IEEE Trans. Pattern Anal. Mach. Intell., June, 2024

DeNKD: Decoupled Non-Target Knowledge Distillation for Complementing Transformer-Based Unsupervised Domain Adaptation.
IEEE Trans. Circuits Syst. Video Technol., May, 2024

3D Object Detection From Images for Autonomous Driving: A Survey.
IEEE Trans. Pattern Anal. Mach. Intell., May, 2024

Towards Frame Rate Agnostic Multi-object Tracking.
Int. J. Comput. Vis., May, 2024

Editorial: Learning With Fewer Labels in Computer Vision.
IEEE Trans. Pattern Anal. Mach. Intell., March, 2024

Self-Supervised Feature Learning for Appliance Recognition in Nonintrusive Load Monitoring.
IEEE Trans. Ind. Informatics, February, 2024

Transferring Vision-Language Models for Visual Recognition: A Classifier Perspective.
Int. J. Comput. Vis., February, 2024

Accurate Fine-Grained Object Recognition with Structure-Driven Relation Graph Networks.
Int. J. Comput. Vis., January, 2024

Similarity- and Quality-Guided Relation Learning for Joint Detection and Tracking.
IEEE Trans. Multim., 2024

Online Handwritten Chinese Character Recognition Based on 1-D Convolution and Two-Streams Transformers.
IEEE Trans. Multim., 2024

RS-Mamba for Large Remote Sensing Image Dense Prediction.
IEEE Trans. Geosci. Remote. Sens., 2024

MambaDS: Near-Surface Meteorological Field Downscaling With Topography Constrained Selective State-Space Modeling.
IEEE Trans. Geosci. Remote. Sens., 2024

Deriving Accurate Surface Meteorological States at Arbitrary Locations via Observation-Guided Continuous Neural Field Modeling.
IEEE Trans. Geosci. Remote. Sens., 2024

Adaptive pessimism via target Q-value for offline reinforcement learning.
Neural Networks, 2024

Geometry-enhanced pretraining on interatomic potentials.
Nat. Mac. Intell., 2024

DiffPano: Scalable and Consistent Text to Panorama Generation with Spherical Epipolar-Aware Diffusion.
CoRR, 2024

MotionGPT-2: A General-Purpose Motion-Language Model for Motion Generation and Understanding.
CoRR, 2024

Where Am I and What Will I See: An Auto-Regressive Model for Spatial Localization and View Prediction.
CoRR, 2024

WorldSimBench: Towards Video Generation Models as World Simulators.
CoRR, 2024

FiTv2: Scalable and Improved Flexible Vision Transformer for Diffusion Model.
CoRR, 2024

CrystalX: Ultra-Precision Crystal Structure Resolution and Error Correction Using Deep Learning.
CoRR, 2024

A CLIP-Powered Framework for Robust and Generalizable Data Selection.
CoRR, 2024

Depth Any Video with Scalable Synthetic Data.
CoRR, 2024

Two Heads Are Better Than One: A Multi-Agent System Has the Potential to Improve Scientific Idea Generation.
CoRR, 2024

Diffusion Models Need Visual Priors for Image Generation.
CoRR, 2024

MOOSE-Chem: Large Language Models for Rediscovering Unseen Chemistry Scientific Hypotheses.
CoRR, 2024

HiSplat: Hierarchical 3D Gaussian Splatting for Generalizable Sparse-View Reconstruction.
CoRR, 2024

PostCast: Generalizable Postprocessing for Precipitation Nowcasting via Unsupervised Blurriness Modeling.
CoRR, 2024

LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning.
CoRR, 2024

GigaGS: Scaling up Planar-Based 3D Gaussians for Large Scene Surface Reconstruction.
CoRR, 2024

GenAgent: Build Collaborative AI Systems with Automated Workflow Generation - Case Studies on ComfyUI.
CoRR, 2024

NeuRodin: A Two-stage Framework for High-Fidelity Neural Surface Reconstruction.
CoRR, 2024

ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area.
CoRR, 2024

Fast Information Streaming Handler (FisH): A Unified Seismic Neural Network for Single Station Real-Time Earthquake Early Warning.
CoRR, 2024

Point Transformer V3 Extreme: 1st Place Solution for 2024 Waymo Open Dataset Challenge in Semantic Segmentation.
CoRR, 2024

VegeDiff: Latent Diffusion Model for Geospatial Vegetation Forecasting.
CoRR, 2024

TCFormer: Visual Recognition via Token Clustering Transformer.
CoRR, 2024

VEnhancer: Generative Space-Time Enhancement for Video Generation.
CoRR, 2024

Achieving Energetic Superiority Through System-Level Quantum Circuit Simulation.
CoRR, 2024

AFBench: A Large-scale Benchmark for Airfoil Design.
CoRR, 2024

Lumina-Next: Making Lumina-T2X Stronger and Faster with Next-DiT.
CoRR, 2024

Iterative Length-Regularized Direct Preference Optimization: A Case Study on Improving 7B Language Models to GPT-4 Level.
CoRR, 2024

Holistic-Motion2D: Scalable Whole-body Human Motion Generation in 2D Space.
CoRR, 2024

BEACON: Benchmark for Comprehensive RNA Tasks and Language Models.
CoRR, 2024

Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B.
CoRR, 2024

Adapter-X: A Novel General Parameter-Efficient Fine-Tuning Framework for Vision.
CoRR, 2024

FNP: Fourier Neural Processes for Arbitrary-Resolution Data Assimilation.
CoRR, 2024

MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series.
CoRR, 2024

Instruct-ReID++: Towards Universal Purpose Instruction-Guided Person Re-identification.
CoRR, 2024

EMR-Merging: Tuning-Free High-Performance Model Merging.
CoRR, 2024

ORCA: A Global Ocean Emulator for Multi-year to Decadal Predictions.
CoRR, 2024

Dense Connector for MLLMs.
CoRR, 2024

Generalizing Weather Forecast to Fine-grained Temporal Scales via Physics-AI Hybrid Modeling.
CoRR, 2024

DocReLM: Mastering Document Retrieval with Language Model.
CoRR, 2024

Physical formula enhanced multi-task learning for pharmacokinetics prediction.
CoRR, 2024

Taming Stable Diffusion for Text to 360{\deg} Panorama Image Generation.
CoRR, 2024

How Much Data are Enough? Investigating Dataset Requirements for Patch-Based Brain MRI Segmentation Tasks.
CoRR, 2024

RH20T-P: A Primitive-Level Robotic Dataset Towards Composable Generalization Agents.
CoRR, 2024

Agent3D-Zero: An Agent for Zero-shot 3D Understanding.
CoRR, 2024

PoIFusion: Multi-Modal 3D Object Detection via Fusion at Points of Interest.
CoRR, 2024

LOCR: Location-Guided Transformer for Optical Character Recognition.
CoRR, 2024

Auxiliary Tasks Enhanced Dual-affinity Learning for Weakly Supervised Semantic Segmentation.
CoRR, 2024

ConceptMath: A Bilingual Concept-wise Benchmark for Measuring Mathematical Reasoning of Large Language Models.
CoRR, 2024

NeRF-Det++: Incorporating Semantic Cues and Perspective-aware Depth Supervision for Indoor Multi-View 3D Detection.
CoRR, 2024

Self-consistent Validation for Machine Learning Electronic Structure.
CoRR, 2024

Revealing Decurve Flows for Generalized Graph Propagation.
CoRR, 2024

ChemLLM: A Chemical Large Language Model.
CoRR, 2024

Uni3D-LLM: Unifying Point Cloud Perception, Generation and Editing with Large Language Models.
CoRR, 2024

Integration of cognitive tasks into artificial general intelligence test for large models.
CoRR, 2024

Point Cloud Matters: Rethinking the Impact of Different Observation Spaces on Robot Learning.
CoRR, 2024

ExtremeCast: Boosting Extreme Value Prediction for Global Weather Forecast.
CoRR, 2024

A Comprehensive Survey on 3D Content Generation.
CoRR, 2024

FengWu-GHR: Learning the Kilometer-scale Medium-range Global Weather Forecasting.
CoRR, 2024

From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities.
CoRR, 2024

Observation-Guided Meteorological Field Downscaling at Station Scale: A Benchmark and a New Method.
CoRR, 2024

Bilateral Reference for High-Resolution Dichotomous Image Segmentation.
CoRR, 2024

Improving multiple sclerosis lesion segmentation across clinical sites: A federated learning approach with noise-resilient training.
Artif. Intell. Medicine, 2024

Surpassing Sycamore: Achieving Energetic Superiority Through System-Level Circuit Simulation.
Proceedings of the International Conference for High Performance Computing, 2024

An Embarrassingly Simple Approach to Enhance Transformer Performance in Genomic Selection for Crop Breeding.
Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024

Towards a Self-contained Data-driven Global Weather Forecasting Framework.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

FiT: Flexible Vision Transformer for Diffusion Model.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

CasCast: Skillful High-resolution Precipitation Nowcasting via Cascaded Modelling.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

3D Point Cloud Pre-Training with Knowledge Distilled from 2D Images.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2024

Octavius: Mitigating Task Interference in MLLMs via LoRA-MoE.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

LOCR: Location-Guided Transformer for Optical Character Recognition.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

GraphReader: Building Graph-based Agent to Enhance Long-Context Abilities of Large Language Models.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

DetToolChain: A New Prompting Paradigm to Unleash Detection Ability of MLLM.
Proceedings of the Computer Vision - ECCV 2024, 2024

PredBench: Benchmarking Spatio-Temporal Prediction Across Diverse Disciplines.
Proceedings of the Computer Vision - ECCV 2024, 2024

UniDream: Unifying Diffusion Priors for Relightable Text-to-3D Generation.
Proceedings of the Computer Vision - ECCV 2024, 2024

DiffBIR: Toward Blind Image Restoration with Generative Diffusion Prior.
Proceedings of the Computer Vision - ECCV 2024, 2024

GVGEN: Text-to-3D Generation with Volumetric Representation.
Proceedings of the Computer Vision - ECCV 2024, 2024

Point Cloud Pre-Training with Diffusion Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Taming Stable Diffusion for Text to 360° Panorama Image Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

UniPAD: A Universal Pre-Training Paradigm for Autonomous Driving.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

TASeg: Temporal Aggregation Network for LiDAR Semantic Segmentation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Instruct-ReID: A Multi-Purpose Person Re-Identification Task with Instructions.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Point Transformer V3: Simpler, Faster, Stronger.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

Emulated Disalignment: Safety Alignment for Large Language Models May Backfire!
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

ConceptMath: A Bilingual Concept-wise Benchmark for Measuring Mathematical Reasoning of Large Language Models.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

RoleLLM: Benchmarking, Eliciting, and Enhancing Role-Playing Abilities of Large Language Models.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

MT-Bench-101: A Fine-Grained Benchmark for Evaluating Large Language Models in Multi-Turn Dialogues.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

A Perspective of Q-value Estimation on Offline-to-Online Reinforcement Learning.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

MotionGPT: Finetuned LLMs Are General-Purpose Motion Generators.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

Semi-supervised 3D Object Detection with PatchTeacher and PillarMix.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

Boosting Residual Networks with Group Knowledge.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

ContraNovo: A Contrastive Learning Approach to Enhance De Novo Peptide Sequencing.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

Frozen CLIP Transformer Is an Efficient Point Cloud Encoder.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Weakly Supervised Semantic Segmentation via Box-Driven Masking and Filling Rate Shifting.
IEEE Trans. Pattern Anal. Mach. Intell., December, 2023

Semantic-Guided Information Alignment Network for Fine-Grained Image Recognition.
IEEE Trans. Circuits Syst. Video Technol., November, 2023

The Equalization Losses: Gradient-Driven Training for Long-tailed Object Recognition.
IEEE Trans. Pattern Anal. Mach. Intell., November, 2023

TransVG++: End-to-End Visual Grounding With Language Conditioned Vision Transformer.
IEEE Trans. Pattern Anal. Mach. Intell., November, 2023

Automatic Loss Function Search for Adversarial Unsupervised Domain Adaptation.
IEEE Trans. Circuits Syst. Video Technol., October, 2023

Towards Trajectory Forecasting From Detection.
IEEE Trans. Pattern Anal. Mach. Intell., October, 2023

Automatically Predicting Material Properties with Microscopic Images: Polymer Miscibility as an Example.
J. Chem. Inf. Model., October, 2023

Learning class-agnostic masks with cross-task refinement for weakly supervised semantic segmentation.
Neural Comput. Appl., September, 2023

MLST-Former: Multi-Level Spatial-Temporal Transformer for Group Activity Recognition.
IEEE Trans. Circuits Syst. Video Technol., July, 2023

Adversarial learning based intermediate feature refinement for semantic segmentation.
Appl. Intell., June, 2023

SiamSampler: Video-Guided Sampling for Siamese Visual Tracking.
IEEE Trans. Circuits Syst. Video Technol., April, 2023

ZoomNAS: Searching for Whole-Body Human Pose Estimation in the Wild.
IEEE Trans. Pattern Anal. Mach. Intell., April, 2023

Deep Instance Segmentation With Automotive Radar Detection Points.
IEEE Trans. Intell. Veh., January, 2023

OTP-NMS: Toward Optimal Threshold Prediction of NMS for Crowded Pedestrian Detection.
IEEE Trans. Image Process., 2023

A Strip Dilated Convolutional Network for Semantic Segmentation.
Neural Process. Lett., 2023

Merging Vision Transformers from Different Tasks and Domains.
CoRR, 2023

Partial Fine-Tuning: A Successor to Full Fine-Tuning for Vision Transformers.
CoRR, 2023

Efficient Architecture Search via Bi-level Data Pruning.
CoRR, 2023

Towards an End-to-End Artificial Intelligence Driven Global Weather Forecasting System.
CoRR, 2023

FengWu-4DVar: Coupling the Data-driven Weather Forecasting Model with 4D Variational Assimilation.
CoRR, 2023

ResoNet: Robust and Explainable ENSO Forecasts with Hybrid Convolution and Transformer Networks.
CoRR, 2023

Hulk: A Universal Knowledge Translator for Human-Centric Tasks.
CoRR, 2023

GPT4Vis: What Can GPT-4 Do for Zero-shot Visual Recognition?
CoRR, 2023

Octavius: Mitigating Task Interference in MLLMs via MoE.
CoRR, 2023

GUPNet++: Geometry Uncertainty Propagation Network for Monocular 3D Object Detection.
CoRR, 2023

I<sup>2</sup>MD: 3D Action Representation Learning with Inter- and Intra-modal Mutual Distillation.
CoRR, 2023

Masked Pretraining for Multi-Agent Decision Making.
CoRR, 2023

PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm.
CoRR, 2023

Rethinking the BERT-like Pretraining for DNA Sequences.
CoRR, 2023

Beyond One-Preference-for-All: Multi-Objective Direct Preference Optimization for Language Models.
CoRR, 2023

Understanding Masked Autoencoders From a Local Contrastive Perspective.
CoRR, 2023

RoleLLM: Benchmarking, Eliciting, and Enhancing Role-Playing Abilities of Large Language Models.
CoRR, 2023

DiffBIR: Towards Blind Image Restoration with Generative Diffusion Prior.
CoRR, 2023

Experts Weights Averaging: A New General Training Scheme for Vision Transformers.
CoRR, 2023

Meta-Transformer: A Unified Framework for Multimodal Learning.
CoRR, 2023

UniG3D: A Unified 3D Object Generation Dataset.
CoRR, 2023

Retrieve Anyone: A General-purpose Person Re-identification Task with Instructions.
CoRR, 2023

Clothes-Invariant Feature Learning by Causal Intervention for Clothes-Changing Person Re-identification.
CoRR, 2023

Stimulative Training++: Go Beyond The Performance Limits of Residual Networks.
CoRR, 2023

Seeing is not always believing: A Quantitative Study on Human Perception of AI-Generated Images.
CoRR, 2023

FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead.
CoRR, 2023

Automatically Predict Material Properties with Microscopic Image Example Polymer Compatibility.
CoRR, 2023

Saliency Guided Contrastive Learning on Scene Images.
CoRR, 2023

β-DARTS++: Bi-level Regularization for Proxy-robust Differentiable Architecture Search.
CoRR, 2023

UATVR: Uncertainty-Adaptive Text-Video Retrieval.
CoRR, 2023

Ponder: Point Cloud Pre-training via Neural Rendering.
CoRR, 2023

SeCo: Separating Unknown Musical Visual Sounds with Consistency Guidance.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023

Exploiting Visual Context Semantics for Sound Source Localization.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023

LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset, Framework, and Benchmark.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

CluB: Cluster Meets BEV for LiDAR-Based 3D Object Detection.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Learning to Parameterize Visual Attributes for Open-set Fine-grained Retrieval.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Seeing is not always believing: Benchmarking Human and Model Perception of AI-Generated Images.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Fed-CoT: Co-teachers for Federated Semi-supervised MS Lesion Segmentation.
Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2023 Workshops, 2023

Denser is Better:cost distribution super-resolution network for more accurate sub-pixel disparity.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2023

Better Teacher Better Student: Dynamic Prior Knowledge for Knowledge Distillation.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Cycle-consistent Masked AutoEncoder for Unsupervised Domain Generalization.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

NDC-Scene: Boost Monocular 3D Semantic Scene Completion in Normalized Device Coordinates Space.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

What Can Simple Arithmetic Operations Do for Temporal Modeling?
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Masked Motion Predictors are Strong 3D Action Representation Learners.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Towards Fair and Comprehensive Comparisons for Image-Based 3D Object Detection.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Semi-Supervised Semantic Segmentation under Label Noise via Diverse Learning Groups.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Ponder: Point Cloud Pre-training via Neural Rendering.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

CLIP2Point: Transfer CLIP to Point Cloud Classification with Image-Depth Pre-Training.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

STEERER: Resolving Scale Variations for Counting and Localization via Selective Inheritance Learning.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Theoretically Guaranteed Policy Improvement Distilled from Model-Based Planning.
Proceedings of the ECAI 2023 - 26th European Conference on Artificial Intelligence, September 30 - October 4, 2023, Kraków, Poland, 2023

GD-MAE: Generative Decoder for MAE Pre-Training on LiDAR Point Clouds.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

PVT-SSD: Single-Stage 3D Object Detector with Point-Voxel Transformer.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

MM-3DScene: 3D Scene Understanding by Customizing Masked Modeling with Informative-Preserved Reconstruction and Self-Distilled Consistency.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Learning Multi-Modal Class-Specific Tokens for Weakly Supervised Dense Object Localization.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval?
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Open-Set Fine-Grained Retrieval via Prompting Vision-Language Evaluator.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

HumanBench: Towards General Human-Centric Perception with Projector Assisted Pretraining.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Crossing the Gap: Domain Generalization for Image Captioning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

UniHCP: A Unified Model for Human-Centric Perceptions.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Bi-LRFusion: Bi-Directional LiDAR-Radar Fusion for 3D Dynamic Object Detection.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Adaptive Hierarchical SpatioTemporal Network for Traffic Forecasting.
Proceedings of the 19th IEEE International Conference on Automation Science and Engineering, 2023

Revisiting Classifier: Transferring Vision-Language Models for Video Recognition.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

Multi-Scale Control Signal-Aware Transformer for Motion Synthesis without Phase.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

Fine-Grained Retrieval Prompt Tuning.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

ACE: Cooperative Multi-Agent Q-learning with Bidirectional Action-Dependency.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
Action Recognition With Motion Diversification and Dynamic Selection.
IEEE Trans. Image Process., 2022

The Farther the Better: Balanced Stereo Matching via Depth-Based Sampling and Adaptive Feature Refinement.
IEEE Trans. Circuits Syst. Video Technol., 2022

Temporal-Channel Transformer for 3D Lidar-Based Video Object Detection for Autonomous Driving.
IEEE Trans. Circuits Syst. Video Technol., 2022

Social-Aware Pedestrian Trajectory Prediction via States Refinement LSTM.
IEEE Trans. Pattern Anal. Mach. Intell., 2022

Learning 3D Human Shape and Pose From Dense Body Parts.
IEEE Trans. Pattern Anal. Mach. Intell., 2022

Probabilistic Graph Attention Network With Conditional Kernels for Pixel-Wise Prediction.
IEEE Trans. Pattern Anal. Mach. Intell., 2022

Two-Branch Relational Prototypical Network for Weakly Supervised Temporal Action Localization.
IEEE Trans. Pattern Anal. Mach. Intell., 2022

Attribute assisted teacher-critical training strategies for image captioning.
Neurocomputing, 2022

Efficient Joint-Dimensional Search with Solution Space Regularization for Real-Time Semantic Segmentation.
Int. J. Comput. Vis., 2022

Deep-learning-based solution for data deficient satellite image segmentation.
Expert Syst. Appl., 2022

3D Point Cloud Pre-training with Knowledge Distillation from 2D Images.
CoRR, 2022

Frozen CLIP Model is An Efficient Point Cloud Backbone.
CoRR, 2022

3D-QueryIS: A Query-based Framework for 3D Instance Segmentation.
CoRR, 2022

Boosting Semi-Supervised 3D Object Detection with Semi-Sampling.
CoRR, 2022

The Equalization Losses: Gradient-Driven Training for Long-tailed Object Recognition.
CoRR, 2022

ZoomNAS: Searching for Whole-body Human Pose Estimation in the Wild.
CoRR, 2022

An Empirical Study of Pseudo-Labeling for Image-based 3D Object Detection.
CoRR, 2022

Pose for Everything: Towards Category-Agnostic Pose Estimation.
CoRR, 2022

Transferring Textual Knowledge for Visual Recognition.
CoRR, 2022

Better Teacher Better Student: Dynamic Prior Knowledge for Knowledge Distillation.
CoRR, 2022

MS Lesion Segmentation: Revisiting Weighting Mechanisms for Federated Learning.
CoRR, 2022

SeCo: Separating Unknown Musical Visual Sounds with Consistency Guidance.
CoRR, 2022

Trajectory Forecasting from Detection with Uncertainty-Aware Motion Encoding.
CoRR, 2022

Reconstructing Hand-Held Objects from Monocular Video.
Proceedings of the SIGGRAPH Asia 2022 Conference Papers, 2022

Stimulative Training of Residual Networks: A Social Psychology Perspective of Loafing.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Unsupervised Object Detection Pretraining with Joint Object Priors Generation and Detector Learning.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

RePre: Improving Self-Supervised Vision Transformer with Reconstructive Pre-training.
Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, 2022

Pseudo-Labeled Auto-Curriculum Learning for Semi-Supervised Keypoint Localization.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm.
Proceedings of the Tenth International Conference on Learning Representations, 2022

MonoDistill: Learning Spatial Features for Monocular 3D Object Detection.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Domain Invariant Masked Autoencoders for Self-supervised Learning from Multi-domains.
Proceedings of the Computer Vision - ECCV 2022, 2022

Pose for Everything: Towards Category-Agnostic Pose Estimation.
Proceedings of the Computer Vision - ECCV 2022, 2022

NSNet: Non-saliency Suppression Sampler for Efficient Video Recognition.
Proceedings of the Computer Vision - ECCV 2022, 2022

Relative Contrastive Loss for Unsupervised Representation Learning.
Proceedings of the Computer Vision - ECCV 2022, 2022

Unifying Visual Contrastive Learning for Object Recognition from a Graph Perspective.
Proceedings of the Computer Vision - ECCV 2022, 2022

3D Interacting Hand Pose Estimation by Hand De-occlusion and Removal.
Proceedings of the Computer Vision - ECCV 2022, 2022

Fast-MoCo: Boost Momentum-Based Contrastive Learning with Combinatorial Patches.
Proceedings of the Computer Vision - ECCV 2022, 2022

Backbone is All Your Need: A Simplified Architecture for Visual Object Tracking.
Proceedings of the Computer Vision - ECCV 2022, 2022

Not All Tokens Are Equal: Human-centric Visual Analysis via Token Clustering Transformer.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

β-DARTS: Beta-Decay Regularization for Differentiable Architecture Search.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Multi-class Token Transformer for Weakly Supervised Semantic Segmentation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Revisiting the Transferability of Supervised Pretraining: an MLP Perspective.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Unsupervised Learning of Accurate Siamese Tracking.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

DR.VIC: Decomposition and Reasoning for Video Individual Counting.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

SepFusion: Finding Optimal Fusion Structures for Visual Sound Separation.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

Category-Specific Nuance Exploration Network for Fine-Grained Object Retrieval.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021
Dense Video Captioning Using Graph-Based Sentence Summarization.
IEEE Trans. Multim., 2021

Progressive Modality Cooperation for Multi-Modality Domain Adaptation.
IEEE Trans. Image Process., 2021

AutoPedestrian: An Automatic Data Augmentation and Loss Function Search Scheme for Pedestrian Detection.
IEEE Trans. Image Process., 2021

Improving Weakly Supervised Temporal Action Localization by Exploiting Multi-Resolution Information in Temporal Domain.
IEEE Trans. Image Process., 2021

PCG-TAL: Progressive Cross-Granularity Cooperation for Temporal Action Localization.
IEEE Trans. Image Process., 2021

Block Proposal Neural Architecture Search.
IEEE Trans. Image Process., 2021

Few-Shot Human-Object Interaction Recognition With Semantic-Guided Attentive Prototypes Network.
IEEE Trans. Image Process., 2021

Modeling Sub-Actions for Weakly Supervised Temporal Action Localization.
IEEE Trans. Image Process., 2021

Model Compression Using Progressive Channel Pruning.
IEEE Trans. Circuits Syst. Video Technol., 2021

Self-Paced Collaborative and Adversarial Network for Unsupervised Domain Adaptation.
IEEE Trans. Pattern Anal. Mach. Intell., 2021

Progressive Cross-Stream Cooperation in Spatial and Temporal Domain for Action Localization.
IEEE Trans. Pattern Anal. Mach. Intell., 2021

An End-to-End Learning Framework for Video Compression.
IEEE Trans. Pattern Anal. Mach. Intell., 2021

The theoretical research of generative adversarial networks: an overview.
Neurocomputing, 2021

Towards Balanced Learning for Instance Recognition.
Int. J. Comput. Vis., 2021

A Shape Transformation-based Dataset Augmentation Framework for Pedestrian Detection.
Int. J. Comput. Vis., 2021

Unsupervised Representation Learning for 3D Point Cloud Data.
CoRR, 2021

PSViT: Better Vision Transformer via Token Pooling and Attention Sharing.
CoRR, 2021

Learning Geometry-Guided Depth via Projective Modeling for Monocular 3D Object Detection.
CoRR, 2021

3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop.
CoRR, 2021

Real-Time Visual Object Tracking via Few-Shot Learning.
CoRR, 2021

Higher Performance Visual Tracking with Dual-Modal Localization.
CoRR, 2021

A Continuous Mapping For Augmentation Design.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

AutoSampling: Search for Effective Data Sampling Schedules.
Proceedings of the 38th International Conference on Machine Learning, 2021

PyMAF: 3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Leveraging Auxiliary Tasks with Affinity Learning for Weakly Supervised Semantic Segmentation.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Graph-Based 3D Multi-Person Pose Estimation Using Multi-View Images.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Aggregation with Feature Detection.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Once Quantization-Aware Training: High Performance Extremely Low-bit Architecture Search.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Geometry Uncertainty Projection Network for Monocular 3D Object Detection.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Evolving Search Space for Neural Architecture Search.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

GLiT: Neural Architecture Search for Global and Local Image Transformer.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

BN-NAS: Neural Architecture Search with Batch Normalization.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Distributed Signal Strength Prediction using Satellite Map empowered by Deep Vision Transformer.
Proceedings of the IEEE Globecom 2021 Workshops, Madrid, Spain, December 7-11, 2021, 2021

ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Layerwise Optimization by Gradient Decomposition for Continual Learning.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Mutual CRF-GNN for Few-Shot Learning.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Delving Into Localization Errors for Monocular 3D Object Detection.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Inception Convolution With Efficient Dilation Search.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Dynamic Position-aware Network for Fine-grained Image Recognition.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

Gradient Regularized Contrastive Learning for Continual Domain Adaptation.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020
Improving Description-Based Person Re-Identification by Multi-Granularity Image-Text Alignments.
IEEE Trans. Image Process., 2020

Deep Non-Local Kalman Network for Video Compression Artifact Reduction.
IEEE Trans. Image Process., 2020

Image Captioning With End-to-End Attribute Detection and Subsequent Attributes Prediction.
IEEE Trans. Image Process., 2020

Person Search by Separated Modeling and A Mask-Guided Two-Stream CNN Model.
IEEE Trans. Image Process., 2020

Show, Tell and Summarize: Dense Video Captioning Using Visual Cue Aided Sentence Summarization.
IEEE Trans. Circuits Syst. Video Technol., 2020

Efficient Visual Recognition.
Int. J. Comput. Vis., 2020

Deep Learning for Generic Object Detection: A Survey.
Int. J. Comput. Vis., 2020

Light field reconstruction using hierarchical features fusion.
Expert Syst. Appl., 2020

DETR for Pedestrian Detection.
CoRR, 2020

Full Matching on Low Resolution for Disparity Estimation.
CoRR, 2020

Direct Depth Learning Network for Stereo Matching.
CoRR, 2020

Temporal-Channel Transformer for 3D Lidar-Based Video Object Detection in Autonomous Driving.
CoRR, 2020

Adaptive Gradient Method with Resilience and Momentum.
CoRR, 2020

Once Quantized for All: Progressively Searching for Quantized Efficient Models.
CoRR, 2020

SAMOT: Switcher-Aware Multi-Object Tracking and Still Another MOT Measure.
CoRR, 2020

Exploring the Hierarchy in Relation Labels for Scene Graph Generation.
CoRR, 2020

Scope Head for Accurate Localization in Object Detection.
CoRR, 2020

Location-Aware Feature Selection for Scene Text Detection.
CoRR, 2020

3D Hand Pose Estimation with Disentangled Cross-Modal Latent Space.
Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2020

Improving Auto-Augment via Augmentation-Wise Weight Sharing.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Category-specific Semantic Coherency Learning for Fine-grained Image Recognition.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Navigation Command Matching for Vision-based Autonomous Driving.
Proceedings of the 2020 IEEE International Conference on Robotics and Automation, 2020

Computation Reallocation for Object Detection.
Proceedings of the 8th International Conference on Learning Representations, 2020

Cheaper Pre-training Lunch: An Efficient Paradigm for Object Detection.
Proceedings of the Computer Vision - ECCV 2020, 2020

Rethinking Pseudo-LiDAR Representation.
Proceedings of the Computer Vision - ECCV 2020, 2020

Content Adaptive and Error Propagation Aware Deep Video Compression.
Proceedings of the Computer Vision - ECCV 2020, 2020

Whole-Body Human Pose Estimation in the Wild.
Proceedings of the Computer Vision - ECCV 2020, 2020

Improving Deep Video Compression by Resolution-Adaptive Flow Coding.
Proceedings of the Computer Vision - ECCV 2020, 2020

Differentiable Hierarchical Graph Grouping for Multi-person Pose Estimation.
Proceedings of the Computer Vision - ECCV 2020, 2020

EcoNAS: Finding Proxies for Economical Neural Architecture Search.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

3D Human Mesh Regression With Dense Correspondence.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Equalization Loss for Long-Tailed Object Recognition.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Improving One-Shot NAS by Suppressing the Posterior Fading.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Multi-Dimensional Pruning: A Unified Framework for Model Compression.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

BriNet: Towards Bridging the Intra-class and Inter-class Gaps in One-Shot Segmentation.
Proceedings of the 31st British Machine Vision Conference 2020, 2020

Relational Prototypical Network for Weakly Supervised Temporal Action Localization.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

Part-Level Graph Convolutional Network for Skeleton-Based Action Recognition.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

Channel Pruning Guided by Classification Loss and Feature Importance.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

DASOT: A Unified Framework Integrating Data Association and Single Object Tracking for Online Multi-Object Tracking.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

Hierarchical Online Instance Matching for Person Search.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

Using Age Information as a Soft Biometric Trait for Face Image Analysis.
Proceedings of the Deep Biometrics, 2020

2019
Contextualized Spatial-Temporal Network for Taxi Origin-Destination Demand Prediction.
IEEE Trans. Intell. Transp. Syst., 2019

Deep Continuous Conditional Random Fields With Asymmetric Inter-Object Constraints for Online Multi-Object Tracking.
IEEE Trans. Circuits Syst. Video Technol., 2019

Part-aligned pose-guided recurrent network for action recognition.
Pattern Recognit., 2019

Monocular Depth Estimation Using Multi-Scale Continuous CRFs as Sequential Deep Networks.
IEEE Trans. Pattern Anal. Mach. Intell., 2019

A spatiotemporal attention-based ResC3D model for large-scale gesture recognition.
Mach. Vis. Appl., 2019

Improved generative adversarial networks with reconstruction loss.
Neurocomputing, 2019

Zoom Out-and-In Network with Map Attention Decision for Region Proposal and Object Detection.
Int. J. Comput. Vis., 2019

MMDetection: Open MMLab Detection Toolbox and Benchmark.
CoRR, 2019

AM-LFS: AutoML for Loss Function Search.
CoRR, 2019

Online Hyper-parameter Learning for Auto-Augmentation Strategy.
CoRR, 2019

Accurate Monocular 3D Object Detection via Color-Embedded 3D Reconstruction for Autonomous Driving.
CoRR, 2019

WIDER Face and Pedestrian Challenge 2018: Methods and Results.
CoRR, 2019

Multi-Object Tracking with Multiple Cues and Switcher-Aware Classification.
CoRR, 2019

Perceptual Image Enhancement by Relativistic Discriminant Learning With Cross-Scale Aggregated Representation.
IEEE Access, 2019

Referring Expression Comprehension with Semantic Visual Relationship and Word Mapping.
Proceedings of the 27th ACM International Conference on Multimedia, 2019

DaNet: Decompose-and-aggregate Network for 3D Human Shape and Pose Estimation.
Proceedings of the 27th ACM International Conference on Multimedia, 2019

IntersectGAN: Learning Domain Intersection for Generating Images with Multiple Attributes.
Proceedings of the 27th ACM International Conference on Multimedia, 2019

High-Performance Light Field Reconstruction with Channel-wise and SAI-wise Attention.
Proceedings of the Neural Information Processing - 26th International Conference, 2019

Feature Intertwiner for Object Detection.
Proceedings of the 7th International Conference on Learning Representations, 2019

Hierarchical Graph Convolutional Network for Skeleton-Based Action Recognition.
Proceedings of the Image and Graphics - 10th International Conference, 2019

Structured Modeling of Joint Deep Feature and Prediction Refinement for Salient Object Detection.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Unsupervised Collaborative Learning of Keyframe Detection and Visual Odometry Towards Monocular Deep SLAM.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Accurate Monocular 3D Object Detection via Color-Embedded 3D Reconstruction for Autonomous Driving.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Crowd Counting With Deep Structured Scale Integration Network.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Online Hyper-Parameter Learning for Auto-Augmentation Strategy.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

AM-LFS: AutoML for Loss Function Search.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

LAP-Net: Level-Aware Progressive Network for Image Dehazing.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

GradNet: Gradient-Guided Network for Visual Object Tracking.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

TRB: A Novel Triplet Representation for Understanding 2D Human Body.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

SR-LSTM: State Refinement for LSTM Towards Pedestrian Trajectory Prediction.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Improving Action Localization by Progressive Cross-Stream Cooperation.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Box-Driven Class-Wise Region Masking and Filling Rate Guided Loss for Weakly Supervised Semantic Segmentation.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Libra R-CNN: Towards Balanced Learning for Object Detection.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

DVC: An End-To-End Deep Video Compression Framework.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

GS3D: An Efficient 3D Object Detection Framework for Autonomous Driving.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Multi-Person Articulated Tracking With Spatial and Temporal Embeddings.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Hybrid Task Cascade for Instance Segmentation.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018
Fast Full-Search-Equivalent Pattern Matching Using Asymmetric Haar Wavelet Packets.
IEEE Trans. Circuits Syst. Video Technol., 2018

T-CNN: Tubelets With Convolutional Neural Networks for Object Detection From Videos.
IEEE Trans. Circuits Syst. Video Technol., 2018

Crafting GBD-Net for Object Detection.
IEEE Trans. Pattern Anal. Mach. Intell., 2018

Jointly Learning Deep Features, Deformable Parts, Occlusion and Classification for Pedestrian Detection.
IEEE Trans. Pattern Anal. Mach. Intell., 2018

Factorizable Net: An Efficient Subgraph-based Framework for Scene Graph Generation.
CoRR, 2018

Improved Boundary Equilibrium Generative Adversarial Networks.
IEEE Access, 2018

FishNet: A Versatile Backbone for Image, Region, and Pixel Level Prediction.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Crowd Counting using Deep Recurrent Spatial-Aware Network.
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, 2018

Quantization Mimic: Towards Very Tiny CNN for Object Detection.
Proceedings of the Computer Vision - ECCV 2018, 2018

Dividing and Aggregating Network for Multi-view Action Recognition.
Proceedings of the Computer Vision - ECCV 2018, 2018

Deep Kalman Filtering Network for Video Compression Artifact Reduction.
Proceedings of the Computer Vision - ECCV 2018, 2018

Factorizable Net: An Efficient Subgraph-Based Framework for Scene Graph Generation.
Proceedings of the Computer Vision - ECCV 2018, 2018

Neural Network Encapsulation.
Proceedings of the Computer Vision - ECCV 2018, 2018

Person Search via a Mask-Guided Two-Stream CNN Model.
Proceedings of the Computer Vision - ECCV 2018, 2018

Collaborative and Adversarial Network for Unsupervised Domain Adaptation.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

3D Human Pose Estimation in the Wild by Adversarial Learning.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Attention-Aware Compositional Network for Person Re-Identification.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Exploit the Unknown Gradually: One-Shot Video-Based Person Re-Identification by Stepwise Learning.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Optical Flow Guided Feature: A Fast and Robust Motion Representation for Video Action Recognition.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Mask-Guided Contrastive Attention Model for Person Re-Identification.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Visual Question Generation as Dual Task of Visual Question Answering.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Style Aggregated Network for Facial Landmark Detection.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

PAD-Net: Multi-Tasks Guided Prediction-and-Distillation Network for Simultaneous Depth Estimation and Scene Parsing.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

2017
Person Re-Identification by Saliency Learning.
IEEE Trans. Pattern Anal. Mach. Intell., 2017

DeepID-Net: Object Detection with Deformable Part Based Convolutional Neural Networks.
IEEE Trans. Pattern Anal. Mach. Intell., 2017

Visual Question Generation as Dual Task of Visual Question Answering.
CoRR, 2017

Learning Deep Representations for Scene Labeling with Semantic Context Guided Supervision.
CoRR, 2017

Learning Chained Deep Features and Classifiers for Cascade in Object Detection.
CoRR, 2017

Scene Graph Generation from Objects, Phrases and Caption Regions.
CoRR, 2017

ViP-CNN: A Visual Phrase Reasoning Convolutional Neural Network for Visual Relationship Detection.
CoRR, 2017

Zoom Out-and-In Network with Recursive Training for Object Proposal.
CoRR, 2017

Learning Deep Structured Multi-Scale Features using Attention-Gated CRFs for Contour Prediction.
Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

Multimodal Gesture Recognition Based on the ResC3D Network.
Proceedings of the 2017 IEEE International Conference on Computer Vision Workshops, 2017

Learning Feature Pyramids for Human Pose Estimation.
Proceedings of the IEEE International Conference on Computer Vision, 2017

Chained Cascade Network for Object Detection.
Proceedings of the IEEE International Conference on Computer Vision, 2017

Scene Graph Generation from Objects, Phrases and Region Captions.
Proceedings of the IEEE International Conference on Computer Vision, 2017

Online Multi-object Tracking Using CNN-Based Single Object Tracker with Spatial-Temporal Attention Mechanism.
Proceedings of the IEEE International Conference on Computer Vision, 2017

Learning Spatial Regularization with Image-Level Supervisions for Multi-label Image Classification.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

Multi-scale Continuous CRFs as Sequential Deep Networks for Monocular Depth Estimation.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

Learning Cross-Modal Deep Representations for Robust Pedestrian Detection.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

Quality Aware Network for Set to Set Recognition.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

ViP-CNN: Visual Phrase Guided Convolutional Neural Network.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

Object Detection in Videos with Tubelet Proposal Networks.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

Multi-context Attention for Human Pose Estimation.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

2016
Partial Occlusion Handling in Pedestrian Detection With a Deep Model.
IEEE Trans. Circuits Syst. Video Technol., 2016

Learning Mutual Visibility Relationship for Pedestrian Detection with a Deep Model.
Int. J. Comput. Vis., 2016

Factors in Finetuning Deep Model for object detection.
CoRR, 2016

CRF-CNN: Modeling Structured Information in Human Pose Estimation.
Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

Multi-Bias Non-linear Activation in Deep Neural Networks.
Proceedings of the 33nd International Conference on Machine Learning, 2016

Gated Bi-directional CNN for Object Detection.
Proceedings of the Computer Vision - ECCV 2016, 2016

Learnable Histogram: Statistical Context Features for Deep Neural Networks.
Proceedings of the Computer Vision - ECCV 2016, 2016

End-to-End Learning of Deformable Mixture of Parts and Deep Convolutional Neural Networks for Human Pose Estimation.
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

Learning Deep Feature Representations with Domain Guided Dropout for Person Re-identification.
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

STCT: Sequentially Training Convolutional Networks for Visual Tracking.
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

Factors in Finetuning Deep Model for Object Detection with Long-Tail Distribution.
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

Object Detection from Video Tubelets with Convolutional Neural Networks.
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

Structured Feature Learning for Pose Estimation.
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

2015
Single-Pedestrian Detection Aided by Two-Pedestrian Detection.
IEEE Trans. Pattern Anal. Mach. Intell., 2015

Window-Object Relationship Guided Representation Learning for Generic Object Detections.
CoRR, 2015

Visual Tracking with Fully Convolutional Networks.
Proceedings of the 2015 IEEE International Conference on Computer Vision, 2015

Learning Deep Representation with Large-Scale Attributes.
Proceedings of the 2015 IEEE International Conference on Computer Vision, 2015

Multi-task Recurrent Neural Network for Immediacy Prediction.
Proceedings of the 2015 IEEE International Conference on Computer Vision, 2015

Saliency detection by multi-context deep learning.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015

DeepID-Net: Deformable deep convolutional neural networks for object detection.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015

2014
DeepID-Net: multi-stage and deformable deep convolutional neural networks for object detection.
CoRR, 2014

Deep Learning of Scene-Specific Classifier for Pedestrian Detection.
Proceedings of the Computer Vision - ECCV 2014, 2014

Learning Mid-level Filters for Person Re-identification.
Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014

Multi-source Deep Learning for Human Pose Estimation.
Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014

Simplifying HOG arithmetic for speedy hardware realization.
Proceedings of the 2014 IEEE Asia Pacific Conference on Circuits and Systems, 2014

2013
Segmented Gray-Code Kernels for Fast Pattern Matching.
IEEE Trans. Image Process., 2013

Person Re-identification by Salience Matching.
Proceedings of the IEEE International Conference on Computer Vision, 2013

Multi-stage Contextual Deep Learning for Pedestrian Detection.
Proceedings of the IEEE International Conference on Computer Vision, 2013

Joint Deep Learning for Pedestrian Detection.
Proceedings of the IEEE International Conference on Computer Vision, 2013

Unsupervised Salience Learning for Person Re-identification.
Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013

Modeling Mutual Visibility Relationship in Pedestrian Detection.
Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013

Single-Pedestrian Detection Aided by Multi-pedestrian Detection.
Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013

2012
Performance Evaluation of Full Search Equivalent Pattern Matching Algorithms.
IEEE Trans. Pattern Anal. Mach. Intell., 2012

A discriminative deep model for pedestrian detection with occlusion handling.
Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012

2011
Fast pattern matching and its applications.
PhD thesis, 2011

Adaptive Low Resolution Pruning for fast Full Search-equivalent pattern matching.
Pattern Recognit. Lett., 2011

Image postprocessing by Non-local Kuan's filter.
J. Vis. Commun. Image Represent., 2011

2010
Fast Algorithm for Walsh Hadamard Transform on Sliding Windows.
IEEE Trans. Pattern Anal. Mach. Intell., 2010

Fast pattern matching using orthogonal Haar transform.
Proceedings of the Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition, 2010

2009
Image multi-scale edge detection using 3-D Hidden Markov Model based on the non-decimated wavelet.
Proceedings of the International Conference on Image Processing, 2009

Image deblocking using dual adaptive FIR Wiener filter in the DCT transform domain.
Proceedings of the IEEE International Conference on Acoustics, 2009


  Loading...