Limin Wang

Int. J. Comput. Vis., June, 2024

Learning Optical Flow and Scene Flow With Bidirectional Camera-LiDAR Fusion.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., April, 2024

Dynamic MDETR: A Dynamic Multimodal Transformer Decoder for Visual Grounding.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., February, 2024

Dual Masked Modeling for Weakly-Supervised Temporal Boundary Discovery.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2024

Sparse Action Tube Detection.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2024

Dual Graph Networks for Pose Estimation in Crowded Scenes.

[BibT_eX]

[DOI]

Jun Tu

Int. J. Comput. Vis., 2024

End-to-end dense video grounding via parallel regression.

[BibT_eX]

[DOI]

Fengyuan Shi

Weilin Huang

Comput. Vis. Image Underst., 2024

FlowDCN: Exploring DCN-like Architectures for Fast Image Generation with Arbitrary Resolution.

[BibT_eX]

[DOI]

CoRR, 2024

TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning.

[BibT_eX]

[DOI]

CoRR, 2024

Dynamic and Compressive Adaptation of Transformers From Images to Videos.

[BibT_eX]

[DOI]

CoRR, 2024

Efficient Test-Time Prompt Tuning for Vision-Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

CycleHOI: Improving Human-Object Interaction Detection with Cycle Consistency of Detection and Generation.

[BibT_eX]

[DOI]

Yisen Wang

CoRR, 2024

VideoEval: Comprehensive Benchmark Suite for Low-Cost Evaluation of Video Foundation Model.

[BibT_eX]

[DOI]

CoRR, 2024

VFIMamba: Video Frame Interpolation with State Space Models.

[BibT_eX]

[DOI]

CoRR, 2024

EgoVideo: Exploring Egocentric Foundation Model and Downstream Adaptation.

[BibT_eX]

[DOI]

CoRR, 2024

OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text.

[BibT_eX]

[DOI]

CoRR, 2024

Open-Vocabulary Spatio-Temporal Action Detection.

[BibT_eX]

[DOI]

CoRR, 2024

Multiple Object Tracking as ID Prediction.

[BibT_eX]

[DOI]

Ruopeng Gao

Yijun Zhang

CoRR, 2024

InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding.

[BibT_eX]

[DOI]

CoRR, 2024

Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding.

[BibT_eX]

[DOI]

CoRR, 2024

Spatiotemporal Predictive Pre-training for Robotic Motor Control.

[BibT_eX]

[DOI]

CoRR, 2024

From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities.

[BibT_eX]

[DOI]

CoRR, 2024

InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

SparseFormer: Sparse Visual Recognition via Limited Latent Tokens.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Accelerating Image Generation with Sub-path Linear Approximation Model.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

InternVideo2: Scaling Foundation Models for Multimodal Video Understanding.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Fully Sparse 3D Occupancy Prediction.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

ZeroI2V: Zero-Cost Adaptation of Pre-trained Transformers from Image to Video.

[BibT_eX]

[DOI]

Xinhao Li

Yuhan Zhu

Proceedings of the Computer Vision - ECCV 2024, 2024

VideoMamba: State Space Model for Efficient Video Understanding.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

StableDrag: Stable Dragging for Point-Based Image Editing.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Dual DETRs for Multi-Label Temporal Action Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Asymmetric Masked Distillation for Pre-Training Small Foundation Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Adapting Short-Term Transformers for Action Detection in Untrimmed Videos.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

SportsHHI: A Dataset for Human-Human Interaction Detection in Sports Videos.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Sparse Global Matching for Video Frame Interpolation with Large Motion.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

VBench: Comprehensive Benchmark Suite for Video Generative Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

MVBench: A Comprehensive Multi-modal Video Understanding Benchmark.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

BIVDiff: A Training-Free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023

Recovering 3D Human Mesh From Monocular Images: A Survey.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., December, 2023

Webly-supervised semantic segmentation via curriculum learning.

[BibT_eX]

[DOI]

Zuxian Huang

Comput. Vis. Image Underst., November, 2023

Temporal Perceiver: A General Architecture for Arbitrary Boundary Detection.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., October, 2023

BasicTAD: An astounding RGB-Only baseline for temporal action detection.

[BibT_eX]

[DOI]

Comput. Vis. Image Underst., July, 2023

APP-Net: Auxiliary-Point-Based Push and Pull Operations for Efficient Point Cloud Recognition.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2023

LIP: Local Importance-Based Pooling.

[BibT_eX]

[DOI]

Ziteng Gao

Int. J. Comput. Vis., 2023

MoVQA: A Benchmark of Versatile Question-Answering for Long-Form Movie Understanding.

[BibT_eX]

[DOI]

CoRR, 2023

MVBench: A Comprehensive Multi-modal Video Understanding Benchmark.

[BibT_eX]

[DOI]

CoRR, 2023

Harvest Video Foundation Models via Efficient Post-Pretraining.

[BibT_eX]

[DOI]

CoRR, 2023

Bridging The Gaps Between Token Pruning and Full Pre-training via Masked Fine-tuning.

[BibT_eX]

[DOI]

Fengyuan Shi

Norbert Scherer-Negenborn

CoRR, 2023

Joint Modeling of Feature, Correspondence, and a Compressed Memory for Video Object Segmentation.

[BibT_eX]

[DOI]

CoRR, 2023

DPL: Decoupled Prompt Learning for Vision-Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation.

[BibT_eX]

[DOI]

CoRR, 2023

Pave the Way to Grasp Anything: Transferring Foundation Models for Universal Pick-Place Robots.

[BibT_eX]

[DOI]

CoRR, 2023

AlphaBlock: Embodied Finetuning for Vision-Language Reasoning in Robot Manipulation.

[BibT_eX]

[DOI]

CoRR, 2023

VideoLLM: Modeling Video Sequence with Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

VideoChat: Chat-Centric Video Understanding.

[BibT_eX]

[DOI]

CoRR, 2023

InternGPT: Solving Vision-Centric Tasks by Interacting with Chatbots Beyond Language.

[BibT_eX]

[DOI]

CoRR, 2023

Progressive Visual Prompt Learning with Contrastive Feature Re-formation.

[BibT_eX]

[DOI]

CoRR, 2023

CycleACR: Cycle Modeling of Actor-Context Relations for Video Action Detection.

[BibT_eX]

[DOI]

CoRR, 2023

JourneyDB: A Benchmark for Generative Image Understanding.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

MixFormerV2: Efficient Fully Transformer Tracking.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Learning Discriminative Feature Representation for Open Set Action Recognition.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

RefineTAD: Learning Proposal-free Refinement for Temporal Action Detection.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

Filter-Recovery Network for Multi-Speaker Audio-Visual Speech Separation.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

The First Visual Object Tracking Segmentation VOTS2023 Challenge Results.

[BibT_eX]

[DOI]

Kannappan Palaniappan

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Deep Equilibrium Object Detection.

[BibT_eX]

[DOI]

Shuai Wang

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Memory-and-Anticipation Transformer for Online Action Understanding.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

StageInteractor: Query-based Object Detector with Cross-stage Interaction.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

SparseBEV: High-Performance Sparse 3D Object Detection from Multi-Camera Videos.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

UniFormerV2: Unlocking the Potential of Image ViTs for Video Understanding.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Unmasked Teacher: Towards Training-Efficient Video Foundation Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

MGMAE: Motion Guided Masking for Video Masked Autoencoding.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

MeMOTR: Long-Term Memory-Augmented Transformer for Multi-Object Tracking.

[BibT_eX]

[DOI]

Ruopeng Gao

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

SportsMOT: A Large Multi-Object Tracking Dataset in Multiple Sports Scenes.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Efficient Video Action Detection with Token Dropout and Context Refinement.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Graph Routes From Local and Global Entrances.

[BibT_eX]

[DOI]

Miao Cheng

Proceedings of the 6th International Conference on Big Data Technologies, 2023

Extracting Motion and Appearance via Inter-Frame Attention for Efficient Video Frame Interpolation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

PDPP: Projected Diffusion for Procedure Planning in Instructional Videos.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

LinK: Linear Kernel for LiDAR-based 3D Perception.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

CoMAE: Single Model Hybrid Pre-training on Small-Scale RGB-D Datasets.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022

FineAction: A Fine-Grained Video Dataset for Temporal Action Localization.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2022

Cross-Domain Gated Learning for Domain Generalization.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., 2022

Fully convolutional online tracking.

[BibT_eX]

[DOI]

Comput. Vis. Image Underst., 2022

InternVideo: General Video Foundation Models via Generative and Discriminative Learning.

[BibT_eX]

[DOI]

CoRR, 2022

UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer.

[BibT_eX]

[DOI]

CoRR, 2022

InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D Challenges.

[BibT_eX]

[DOI]

CoRR, 2022

Exploring State Change Capture of Heterogeneous Backbones @ Ego4D Hands and Objects Challenge 2022.

[BibT_eX]

[DOI]

CoRR, 2022

Submission to Generic Event Boundary Detection Challenge@CVPR 2022: Local Context Modeling and Global Boundary Decoding Approach.

[BibT_eX]

[DOI]

CoRR, 2022

APP-Net: Auxiliary-point-based Push and Pull Operations for Efficient Point Cloud Classification.

[BibT_eX]

[DOI]

CoRR, 2022

VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

PointTAD: Multi-Label Temporal Action Detection with Learnable Query Points.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

SpotFormer: A Transformer-based Framework for Precise Soccer Action Spotting.

[BibT_eX]

[DOI]

Proceedings of the 24th IEEE International Workshop on Multimedia Signal Processing, 2022

The Tenth Visual Object Tracking VOT2022 Challenge Results.

[BibT_eX]

[DOI]

Joni-Kristian Kämäräinen

Alireza Memarmoghadam

Christian Micheloni

Payman Moallem

Le Thanh Nguyen-Meidine

Proceedings of the Computer Vision - ECCV 2022 Workshops, 2022

Joint-Modal Label Denoising for Weakly-Supervised Audio-Visual Video Parsing.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

Task-specific Inconsistency Alignment for Domain Adaptive Object Detection.

[BibT_eX]

[DOI]

Liang Zhao

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Structured Sparse R-CNN for Direct Scene Graph Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Progressive Attention on Multi-Level Dense Difference Maps for Generic Event Boundary Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

OCSampler: Compressing Videos to One Clip with Single-step Sampling.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

AdaMixer: A Fast-Converging Query-Based Object Detector.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Cross-Architecture Self-supervised Video Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

DCAN: Improving Temporal Action Detection via Dual Context Aggregation.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

Negative Sample Matters: A Renaissance of Metric Learning for Temporal Grounding.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021

SADRNet: Self-Aligned Dual Face Regression Networks for Robust 3D Dense Face Alignment and Reconstruction.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2021

Cross-Modal Pyramid Translation for RGB-D Scene Recognition.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., 2021

FineAction: A Fined Video Dataset for Temporal Action Localization.

[BibT_eX]

[DOI]

CoRR, 2021

Target Transformed Regression for Accurate Tracking.

[BibT_eX]

[DOI]

CoRR, 2021

3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop.

[BibT_eX]

[DOI]

CoRR, 2021

NJU MCG - Sensetime Team Submission to Pre-training for Video Understanding Challenge Track II.

[BibT_eX]

[DOI]

Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Cross-modal Pretraining and Matching for Video Understanding.

[BibT_eX]

[DOI]

Proceedings of the MMPT@ICMR2021: Proceedings of the 2021 Workshop on Multi-Modal Pre-Training for Multimedia Understanding, 2021

The Ninth Visual Object Tracking VOT2021 Challenge Results.

[BibT_eX]

[DOI]

Joni-Kristian Kämäräinen

Mohamed H. Abdelpakey

Alireza Memarmoghadam

Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, 2021

MGSampler: An Explainable Sampling Strategy for Video Action Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

PyMAF: 3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Target Adaptive Context Aggregation for Video Scene Graph Generation.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Relaxed Transformer Decoders for Direct Action Proposal Generation.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

TAM: Temporal Adaptive Module for Video Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

MultiSports: A Multi-Person Video Dataset of Spatio-Temporally Localized Sports Actions.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Self Supervision to Distillation for Long-Tailed Visual Recognition.

[BibT_eX]

[DOI]

Tianhao Li

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Mutual Supervision for Dense Object Detection.

[BibT_eX]

[DOI]

Ziteng Gao

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

CGA-Net: Category Guided Aggregation for Point Cloud Semantic Segmentation.

[BibT_eX]

[DOI]

Tao Lu

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

TDN: Temporal Difference Networks for Efficient Action Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

A Closer Look at Few-Shot Video Classification: A New Baseline and Benchmark.

[BibT_eX]

[DOI]

Proceedings of the 32nd British Machine Vision Conference 2021, 2021

2020

Dynamic Sampling Networks for Efficient Action Recognition in Videos.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2020

Temporal Action Detection with Structured Segment Networks.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., 2020

Learning Spatiotemporal Features via Video and Text Pair Discrimination.

[BibT_eX]

[DOI]

Tianhao Li

CoRR, 2020

V4D: 4D Convolutional Neural Networks for Video-level Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the 8th International Conference on Learning Representations, 2020

Context-Aware RCNN: A Baseline for Action Detection in Videos.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

Boundary-Aware Cascade Networks for Temporal Action Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

Actions as Moving Points.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

TEA: Temporal Excitation and Aggregation for Action Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

SketchyCOCO: Image Generation From Freehand Scene Sketches.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Knowledge Integration Networks for Action Recognition.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

TEINet: Towards an Efficient Architecture for Video Recognition.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

Finding Action Tubes with a Sparse-to-Dense Framework.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019

Temporal Segment Networks for Action Recognition in Videos.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2019

Dynamically Visual Disambiguation of Keyword-based Image Search.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019

Learning Actor Relation Graphs for Group Activity Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Cross-Stream Selective Networks for Action Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2019

Translate-to-Recognize Networks for RGB-D Scene Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

StNet: Local and Global Spatial-Temporal Modeling for Action Recognition.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018

Real-Time Action Recognition With Deeply Transferred Motion Vector CNNs.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2018

Transferring Deep Object and Scene Representations for Event Recognition in Still Images.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., 2018

Structured Triplet Learning with POS-tag Guided Attention for Visual Question Answering.

[BibT_eX]

[DOI]

CoRR, 2018

Structured Triplet Learning with POS-Tag Guided Attention for Visual Question Answering.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision, 2018

Single Image Highlight Removal with a Sparse and Low-Rank Reflection Model.

[BibT_eX]

[DOI]

Jie Guo

Zuojian Zhou

Proceedings of the Computer Vision - ECCV 2018, 2018

Appearance-and-Relation Networks for Video Classification.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

2017

Weakly Supervised PatchNets: Describing and Aggregating Local Patches for Scene Recognition.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2017

Knowledge Guided Disambiguation for Large-Scale Scene Classification With Multi-Resolution CNNs.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2017

Locally Supervised Deep Hybrid Model for Scene Recognition.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2017

WebVision Database: Visual Learning and Understanding from Web Data.

[BibT_eX]

[DOI]

CoRR, 2017

A Pursuit of Temporal Accuracy in General Activity Detection.

[BibT_eX]

[DOI]

CoRR, 2017

WebVision Challenge: Visual Learning and Understanding With Web Data.

[BibT_eX]

[DOI]

CoRR, 2017

UntrimmedNets for Weakly Supervised Action Recognition and Detection.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

Thin-Slicing Network: A Deep Structured Model for Pose Estimation in Videos.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

2016

MoFAP: A Multi-level Representation for Action Recognition.

[BibT_eX]

[DOI]

Shivakumara Palaiahnakote

Int. J. Comput. Vis., 2016

Modeling spatial layout for scene image understanding via a novel multiscale sum-product network.

[BibT_eX]

[DOI]

Chew Lim Tan

Expert Syst. Appl., 2016

Bag of visual words and fusion methods for action recognition: Comprehensive study and good practice.

[BibT_eX]

[DOI]

Comput. Vis. Image Underst., 2016

CUHK & ETHZ & SIAT Submission to ActivityNet Challenge 2016.

[BibT_eX]

[DOI]

CoRR, 2016

Transferring Object-Scene Convolutional Neural Networks for Event Recognition in Still Images.

[BibT_eX]

[DOI]

CoRR, 2016

Codebook enhancement of vlad representation for visual recognition.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Temporal Segment Networks: Towards Good Practices for Deep Action Recognition.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2016, 2016

Real-Time Action Recognition with Enhanced Motion Vector CNNs.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

Actionness Estimation Using Hybrid Fully Convolutional Networks.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

Two-Stream SR-CNNs for Action Recognition in Videos.

[BibT_eX]

[DOI]

Proceedings of the British Machine Vision Conference 2016, 2016

2015

Towards Good Practices for Very Deep Two-Stream ConvNets.

[BibT_eX]

[DOI]

CoRR, 2015

Object-Scene Convolutional Neural Networks for Event Recognition in Images.

[BibT_eX]

[DOI]

CoRR, 2015

Places205-VGGNet Models for Scene Recognition.

[BibT_eX]

[DOI]

CoRR, 2015

Better Exploiting OS-CNNs for Better Event Recognition in Images.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Computer Vision Workshop, 2015

Object-Scene Convolutional Neural Networks for event recognition in images.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2015

Exploring Fisher vector and deep networks for action spotting.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2015

Action recognition with trajectory-pooled deep-convolutional descriptors.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015

2014

Latent Hierarchical Model of Temporal Structure for Complex Activity Classification.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2014

A Joint Evaluation of Dictionary Learning and Feature Encoding for Action Recognition.

[BibT_eX]

[DOI]

Proceedings of the 22nd International Conference on Pattern Recognition, 2014

Video Action Detection with Relational Dynamic-Poselets.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2014, 2014

Boosting VLAD with Supervised Dictionary Learning and High-Order Statistics.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2014, 2014

Action and Gesture Temporal Spotting with Super Vector Representation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2014 Workshops, 2014

Multi-view Super Vector for Action Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014

2013

Mining Motion Atoms and Phrases for Complex Action Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Computer Vision, 2013

Motionlets: Mid-level 3D Parts for Human Motion Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013

2012

A Comparative Study of Encoding, Pooling and Normalization Methods for Action Recognition.

[BibT_eX]

[DOI]

Xingxing Wang