2025
Building a Mind Palace: Structuring Environment-Grounded Semantic Graphs for Effective Long Video Analysis with LLMs.
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, January, 2025
2024
On the Effectiveness of Dataset Alignment for Fake Image Detection.
CoRR, 2024
TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2024
Vinoground: Scrutinizing LMMs over Dense Temporal Reasoning with Short Videos.
CoRR, 2024
Interpolating Video-LLMs: Toward Longer-sequence LMMs in a Training-free Manner.
CoRR, 2024
LLaRA: Supercharging Robot Learning Data for Vision-Language Policy.
,
,
,
,
,
,
,
,
,
,
CoRR, 2024
Matryoshka Multimodal Models.
CoRR, 2024
LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models.
CoRR, 2024
LLM Inference Unveiled: Survey and Roofline Model Insights.
,
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2024
Cohere3D: Exploiting Temporal Coherence for Unsupervised Representation Learning of Vision-based Autonomous Driving.
CoRR, 2024
Computer Vision on the Edge: Individual Cattle Identification in Real-time with ReadMyCow System.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024
Testing Learning-Enabled Cyber-Physical Systems with Large-Language Models: A Formal Approach.
Proceedings of the Companion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering, 2024
Interfacing Foundation Models' Embeddings.
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
Yo'LLaVA: Your Personalized Language and Vision Assistant.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
Cross-Modal Self-Supervised Learning with Effective Contrastive Units for LiDAR Point Clouds.
Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2024
VGBench: Evaluating Large Language Models on Vector Graphics Understanding and Generation.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024
MATE: Meet At The Embedding - Connecting Images with Long Texts.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024
Removing Distributional Discrepancies in Captions Improves Image-Text Alignment.
Proceedings of the Computer Vision - ECCV 2024, 2024
Edit One for All: Interactive Batch Image Editing.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
Improved Baselines with Visual Instruction Tuning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
CounterCurate: Enhancing Physical and Semantic Visio-Linguistic Compositional Reasoning via Counterfactual Examples.
Proceedings of the Findings of the Association for Computational Linguistics, 2024
2023
Delving Deeper into Anti-Aliasing in ConvNets.
Int. J. Comput. Vis., 2023
Interfacing Foundation Models' Embeddings.
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2023
Diversify, Don't Fine-Tune: Scaling Up Visual Recognition Training with Synthetic Images.
CoRR, 2023
Making Large Multimodal Models Understand Arbitrary Visual Prompts.
CoRR, 2023
Testing learning-enabled cyber-physical systems with Large-Language Models: A Formal Approach.
CoRR, 2023
Investigating the Catastrophic Forgetting in Multimodal Large Language Models.
CoRR, 2023
Visual Instruction Inversion: Image Editing via Visual Prompting.
CoRR, 2023
Benchmarking and Analyzing Generative Data for Visual Recognition.
CoRR, 2023
Generate Anything Anywhere in Any Scene.
CoRR, 2023
Leveraging Large Language Models for Scalable Vector Graphics-Driven Image Understanding.
CoRR, 2023
Segment Everything Everywhere All at Once.
CoRR, 2023
Segment Everything Everywhere All at Once.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023
What Knowledge Gets Distilled in Knowledge Distillation?
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023
Visual Instruction Inversion: Image Editing via Image Prompting.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023
Visual Instruction Tuning.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023
InPL: Pseudo-labeling the Inliers First for Imbalanced Semi-supervised Learning.
Proceedings of the Eleventh International Conference on Learning Representations, 2023
A Sentence Speaks a Thousand Images: Domain Generalization through Distilling CLIP with Language Guidance.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023
Exploring the Capabilities of a General-Purpose Robotic Arm in Chess Gameplay.
Proceedings of the 22nd IEEE-RAS International Conference on Humanoid Robots, 2023
Generalized Decoding for Pixel, Image, and Language.
,
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
Towards Universal Fake Image Detectors that Generalize Across Generative Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
Learning Customized Visual Models with Retrieval-Augmented Knowledge.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
GLIGEN: Open-Set Grounded Text-to-Image Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
2022
YOLACT++ Better Real-Time Instance Segmentation.
IEEE Trans. Pattern Anal. Mach. Intell., 2022
Expeditious Saliency-guided Mix-up through Random Gradient Thresholding.
CoRR, 2022
EnergyMatch: Energy-based Pseudo-Labeling for Semi-Supervised Learning.
CoRR, 2022
What Knowledge Gets Distilled in Knowledge Distillation?
CoRR, 2022
The Two Dimensions of Worst-case Training and the Integrated Effect for Out-of-domain Generalization.
CoRR, 2022
End-to-End Instance Edge Detection.
CoRR, 2022
Equine Pain Behavior Classification via Self-Supervised Disentangled Pose Representation.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2022
Toward learning human-aligned cross-domain robust models by countering misaligned features.
Proceedings of the Uncertainty in Artificial Intelligence, 2022
ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models.
,
,
,
,
,
,
,
,
,
,
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022
Masked Discrimination for Self-supervised Learning on Point Clouds.
Proceedings of the Computer Vision - ECCV 2022, 2022
Contrastive Learning for Diverse Disentangled Foreground Generation.
Proceedings of the Computer Vision - ECCV 2022, 2022
GIRAFFE HD: A High-Resolution 3D-aware Generative Model.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022
The Two Dimensions of Worst-case Training and Their Integrated Effect for Out-of-domain Generalization.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022
2021
Generating Furry Cars: Disentangling Object Shape & Appearance across Multiple Domains.
CoRR, 2021
SinGAN-GIF: Learning a Generative Video Model from a Single GIF.
Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2021
YolactEdge: Real-time Instance Segmentation on the Edge.
Proceedings of the IEEE International Conference on Robotics and Automation, 2021
Generating Furry Cars: Disentangling Object Shape and Appearance across Multiple Domains.
Proceedings of the 9th International Conference on Learning Representations, 2021
Seeing the Unseen: Predicting the First-Person Camera Wearer's Location and Pose in Third-Person Scenes.
Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, 2021
Collaging Class-specific GANs for Semantic Image Synthesis.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021
Progressive Temporal Feature Alignment Network for Video Inpainting.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021
Few-Shot Image Generation via Cross-Domain Correspondence.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021
PartGAN: Unsupervised Part Decomposition for Image Generation and Segmentation.
Proceedings of the 32nd British Machine Vision Conference 2021, 2021
2020
YolactEdge: Real-time Instance Segmentation on the Edge (Jetson AGX Xavier: 30 FPS, RTX 2080 Ti: 170 FPS).
CoRR, 2020
Audiovisual SlowFast Networks for Video Recognition.
CoRR, 2020
Action Graphs: Weakly-supervised Action Localization with Graph Convolution Networks.
Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2020
Boxer: Preventing fraud by scanning credit cards.
Proceedings of the 29th USENIX Security Symposium, 2020
Elastic-InfoGAN: Unsupervised Disentangled Representation Learning in Class-Imbalanced Data.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020
Password-Conditioned Anonymization and Deanonymization with Face Identity Transformers.
Proceedings of the Computer Vision - ECCV 2020, 2020
Don't Judge an Object by Its Context: Learning to Overcome Contextual Bias.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020
Instance-Aware, Context-Focused, and Memory-Efficient Weakly Supervised Object Detection.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020
MixNMatch: Multifactor Disentanglement and Encoding for Conditional Image Generation.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020
Delving Deeper into Anti-aliasing in ConvNets.
Proceedings of the 31st British Machine Vision Conference 2020, 2020
2019
A 16-Gb, 18-Gb/s/pin GDDR6 DRAM With Per-Bit Trainable Single-Ended DFE and PLL-Less Clocking.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
IEEE J. Solid State Circuits, 2019
Elastic-InfoGAN: Unsupervised Disentangled Representation Learning in Imbalanced Data.
CoRR, 2019
Identity From Here, Pose From There: Self-Supervised Disentanglement and Generation of Objects Using Unlabeled Videos.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019
YOLACT: Real-Time Instance Segmentation.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019
FineGAN: Unsupervised Hierarchical Disentanglement for Fine-Grained Object Generation and Discovery.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019
You Reap What You Sow: Using Videos to Generate High Precision Object Proposals for Weakly-Supervised Object Detection.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019
HPLFlowNet: Hierarchical Permutohedral Lattice FlowNet for Scene Flow Estimation on Large-Scale Point Clouds.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019
2018
Hide-and-Seek: A Data Augmentation Technique for Weakly-Supervised Localization and Beyond.
CoRR, 2018
Transferring Common-Sense Knowledge for Object Detection.
CoRR, 2018
Who Will Share My Image?: Predicting the Content Diffusion Path in Online Social Networks.
Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, 2018
A 16Gb 18Gb/S/pin GDDR6 DRAM with per-bit trainable single-ended DFE and PLL-less clocking.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the 2018 IEEE International Solid-State Circuits Conference, 2018
A Visual Attention Grounding Neural Model for Multimodal Machine Translation.
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31, 2018
Video Object Detection with an Aligned Spatial-Temporal Memory.
Proceedings of the Computer Vision - ECCV 2018, 2018
DOCK: Detecting Objects by Transferring Common-Sense Knowledge.
Proceedings of the Computer Vision - ECCV 2018, 2018
Learning to Anonymize Faces for Privacy Preserving Action Detection.
Proceedings of the Computer Vision - ECCV 2018, 2018
Cross-Domain Self-Supervised Multi-Task Feature Learning Using Synthetic Imagery.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018
2017
Analyzing the Adoption and Cascading Process of OSN-Based Gifting Applications: An Empirical Study.
ACM Trans. Web, 2017
Spatial-Temporal Memory Networks for Video Object Detection.
CoRR, 2017
Who Moved My Cheese? Automatic Annotation of Rodent Behaviors with Convolutional Neural Networks.
Proceedings of the 2017 IEEE Winter Conference on Applications of Computer Vision, 2017
Hide-and-Seek: Forcing a Network to be Meticulous for Weakly-Supervised Object and Action Localization.
Proceedings of the IEEE International Conference on Computer Vision, 2017
Weakly-Supervised Visual Grounding of Phrases with Linguistic Structures.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017
Interspecies Knowledge Transfer for Facial Keypoint Detection.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017
Identifying First-Person Camera Wearers in Third-Person Videos.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017
2016
Discovering Mid-level Visual Connections in Space and Time.
Proceedings of the Deep Learning and Convolutional Neural Networks for Medical Image Computing, 2016
End-to-End Localization and Ranking for Relative Attributes.
Proceedings of the Computer Vision - ECCV 2016, 2016
Track and Segment: An Iterative Unsupervised Approach for Video Object Proposals.
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016
Track and Transfer: Watching Videos to Simulate Strong Human Supervision for Weakly-Supervised Object Detection.
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016
2015
Predicting Important Objects for Egocentric Video Summarization.
Int. J. Comput. Vis., 2015
Discovering the Spatial Extent of Relative Attributes.
Proceedings of the 2015 IEEE International Conference on Computer Vision, 2015
FlowWeb: Joint image set alignment by weaving consistent, pixel-wise correspondences.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015
2014
AverageExplorer: interactive exploration and alignment of visual data collections.
ACM Trans. Graph., 2014
Development of a Monitoring System for Multichannel Cables Using TDR.
IEEE Trans. Instrum. Meas., 2014
Weakly-supervised Discovery of Visual Pattern Configurations.
Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, 2014
An Introduction to the 3rd Workshop on Egocentric (First-Person) Vision.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014
2013
Style-Aware Mid-level Representation for Discovering Visual Connections in Space and Time.
Proceedings of the IEEE International Conference on Computer Vision, 2013
2012
Object-Graphs for Context-Aware Visual Category Discovery.
IEEE Trans. Pattern Anal. Mach. Intell., 2012
Discovering important people and objects for egocentric video summarization.
Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012
2011
ShadowDraw: real-time user guidance for freehand drawing.
ACM Trans. Graph., 2011
Face Tracking for Augmented Reality Game Interface and Brand Placement.
Proceedings of the Ubiquitous Computing and Multimedia Applications, 2011
Key-segments for video object segmentation.
Proceedings of the IEEE International Conference on Computer Vision, 2011
Learning the easy things first: Self-paced visual category discovery.
Proceedings of the 24th IEEE Conference on Computer Vision and Pattern Recognition, 2011
Face Discovery with Social Context.
Proceedings of the British Machine Vision Conference, 2011
2010
Simple, extensible and flexible random key predistribution schemes for wireless sensor networks using reusable key pools.
J. Intell. Manuf., 2010
Interface of Augmented Reality Game Using Face Tracking and Its Application to Advertising.
Proceedings of the Security-Enriched Urban Computing and Smart Grid, 2010
Collect-cut: Segmentation with top-down cues discovered in multi-object images.
Proceedings of the Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition, 2010
Object-graphs for context-aware category discovery.
Proceedings of the Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition, 2010
2009
Foreground Focus: Unsupervised Learning from Partially Matching Images.
Int. J. Comput. Vis., 2009
Shape discovery from unlabeled image collections.
Proceedings of the 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009), 2009
2008
Ray-based Color Image Segmentation.
Proceedings of the Fifth Canadian Conference on Computer and Robot Vision, 2008
Foreground Focus: Finding Meaningful Features in Unlabeled Images.
Proceedings of the British Machine Vision Conference 2008, Leeds, UK, September 2008, 2008
2007
The Analysis of PPL Attention Effects in the Screen of Multimedia Contents.
Proceedings of the Future Generation Communication and Networking, 2007