Josef Sivic

Orcid: 0000-0002-2554-5301

According to our database1, Josef Sivic authored at least 153 papers between 2003 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
GJK++: Leveraging Acceleration Methods for Faster Collision Detection.
IEEE Trans. Robotics, 2024

Multi-Task Learning of Object States and State-Modifying Actions From Web Videos.
IEEE Trans. Pattern Anal. Mach. Intell., 2024

MassSpecGym: A benchmark for the discovery and identification of molecules.
CoRR, 2024

Revealing data leakage in protein interaction benchmarks.
CoRR, 2024

Learning to design protein-protein interactions with enhanced generalization.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

GenHowTo: Learning to Generate Actions and State Transformations from Instructional Videos.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023
YouTube8M-MusicTextClips.
Dataset, June, 2023

Imitrob: Imitation Learning Dataset for Training and Evaluating 6D Object Pose Estimators.
IEEE Robotics Autom. Lett., May, 2023

Customizing Motion in Text-to-Video Diffusion Models.
CoRR, 2023

FocalPose++: Focal Length and Object Pose Estimation via Render and Compare.
CoRR, 2023

Visually Guided Model Predictive Robot Control via 6D Object Pose Localization and Tracking.
CoRR, 2023

VidChapters-7M: Video Chapters at Scale.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

POP-3D: Open-Vocabulary 3D Occupancy Prediction from Images.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Multi-Contact Task and Motion Planning Guided by Video Demonstration.
Proceedings of the IEEE International Conference on Robotics and Automation, 2023

Differentiable Collision Detection: a Randomized Smoothing Approach.
Proceedings of the IEEE International Conference on Robotics and Automation, 2023

Meta-Personalizing Vision-Language Models to Find Named Instances in Video.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Language-Guided Music Recommendation for Video via Prompt Analogies.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
Learning to Manipulate Tools by Aligning Simulation to Video Demonstration.
IEEE Robotics Autom. Lett., 2022

Long-Term Visual Localization Revisited.
IEEE Trans. Pattern Anal. Mach. Intell., 2022

NCNet: Neighbourhood Consensus Networks for Estimating Image Correspondences.
IEEE Trans. Pattern Anal. Mach. Intell., 2022

Estimating 3D Motion and Forces of Human-Object Interactions from Internet Videos.
Int. J. Comput. Vis., 2022

Multi-Task Learning of Object State Changes from Uncurated Videos.
CoRR, 2022

Learning to Answer Visual Questions from Web Videos.
CoRR, 2022

Collision Detection Accelerated: An Optimization Perspective.
Proceedings of the Robotics: Science and Systems XVIII, New York City, NY, USA, June 27, 2022

Zero-Shot Video Question Answering via Frozen Bidirectional Language Models.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Learning Object Manipulation Skills from Video via Approximate Differentiable Physics.
Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2022

Drive&Segment: Unsupervised Semantic Segmentation of Urban Scenes via Cross-Modal Distillation.
Proceedings of the Computer Vision - ECCV 2022, 2022

TubeDETR: Spatio-Temporal Video Grounding with Transformers.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Look for the Change: Learning Object States and State-Modifying Actions from Untrimmed Web Videos.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Focal Length and Object Pose Estimation via Render and Compare.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

MegaPose: 6D Pose Estimation of Novel Objects via Render & Compare.
Proceedings of the Conference on Robot Learning, 2022

Benchmarking Learning Efficiency in Deep Reservoir Computing.
Proceedings of the Conference on Lifelong Learning Agents, 2022

2021
Are Large-Scale 3D Models Really Necessary for Accurate Visual Localization?
IEEE Trans. Pattern Anal. Mach. Intell., 2021

InLoc: Indoor Visual Localization with Dense Matching and View Synthesis.
IEEE Trans. Pattern Anal. Mach. Intell., 2021

Bilinear Image Translation for Temporal Analysis of Photo Collections.
IEEE Trans. Pattern Anal. Mach. Intell., 2021

Reconstructing and grounding narrated instructional videos in 3D.
CoRR, 2021

Learning to Solve Geometric Construction Problems from Images.
Proceedings of the Intelligent Computer Mathematics - 14th International Conference, 2021

Just Ask: Learning to Answer Questions from Millions of Narrated Videos.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Weakly Supervised Human-Object Interaction Detection in Video via Contrastive Spatiotemporal Regions.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Thinking Fast and Slow: Efficient Text-to-Visual Retrieval With Transformers.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Single-View Robot Pose and Joint Angle Estimation via Render & Compare.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Artificial Dummies for Urban Dataset Augmentation.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020
Monte-Carlo Tree Search for Efficient Visually Guided Rearrangement Planning.
IEEE Robotics Autom. Lett., 2020

RareAct: A video dataset of unusual interactions.
CoRR, 2020

Occlusion resistant learning of intuitive physics from videos.
CoRR, 2020

Visualizing computation in large-scale cellular automata.
Proceedings of the 2020 Conference on Artificial Life, 2020

Learning to combine primitive skills: A step towards versatile robotic manipulation §.
Proceedings of the 2020 IEEE International Conference on Robotics and Automation, 2020

Learning Actionness via Long-Range Temporal Order Verification.
Proceedings of the Computer Vision - ECCV 2020, 2020

Efficient Neighbourhood Consensus Networks via Submanifold Sparse Convolutions.
Proceedings of the Computer Vision - ECCV 2020, 2020

CosyPose: Consistent Multi-view Multi-object 6D Pose Estimation.
Proceedings of the Computer Vision - ECCV 2020, 2020

End-to-End Learning of Visual Representations From Uncurated Instructional Videos.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Learning Object Manipulation Skills via Approximate State Estimation from Real Videos.
Proceedings of the 4th Conference on Robot Learning, 2020

2019
Convolutional Neural Network Architecture for Geometric Matching.
IEEE Trans. Pattern Anal. Mach. Intell., 2019

Combining learned skills and reinforcement learning for robotic manipulations.
CoRR, 2019

Temporal Localization of Moments in Video Collections with Natural Language.
CoRR, 2019

D2-Net: A Trainable CNN for Joint Detection and Description of Local Features.
CoRR, 2019

Teaching robots to imitate a human with no on-teacher sensors. What are the key challenges?
CoRR, 2019

Evolving Structures in Complex Systems.
Proceedings of the IEEE Symposium Series on Computational Intelligence, 2019

Is This the Right Place? Geometric-Semantic Pose Verification for Indoor Visual Localization.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Detecting Unseen Visual Relations Using Analogies.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Cross-Task Weakly Supervised Learning From Instructional Videos.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Leveraging the Present to Anticipate the Future in Videos.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2019

Estimating 3D Motion and Forces of Person-Object Interactions From Monocular Video.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

D2-Net: A Trainable CNN for Joint Description and Detection of Local Features.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018
24/7 Place Recognition by View Synthesis.
IEEE Trans. Pattern Anal. Mach. Intell., 2018

NetVLAD: CNN Architecture for Weakly Supervised Place Recognition.
IEEE Trans. Pattern Anal. Mach. Intell., 2018

Learning from Narrated Instruction Videos.
IEEE Trans. Pattern Anal. Mach. Intell., 2018

Detecting rare visual relations using analogies.
CoRR, 2018

Learning a Text-Video Embedding from Incomplete and Heterogeneous Data.
CoRR, 2018

Neighbourhood Consensus Networks.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Localizing Moments in Video with Temporal Language.
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31, 2018

Benchmarking 6DOF Outdoor Visual Localization in Changing Conditions.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

End-to-End Weakly-Supervised Semantic Alignment.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

2017
Guest Editorial: Best Papers from ICCV 2015.
Int. J. Comput. Vis., 2017

Benchmarking 6DOF Urban Visual Localization in Changing Conditions.
CoRR, 2017

Learnable pooling with Context Gating for video classification.
CoRR, 2017

Joint Discovery of Object States and Manipulating Actions.
CoRR, 2017

Weakly-Supervised Learning of Visual Relations.
Proceedings of the IEEE International Conference on Computer Vision, 2017

Learning from Video and Text via Large-Scale Discriminative Clustering.
Proceedings of the IEEE International Conference on Computer Vision, 2017

Localizing Moments in Video with Natural Language.
Proceedings of the IEEE International Conference on Computer Vision, 2017

Joint Discovery of Object States and Manipulation Actions.
Proceedings of the IEEE International Conference on Computer Vision, 2017

ActionVLAD: Learning Spatio-Temporal Aggregation for Action Classification.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

The Analysis of High Density Crowds in Videos.
Proceedings of the Group and Crowd Behavior for Computer Vision, 1st Edition, 2017

2016
Visual Geo-localization of Non-photographic Depictions via 2D-3D Alignment.
Proceedings of the Deep Learning and Convolutional Neural Networks for Medical Image Computing, 2016

Guest Editorial: Video Recognition.
Int. J. Comput. Vis., 2016

Guest Editorial: Large Scale Visual Media Geo-Localization.
Int. J. Comput. Vis., 2016

Learning and Calibrating Per-Location Classifiers for Visual Place Recognition.
Int. J. Comput. Vis., 2016

Unsupervised Learning from Narrated Instruction Videos.
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

2015
Visual Place Recognition with Repetitive Structures.
IEEE Trans. Pattern Anal. Mach. Intell., 2015

Pose Estimation and Segmentation of Multiple People in Stereoscopic Movies.
IEEE Trans. Pattern Anal. Mach. Intell., 2015

What makes Paris look like Paris?
Commun. ACM, 2015

Linking Past to Present: Discovering Style in Two Centuries of Architecture.
Proceedings of the 2015 IEEE International Conference on Computational Photography, 2015

Is object localization for free? - Weakly-supervised learning with convolutional neural networks.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015

On pairwise costs for network flow multi-object tracking.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015

2014
Painting-to-3D model alignment via discriminative visual elements.
ACM Trans. Graph., 2014

Efficient Localization of Panoramic Images Using Tiled Image Descriptors.
IPSJ Trans. Comput. Vis. Appl., 2014

Deblurring Shaken and Partially Saturated Images.
Int. J. Comput. Vis., 2014

People Watching: Human Actions as a Cue for Single View Geometry.
Int. J. Comput. Vis., 2014

Urban-Scale Quantitative Visual Analysis.
ERCIM News, 2014

On Pairwise Cost for Multi-Object Network Flow Tracking.
CoRR, 2014

Predicting Actions from Static Scenes.
Proceedings of the Computer Vision - ECCV 2014, 2014

Weakly Supervised Action Labeling in Videos under Ordering Constraints.
Proceedings of the Computer Vision - ECCV 2014, 2014

Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks.
Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014

Seeing 3D Chairs: Exemplar Part-Based 2D-3D Alignment Using a Large Dataset of CAD Models.
Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014

Visual search and recognition of objects, scenes and people.
, 2014

Efficient, blind, spatially-variant deblurring for shaken images.
Proceedings of the Motion Deblurring: Algorithms and Systems, 2014

2013
Finding Actors and Actions in Movies.
Proceedings of the IEEE International Conference on Computer Vision, 2013

Pose Estimation and Segmentation of People in 3D Movies.
Proceedings of the IEEE International Conference on Computer Vision, 2013

2012
In Memoriam: Mark Everingham.
IEEE Trans. Pattern Anal. Mach. Intell., 2012

Non-uniform Deblurring for Shaken Images.
Int. J. Comput. Vis., 2012

Scene Semantics from Long-Term Observation of People.
Proceedings of the Computer Vision - ECCV 2012, 2012

2011
Geometric Latent Dirichlet Allocation on a Matching Graph for Large-scale Image Datasets.
Int. J. Comput. Vis., 2011

Learning person-object interactions for action recognition in still images.
Proceedings of the Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12-14 December 2011, 2011

Visual localization by linear combination of image descriptors.
Proceedings of the IEEE International Conference on Computer Vision Workshops, 2011

Automatic alignment of paintings and photographs depicting a 3D scene.
Proceedings of the IEEE International Conference on Computer Vision Workshops, 2011

Data-driven crowd analysis in videos.
Proceedings of the IEEE International Conference on Computer Vision, 2011

Density-aware person detection and tracking in crowds.
Proceedings of the IEEE International Conference on Computer Vision, 2011

Track to the future: Spatio-temporal video segmentation with long-range motion cues.
Proceedings of the 24th IEEE Conference on Computer Vision and Pattern Recognition, 2011

2010
Infinite Images: Creating and Exploring a Large Photorealistic Virtual Space.
Proc. IEEE, 2010

Descriptor Learning for Efficient Retrieval.
Proceedings of the Computer Vision, 2010

Avoiding Confusing Features in Place Recognition.
Proceedings of the Computer Vision, 2010

Semi-supervised Learning of Facial Attributes in Video.
Proceedings of the Trends and Topics in Computer Vision, 2010

Recognizing human actions in still images: a study of bag-of-features and part-based representations.
Proceedings of the British Machine Vision Conference, 2010

2009
Efficient Visual Search of Videos Cast as Text Retrieval.
IEEE Trans. Pattern Anal. Mach. Intell., 2009

Taking the bite out of automated naming of characters in TV video.
Image Vis. Comput., 2009

Segmenting Scenes by Matching Image Composites.
Proceedings of the Advances in Neural Information Processing Systems 22: 23rd Annual Conference on Neural Information Processing Systems 2009. Proceedings of a meeting held 7-10 December 2009, 2009

Automatic annotation of human actions in video.
Proceedings of the IEEE 12th International Conference on Computer Vision, ICCV 2009, Kyoto, Japan, September 27, 2009

"Who are you?" - Learning person specific classifiers from video.
Proceedings of the 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009), 2009

Get Out of my Picture! Internet-based Inpainting.
Proceedings of the British Machine Vision Conference, 2009

2008
Efficient Visual Search for Objects in Videos.
Proc. IEEE, 2008

SIFT Flow: Dense Correspondence across Different Scenes.
Proceedings of the Computer Vision, 2008

Unsupervised discovery of visual object class hierarchies.
Proceedings of the 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2008), 2008

Creating and exploring a large photorealistic virtual space.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2008

Lost in quantization: Improving particular object retrieval in large scale image databases.
Proceedings of the 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2008), 2008

Geometric LDA: A Generative Model for Particular Object Discovery.
Proceedings of the British Machine Vision Conference 2008, Leeds, UK, September 2008, 2008

2007
Oxford TRECVid 2007 \u2013 Notebook paper.
Proceedings of the TRECVID 2007 workshop participants notebook papers, 2007

Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval.
Proceedings of the IEEE 11th International Conference on Computer Vision, 2007

Object retrieval with large vocabularies and fast spatial matching.
Proceedings of the 2007 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2007), 2007

2006
Efficient visual search of images videos.
PhD thesis, 2006

Object Level Grouping for Video Shots.
Int. J. Comput. Vis., 2006

Oxford TRECVID 2006 - Notebook paper.
Proceedings of the 2006 TREC Video Retrieval Evaluation, 2006

Using Multiple Segmentations to Discover Objects and their Extent in Image Collections.
Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2006), 2006

Video Google: Efficient Visual Search of Videos.
Proceedings of the Toward Category-Level Object Recognition, 2006

Finding People in Repeated Shots of the Same Scene.
Proceedings of the British Machine Vision Conference 2006, 2006

Hello! My name is... Buffy'' -- Automatic Naming of Characters in TV Video.
Proceedings of the British Machine Vision Conference 2006, 2006

2005
Discovering Objects and their Localization in Images.
Proceedings of the 10th IEEE International Conference on Computer Vision (ICCV 2005), 2005

Person Spotting: Video Shot Retrieval for Face Sets.
Proceedings of the Image and Video Retrieval, 4th International Conference, 2005

2004
Efficient Visual Content Retrieval and Mining in Videos.
Proceedings of the Advances in Multimedia Information Processing - PCM 2004, 5th Pacific Rim Conference on Multimedia, Tokyo, Japan, November 30, 2004

Efficient object retrieval from videos.
Proceedings of the 2004 12th European Signal Processing Conference, 2004

Video Data Mining Using Configurations of Viewpoint Invariant Regions.
Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2004), with CD-ROM, 27 June, 2004

2003
Video Google: A Text Retrieval Approach to Object Matching in Videos.
Proceedings of the 9th IEEE International Conference on Computer Vision (ICCV 2003), 2003


  Loading...