Juan Carlos Niebles

Orcid: 0000-0001-8225-9793

According to our database1, Juan Carlos Niebles authored at least 119 papers between 2007 and 2024.

Collaborative distances:

Timeline

2008
2010
2012
2014
2016
2018
2020
2022
2024
0
5
10
15
9
10
2
3
4
3
1
4
1
1
6
7
6
9
9
8
10
8
5
3
3
1
2
1
1
2

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
PRACT: Optimizing Principled Reasoning and Acting of LLM Agent.
CoRR, 2024

xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs.
CoRR, 2024

xLAM: A Family of Large Action Models to Empower AI Agent Systems.
CoRR, 2024

xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations.
CoRR, 2024

xGen-MM (BLIP-3): A Family of Open Large Multimodal Models.
CoRR, 2024

APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets.
CoRR, 2024

Artificial Intelligence Index Report 2024.
CoRR, 2024

AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning.
CoRR, 2024

Editing Arbitrary Propositions in LLMs without Subject Labels.
CoRR, 2024

Hierarchical Point Attention for Indoor 3D Object Detection.
Proceedings of the IEEE International Conference on Robotics and Automation, 2024

Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

LayoutDETR: Detection Transformer Is a Good Multimodal Layout Designer.
Proceedings of the Computer Vision - ECCV 2024, 2024

X-InstructBLIP: A Framework for Aligning Image, 3D, Audio, Video to LLMs and its Emergent Cross-Modal Reasoning.
Proceedings of the Computer Vision - ECCV 2024, 2024

ULIP-2: Towards Scalable Multimodal Pre-Training for 3D Understanding.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Causal Layering via Conditional Entropy.
Proceedings of the Causal Learning and Reasoning, 2024

2023
X-InstructBLIP: A Framework for aligning X-Modal instruction-aware representations to LLMs and Emergent Cross-modal Reasoning.
CoRR, 2023

Artificial Intelligence Index Report 2023.
CoRR, 2023

BOLAA: Benchmarking and Orchestrating LLM-augmented Autonomous Agents.
CoRR, 2023

Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization.
CoRR, 2023

REX: Rapid Exploration and eXploitation for AI Agents.
CoRR, 2023

HomE: Homography-Equivariant Video Representation Learning.
CoRR, 2023

ULIP-2: Towards Scalable Multimodal Pre-training for 3D Understanding.
CoRR, 2023

On the Unlikelihood of D-Separation.
CoRR, 2023

Salesforce CausalAI Library: A Fast and Scalable Framework for Causal Analysis of Time Series and Tabular Data.
CoRR, 2023

Model-Agnostic Hierarchical Attention for 3D Object Detection.
CoRR, 2023

PreViTS: Contrastive Pretraining with Video Tracking Supervision.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023

Temporally Disentangled Representation Learning under Unknown Nonstationarity.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

UniControl: A Unified Diffusion Model for Controllable Visual Generation In the Wild.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Deformer: Dynamic Fusion Transformer for Robust Hand Pose Estimation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Procedure-Aware Pretraining for Instructional Video Understanding.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

ULIP: Learning a Unified Representation of Language, Images, and Point Clouds for 3D Understanding.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Mask-Free OVIS: Open-Vocabulary Instance Segmentation without Manual Mask Annotations.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
ULIP: Learning Unified Representation of Language, Image and Point Cloud for 3D Understanding.
CoRR, 2022

The AI Index 2022 Annual Report.
CoRR, 2022

MOMA-LRG: Language-Refined Graphs for Multi-Object Multi-Actor Activity Parsing.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Identifying Auxiliary or Adversarial Tasks Using Necessary Condition Analysis for Adversarial Multi-task Video Understanding.
Proceedings of the Computer Vision - ECCV 2022 Workshops, 2022

PrivHAR: Recognizing Human Actions from Privacy-Preserving Lens.
Proceedings of the Computer Vision - ECCV 2022, 2022

Open Vocabulary Object Detection with Pseudo Bounding-Box Labels.
Proceedings of the Computer Vision - ECCV 2022, 2022

Align and Prompt: Video-and-Language Pre-training with Entity Prompts.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Revisiting the "Video" in Video-Language Understanding.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021
Quantifying Parkinson's disease motor severity under uncertainty using MDS-UPDRS videos.
Medical Image Anal., 2021

Towards Open Vocabulary Object Detection without Human-provided Bounding Boxes.
CoRR, 2021

The AI Index 2021 Annual Report.
CoRR, 2021

Representation Learning with Statistical Independence to Mitigate Bias.
Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2021

MOMA: Multi-Object Multi-Actor Activity Parsing.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Detecting Human-Object Relationships in Videos.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Learning Privacy-preserving Optics for Human Pose Estimation.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Home Action Genome: Cooperative Compositional Action Understanding.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

CoCon: Cooperative-Contrastive Learning.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2021

Metadata Normalization.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

TNT: Text-Conditioned Network with Transductive Inference for Few-Shot Video Classification.
Proceedings of the 32nd British Machine Vision Conference 2021, 2021

2020
Spatiotemporal Relationship Reasoning for Pedestrian Intent Prediction.
IEEE Robotics Autom. Lett., 2020

Segmenting the Future.
IEEE Robotics Autom. Lett., 2020

Socially and Contextually Aware Human Motion and Pose Forecasting.
IEEE Robotics Autom. Lett., 2020

Explaining VQA predictions using visual grounding and a knowledge base.
Image Vis. Comput., 2020

Disentangling Human Dynamics for Pedestrian Locomotion Forecasting with Noisy Supervision.
Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2020

Vision-Based Estimation of MDS-UPDRS Gait Scores for Assessing Parkinson's Disease Motor Severity.
Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2020, 2020

Motion Reasoning for Goal-Based Imitation Learning.
Proceedings of the 2020 IEEE International Conference on Robotics and Automation, 2020

RubiksNet: Learnable 3D-Shift for Efficient Video Action Recognition.
Proceedings of the Computer Vision - ECCV 2020, 2020

Procedure Planning in Instructional Videos.
Proceedings of the Computer Vision - ECCV 2020, 2020

Spatio-Temporal Graph for Video Captioning With Knowledge Distillation.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Action Genome: Actions As Compositions of Spatio-Temporal Scene Graphs.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Few-Shot Video Classification via Temporal Alignment.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Adversarial Cross-Domain Action Recognition with Co-Attention.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019
Action Genome: Actions as Composition of Spatio-temporal Scene Graphs.
CoRR, 2019

Bias-Resilient Neural Network.
CoRR, 2019

D<sup>3</sup>TW: Discriminative Differentiable Dynamic Time Warping for Weakly Supervised Action Alignment and Segmentation.
CoRR, 2019

Interpretable Visual Question Answering by Visual Grounding From Attention Supervision Mining.
Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2019

Action-Agnostic Human Pose Forecasting.
Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2019

Continuous Relaxation of Symbolic Planner for One-Shot Imitation Learning.
Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2019

Imitation Learning for Human Pose Prediction.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Learning Temporal Action Proposals With Fewer Labels.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Peeking Into the Future: Predicting Future Person Activities and Locations in Videos.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Neural Task Graphs: Generalizing to Unseen Tasks From a Single Video Demonstration.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

D3TW: Discriminative Differentiable Dynamic Time Warping for Weakly Supervised Action Alignment and Segmentation.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018
The ActivityNet Large-Scale Activity Recognition Challenge 2018 Summary.
CoRR, 2018

Learning to Decompose and Disentangle Representations for Video Prediction.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

A Deep Learning Based Behavioral Approach to Indoor Autonomous Navigation.
Proceedings of the 2018 IEEE International Conference on Robotics and Automation, 2018

Behavioral Indoor Navigation With Natural Language Directions.
Proceedings of the Companion of the 2018 ACM/IEEE International Conference on Human-Robot Interaction, 2018

Translating Navigation Instructions in Natural Language to a High-Level Plan for Behavioral Robot Navigation.
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31, 2018

Liquid Pouring Monitoring via Rich Sensory Inputs.
Proceedings of the Computer Vision - ECCV 2018, 2018

Graph Distillation for Action Detection with Privileged Modalities.
Proceedings of the Computer Vision - ECCV 2018, 2018

Temporal Modular Networks for Retrieving Complex Compositional Activities in Videos.
Proceedings of the Computer Vision - ECCV 2018, 2018

End-to-End Joint Semantic Segmentation of Actors and Actions in Video.
Proceedings of the Computer Vision - ECCV 2018, 2018

What Makes a Video a Video: Analyzing Temporal Information in Video Understanding Models and Datasets.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Finding "It": Weakly-Supervised Reference-Aware Visual Grounding in Instructional Videos.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

2017
Corrigendum to "Sparse Composition of Body Poses and Atomic Actions for Human Activity Recognition in RGB-D Videos" [Image Vis. Comput. 59 (2017) 63-75].
Image Vis. Comput., 2017

Sparse composition of body poses and atomic actions for human activity recognition in RGB-D videos.
Image Vis. Comput., 2017

Graph Distillation for Action Detection with Privileged Information.
CoRR, 2017

ActivityNet Challenge 2017 Summary.
CoRR, 2017

Risky Region Localization with Point Supervision.
Proceedings of the 2017 IEEE International Conference on Computer Vision Workshops, 2017

Visual Forecasting by Imitating Dynamics in Natural Sequences.
Proceedings of the IEEE International Conference on Computer Vision, 2017

Dense-Captioning Events in Videos.
Proceedings of the IEEE International Conference on Computer Vision, 2017

Agent-Centric Risk Assessment: Accident Anticipation and Risky Region Localization.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

Unsupervised Visual-Linguistic Reference Resolution in Instructional Videos.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

SST: Single-Stream Temporal Action Proposals.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

End-to-End, Single-Stream Temporal Action Detection in Untrimmed Videos.
Proceedings of the British Machine Vision Conference 2017, 2017

Leveraging Video Descriptions to Learn Video Question Answering.
Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 2017

2016
Title Generation for User Generated Videos.
Proceedings of the Computer Vision - ECCV 2016, 2016

Connectionist Temporal Modeling for Weakly Supervised Action Labeling.
Proceedings of the Computer Vision - ECCV 2016, 2016

DAPs: Deep Action Proposals for Action Understanding.
Proceedings of the Computer Vision - ECCV 2016, 2016

A Hierarchical Pose-Based Approach to Complex Action Understanding Using Dictionaries of Actionlets and Motion Poselets.
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

Fast Temporal Activity Proposals for Efficient Detection of Human Actions in Untrimmed Videos.
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

2015
ActivityNet: A large-scale video benchmark for human activity understanding.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015

Robust Manhattan Frame estimation from a single RGB-D image.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015

On the relationship between visual attributes and convolutional networks.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015

2014
Collecting and Annotating Human Activities in Web Videos.
Proceedings of the International Conference on Multimedia Retrieval, 2014

Discriminative Hierarchical Modeling of Spatio-temporally Composable Human Activities.
Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014

Camera Motion and Surrounding Scene Appearance as Context for Action Recognition.
Proceedings of the Computer Vision - ACCV 2014, 2014

2013
Vision-based action recognition of earthmoving equipment using spatio-temporal features and support vector machine classifiers.
Adv. Eng. Informatics, 2013

Spatio-temporal Human-Object Interactions for Action Recognition in Videos.
Proceedings of the 2013 IEEE International Conference on Computer Vision Workshops, 2013

2010
Modeling Temporal Structure of Decomposable Motion Segments for Activity Classification.
Proceedings of the Computer Vision, 2010

Efficient extraction of human motion volumes by tracking.
Proceedings of the Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition, 2010

2009
Mining discriminative adjectives and prepositions for natural scene recognition.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2009

2008
Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words.
Int. J. Comput. Vis., 2008

Extracting Moving People from Internet Videos.
Proceedings of the Computer Vision, 2008

2007
A Hierarchical Model of Shape and Appearance for Human Action Classification.
Proceedings of the 2007 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2007), 2007

OPTIMOL: A Framework for Online Picture Collection via Incremental Model Learning.
Proceedings of the Twenty-Second AAAI Conference on Artificial Intelligence, 2007


  Loading...