Ranjay Krishna

Orcid: 0000-0001-8784-2531

According to our database1, Ranjay Krishna authored at least 103 papers between 2015 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
NaturalBench: Evaluating Vision-Language Models on Natural Adversarial Samples.
CoRR, 2024

Language Model Preference Evaluation with Multiple Weak Evaluators.
CoRR, 2024

ActionAtlas: A VideoQA Benchmark for Domain-specialized Action Recognition.
CoRR, 2024

AHA: A Vision-Language-Model for Detecting and Reasoning Over Failures in Robotic Manipulation.
CoRR, 2024

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models.
CoRR, 2024

Self-Enhancing Video Data Management System for Compositional Events with Large Language Models [Technical Report].
CoRR, 2024

Coarse Correspondence Elicit 3D Spacetime Understanding in Multimodal Language Model.
CoRR, 2024

Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions.
CoRR, 2024

Manipulate-Anything: Automating Real-World Robots using Vision-Language Models.
CoRR, 2024

Task Me Anything.
CoRR, 2024

RoboPoint: A Vision-Language Model for Spatial Affordance Prediction for Robotics.
CoRR, 2024

Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models.
CoRR, 2024

The Unmet Promise of Synthetic Training Images: Using Retrieved Real Images Performs Better.
CoRR, 2024

Superposed Decoding: Multiple Generations from a Single Autoregressive Inference Pass.
CoRR, 2024

Multilingual Diversity Improves Vision-Language Representations.
CoRR, 2024

SPARO: Selective Attention for Robust and Compositional Transformer Encodings for Vision.
CoRR, 2024

EVE: Enabling Anyone to Train Robot using Augmented Reality.
CoRR, 2024

Videoshop: Localized Semantic Video Editing with Noise-Extrapolated Diffusion Inversion.
CoRR, 2024

Training Language Model Agents without Modifying Language Models.
CoRR, 2024

THE COLOSSEUM: A Benchmark for Evaluating Generalization for Robotic Manipulation.
CoRR, 2024

Scaling Up LLM Reviews for Google Ads Content Moderation.
Proceedings of the 17th ACM International Conference on Web Search and Data Mining, 2024

EVE: Enabling Anyone to Train Robots using Augmented Reality.
Proceedings of the 37th Annual ACM Symposium on User Interface Software and Technology, 2024

Offline Training of Language Model Agents with Functions as Learnable Weights.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Selective Visual Representations Improve Convergence and Generalization for Embodied AI.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Davidsonian Scene Graph: Improving Reliability in Fine-grained Evaluation for Text-to-Image Generation.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

ImageInWords: Unlocking Hyper-Detailed Image Descriptions.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Lookback Lens: Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Is C4 Dataset Optimal for Pruning? An Investigation of Calibration Data for LLM Pruning.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

m &m's: A Benchmark to Evaluate Tool-Use for multi-step multi-modal Tasks.
Proceedings of the Computer Vision - ECCV 2024, 2024

Efficient Inference of Vision Instruction-Following Models with Elastic Cache.
Proceedings of the Computer Vision - ECCV 2024, 2024

BLINK: Multimodal Large Language Models Can See but Not Perceive.
Proceedings of the Computer Vision - ECCV 2024, 2024

Iterated Learning Improves Compositionality in Large Vision-Language Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Holodeck: Language Guided Generation of 3D Embodied AI Environments.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Modeling Collaborator: Enabling Subjective Vision Classification with Minimal Human Effort via LLM Tool-Use.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Quilt-LLaVA: Visual Instruction Tuning by Extracting Localized Narratives from Open-Source Histopathology Videos.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

MIMIC: Masked Image Modeling with Image Correspondences.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

SPOC: Imitating Shortest Paths in Simulation Enables Effective Navigation and Manipulation in the Real World.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Found in the middle: Calibrating Positional Attention Bias Improves Long Context Utilization.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

2023
Guest Editorial: Introduction to the Special Section on Graphs in Vision and Pattern Analysis.
IEEE Trans. Pattern Anal. Mach. Intell., June, 2023

Explanations Can Reduce Overreliance on AI Systems During Decision-Making.
Proc. ACM Hum. Comput. Interact., April, 2023

EQUI-VOCAL: Synthesizing Queries for Compositional Video Events from Limited User Interactions.
Proc. VLDB Endow., 2023

EQUI-VOCAL Demonstration: Synthesizing Video Queries from User Interactions.
Proc. VLDB Endow., 2023

VOCALExplore: Pay-as-You-Go Video Data Exploration and Model Building.
Proc. VLDB Endow., 2023

Designing LLM Chains by Adapting Techniques from Crowdsourcing Workflows.
CoRR, 2023

Imitating Shortest Paths in Simulation Enables Effective Navigation and Manipulation in the Real World.
CoRR, 2023

Lasagna: Layered Score Distillation for Disentangled Object Relighting.
CoRR, 2023

DreamSync: Aligning Text-to-Image Generation with Image Understanding Feedback.
CoRR, 2023

Improving Interpersonal Communication by Simulating Audiences with Language Models.
CoRR, 2023

Cultural and Linguistic Diversity Improves Visual Representations.
CoRR, 2023

EcoAssistant: Using LLM Assistant More Affordably and Accurately.
CoRR, 2023

Tool Documentation Enables Zero-Shot Tool-Usage with Large Language Models.
CoRR, 2023

MIMIC: Masked Image Modeling with Image Correspondences.
CoRR, 2023

COLA: How to adapt vision-language models to Compose Objects Localized with Attributes?
CoRR, 2023

EQUI-VOCAL: Synthesizing Queries for Compositional Video Events from Limited User Interactions [Technical Report].
CoRR, 2023

Large Language Model as Attributed Training Data Generator: A Tale of Diversity and Bias.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Cola: A Benchmark for Compositional Text-to-image Retrieval.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

OBJECT 3DIT: Language-guided 3D-aware Image Editing.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Quilt-1M: One Million Image-Text Pairs for Histopathology.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

SugarCrepe: Fixing Hackable Benchmarks for Vision-Language Compositionality.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023



TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

@ CREPE: Can Vision-Language Foundation Models Reason Compositionally?
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

AR2-D2: Training a Robot Without a Robot.
Proceedings of the Conference on Robot Learning, 2023

Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

2022
AGQA 2.0: An Updated Benchmark for Compositional Spatio-Temporal Reasoning.
CoRR, 2022

ELIGN: Expectation Alignment as a Multi-Agent Intrinsic Reward.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Measuring Compositional Consistency for Video Question Answering.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

VOCAL: Video Organization and Interactive Compositional AnaLytics.
Proceedings of the 12th Conference on Innovative Data Systems Research, 2022

2021
Visual intelligence through human learning.
PhD thesis, 2021

Visual Intelligence through Human Interaction.
CoRR, 2021

On the Opportunities and Risks of Foundation Models.
CoRR, 2021

AGQA: A Benchmark for Compositional Spatio-Temporal Reasoning.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Mind Your Outliers! Investigating the Negative Impact of Outliers on Active Learning for Visual Question Answering.
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

2020
Conceptual Metaphors Impact Perceptions of Human-AI Collaboration.
Proc. ACM Hum. Comput. Interact., 2020

Action Genome: Actions As Compositions of Spatio-Temporal Scene Graphs.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Determining Question-Answer Plausibility in Crowdsourced Datasets Using Multi-Task Learning.
Proceedings of the Sixth Workshop on Noisy User-generated Text, 2020

2019
Action Genome: Actions as Composition of Spatio-temporal Scene Graphs.
CoRR, 2019

Deep Bayesian Active Learning for Multiple Correct Outputs.
CoRR, 2019

HYPE: A Benchmark for Human eYe Perceptual Evaluation of Generative Models.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

HYPE: Human-eYe Perceptual Evaluation of Generative Models.
Proceedings of the Deep Generative Models for Highly Structured Data, 2019

Visual Relationships as Functions: Enabling Few-Shot Scene Graph Prediction.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshops, 2019

Scene Graph Prediction with Limited Labels.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshops, 2019

AI-Based Request Augmentation to Increase Crowdsourcing Participation.
Proceedings of the Seventh AAAI Conference on Human Computation and Crowdsourcing, 2019

Information Maximizing Visual Question Generation.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Eevee: Transforming Images by Bridging High-level Goals and Low-level Edit Operations.
Proceedings of the Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems, 2019

2018
The ActivityNet Large-Scale Activity Recognition Challenge 2018 Summary.
CoRR, 2018

Engagement Learning: Expanding Visual Knowledge by Engaging Online Participants.
Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology Adjunct Proceedings, 2018

Referring Relationships.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

2017
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations.
Int. J. Comput. Vis., 2017

ActivityNet Challenge 2017 Summary.
CoRR, 2017

Crowd Research: Open and Scalable University Laboratories.
Proceedings of the 30th Annual ACM Symposium on User Interface Software and Technology, 2017

Dense-Captioning Events in Videos.
Proceedings of the IEEE International Conference on Computer Vision, 2017

A Hierarchical Approach for Generating Descriptive Image Paragraphs.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

A Glimpse Far into the Future: Understanding Long-term Crowd Worker Quality.
Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing, 2017

2016
A Glimpse Far into the Future: Understanding Long-term Crowd Worker Accuracy.
CoRR, 2016

Visual Relationship Detection with Language Priors.
Proceedings of the Computer Vision - ECCV 2016, 2016

Embracing Error to Enable Rapid Crowdsourcing.
Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, 2016

2015
SentenceRacer: A Game with a Purpose for Image Sentence Annotation.
CoRR, 2015


Image retrieval using scene graphs.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015

Generating Semantically Precise Scene Graphs from Textual Descriptions for Improved Image Retrieval.
Proceedings of the Fourth Workshop on Vision and Language, 2015


  Loading...