Jacob Steinhardt

Orcid: 0000-0002-0257-3860

According to our database1, Jacob Steinhardt authored at least 99 papers between 2009 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Language Models Learn to Mislead Humans via RLHF.
CoRR, 2024

Explaining Datasets in Words: Statistical Models with Natural Language Parameters.
CoRR, 2024

Safety vs. Performance: How Multi-Objective Learning Reduces Barriers to Market Entry.
CoRR, 2024

Monitoring Latent World States in Language Models with Propositional Probes.
CoRR, 2024

Adversaries Can Misuse Combinations of Safe Models.
CoRR, 2024

Interpreting the Second-Order Effects of Neurons in CLIP.
CoRR, 2024

Approaching Human-Level Forecasting with Language Models.
CoRR, 2024

Feedback Loops With Language Models Drive In-Context Reward Hacking.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Overthinking the Truth: Understanding how Language Models Process False Demonstrations.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Interpreting CLIP's Image Representation via Text-Based Decomposition.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

How do Language Models Bind Entities in Context?
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Describing Differences in Image Sets with Natural Language.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023
Learning Equilibria in Matching Markets with Bandit Feedback.
J. ACM, June, 2023

Incentivizing High-Quality Content in Online Recommender Systems.
CoRR, 2023

Eliciting Latent Predictions from Transformers with the Tuned Lens.
CoRR, 2023

Goal Driven Discovery of Distributional Differences via Language Descriptions.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Mass-Producing Failures of Multimodal Systems with Language Models.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Improved Bayes Risk Can Yield Reduced Social Welfare Under Competition.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Supply-Side Equilibria in Recommender Systems.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Jailbroken: How Does LLM Safety Training Fail?
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Are Neurons Actually Collapsed? On the Fine-Grained Structure in Neural Representations.
Proceedings of the International Conference on Machine Learning, 2023

Automatically Auditing Large Language Models via Discrete Optimization.
Proceedings of the International Conference on Machine Learning, 2023

Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 Small.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Progress measures for grokking via mechanistic interpretability.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Discovering Latent Knowledge in Language Models Without Supervision.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Reward Learning as Doubly Nonparametric Bandits: Optimal Design and Scaling Laws.
Proceedings of the International Conference on Artificial Intelligence and Statistics, 2023

2022
Stronger data poisoning attacks break data sanitization defenses.
Mach. Learn., 2022

Auditing Visualizations: Transparency Methods Struggle to Detect Anomalous Behavior.
CoRR, 2022

Summarizing Differences between Text Distributions with Natural Language.
CoRR, 2022

Forecasting Future World Events With Neural Networks.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

How Would The Viewer Feel? Estimating Wellbeing From Video Scenarios.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Capturing Failures of Large Language Models via Human Cognitive Biases.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Describing Differences between Text Distributions with Natural Language.
Proceedings of the International Conference on Machine Learning, 2022

Predicting Out-of-Distribution Error with the Projection Norm.
Proceedings of the International Conference on Machine Learning, 2022

Scaling Out-of-Distribution Detection for Real-World Settings.
Proceedings of the International Conference on Machine Learning, 2022

More Than a Toy: Random Matrix Models Predict How Real-World Neural Representations Generalize.
Proceedings of the International Conference on Machine Learning, 2022

The Effects of Reward Misspecification: Mapping and Mitigating Misaligned Models.
Proceedings of the Tenth International Conference on Learning Representations, 2022

A3D: Studying Pretrained Representations with Programmable Datasets.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2022

PixMix: Dreamlike Pictures Comprehensively Improve Safety Measures.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021
The Effect of Model Size on Worst-Group Generalization.
CoRR, 2021

Unsolved Problems in ML Safety.
CoRR, 2021

Grounding Representation Similarity with Statistical Testing.
CoRR, 2021

Understanding Generalization in Adversarial Training via the Bias-Variance Decomposition.
CoRR, 2021

Approximating How Single Head Attention Learns.
CoRR, 2021

Technical perspective: Robust statistics tackle new problems.
Commun. ACM, 2021

Learning Equilibria in Matching Markets from Bandit Feedback.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

What Would Jiminy Cricket Do? Towards Agents That Behave Morally.
Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, 2021

Measuring Coding Challenge Competence With APPS.
Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, 2021

Measuring Mathematical Problem Solving With the MATH Dataset.
Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, 2021

Grounding Representation Similarity Through Statistical Testing.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Agnostic Learning with Unknown Utilities.
Proceedings of the 12th Innovations in Theoretical Computer Science Conference, 2021

Measuring Massive Multitask Language Understanding.
Proceedings of the 9th International Conference on Learning Representations, 2021

Aligning AI With Shared Human Values.
Proceedings of the 9th International Conference on Learning Representations, 2021

The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Natural Adversarial Examples.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Limitations of Post-Hoc Feature Alignment for Robustness.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Are Larger Pretrained Language Models Uniformly Better? Comparing Performance at the Instance Level.
Proceedings of the Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, 2021

2020
Robust estimation via generalized quasi-gradients.
CoRR, 2020

Enabling certification of verification-agnostic networks via memory-efficient semidefinite programming.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

When does the Tukey Median work?
Proceedings of the IEEE International Symposium on Information Theory, 2020

Rethinking Bias-Variance Trade-off for Generalization of Neural Networks.
Proceedings of the 37th International Conference on Machine Learning, 2020

Identifying Statistical Bias in Dataset Replication.
Proceedings of the 37th International Conference on Machine Learning, 2020

2019
Troubling Trends in Machine Learning Scholarship.
ACM Queue, 2019

FrAngel: component-based synthesis with control structures.
Proc. ACM Program. Lang., 2019

A Benchmark for Anomaly Segmentation.
CoRR, 2019

Generalized Resilience and Robust Statistics.
CoRR, 2019

Testing Robustness Against Unforeseen Adversaries.
CoRR, 2019

Transfer of Adversarial Robustness Between Perturbation Types.
CoRR, 2019

Research for practice: troubling trends in machine-learning scholarship.
Commun. ACM, 2019

Sever: A Robust Meta-Algorithm for Stochastic Optimization.
Proceedings of the 36th International Conference on Machine Learning, 2019

2018
Robust learning: information theory and algorithms.
PhD thesis, 2018

The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation.
CoRR, 2018

Robust moment estimation and improved clustering via sum of squares.
Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, 2018

Semidefinite relaxations for certifying robustness to adversarial examples.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Resilience: A Criterion for Learning in the Presence of Arbitrary Outliers.
Proceedings of the 9th Innovations in Theoretical Computer Science Conference, 2018

Certified Defenses against Adversarial Examples.
Proceedings of the 6th International Conference on Learning Representations, 2018

2017
Does robustness imply tractability? A lower bound for planted clique in the semi-random model.
Electron. Colloquium Comput. Complex., 2017

Better Agnostic Clustering Via Relaxed Tensor Norms.
CoRR, 2017

Learning from untrusted data.
Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, 2017

Certified Defenses for Data Poisoning Attacks.
Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

2016
Concrete Problems in AI Safety.
CoRR, 2016

Avoiding Imposters and Delinquents: Adversarial Crowdsourcing and Peer Prediction.
Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

Unsupervised Risk Estimation Using Only Conditional Independence Structure.
Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

2015
Memory, Communication, and Statistical Queries.
Electron. Colloquium Comput. Complex., 2015

Learning with Relaxed Supervision.
Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 2015

Learning Fast-Mixing Models for Structured Prediction.
Proceedings of the 32nd International Conference on Machine Learning, 2015

Reified Context Models.
Proceedings of the 32nd International Conference on Machine Learning, 2015

Minimax rates for memory-bounded sparse linear regression.
Proceedings of The 28th Conference on Learning Theory, 2015

Learning Where to Sample in Structured Prediction.
Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, 2015

2014
The Statistics of Streaming Sparse Regression.
CoRR, 2014

Adaptivity and Optimism: An Improved Exponentiated Gradient Algorithm.
Proceedings of the 31th International Conference on Machine Learning, 2014

Filtering with Abstract Particles.
Proceedings of the 31th International Conference on Machine Learning, 2014

2012
Flexible Martingale Priors for Deep Hierarchies.
Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, 2012

Finite-time regional verification of stochastic non-linear systems.
Int. J. Robotics Res., 2012

2011
Finite-Time Regional Verification of Stochastic Nonlinear Systems.
Proceedings of the Robotics: Science and Systems VII, 2011

2010
Permutations with Ascending and Descending Blocks.
Electron. J. Comb., 2010

2009
On Coloring the Odd-Distance Graph.
Electron. J. Comb., 2009


  Loading...