David Lindner

Orcid: 0000-0001-7051-7433

According to our database1, David Lindner authored at least 21 papers between 2019 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
On scalable oversight with weak LLMs judging strong LLMs.
CoRR, 2024

Evaluating Frontier Models for Dangerous Capabilities.
CoRR, 2024

Vision-Language Models are Zero-Shot Reward Models for Reinforcement Learning.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Learning Safety Constraints from Demonstrations with Unknown Rewards.
Proceedings of the International Conference on Artificial Intelligence and Statistics, 2024

2023
GoSafeOpt: Scalable safe exploration for global optimization of dynamical systems.
Artif. Intell., July, 2023

Algorithmic Foundations for Safe and Efficient Reinforcement Learning from Human Feedback.
PhD thesis, 2023

Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback.
Trans. Mach. Learn. Res., 2023

RLHF-Blender: A Configurable Interactive Interface for Learning from Diverse Human Feedback.
CoRR, 2023

Tracr: Compiled Transformers as a Laboratory for Interpretability.
CoRR, 2023

Tracr: Compiled Transformers as a Laboratory for Interpretability.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

2022
Red-Teaming the Stable Diffusion Safety Filter.
CoRR, 2022

Humans are not Boltzmann Distributions: Challenges and Opportunities for Modelling Human Feedback and Interaction in Reinforcement Learning.
CoRR, 2022

Scalable Safe Exploration for Global Optimization of Dynamical Systems.
CoRR, 2022

Active Exploration for Inverse Reinforcement Learning.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Interactively Learning Preference Constraints in Linear Bandits.
Proceedings of the International Conference on Machine Learning, 2022

2021
Information Directed Reward Learning for Reinforcement Learning.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Addressing the Long-term Impact of ML Decisions via Policy Regret.
Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, 2021

Learning What To Do by Simulating the Past.
Proceedings of the 9th International Conference on Learning Representations, 2021

Challenges for Using Impact Regularizers to Avoid Negative Side Effects.
Proceedings of the Workshop on Artificial Intelligence Safety 2021 (SafeAI 2021) co-located with the Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI 2021), 2021

2019
Sensing Social Media Signals for Cryptocurrency News.
Proceedings of the Companion of The 2019 World Wide Web Conference, 2019

Detecting Spiky Corruption in Markov Decision Processes.
Proceedings of the Workshop on Artificial Intelligence Safety 2019 co-located with the 28th International Joint Conference on Artificial Intelligence, 2019


  Loading...