Rohin Shah
Orcid: 0000-0002-0656-2800
According to our database1,
Rohin Shah
authored at least 33 papers
between 2014 and 2025.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
On csauthors.net:
Bibliography
2025
MONA: Myopic Optimization with Non-myopic Approval Can Mitigate Multi-step Reward Hacking.
CoRR, January, 2025
2024
CoRR, 2024
Improving Sparse Decomposition of Language Model Activations with Gated Sparse Autoencoders.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
2023
Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla.
CoRR, 2023
Towards Solving Fuzzy Tasks with Human Feedback: A Retrospective of the MineRL BASALT 2022 Competition.
CoRR, 2023
BEDD: The MineRL BASALT Evaluation and Demonstrations Dataset for Training and Benchmarking Agents that Solve Fuzzy Tasks.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023
Proceedings of the 2023 ACM/IEEE International Conference on Human-Robot Interaction, 2023
2022
CoRR, 2022
CoRR, 2022
2021
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021
Proceedings of the NeurIPS 2021 Competitions and Demonstrations Track, 2021
Towards Solving Fuzzy Tasks with Human Feedback: A Retrospective of the MineRL BASALT 2022 Competition.
Proceedings of the NeurIPS 2022 Competition Track, 2021
Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, 2021
Proceedings of the 9th International Conference on Learning Representations, 2021
Proceedings of the AAMAS '21: 20th International Conference on Autonomous Agents and Multiagent Systems, 2021
2020
PhD thesis, 2020
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020
Proceedings of the Workshop on Artificial Intelligence Safety 2020 co-located with the 29th International Joint Conference on Artificial Intelligence and the 17th Pacific Rim International Conference on Artificial Intelligence (IJCAI-PRICAI 2020), 2020
2019
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019
On the Feasibility of Learning, Rather than Assuming, Human Biases for Reward Inference.
Proceedings of the 36th International Conference on Machine Learning, 2019
Proceedings of the 7th International Conference on Learning Representations, 2019
2018
2016
2014
Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, 2014