Adrià Garriga-Alonso
According to our database1,
Adrià Garriga-Alonso
authored at least 20 papers
between 2019 and 2025.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
On csauthors.net:
Bibliography
2025
2024
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
Catastrophic Goodhart: regularizing RLHF with KL divergence does not mitigate heavy-tailed reward misspecification.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
InterpBench: Semi-Synthetic Transformers for Evaluating Mechanistic Interpretability Techniques.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
2023
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models.
Trans. Mach. Learn. Res., 2023
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023
2022
Proceedings of the Uncertainty in Artificial Intelligence, 2022
Proceedings of the Tenth International Conference on Learning Representations, 2022
2021
<i>BNNpriors</i>: A library for Bayesian neural network inference with different prior distributions.
Softw. Impacts, 2021
BNNpriors: A library for Bayesian neural network inference with different prior distributions.
CoRR, 2021
Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence, 2021
2020
2019
Proceedings of the 7th International Conference on Learning Representations, 2019