Adrià Garriga-Alonso

According to our database1, Adrià Garriga-Alonso authored at least 20 papers between 2019 and 2025.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
Open Problems in Mechanistic Interpretability.
CoRR, January, 2025

2024
Planning behavior in a recurrent neural network that plays Sokoban.
CoRR, 2024

Adversarial Circuit Evaluation.
CoRR, 2024

Investigating the Indirect Object Identification circuit in Mamba.
CoRR, 2024

Analyzing the Generalization and Reliability of Steering Vectors.
CoRR, 2024

Analysing the Generalisation and Reliability of Steering Vectors.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Hypothesis Testing the Circuit Hypothesis in LLMs.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Catastrophic Goodhart: regularizing RLHF with KL divergence does not mitigate heavy-tailed reward misspecification.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

InterpBench: Semi-Synthetic Transformers for Evaluating Mechanistic Interpretability Techniques.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

2023
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models.
, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
Trans. Mach. Learn. Res., 2023

Towards Automated Circuit Discovery for Mechanistic Interpretability.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

2022
Data augmentation in Bayesian neural networks and the cold posterior effect.
Proceedings of the Uncertainty in Artificial Intelligence, 2022

Bayesian Neural Network Priors Revisited.
Proceedings of the Tenth International Conference on Learning Representations, 2022

2021
<i>BNNpriors</i>: A library for Bayesian neural network inference with different prior distributions.
Softw. Impacts, 2021

BNNpriors: A library for Bayesian neural network inference with different prior distributions.
CoRR, 2021

Bayesian Neural Network Priors Revisited.
CoRR, 2021

Exact Langevin Dynamics with Stochastic Gradients.
CoRR, 2021

Correlated weights in infinite limits of deep convolutional neural networks.
Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence, 2021

2020
Understanding Variational Inference in Function-Space.
CoRR, 2020

2019
Deep Convolutional Networks as shallow Gaussian Processes.
Proceedings of the 7th International Conference on Learning Representations, 2019


  Loading...