Towards Understanding the Nature of Attention with Low-Rank Sparse Decomposition.
CoRR, April, 2025
Towards Universality: Studying Mechanistic Similarity Across Language Model Architectures.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
Correction to: MOSS: An Open Conversational Large Language Model.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Mach. Intell. Res., December, 2024
MOSS: An Open Conversational Large Language Model.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Mach. Intell. Res., October, 2024
Llama Scope: Extracting Millions of Features from Llama-3.1-8B with Sparse Autoencoders.
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2024
Automatically Identifying Local and Global Circuits with Linear Computation Graphs.
CoRR, 2024
Dictionary Learning Improves Patch-Free Circuit Discovery in Mechanistic Interpretability: A Case Study on Othello-GPT.
CoRR, 2024
Can AI Assistants Know What They Don't Know?
Proceedings of the Forty-first International Conference on Machine Learning, 2024
Multitask Pre-training of Modular Prompt for Chinese Few-Shot Learning.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023
DiffusionBERT: Improving Generative Masked Language Models with Diffusion Models.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023
DiffusionBERT: Improving Generative Masked Language Models with Diffusion Models.
CoRR, 2022
Multi-Task Pre-Training of Modular Prompt for Few-Shot Learning.
CoRR, 2022
BBTv2: Pure Black-Box Optimization Can Be Comparable to Gradient Descent for Few-Shot Learning.
CoRR, 2022
BBTv2: Towards a Gradient-Free Future with Large Language Models.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022