2025
Towards Understanding the Nature of Attention with Low-Rank Sparse Decomposition.
CoRR, April, 2025

Towards Universality: Studying Mechanistic Similarity Across Language Model Architectures.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024
Correction to: MOSS: An Open Conversational Large Language Model.
Mach. Intell. Res., December, 2024

MOSS: An Open Conversational Large Language Model.
Mach. Intell. Res., October, 2024

Llama Scope: Extracting Millions of Features from Llama-3.1-8B with Sparse Autoencoders.
CoRR, 2024

Automatically Identifying Local and Global Circuits with Linear Computation Graphs.
CoRR, 2024

Dictionary Learning Improves Patch-Free Circuit Discovery in Mechanistic Interpretability: A Case Study on Othello-GPT.
CoRR, 2024

Can AI Assistants Know What They Don't Know?
Proceedings of the Forty-first International Conference on Machine Learning, 2024

2023
Multitask Pre-training of Modular Prompt for Chinese Few-Shot Learning.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

DiffusionBERT: Improving Generative Masked Language Models with Diffusion Models.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

2022
DiffusionBERT: Improving Generative Masked Language Models with Diffusion Models.
CoRR, 2022

Multi-Task Pre-Training of Modular Prompt for Few-Shot Learning.
CoRR, 2022

BBTv2: Pure Black-Box Optimization Can Be Comparable to Gradient Descent for Few-Shot Learning.
CoRR, 2022

BBTv2: Towards a Gradient-Free Future with Large Language Models.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022