2025

Towards Understanding the Nature of Attention with Low-Rank Sparse Decomposition.

[DOI]

Zhengfu He

Junxuan Wang

CoRR, April, 2025

Towards Universality: Studying Mechanistic Similarity Across Language Model Architectures.

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024

Correction to: MOSS: An Open Conversational Large Language Model.

[DOI]

Mach. Intell. Res., December, 2024

MOSS: An Open Conversational Large Language Model.

[DOI]

Mach. Intell. Res., October, 2024

Llama Scope: Extracting Millions of Features from Llama-3.1-8B with Sparse Autoencoders.

[DOI]

CoRR, 2024

Automatically Identifying Local and Global Circuits with Linear Computation Graphs.

[DOI]

CoRR, 2024

Dictionary Learning Improves Patch-Free Circuit Discovery in Mechanistic Interpretability: A Case Study on Othello-GPT.

[DOI]

CoRR, 2024

Can AI Assistants Know What They Don't Know?

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

2023

Multitask Pre-training of Modular Prompt for Chinese Few-Shot Learning.

[DOI]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

DiffusionBERT: Improving Generative Masked Language Models with Diffusion Models.

[DOI]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

2022

DiffusionBERT: Improving Generative Masked Language Models with Diffusion Models.

[DOI]

CoRR, 2022

Multi-Task Pre-Training of Modular Prompt for Few-Shot Learning.

[DOI]

CoRR, 2022

BBTv2: Pure Black-Box Optimization Can Be Comparable to Gradient Descent for Few-Shot Learning.

[DOI]

CoRR, 2022

BBTv2: Towards a Gradient-Free Future with Large Language Models.

[DOI]

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022