2023
Scaling TransNormer to 175 Billion Parameters.
CoRR, 2023

2022
cosFormer: Rethinking Softmax In Attention.
Proceedings of the Tenth International Conference on Learning Representations, 2022