×
2023
Scaling TransNormer to 175 Billion Parameters.
[DOI]
Zhen Qin
,
Dong Li
,
Weigao Sun
,
Weixuan Sun
,
Xuyang Shen
,
Xiaodong Han
,
Yunshen Wei
,
Baohong Lv
,
Fei Yuan
,
Xiao Luo
,
Yu Qiao
,
Yiran Zhong
CoRR, 2023
2022
cosFormer: Rethinking Softmax In Attention.
[DOI]
Zhen Qin
,
Weixuan Sun
,
Hui Deng
,
Dongxu Li
,
Yunshen Wei
,
Baohong Lv
,
Junjie Yan
,
Lingpeng Kong
,
Yiran Zhong
Proceedings of the Tenth International Conference on Learning Representations, 2022