×
2024
Minor SFT loss for LLM fine-tune to increase performance and reduce model deviation.
[DOI]
Shiming Xie
,
Hong Chen
,
Fred Yu
,
Zeye Sun
,
Xiuyu Wu
CoRR, 2024
Minor DPO reject penalty to increase training robustness.
[DOI]
Shiming Xie
,
Hong Chen
,
Fred Yu
,
Zeye Sun
,
Xiuyu Wu
,
Yingfan Hu
CoRR, 2024