2024
Minor SFT loss for LLM fine-tune to increase performance and reduce model deviation.
CoRR, 2024

Minor DPO reject penalty to increase training robustness.
CoRR, 2024