Weight Ensembling Improves Reasoning in Language Models.
CoRR, April, 2025
Overtrained Language Models Are Harder to Fine-Tune.
CoRR, March, 2025
Symmetrical Visual Contrastive Optimization: Aligning Vision-Language Models with Minimal Contrastive Images.
CoRR, February, 2025
Task Generalization With AutoRegressive Compositional Structure: Can Learning From <i>D</i> Tasks Generalize to <i>D</i><sup>T</sup> Tasks?
CoRR, February, 2025
Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models.
CoRR, January, 2025
From Sparse Dependence to Sparse Attention: Unveiling How Chain-of-Thought Enhances Transformer Sample Efficiency.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
RNNs are not Transformers (Yet): The Key Bottleneck on In-Context Retrieval.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
Understanding Warmup-Stable-Decay Learning Rates: A River Valley Loss Landscape View.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
Understanding Warmup-Stable-Decay Learning Rates: A River Valley Loss Landscape Perspective.
CoRR, 2024
Practically Solving LPN in High Noise Regimes Faster Using Neural Networks.
IACR Cryptol. ePrint Arch., 2023
Transformers are uninterpretable with myopic methods: a case study with bounded Dyck grammars.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023
Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To Achieve Better Generalization.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023
Benign Overfitting in Classification: Provably Counter Label Noise with Larger Models.
Proceedings of the Eleventh International Conference on Learning Representations, 2023
How Sharpness-Aware Minimization Minimizes Sharpness?
Proceedings of the Eleventh International Conference on Learning Representations, 2023
How Does Sharpness-Aware Minimization Minimize Sharpness?
CoRR, 2022
Realistic Deep Learning May Not Fit Benignly.
CoRR, 2022
On Transferability of Prompt Tuning for Natural Language Processing.
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022
Finding Skill Neurons in Pre-trained Transformer-based Language Models.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022