Zico Kolter

Aditi Raghunathan

CoRR, April, 2025

Overtrained Language Models Are Harder to Fine-Tune.

[DOI]

Jacob Mitchell Springer

CoRR, March, 2025

Symmetrical Visual Contrastive Optimization: Aligning Vision-Language Models with Minimal Contrastive Images.

[DOI]

CoRR, February, 2025

Task Generalization With AutoRegressive Compositional Structure: Can Learning From D Tasks Generalize to DT Tasks?

[DOI]

CoRR, February, 2025

Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models.

[DOI]

CoRR, January, 2025

From Sparse Dependence to Sparse Attention: Unveiling How Chain-of-Thought Enhances Transformer Sample Efficiency.

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

RNNs are not Transformers (Yet): The Key Bottleneck on In-Context Retrieval.

[DOI]

Xingyu Dang

Kaifeng Lyu

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Understanding Warmup-Stable-Decay Learning Rates: A River Valley Loss Landscape View.

[DOI]

Jason S. Wang

David Leo Wright Hall

Percy Liang

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024

Understanding Warmup-Stable-Decay Learning Rates: A River Valley Loss Landscape Perspective.

[DOI]

CoRR, 2024

2023

Practically Solving LPN in High Noise Regimes Faster Using Neural Networks.

[DOI]

Haozhe Jiang

Yilei Chen

IACR Cryptol. ePrint Arch., 2023

Transformers are uninterpretable with myopic methods: a case study with bounded Dyck grammars.

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To Achieve Better Generalization.

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Benign Overfitting in Classification: Provably Counter Label Noise with Larger Models.

[DOI]

Jiaye Teng

Jingzhao Zhang

Proceedings of the Eleventh International Conference on Learning Representations, 2023

How Sharpness-Aware Minimization Minimizes Sharpness?

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

2022

How Does Sharpness-Aware Minimization Minimize Sharpness?

[DOI]

CoRR, 2022

Realistic Deep Learning May Not Fit Benignly.

[DOI]