2025

SWE-smith: Scaling Data for Software Engineering Agents.

[DOI]

,

,

Carlos E. Jimenez

,

Alexander Wettig

,

,

,

,

,

,

CoRR, April, 2025

SWE-bench Multimodal: Do AI Systems Generalize to Visual Software Domains?

[DOI]

,

Carlos E. Jimenez

,

,

,

,

,

,

Niklas Muennighoff

,

Gabriel Synnaeve

,

Karthik R. Narasimhan

,

,

,

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024

EnIGMA: Enhanced Interactive Generative Model Agent for CTF Challenges.

[DOI]

Talor Abramovich

,

,

,

,

,

Kimberly Milner

,

Sofija Jancheska

,

,

Carlos E. Jimenez

,

Farshad Khorrami

,

Prashanth Krishnamurthy

,

Brendan Dolan-Gavitt

,

Muhammad Shafique

,

Karthik Narasimhan

,

,

CoRR, 2024

SciCode: A Research Coding Benchmark Curated by Scientists.

[DOI]

,

,

Shizhuo Dylan Zhang

,

,

,

,

,

,

Kittithat Krongchon

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, 2024

SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering.

[DOI]

,

Carlos E. Jimenez

,

Alexander Wettig

,

,

,

Karthik Narasimhan

,

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

SciCode: A Research Coding Benchmark Curated by Scientists.

[DOI]

,

,

Shizhuo Dylan Zhang

,

,

,

,

,

,

Kittithat Krongchon

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

CiteME: Can Language Models Accurately Cite Scientific Claims?

[DOI]

,

Andreas Hochlehnert

,

,

Vishaal Udandarao

,

,

Matthias Bethge

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

How Language Model Hallucinations Can Snowball.

[DOI]

,

,

William Merrill

,

,

Proceedings of the Forty-first International Conference on Machine Learning, 2024

SWE-bench: Can Language Models Resolve Real-world Github Issues?

[DOI]

Carlos E. Jimenez

,

,

Alexander Wettig

,

,

,

,

Karthik R. Narasimhan

Proceedings of the Twelfth International Conference on Learning Representations, 2024

AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?

[DOI]

,

Samuel Joseph Amouyal

,

Chaitanya Malaviya

,

,

,

Jonathan Berant

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

2023

Complementing Scale: Novel Guidance Methods for Improving Language Models

[DOI]

PhD thesis, 2023

Measuring and Narrowing the Compositionality Gap in Language Models.

[DOI]

,

,

,

,

,

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

2022

What Language Model to Train if You Have One Million GPU Hours?

[DOI]

,

,

,

Lucile Saulnier

,

,

,

Stella Biderman

,

,

Niklas Muennighoff

,

,

,

,

,

,

Lintang Sutawika

,

,

,

,

CoRR, 2022

Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation.

[DOI]

,

,

Proceedings of the Tenth International Conference on Learning Representations, 2022

What Language Model to Train if You Have One Million GPU Hours?

[DOI]

,

,

,

,

,

Stella Biderman

,

,

Niklas Muennighoff

,

,

,

,

,

,

Lintang Sutawika

,

,

,

,

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022

Transformer Language Models without Positional Encodings Still Learn Positional Information.

[DOI]

,

,

,

,

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022

2021

Shortformer: Better Language Modeling using Shorter Inputs.

[DOI]

,

,

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

2020

Improving Transformer Models by Reordering their Sublayers.

[DOI]

,

,

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

2019

Partially Shuffling the Training Data to Improve Language Models.

[DOI]

CoRR, 2019

2018

You May Not Need Attention.

[DOI]

,

CoRR, 2018

2017

Language Generation with Recurrent Generative Adversarial Networks without Pre-training.

[DOI]

,

,

,

Jonathan Berant

,

CoRR, 2017

Using the Output Embedding to Improve Language Models.

[DOI]

,

Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, 2017