SWE-smith: Scaling Data for Software Engineering Agents.
CoRR, April, 2025
SWE-bench Multimodal: Do AI Systems Generalize to Visual Software Domains?
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
EnIGMA: Enhanced Interactive Generative Model Agent for CTF Challenges.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2024
SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
SciCode: A Research Coding Benchmark Curated by Scientists.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
CiteME: Can Language Models Accurately Cite Scientific Claims?
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
How Language Model Hallucinations Can Snowball.
Proceedings of the Forty-first International Conference on Machine Learning, 2024
SWE-bench: Can Language Models Resolve Real-world Github Issues?
Proceedings of the Twelfth International Conference on Learning Representations, 2024
AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024
Complementing Scale: Novel Guidance Methods for Improving Language Models
PhD thesis, 2023
Measuring and Narrowing the Compositionality Gap in Language Models.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation.
Proceedings of the Tenth International Conference on Learning Representations, 2022
What Language Model to Train if You Have One Million GPU Hours?
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022
Transformer Language Models without Positional Encodings Still Learn Positional Information.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022
Shortformer: Better Language Modeling using Shorter Inputs.
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021
Improving Transformer Models by Reordering their Sublayers.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020
Partially Shuffling the Training Data to Improve Language Models.
CoRR, 2019
You May Not Need Attention.
CoRR, 2018
Language Generation with Recurrent Generative Adversarial Networks without Pre-training.
CoRR, 2017
Using the Output Embedding to Improve Language Models.
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, 2017