2024

Scaling Instruction-Finetuned Language Models.

[DOI]

J. Mach. Learn. Res., 2024

NATURAL PLAN: Benchmarking LLMs on Natural Language Planning.

[DOI]

Huaixiu Steven Zheng

,

,

,

,

,

,

,

,

,

,

CoRR, 2024

CRISPR-GPT: An LLM Agent for Automated Design of Gene-Editing Experiments.

[DOI]

,

,

,

William A. Johnson

,

,

,

,

,

,

CoRR, 2024

Best Practices and Lessons Learned on Synthetic Data for Language Models.

[DOI]

,

,

,

,

,

,

,

,

,

,

CoRR, 2024

Chain of Thought Empowers Transformers to Solve Inherently Serial Problems.

[DOI]

,

,

,

CoRR, 2024

Transformers Can Achieve Length Generalization But Not Robustly.

[DOI]

,

,

,

,

Rishabh Agarwal

,

CoRR, 2024

SELF-DISCOVER: Large Language Models Self-Compose Reasoning Structures.

[DOI]

,

,

,

,

,

,

,

,

,

Huaixiu Steven Zheng

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Chain-of-Thought Reasoning Without Prompting.

[DOI]

,

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

A Pretrainer's Guide to Training Data: Measuring the Effects of Data Age, Domain Coverage, Quality, & Toxicity.

[DOI]

,

,

,

,

,

,

,

,

,

,

Daphne Ippolito

Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024

Premise Order Matters in Reasoning with Large Language Models.

[DOI]

,

,

,

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models.

[DOI]

Huaixiu Steven Zheng

,

,

,

,

,

,

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Large Language Models as Analogical Reasoners.

[DOI]

Michihiro Yasunaga

,

,

,

Panupong Pasupat

,

,

,

,

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Large Language Models as Optimizers.

[DOI]

,

,

,

,

,

,

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Mixture-of-Experts Meets Instruction Tuning: A Winning Combination for Large Language Models.

[DOI]

,

,

,

,

,

,

Hyung Won Chung

,

,

,

,

,

,

,

,

,

Vincent Y. Zhao

,

,

,

,

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Teaching Large Language Models to Self-Debug.

[DOI]

,

,

Nathanael Schärli

,

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Large Language Models as Tool Makers.

[DOI]

,

,

,

,

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Large Language Models Cannot Self-Correct Reasoning Yet.

[DOI]

,

,

,

Huaixiu Steven Zheng

,

,

,

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Chain of Thought Empowers Transformers to Solve Inherently Serial Problems.

[DOI]

,

,

,

Proceedings of the Twelfth International Conference on Learning Representations, 2024

FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation.

[DOI]

,

,

,

,

,

,

,

,

,

,

Proceedings of the Findings of the Association for Computational Linguistics, 2024

2023

PaLM: Scaling Language Modeling with Pathways.

[DOI]

Aakanksha Chowdhery

,

,

,

,

,

,

,

Hyung Won Chung

,

,

Sebastian Gehrmann

,

,

,

Sasha Tsvyashchenko

,

,

,

,

,

,

Vinodkumar Prabhakaran

,

,

,

,

,

,

,

,

,

,

,

Anselm Levskaya

,

Sanjay Ghemawat

,

,

Henryk Michalewski

,

,

,

,

,

,

Daphne Ippolito

,

,

,

,

Alexander Spiridonov

,

,

,

Shivani Agrawal

,

,

,

Thanumalayan Sankaranarayana Pillai

,

,

Aitor Lewkowycz

,

,

,

Oleksandr Polozov

,

,

,

,

,

,

,

Michele Catasta

,

,

Kathy Meier-Hellstern

,

,

,

,

J. Mach. Learn. Res., 2023

Universal Self-Consistency for Large Language Model Generation.

[DOI]

,

,

,

,

,

,

Sushant Prakash

,

,

,

CoRR, 2023

Instruction-Following Evaluation for Large Language Models.

[DOI]

,

,

,

Siddhartha Brahma

,

,

,

,

CoRR, 2023

Large Language Models can Learn Rules.

[DOI]

,

,

,

,

,

Dale Schuurmans

,

CoRR, 2023

Simple synthetic data reduces sycophancy in large language models.

[DOI]

,

,

,

,

CoRR, 2023

Training Socially Aligned Language Models in Simulated Human Society.

[DOI]

,

,

,

,

,

,

,

Soroush Vosoughi

CoRR, 2023

Flan-MoE: Scaling Instruction-Finetuned Language Models with Sparse Mixture of Experts.

[DOI]

,

,

,

,

,

,

Hyung Won Chung

,

,

,

,

,

,

,

,

,

Vincent Y. Zhao

,

,

,

,

CoRR, 2023

Larger language models do in-context learning differently.

[DOI]

,

,

,

,

,

,

,

,

,

,

CoRR, 2023

Large Language Models Can Be Easily Distracted by Irrelevant Context.

[DOI]

,

,

,

,

,

,

Nathanael Schärli

,

Proceedings of the International Conference on Machine Learning, 2023

Not All Semantics are Created Equal: Contrastive Self-supervised Learning with Automatic Temperature Individualization.

[DOI]

,

,

,

,

,

Proceedings of the International Conference on Machine Learning, 2023

The Flan Collection: Designing Data and Methods for Effective Instruction Tuning.

[DOI]

,

,

,

,

Hyung Won Chung

,

,

,

,

,

,

Proceedings of the International Conference on Machine Learning, 2023

Least-to-Most Prompting Enables Complex Reasoning in Large Language Models.

[DOI]

,

Nathanael Schärli

,

,

,

,

,

Dale Schuurmans

,

,

Olivier Bousquet

,

,

Proceedings of the Eleventh International Conference on Learning Representations, 2023

TEMPERA: Test-Time Prompt Editing via Reinforcement Learning.

[DOI]

,

,

,

Dale Schuurmans

,

Joseph E. Gonzalez

Proceedings of the Eleventh International Conference on Learning Representations, 2023

UL2: Unifying Language Learning Paradigms.

[DOI]

,

Mostafa Dehghani

,

,

,

,

,

Hyung Won Chung

,

,

,

Huaixiu Steven Zheng

,

,

,

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Recitation-Augmented Language Models.

[DOI]

,

,

,

,

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Language models are multilingual chain-of-thought reasoners.

[DOI]

,

,

,

,

,

Soroush Vosoughi

,

Hyung Won Chung

,

,

Sebastian Ruder

,

,

,

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Mind's Eye: Grounded Language Model Reasoning through Simulation.

[DOI]

,

,

Shixiang Shane Gu

,

,

Soroush Vosoughi

,

,

,

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Compositional Semantic Parsing with Large Language Models.

[DOI]

,

Nathanael Schärli

,

,

,

,

,

Olivier Bousquet

,

Proceedings of the Eleventh International Conference on Learning Representations, 2023

What learning algorithm is in-context learning? Investigations with linear models.

[DOI]

,

Dale Schuurmans

,

,

,

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Self-Consistency Improves Chain of Thought Reasoning in Language Models.

[DOI]

,

,

Dale Schuurmans

,

,

,

,

Aakanksha Chowdhery

,

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Symbol tuning improves in-context learning in language models.

[DOI]

,

,

Andrew K. Lampinen

,

,

,

,

,

,

,

,

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Transcending Scaling Laws with 0.1% Extra Compute.

[DOI]

,

,

Hyung Won Chung

,

,

,

,

,

Huaixiu Steven Zheng

,

,

Aakanksha Chowdhery

,

,

,

,

,

,

Mostafa Dehghani

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them.

[DOI]

,

,

Nathanael Schärli

,

Sebastian Gehrmann

,

,

Hyung Won Chung

,

Aakanksha Chowdhery

,

,

,

,

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

2022

Emergent Abilities of Large Language Models.

[DOI]

,

,

Rishi Bommasani

,

,

,

Sebastian Borgeaud

,

,

,

,

,

,

Tatsunori Hashimoto

,

,

,

,

Trans. Mach. Learn. Res., 2022

TEMPERA: Test-Time Prompting via Reinforcement Learning.

[DOI]

,

,

,

Dale Schuurmans

,

Joseph E. Gonzalez

CoRR, 2022

Scaling Instruction-Finetuned Language Models.

[DOI]

CoRR, 2022

Rationale-Augmented Ensembles in Language Models.

[DOI]

,

,

Dale Schuurmans

,

,

,

CoRR, 2022

Least-to-Most Prompting Enables Complex Reasoning in Large Language Models.

[DOI]

,

Nathanael Schärli

,

,

,

,

,

Dale Schuurmans

,

Olivier Bousquet

,

,

CoRR, 2022

Self-Consistency Improves Chain of Thought Reasoning in Language Models.

[DOI]

,

,

Dale Schuurmans

,

,

,

CoRR, 2022

DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, 2022

Chain of Thought Prompting Elicits Reasoning in Large Language Models.

[DOI]

,

,

Dale Schuurmans

,

,

,

,

CoRR, 2022

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models.

[DOI]

,

,

Dale Schuurmans

,

,

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Back Razor: Memory-Efficient Transfer Learning by Self-Sparsified Backpropagation.

[DOI]

,

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

SMORE: Knowledge Graph Completion and Multi-hop Reasoning in Massive Knowledge Graphs.

[DOI]

,

,

,

,

,

,

Dale Schuurmans

Proceedings of the KDD '22: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, August 14, 2022

Provable Stochastic Optimization for Global Contrastive Learning: Small Batch Does Not Harm Performance.

[DOI]

,

,

,

,

,

,

Proceedings of the International Conference on Machine Learning, 2022

Auto-scaling Vision Transformers without Training.

[DOI]

,

,

,

,

,

Proceedings of the Tenth International Conference on Learning Representations, 2022

A Simple Single-Scale Vision Transformer for Object Detection and Instance Segmentation.

[DOI]

,

,

,

,

,

,

,

,

,

,

Proceedings of the Computer Vision - ECCV 2022, 2022

DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Token Dropping for Efficient BERT Pretraining.

[DOI]

,

Richard Yuanzhe Pang

,

,

,

,

,

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

2021

A Simple Single-Scale Vision Transformer for Object Localization and Instance Segmentation.

[DOI]

,

,

,

,

,

,

,

,

,

,

CoRR, 2021

Speeding up Deep Model Training by Sharing Weights and Then Unsharing.

[DOI]

,

,

,

,

CoRR, 2021

LEGO: Latent Execution-Guided Reasoning for Multi-Hop Question Answering on Knowledge Graphs.

[DOI]

,

,

,

,

Michihiro Yasunaga

,

,

Dale Schuurmans

,

,

Proceedings of the 38th International Conference on Machine Learning, 2021

SpreadsheetCoder: Formula Prediction from Semi-structured Context.

[DOI]

,

Petros Maniatis

,

,

,

,

,

Proceedings of the 38th International Conference on Machine Learning, 2021

Fast WordPiece Tokenization.

[DOI]

,

,

,

,

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

Extremely Small BERT Models from Mixed-Vocabulary Training.

[DOI]

,

,

,

Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 2021

2020

Linear-Time WordPiece Tokenization.

[DOI]

,

,

,

,

CoRR, 2020

Compositional Generalization via Neural-Symbolic Stack Machines.

[DOI]

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Deep State-Space Generative Model For Correlated Time-to-Event Predictions.

[DOI]

,

,

,

,

,

,

Proceedings of the KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2020

Go Wide, Then Narrow: Efficient Training of Deep Thin Networks.

[DOI]

,

,

,

,

,

,

,

,

Dale Schuurmans

Proceedings of the 37th International Conference on Machine Learning, 2020

Good Subnetworks Provably Exist: Pruning via Greedy Forward Selection.

[DOI]

,

,

,

,

Adam R. Klivans

,

Proceedings of the 37th International Conference on Machine Learning, 2020

Black-box Off-policy Estimation for Infinite-Horizon Reinforcement Learning.

[DOI]

,

,

,

Proceedings of the 8th International Conference on Learning Representations, 2020

Neural Symbolic Reader: Scalable Integration of Distributed and Symbolic Representations for Reading Comprehension.

[DOI]

,

,

,

,

,

Proceedings of the 8th International Conference on Learning Representations, 2020

MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices.

[DOI]

,

,

,

,

,

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

2019

Deep Physiological State Space Model for Clinical Forecasting.

[DOI]

,

,

,

,

,

,

CoRR, 2019

Extreme Language Model Compression with Optimal Subwords and Shared Projections.

[DOI]

,

,

,

CoRR, 2019

Doubly Sparse: Sparse Mixture of Sparse Experts for Efficient Softmax Inference.

[DOI]

,

,

,

,

CoRR, 2019

Neural Logic Machines.

[DOI]

,

,

,

,

,

Proceedings of the 7th International Conference on Learning Representations, 2019

2015

Double or Nothing: Multiplicative Incentive Mechanisms for Crowdsourcing.

[DOI]

Nihar Bhadresh Shah

,

Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 2015