2024
Scaling Instruction-Finetuned Language Models.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
J. Mach. Learn. Res., 2024
NATURAL PLAN: Benchmarking LLMs on Natural Language Planning.
,
,
,
,
,
,
,
,
,
,
CoRR, 2024
CRISPR-GPT: An LLM Agent for Automated Design of Gene-Editing Experiments.
CoRR, 2024
Best Practices and Lessons Learned on Synthetic Data for Language Models.
,
,
,
,
,
,
,
,
,
,
CoRR, 2024
Chain of Thought Empowers Transformers to Solve Inherently Serial Problems.
CoRR, 2024
Transformers Can Achieve Length Generalization But Not Robustly.
CoRR, 2024
SELF-DISCOVER: Large Language Models Self-Compose Reasoning Structures.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
Chain-of-Thought Reasoning Without Prompting.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
A Pretrainer's Guide to Training Data: Measuring the Effects of Data Age, Domain Coverage, Quality, & Toxicity.
,
,
,
,
,
,
,
,
,
,
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024
Premise Order Matters in Reasoning with Large Language Models.
Proceedings of the Forty-first International Conference on Machine Learning, 2024
Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models.
Proceedings of the Twelfth International Conference on Learning Representations, 2024
Large Language Models as Analogical Reasoners.
Proceedings of the Twelfth International Conference on Learning Representations, 2024
Large Language Models as Optimizers.
Proceedings of the Twelfth International Conference on Learning Representations, 2024
Mixture-of-Experts Meets Instruction Tuning: A Winning Combination for Large Language Models.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the Twelfth International Conference on Learning Representations, 2024
Teaching Large Language Models to Self-Debug.
Proceedings of the Twelfth International Conference on Learning Representations, 2024
Large Language Models as Tool Makers.
Proceedings of the Twelfth International Conference on Learning Representations, 2024
Large Language Models Cannot Self-Correct Reasoning Yet.
Proceedings of the Twelfth International Conference on Learning Representations, 2024
Chain of Thought Empowers Transformers to Solve Inherently Serial Problems.
Proceedings of the Twelfth International Conference on Learning Representations, 2024
FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation.
,
,
,
,
,
,
,
,
,
,
Proceedings of the Findings of the Association for Computational Linguistics, 2024
2023
PaLM: Scaling Language Modeling with Pathways.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
J. Mach. Learn. Res., 2023
Universal Self-Consistency for Large Language Model Generation.
CoRR, 2023
Instruction-Following Evaluation for Large Language Models.
CoRR, 2023
Large Language Models can Learn Rules.
CoRR, 2023
Simple synthetic data reduces sycophancy in large language models.
CoRR, 2023
Training Socially Aligned Language Models in Simulated Human Society.
CoRR, 2023
Flan-MoE: Scaling Instruction-Finetuned Language Models with Sparse Mixture of Experts.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2023
Larger language models do in-context learning differently.
,
,
,
,
,
,
,
,
,
,
CoRR, 2023
Large Language Models Can Be Easily Distracted by Irrelevant Context.
Proceedings of the International Conference on Machine Learning, 2023
Not All Semantics are Created Equal: Contrastive Self-supervised Learning with Automatic Temperature Individualization.
Proceedings of the International Conference on Machine Learning, 2023
The Flan Collection: Designing Data and Methods for Effective Instruction Tuning.
,
,
,
,
,
,
,
,
,
,
Proceedings of the International Conference on Machine Learning, 2023
Least-to-Most Prompting Enables Complex Reasoning in Large Language Models.
,
,
,
,
,
,
,
,
,
,
Proceedings of the Eleventh International Conference on Learning Representations, 2023
TEMPERA: Test-Time Prompt Editing via Reinforcement Learning.
Proceedings of the Eleventh International Conference on Learning Representations, 2023
UL2: Unifying Language Learning Paradigms.
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the Eleventh International Conference on Learning Representations, 2023
Recitation-Augmented Language Models.
Proceedings of the Eleventh International Conference on Learning Representations, 2023
Language models are multilingual chain-of-thought reasoners.
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the Eleventh International Conference on Learning Representations, 2023
Mind's Eye: Grounded Language Model Reasoning through Simulation.
Proceedings of the Eleventh International Conference on Learning Representations, 2023
Compositional Semantic Parsing with Large Language Models.
Proceedings of the Eleventh International Conference on Learning Representations, 2023
What learning algorithm is in-context learning? Investigations with linear models.
Proceedings of the Eleventh International Conference on Learning Representations, 2023
Self-Consistency Improves Chain of Thought Reasoning in Language Models.
Proceedings of the Eleventh International Conference on Learning Representations, 2023
Symbol tuning improves in-context learning in language models.
,
,
,
,
,
,
,
,
,
,
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023
Transcending Scaling Laws with 0.1% Extra Compute.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023
Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them.
,
,
,
,
,
,
,
,
,
,
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023
2022
Emergent Abilities of Large Language Models.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Trans. Mach. Learn. Res., 2022
TEMPERA: Test-Time Prompting via Reinforcement Learning.
CoRR, 2022
Scaling Instruction-Finetuned Language Models.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2022
Rationale-Augmented Ensembles in Language Models.
CoRR, 2022
Least-to-Most Prompting Enables Complex Reasoning in Large Language Models.
CoRR, 2022
Self-Consistency Improves Chain of Thought Reasoning in Language Models.
CoRR, 2022
DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection.
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2022
Chain of Thought Prompting Elicits Reasoning in Large Language Models.
CoRR, 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022
Back Razor: Memory-Efficient Transfer Learning by Self-Sparsified Backpropagation.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022
SMORE: Knowledge Graph Completion and Multi-hop Reasoning in Massive Knowledge Graphs.
Proceedings of the KDD '22: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, August 14, 2022
Provable Stochastic Optimization for Global Contrastive Learning: Small Batch Does Not Harm Performance.
Proceedings of the International Conference on Machine Learning, 2022
Auto-scaling Vision Transformers without Training.
Proceedings of the Tenth International Conference on Learning Representations, 2022
A Simple Single-Scale Vision Transformer for Object Detection and Instance Segmentation.
,
,
,
,
,
,
,
,
,
,
Proceedings of the Computer Vision - ECCV 2022, 2022
DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection.
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022
Token Dropping for Efficient BERT Pretraining.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022
2021
A Simple Single-Scale Vision Transformer for Object Localization and Instance Segmentation.
,
,
,
,
,
,
,
,
,
,
CoRR, 2021
Speeding up Deep Model Training by Sharing Weights and Then Unsharing.
CoRR, 2021
LEGO: Latent Execution-Guided Reasoning for Multi-Hop Question Answering on Knowledge Graphs.
Proceedings of the 38th International Conference on Machine Learning, 2021
SpreadsheetCoder: Formula Prediction from Semi-structured Context.
Proceedings of the 38th International Conference on Machine Learning, 2021
Fast WordPiece Tokenization.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021
Extremely Small BERT Models from Mixed-Vocabulary Training.
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 2021
2020
Linear-Time WordPiece Tokenization.
CoRR, 2020
Compositional Generalization via Neural-Symbolic Stack Machines.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020
Deep State-Space Generative Model For Correlated Time-to-Event Predictions.
Proceedings of the KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2020
Go Wide, Then Narrow: Efficient Training of Deep Thin Networks.
Proceedings of the 37th International Conference on Machine Learning, 2020
Good Subnetworks Provably Exist: Pruning via Greedy Forward Selection.
Proceedings of the 37th International Conference on Machine Learning, 2020
Black-box Off-policy Estimation for Infinite-Horizon Reinforcement Learning.
Proceedings of the 8th International Conference on Learning Representations, 2020
Neural Symbolic Reader: Scalable Integration of Distributed and Symbolic Representations for Reading Comprehension.
Proceedings of the 8th International Conference on Learning Representations, 2020
MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020
2019
Deep Physiological State Space Model for Clinical Forecasting.
CoRR, 2019
Extreme Language Model Compression with Optimal Subwords and Shared Projections.
CoRR, 2019
Doubly Sparse: Sparse Mixture of Sparse Experts for Efficient Softmax Inference.
CoRR, 2019
Proceedings of the 7th International Conference on Learning Representations, 2019
2015
Double or Nothing: Multiplicative Incentive Mechanisms for Crowdsourcing.
Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 2015