Scaling LLM Test-Time Compute Optimally Can be More Effective than Scaling Parameters for Reasoning.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters.
CoRR, 2024
Small-scale proxies for large-scale Transformer training instabilities.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the Twelfth International Conference on Learning Representations, 2024
ContMulti-objective Optimization Model for Momentum Change Based on Genetic Algorithm.
Proceedings of the Advanced Intelligent Computing Technology and Applications, 2024
LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language Models.
CoRR, 2023
Frontier Language Models are not Robust to Adversarial Arithmetic, or "What do I need to say so you agree 2+2=5?
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2023
Dexterous Manipulation from Images: Autonomous Real-World RL via Substep Guidance.
Proceedings of the IEEE International Conference on Robotics and Automation, 2023
Towards Adaptive, Continual Embodied Agents
PhD thesis, 2022
Autonomous Reinforcement Learning: Formalism and Benchmarking.
Proceedings of the Tenth International Conference on Learning Representations, 2022
Reset-Free Reinforcement Learning via Multi-Task Learning: Learning Dexterous Manipulation Behaviors without Human Intervention.
Proceedings of the IEEE International Conference on Robotics and Automation, 2021
Continual Learning of Control Primitives : Skill Discovery via Reset-Games.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020
Meta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples.
,
,
,
,
,
,
,
,
,
,
Proceedings of the 8th International Conference on Learning Representations, 2020
Meta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples.
CoRR, 2019
Privacy-Preserving Fall Detection with Deep Learning on mmWave Radar Signal.
Proceedings of the 2019 IEEE Visual Communications and Image Processing, 2019
Learning a Prior over Intent via Meta-Inverse Reinforcement Learning.
Proceedings of the 36th International Conference on Machine Learning, 2019
Probabilistic Model-Agnostic Meta-Learning.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018
Trust-PCL: An Off-Policy Trust Region Method for Continuous Control.
Proceedings of the 6th International Conference on Learning Representations, 2018
On integrating a language model into neural machine translation.
Comput. Speech Lang., 2017
Bridging the Gap Between Value and Policy Based Reinforcement Learning.
Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017
Unsupervised Perceptual Rewards for Imitation Learning.
Proceedings of the 5th International Conference on Learning Representations, 2017
An Actor-Critic Algorithm for Sequence Prediction.
Proceedings of the 5th International Conference on Learning Representations, 2017
A Controller Recognizer Framework: How necessary is recognition for control?
CoRR, 2015
On Using Monolingual Corpora in Neural Machine Translation.
CoRR, 2015
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention.
Proceedings of the 32nd International Conference on Machine Learning, 2015