Rafael Rafailov

According to our database1, Rafael Rafailov authored at least 31 papers between 2021 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents.
CoRR, 2024

PERSONA: A Reproducible Testbed for Pluralistic Alignment.
CoRR, 2024

MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?
CoRR, 2024

OpenVLA: An Open-Source Vision-Language-Action Model.
CoRR, 2024

Scaling Laws for Reward Model Overoptimization in Direct Alignment Algorithms.
CoRR, 2024

Scalable Ensembling For Mitigating Reward Overoptimisation.
CoRR, 2024

Offline Regularised Reinforcement Learning for Large Language Models Alignment.
CoRR, 2024

Self-Supervised Alignment with Mutual Information: Learning to Follow Principles without Preference Labels.
CoRR, 2024

From <i>r</i> to Q<sup>*</sup>: Your Language Model is Secretly a Q-Function.
CoRR, 2024

Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data.
CoRR, 2024

Aligning Modalities in Vision Large Language Models via Preference Fine-tuning.
CoRR, 2024

D5RL: Diverse Datasets for Data-Driven Deep Reinforcement Learning.
Proceedings of the 1st Reinforcement Learning Conference, 2024

Efficient imitation learning with conservative world models.
Proceedings of the 6th Annual Learning for Dynamics & Control Conference, 2024

Open X-Embodiment: Robotic Learning Datasets and RT-X Models : Open X-Embodiment Collaboration.
, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
Proceedings of the IEEE International Conference on Robotics and Automation, 2024

Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Language Model Detectors Are Easily Optimized Against.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

An Emulator for Fine-tuning Large Language Models using Small Language Models.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Contrastive Preference Learning: Learning from Human Feedback without Reinforcement Learning.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Diffusion Model Alignment Using Direct Preference Optimization.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Disentangling Length from Quality in Direct Preference Optimization.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

2023
Contrastive Preference Learning: Learning from Human Feedback without RL.
CoRR, 2023

Offline Retraining for Online RL: Decoupled Policy Learning to Mitigate Exploration Bias.
CoRR, 2023

Direct Preference Optimization: Your Language Model is Secretly a Reward Model.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Contrastive Example-Based Control.
Proceedings of the Learning for Dynamics and Control Conference, 2023

Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

MOTO: Offline Pre-training to Online Fine-tuning for Model-based Robot Learning.
Proceedings of the Conference on Robot Learning, 2023

2022
Vision-Based Manipulators Need to Also See from Their Hands.
Proceedings of the Tenth International Conference on Learning Representations, 2022

2021
COMBO: Conservative Offline Model-Based Policy Optimization.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Visual Adversarial Imitation Learning using Variational Models.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Offline Reinforcement Learning from Images with Latent Space Models.
Proceedings of the 3rd Annual Conference on Learning for Dynamics and Control, 2021

Offline Meta-Reinforcement Learning with Advantage Weighting.
Proceedings of the 38th International Conference on Machine Learning, 2021


  Loading...