Varying Shades of Wrong: Aligning LLMs with Wrong Answers Only.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
POTEC: Off-Policy Contextual Bandits for Large Action Spaces via Policy Decomposition.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
The Art of Refusal: A Survey of Abstention in Large Language Models.
CoRR, 2024
Developing a Framework for Auditing Large Language Models Using Human-in-the-Loop.
CoRR, 2024
POTEC: Off-Policy Learning for Large Action Spaces via Two-Stage Policy Decomposition.
CoRR, 2024