Training Language Models to Win Debates with Self-Play Improves Judge Accuracy.
CoRR, 2024
GPQA: A Graduate-Level Google-Proof Q&A Benchmark.
CoRR, 2023
Debate Helps Supervise Unreliable Experts.
CoRR, 2023
Drusen segmentation with sparse volumetric SD-OCT sampling.
Proceedings of the Medical Imaging 2021: Image Processing, Online, February 15-19, 2021, 2021
Classification with Strategically Withheld Data.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021
Detecting age-related macular degeneration (AMD) biomarker images using MFCC and texture features.
Proceedings of the Medical Imaging 2020: Computer-Aided Diagnosis, 2020