Mantas Mazeika

According to our database1, Mantas Mazeika authored at least 24 papers between 2018 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Tamper-Resistant Safeguards for Open-Weight LLMs.
CoRR, 2024

Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?
CoRR, 2024

The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning.
CoRR, 2024

HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal.
Proceedings of the Forty-first International Conference on Machine Learning, 2024


2023
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models.
, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
Trans. Mach. Learn. Res., 2023

Representation Engineering: A Top-Down Approach to AI Transparency.
CoRR, 2023

An Overview of Catastrophic AI Risks.
CoRR, 2023

DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

2022
X-Risk Analysis for AI Research.
CoRR, 2022

Forecasting Future World Events With Neural Networks.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

How Would The Viewer Feel? Estimating Wellbeing From Video Scenarios.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

How to Steer Your Adversary: Targeted and Efficient Model Stealing Defenses with Gradient Redirection.
Proceedings of the International Conference on Machine Learning, 2022

Scaling Out-of-Distribution Detection for Real-World Settings.
Proceedings of the International Conference on Machine Learning, 2022

PixMix: Dreamlike Pictures Comprehensively Improve Safety Measures.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021

What Would Jiminy Cricket Do? Towards Agents That Behave Morally.
Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, 2021

Measuring Coding Challenge Competence With APPS.
Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, 2021

Measuring Massive Multitask Language Understanding.
Proceedings of the 9th International Conference on Learning Representations, 2021

2019
A Benchmark for Anomaly Segmentation.
CoRR, 2019

Using Self-Supervised Learning Can Improve Model Robustness and Uncertainty.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Using Pre-Training Can Improve Model Robustness and Uncertainty.
Proceedings of the 36th International Conference on Machine Learning, 2019

Deep Anomaly Detection with Outlier Exposure.
Proceedings of the 7th International Conference on Learning Representations, 2019

2018
Using Trusted Data to Train Deep Networks on Labels Corrupted by Severe Noise.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018


  Loading...