Javier Rando

Orcid: 0000-0002-2723-7660

According to our database1, Javier Rando authored at least 15 papers between 2020 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Gradient-based Jailbreak Images for Multimodal Fusion Models.
CoRR, 2024

Adversarial Perturbations Cannot Reliably Protect Artists From Generative AI.
CoRR, 2024

Attributions toward Artificial Agents in a modified Moral Turing Test.
CoRR, 2024

Dataset and Lessons Learned from the 2024 SaTML LLM Capture-the-Flag Competition.
CoRR, 2024

Competition Report: Finding Universal Jailbreak Backdoors in Aligned LLMs.
CoRR, 2024

Foundational Challenges in Assuring Alignment and Safety of Large Language Models.
CoRR, 2024

Universal Jailbreak Backdoors from Poisoned Human Feedback.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Personas as a Way to Model Truthfulness in Language Models.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

2023
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback.
Trans. Mach. Learn. Res., 2023

Scalable and Transferable Black-Box Jailbreaks for Language Models via Persona Modulation.
CoRR, 2023

PassGPT: Password Modeling and (Guided) Generation with Large Language Models.
Proceedings of the Computer Security - ESORICS 2023, 2023

2022
Red-Teaming the Stable Diffusion Safety Filter.
CoRR, 2022

Exploring Adversarial Attacks and Defenses in Vision Transformers trained with DINO.
CoRR, 2022

2020
Uneven Coverage of Natural Disasters in Wikipedia: the Case of Flood.
CoRR, 2020

Uneven Coverage of Natural Disasters in Wikipedia: The Case of Floods.
Proceedings of the 17th International Conference on Information Systems for Crisis Response and Management, 2020


  Loading...