Javier Rando

Orcid: 0000-0002-2723-7660

According to our database1, Javier Rando authored at least 11 papers between 2020 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Competition Report: Finding Universal Jailbreak Backdoors in Aligned LLMs.
CoRR, 2024

Foundational Challenges in Assuring Alignment and Safety of Large Language Models.
CoRR, 2024

2023
Universal Jailbreak Backdoors from Poisoned Human Feedback.
CoRR, 2023

Scalable and Transferable Black-Box Jailbreaks for Language Models via Persona Modulation.
CoRR, 2023

Personas as a Way to Model Truthfulness in Language Models.
CoRR, 2023

Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback.
CoRR, 2023

PassGPT: Password Modeling and (Guided) Generation with Large Language Models.
Proceedings of the Computer Security - ESORICS 2023, 2023

2022
Red-Teaming the Stable Diffusion Safety Filter.
CoRR, 2022

Exploring Adversarial Attacks and Defenses in Vision Transformers trained with DINO.
CoRR, 2022

2020
Uneven Coverage of Natural Disasters in Wikipedia: the Case of Flood.
CoRR, 2020

Uneven Coverage of Natural Disasters in Wikipedia: The Case of Floods.
Proceedings of the 17th International Conference on Information Systems for Crisis Response and Management, 2020


  Loading...