Phillip Guo

According to our database1, Phillip Guo authored at least 5 papers between 2022 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Targeted Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs.
CoRR, 2024

Eight Methods to Evaluate Robust Unlearning in LLMs.
CoRR, 2024

2023
Localizing Lying in Llama: Understanding Instructed Dishonesty on True-False Questions Through Prompting, Probing, and Patching.
CoRR, 2023

Representation Engineering: A Top-Down Approach to AI Transparency.
CoRR, 2023

2022
Bandit-Based Multi-Start Strategies for Global Continuous Optimization.
Proceedings of the Winter Simulation Conference, 2022


  Loading...