Tinghao Xie

According to our database1, Tinghao Xie authored at least 11 papers between 2022 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors.
CoRR, 2024

Fantastic Copyrighted Beasts and How (Not) to Generate Them.
CoRR, 2024

AI Risk Management Should Incorporate Both Safety and Security.
CoRR, 2024

Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

BaDExpert: Extracting Backdoor Functionality for Accurate Backdoor Input Detection.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!
Proceedings of the Twelfth International Conference on Learning Representations, 2024

2023
Towards A Proactive ML Approach for Detecting Backdoor Poison Samples.
Proceedings of the 32nd USENIX Security Symposium, 2023

Revisiting the Assumption of Latent Separability for Backdoor Defenses.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

2022
Fight Poison with Poison: Detecting Backdoor Poison Samples via Decoupling Benign Correlations.
CoRR, 2022

Circumventing Backdoor Defenses That Are Based on Latent Separability.
CoRR, 2022

Towards Practical Deployment-Stage Backdoor Attack on Deep Neural Networks.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022


  Loading...