Zifan Wang

Affiliations:
  • Carnegie Mellon University, PA, USA


According to our database1, Zifan Wang authored at least 34 papers between 2020 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet.
CoRR, 2024

Mechanistically Interpreting a Transformer-based 2-SAT Solver: An Axiomatic Approach.
CoRR, 2024

Sales Whisperer: A Human-Inconspicuous Attack on LLM Brand Recommendations.
CoRR, 2024

VeriSplit: Secure and Practical Offloading of Machine Learning Inferences across IoT Devices.
CoRR, 2024

The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning.
CoRR, 2024

HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal.
Proceedings of the Forty-first International Conference on Machine Learning, 2024


A Recipe for Improved Certifiable Robustness.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

2023
Transfer Attacks and Defenses for Large Language Models on Coding Tasks.
CoRR, 2023

Can LLMs Follow Simple Rules?
CoRR, 2023

Is Certifying 𝓁<sub>p</sub> Robustness Still Worthwhile?
CoRR, 2023

A Recipe for Improved Certifiable Robustness: Capacity and Data.
CoRR, 2023

Representation Engineering: A Top-Down Approach to AI Transparency.
CoRR, 2023

Universal and Transferable Adversarial Attacks on Aligned Language Models.
CoRR, 2023

Scaling in Depth: Unlocking Robustness Certification on ImageNet.
CoRR, 2023

Learning Modulo Theories.
CoRR, 2023

Grounding Neural Inference with Satisfiability Modulo Theories.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Unlocking Deterministic Robustness Certification on ImageNet.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

On the Perils of Cascading Robust Classifiers.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Improving Robust Generalization by Direct PAC-Bayesian Bound Minimization.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
Faithful Explanations for Deep Graph Models.
CoRR, 2022

Robust Models Are More Interpretable Because Attributions Look Normal.
Proceedings of the International Conference on Machine Learning, 2022

Consistent Counterfactuals for Deep Models.
Proceedings of the Tenth International Conference on Learning Representations, 2022

2021
Consistent Counterfactuals for Deep Models.
CoRR, 2021

Boundary Attributions Provide Normal (Vector) Explanations.
CoRR, 2021

Influence Patterns for Explaining Information Flow in BERT.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Exploring Conceptual Soundness with TruLens.
Proceedings of the NeurIPS 2021 Competitions and Demonstrations Track, 2021

Machine Learning Explainability and Robustness: Connected at the Hip.
Proceedings of the KDD '21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2021

Globally-Robust Neural Networks.
Proceedings of the 38th International Conference on Machine Learning, 2021

2020
Abstracting Influence Paths for Explaining (Contextualization of) BERT Models.
CoRR, 2020

Towards Behavior-Level Explanation for Deep Reinforcement Learning.
CoRR, 2020

Smoothed Geometry for Robust Attribution.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Score-CAM: Score-Weighted Visual Explanations for Convolutional Neural Networks.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Interpreting Interpretations: Organizing Attribution Methods by Criteria.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020


  Loading...