Zifan Wang

Affiliations:

Carnegie Mellon University, PA, USA

According to our database¹, Zifan Wang authored at least 36 papers between 2020 and 2024.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Bibliography

2024

Refusal-Trained LLMs Are Easily Jailbroken As Browser Agents.

[BibT_eX]

[DOI]

CoRR, 2024

LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet.

[BibT_eX]

[DOI]

CoRR, 2024

Mechanistically Interpreting a Transformer-based 2-SAT Solver: An Axiomatic Approach.

[BibT_eX]

[DOI]

CoRR, 2024

Sales Whisperer: A Human-Inconspicuous Attack on LLM Brand Recommendations.

[BibT_eX]

[DOI]

CoRR, 2024

VeriSplit: Secure and Practical Offloading of Machine Learning Inferences across IoT Devices.

[BibT_eX]

[DOI]

CoRR, 2024

The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning.

[BibT_eX]

[DOI]

Ann-Kathrin Dombrowski

Justin Tienken-Harder

Kallol Krishna Karmakar

Steven Basart

Stephen Fitz

Mindy Levine

Ponnurangam Kumaraguru

CoRR, 2024

Attacks and Defenses for Large Language Models on Coding Tasks.

[BibT_eX]

[DOI]

Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering, 2024

HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

The WMDP Benchmark: Measuring and Reducing Malicious Use with Unlearning.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

A Recipe for Improved Certifiable Robustness.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

2023

Transfer Attacks and Defenses for Large Language Models on Coding Tasks.

[BibT_eX]

[DOI]

CoRR, 2023

Can LLMs Follow Simple Rules?

[BibT_eX]

[DOI]

CoRR, 2023

Is Certifying 𝓁<sub>p</sub> Robustness Still Worthwhile?

[BibT_eX]

[DOI]

CoRR, 2023

A Recipe for Improved Certifiable Robustness: Capacity and Data.

[BibT_eX]

[DOI]

CoRR, 2023

Representation Engineering: A Top-Down Approach to AI Transparency.

[BibT_eX]

[DOI]

CoRR, 2023

Universal and Transferable Adversarial Attacks on Aligned Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

Scaling in Depth: Unlocking Robustness Certification on ImageNet.

[BibT_eX]

[DOI]

CoRR, 2023

Learning Modulo Theories.

[BibT_eX]

[DOI]

CoRR, 2023

Grounding Neural Inference with Satisfiability Modulo Theories.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Unlocking Deterministic Robustness Certification on ImageNet.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

On the Perils of Cascading Robust Classifiers.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Improving Robust Generalization by Direct PAC-Bayesian Bound Minimization.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

Faithful Explanations for Deep Graph Models.

[BibT_eX]

[DOI]

CoRR, 2022

Robust Models Are More Interpretable Because Attributions Look Normal.

[BibT_eX]

[DOI]

Zifan Wang

Matt Fredrikson

Anupam Datta

Proceedings of the International Conference on Machine Learning, 2022

Consistent Counterfactuals for Deep Models.

[BibT_eX]

[DOI]

Emily Black

Zifan Wang

Matt Fredrikson

Proceedings of the Tenth International Conference on Learning Representations, 2022

2021

Consistent Counterfactuals for Deep Models.

[BibT_eX]

[DOI]

CoRR, 2021

Boundary Attributions Provide Normal (Vector) Explanations.

[BibT_eX]

[DOI]

Zifan Wang

Matt Fredrikson

Anupam Datta

CoRR, 2021

Influence Patterns for Explaining Information Flow in BERT.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Exploring Conceptual Soundness with TruLens.

[BibT_eX]

[DOI]

Proceedings of the NeurIPS 2021 Competitions and Demonstrations Track, 2021

Machine Learning Explainability and Robustness: Connected at the Hip.

[BibT_eX]

[DOI]

Proceedings of the KDD '21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2021

Globally-Robust Neural Networks.

[BibT_eX]

[DOI]

Klas Leino

Zifan Wang

Matt Fredrikson

Proceedings of the 38th International Conference on Machine Learning, 2021

2020

Abstracting Influence Paths for Explaining (Contextualization of) BERT Models.

[BibT_eX]

[DOI]

CoRR, 2020

Towards Behavior-Level Explanation for Deep Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, 2020

Smoothed Geometry for Robust Attribution.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Score-CAM: Score-Weighted Visual Explanations for Convolutional Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Interpreting Interpretations: Organizing Attribution Methods by Criteria.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Zifan Wang

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...