Tony T. Wang

Affiliations:

MIT, CSAIL, Cambridge, MA, USA

According to our database¹, Tony T. Wang authored at least 8 papers between 2022 and 2024.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

2022

2023

2024

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2024

Jailbreak Defense in a Narrow Domain: Limitations of Existing Methods and a New Transcript-Classifier Approach.

[BibT_eX]

[DOI]

CoRR, 2024

Can Go AIs be adversarially robust?

[BibT_eX]

[DOI]

CoRR, 2024

Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

2023

Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2023

Forbidden Facts: An Investigation of Competing Objectives in Llama-2.

[BibT_eX]

[DOI]

CoRR, 2023

Cliff-Learning.

[BibT_eX]

[DOI]

Tony T. Wang

Igor Zablotchi

Nir Shavit

Jonathan S. Rosenfeld

CoRR, 2023

Adversarial Policies Beat Superhuman Go AIs.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

2022

Adversarial Policies Beat Professional-Level Go AIs.

[BibT_eX]

[DOI]

CoRR, 2022

Tony T. Wang

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...