2024
ARCTURUS: Full Coverage Binary Similarity Analysis with Reachability-guided Emulation.
ACM Trans. Softw. Eng. Methodol., May, 2024

CodeArt: Better Code Models by Attention Regularization When Symbols Are Lacking.
Proc. ACM Softw. Eng., 2024

ParDiff: Practical Static Differential Analysis of Network Protocol Parsers.
Proc. ACM Program. Lang., 2024

ProSec: Fortifying Code LLMs with Proactive Security Alignment.
CoRR, 2024

When Dataflow Analysis Meets Large Language Models.
CoRR, 2024

OdScan: Backdoor Scanning for Object Detection Models.
Proceedings of the IEEE Symposium on Security and Privacy, 2024

LLMDFA: Analyzing Dataflow in Code with Large Language Models.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Source Code Foundation Models are Transferable Binary Analysis Knowledge Bases.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

ROCAS: Root Cause Analysis of Autonomous Driving Accidents via Cyber-Physical Co-mutation.
Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering, 2024

Sanitizing Large Language Models in Bug Detection with Data-Flow.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

Lotus: Evasive and Resilient Backdoor Attacks through Sub-Partitioning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

ReSym: Harnessing LLMs to Recover Variable and Data Structure Symbols from Stripped Binaries.
Proceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security, 2024

2023
Nova<sup>+</sup>: Generative Language Models for Binaries.
CoRR, 2023

LmPa: Improving Decompilation by Synergy of Large Language Model and Program Analysis.
CoRR, 2023

Extracting Protocol Format as State Machine via Controlled Static Loop Analysis.
Proceedings of the 32nd USENIX Security Symposium, 2023

PEM: Representing Binary Program Semantics for Similarity Analysis via a Probabilistic Execution Model.
Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2023

BEAGLE: Forensics of Deep Learning Backdoor Attack for Better Defense.
Proceedings of the 30th Annual Network and Distributed System Security Symposium, 2023

Improving Binary Code Similarity Transformer Models by Semantics-Driven Instruction Deemphasis.
Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, 2023

Detecting Backdoors in Pre-trained Encoders.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Towards a Framework for Developing Verified Assemblers for the ELF Format.
Proceedings of the Programming Languages and Systems - 21st Asian Symposium, 2023

2022
Checkpointing and deterministic training for deep learning.
Proceedings of the 1st International Conference on AI Engineering: Software Engineering for AI, 2022

2021
Automatic Generation and Validation of Instruction Encoders and Decoders.
Proceedings of the Computer Aided Verification - 33rd International Conference, 2021

2020
CompCertELF: verified separate compilation of C programs into ELF object files.
Proc. ACM Program. Lang., 2020

The Classification and Propagation of Program Comments.
Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering, 2020

CPC: automatically classifying and propagating natural language comments via program analysis.
Proceedings of the ICSE '20: 42nd International Conference on Software Engineering, Seoul, South Korea, 27 June, 2020