2025
DS-TPU: Dynamical System for on-Device Lifelong Graph Learning with Nonlinear Node Interaction.
Proceedings of the 52nd Annual International Symposium on Computer Architecture, 2025

DS-LLM: Leveraging Dynamical Systems to Enhance Both Training and Inference of Large Language Models.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Diff-PIC: Revolutionizing Particle-In-Cell Nuclear Fusion Simulation with Diffusion Models.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

InstaTrain: Adaptive Training via Ultra-Fast Natural Annealing within Dynamical Systems.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Nature-GL: A Revolutionary Learning Paradigm Unleashing Nature's Power in Real-World Spatial-Temporal Graph Learning.
Proceedings of the 30th Asia and South Pacific Design Automation Conference, 2025

2024
FPGA-Accelerated Range-Limited Molecular Dynamics.
IEEE Trans. Computers, June, 2024

Diff-PIC: Revolutionizing Particle-In-Cell Simulation for Advancing Nuclear Fusion with Diffusion Models.
CoRR, 2024

Inertial Confinement Fusion Forecasting via LLMs.
CoRR, 2024

Visual Fourier Prompt Tuning.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Bridging the Gap Between LLMs and LNS with Dynamic Data Format and Architecture Codesign.
Proceedings of the 57th IEEE/ACM International Symposium on Microarchitecture, 2024

DS-GL: Advancing Graph Learning via Harnessing Nature's Power within Scalable Dynamical Systems.
Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024

SmartFuse: Reconfigurable Smart Switches to Accelerate Fused Collectives in HPC Applications.
Proceedings of the 38th ACM International Conference on Supercomputing, 2024

Extending Power of Nature from Binary to Real-Valued Graph Learning in Real World.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

2023
FASDA: An FPGA-Aided, Scalable and Distributed Accelerator for Range-Limited Molecular Dynamics.
Proceedings of the International Conference for High Performance Computing, 2023

FLASH: FPGA-Accelerated Smart Switches with GCN Case Study.
Proceedings of the 37th International Conference on Supercomputing, 2023

Software-Hardware Co-design of Heterogeneous SmartNIC System for Recommendation Models Inference and Training.
Proceedings of the 37th International Conference on Supercomputing, 2023

2022
Optimized Mappings for Symmetric Range-Limited Molecular Force Calculations on FPGAs.
Proceedings of the 32nd International Conference on Field-Programmable Logic and Applications, 2022

A Framework for Neural Network Inference on FPGA-Centric SmartNICs.
Proceedings of the 32nd International Conference on Field-Programmable Logic and Applications, 2022

FCsN: A FPGA-Centric SmartNIC Framework for Neural Networks.
Proceedings of the 30th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2022

2021
O3BNN-R: An Out-of-Order Architecture for High-Performance and Regularized BNN Inference.
IEEE Trans. Parallel Distributed Syst., 2021

I-GCN: A Graph Convolutional Network Accelerator with Runtime Locality Enhancement through Islandization.
Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021

System-Level Modeling of GPU/FPGA Clusters for Molecular Dynamics Simulations.
Proceedings of the 2021 IEEE High Performance Extreme Computing Conference, 2021

A Survey: Handling Irregularities in Neural Network Acceleration with FPGAs.
Proceedings of the 2021 IEEE High Performance Extreme Computing Conference, 2021

Upgrade of FPGA Range-Limited Molecular Dynamics to Handle Hundreds of Processors.
Proceedings of the 29th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2021

2020
AWB-GCN: A Graph Convolutional Network Accelerator with Runtime Workload Rebalancing.
Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020

A Communication-Efficient Multi-Chip Design for Range-Limited Molecular Dynamics.
Proceedings of the 2020 IEEE High Performance Extreme Computing Conference, 2020

CQNN: a CGRA-based QNN Framework.
Proceedings of the 2020 IEEE High Performance Extreme Computing Conference, 2020

2019
UWB-GCN: Hardware Acceleration of Graph-Convolution-Network through Runtime Workload Rebalancing.
CoRR, 2019

Fully integrated FPGA molecular dynamics simulations.
Proceedings of the International Conference for High Performance Computing, 2019

O3BNN: an out-of-order architecture for high-performance binarized neural network inference with fine-grained pruning.
Proceedings of the ACM International Conference on Supercomputing, 2019

LP-BNN: Ultra-low-Latency BNN Inference with Layer Parallelism.
Proceedings of the 30th IEEE International Conference on Application-specific Systems, 2019