2025
DS-TPU: Dynamical System for on-Device Lifelong Graph Learning with Nonlinear Node Interaction.
Proceedings of the 52nd Annual International Symposium on Computer Architecture, 2025
DS-LLM: Leveraging Dynamical Systems to Enhance Both Training and Inference of Large Language Models.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
Diff-PIC: Revolutionizing Particle-In-Cell Nuclear Fusion Simulation with Diffusion Models.
,
,
,
,
,
,
,
,
,
,
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
InstaTrain: Adaptive Training via Ultra-Fast Natural Annealing within Dynamical Systems.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
Nature-GL: A Revolutionary Learning Paradigm Unleashing Nature's Power in Real-World Spatial-Temporal Graph Learning.
Proceedings of the 30th Asia and South Pacific Design Automation Conference, 2025
2024
FPGA-Accelerated Range-Limited Molecular Dynamics.
IEEE Trans. Computers, June, 2024
Diff-PIC: Revolutionizing Particle-In-Cell Simulation for Advancing Nuclear Fusion with Diffusion Models.
,
,
,
,
,
,
,
,
,
,
CoRR, 2024
Inertial Confinement Fusion Forecasting via LLMs.
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2024
Visual Fourier Prompt Tuning.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
Bridging the Gap Between LLMs and LNS with Dynamic Data Format and Architecture Codesign.
Proceedings of the 57th IEEE/ACM International Symposium on Microarchitecture, 2024
DS-GL: Advancing Graph Learning via Harnessing Nature's Power within Scalable Dynamical Systems.
Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024
SmartFuse: Reconfigurable Smart Switches to Accelerate Fused Collectives in HPC Applications.
Proceedings of the 38th ACM International Conference on Supercomputing, 2024
Extending Power of Nature from Binary to Real-Valued Graph Learning in Real World.
Proceedings of the Twelfth International Conference on Learning Representations, 2024
2023
FASDA: An FPGA-Aided, Scalable and Distributed Accelerator for Range-Limited Molecular Dynamics.
Proceedings of the International Conference for High Performance Computing, 2023
FLASH: FPGA-Accelerated Smart Switches with GCN Case Study.
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the 37th International Conference on Supercomputing, 2023
Software-Hardware Co-design of Heterogeneous SmartNIC System for Recommendation Models Inference and Training.
Proceedings of the 37th International Conference on Supercomputing, 2023
2022
Optimized Mappings for Symmetric Range-Limited Molecular Force Calculations on FPGAs.
Proceedings of the 32nd International Conference on Field-Programmable Logic and Applications, 2022
A Framework for Neural Network Inference on FPGA-Centric SmartNICs.
Proceedings of the 32nd International Conference on Field-Programmable Logic and Applications, 2022
FCsN: A FPGA-Centric SmartNIC Framework for Neural Networks.
Proceedings of the 30th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2022
2021
O3BNN-R: An Out-of-Order Architecture for High-Performance and Regularized BNN Inference.
IEEE Trans. Parallel Distributed Syst., 2021
I-GCN: A Graph Convolutional Network Accelerator with Runtime Locality Enhancement through Islandization.
Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021
System-Level Modeling of GPU/FPGA Clusters for Molecular Dynamics Simulations.
Proceedings of the 2021 IEEE High Performance Extreme Computing Conference, 2021
A Survey: Handling Irregularities in Neural Network Acceleration with FPGAs.
Proceedings of the 2021 IEEE High Performance Extreme Computing Conference, 2021
Upgrade of FPGA Range-Limited Molecular Dynamics to Handle Hundreds of Processors.
Proceedings of the 29th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2021
2020
AWB-GCN: A Graph Convolutional Network Accelerator with Runtime Workload Rebalancing.
,
,
,
,
,
,
,
,
,
,
Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020
A Communication-Efficient Multi-Chip Design for Range-Limited Molecular Dynamics.
Proceedings of the 2020 IEEE High Performance Extreme Computing Conference, 2020
CQNN: a CGRA-based QNN Framework.
Proceedings of the 2020 IEEE High Performance Extreme Computing Conference, 2020
2019
UWB-GCN: Hardware Acceleration of Graph-Convolution-Network through Runtime Workload Rebalancing.
CoRR, 2019
Fully integrated FPGA molecular dynamics simulations.
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the International Conference for High Performance Computing, 2019
O3BNN: an out-of-order architecture for high-performance binarized neural network inference with fine-grained pruning.
Proceedings of the ACM International Conference on Supercomputing, 2019
LP-BNN: Ultra-low-Latency BNN Inference with Layer Parallelism.
Proceedings of the 30th IEEE International Conference on Application-specific Systems, 2019