2025
Efficient LLM Inference with Activation Checkpointing and Hybrid Caching.
CoRR, January, 2025

16.2 RNGD: A 5nm Tensor-Contraction Processor for Power-Efficient Inference on Large Language Models.
Proceedings of the IEEE International Solid-State Circuits Conference, 2025

2024
TCP: A Tensor Contraction Processor for AI Workloads Industrial Product.
Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024