Efficient LLM Inference with Activation Checkpointing and Hybrid Caching.
CoRR, January, 2025
16.2 RNGD: A 5nm Tensor-Contraction Processor for Power-Efficient Inference on Large Language Models.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the IEEE International Solid-State Circuits Conference, 2025
TCP: A Tensor Contraction Processor for AI Workloads Industrial Product.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024