2025
Scalable LLM Math Reasoning Acceleration with Low-rank Distillation.
CoRR, May, 2025

Leveraging Multimodal Diffusion Models to Accelerate Imaging with Side Information.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

2024
Towards Low-bit Communication for Tensor Parallel LLM Inference.
CoRR, 2024

ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference.
CoRR, 2024

Prompt-prompted Mixture of Experts for Efficient LLM Generation.
CoRR, 2024

Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

2023
A Lightweight Transformer for Faster and Robust EBSD Data Collection.
CoRR, 2023

Deep Unfolded Tensor Robust PCA With Self-Supervised Learning.
Proceedings of the IEEE International Conference on Acoustics, 2023

2022
Fast and Provable Tensor Robust Principal Component Analysis via Scaled Gradient Descent.
CoRR, 2022