Scalable LLM Math Reasoning Acceleration with Low-rank Distillation.
CoRR, May, 2025
Leveraging Multimodal Diffusion Models to Accelerate Imaging with Side Information.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025
Towards Low-bit Communication for Tensor Parallel LLM Inference.
CoRR, 2024
ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference.
CoRR, 2024
Prompt-prompted Mixture of Experts for Efficient LLM Generation.
CoRR, 2024
Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference.
Proceedings of the Forty-first International Conference on Machine Learning, 2024
A Lightweight Transformer for Faster and Robust EBSD Data Collection.
CoRR, 2023
Deep Unfolded Tensor Robust PCA With Self-Supervised Learning.
Proceedings of the IEEE International Conference on Acoustics, 2023
Fast and Provable Tensor Robust Principal Component Analysis via Scaled Gradient Descent.
CoRR, 2022