Hardware-Efficient Attention for Fast Decoding.
CoRR, May, 2025
Pushing Mixture of Experts to the Limit: Extremely Parameter Efficient MoE for Instruction Tuning.
Proceedings of the Twelfth International Conference on Learning Representations, 2024
High Probability Bounds for Stochastic Continuous Submodular Maximization.
Proceedings of the International Conference on Artificial Intelligence and Statistics, 2023