2025
Unnatural Languages Are Not Bugs but Features for LLMs.
,
,
,
,
,
,
,
,
,
,
,
CoRR, March, 2025
MATH-Perturb: Benchmarking LLMs' Math Reasoning Abilities against Hard Perturbations.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, February, 2025
Training-Free Activation Sparsity in Large Language Models.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
SVDQuant: Absorbing Outliers by Low-Rank Component for 4-Bit Diffusion Models.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
2024
SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models.
CoRR, 2024
Incremental Comprehension of Garden-Path Sentences by Large Language Models: Semantic Interpretation, Syntactic Re-Analysis, and Attention.
CoRR, 2024
JetMoE: Reaching Llama2 Performance with 0.1M Dollars.
CoRR, 2024
Accelerating Greedy Coordinate Gradient via Probe Sampling.
CoRR, 2024
DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models.
CoRR, 2024
BitDelta: Your Fine-Tune May Only Be Worth One Bit.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
SnapKV: LLM Knows What You are Looking for Before Generation.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
Accelerating Greedy Coordinate Gradient and General Prompt Optimization via Probe Sampling.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
REST: Retrieval-Based Speculative Decoding.
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads.
Proceedings of the Forty-first International Conference on Machine Learning, 2024
Large Language Models as Tool Makers.
Proceedings of the Twelfth International Conference on Learning Representations, 2024
FlexAttention for Efficient High-Resolution Vision-Language Models.
Proceedings of the Computer Vision - ECCV 2024, 2024
DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
2023
Scaling In-Context Demonstrations with Structured Attention.
CoRR, 2023
Reward Collapse in Aligning Large Language Models.
CoRR, 2023
What Makes Convolutional Models Great on Long Sequence Modeling?
Proceedings of the Eleventh International Conference on Learning Representations, 2023
2022
Is Vertical Logistic Regression Privacy-Preserving? A Comprehensive Privacy Analysis and Beyond.
CoRR, 2022
2021
First Place Solution of KDD Cup 2021 & OGB Large-Scale Challenge Graph Prediction Track.
CoRR, 2021
Do Transformers Really Perform Bad for Graph Representation?
CoRR, 2021
Towards Certifying 𝓁<sub>∞</sub> Robustness using Neural Networks with 𝓁<sub>∞</sub>-dist Neurons.
CoRR, 2021
Do Transformers Really Perform Badly for Graph Representation?
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021
Towards a Theoretical Framework of Out-of-Distribution Generalization.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021
Stable, Fast and Accurate: Kernelized Attention with Relative Positional Encoding.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021
Towards Certifying L-infinity Robustness using Neural Networks with L-inf-dist Neurons.
Proceedings of the 38th International Conference on Machine Learning, 2021
GraphNorm: A Principled Approach to Accelerating Graph Neural Network Training.
Proceedings of the 38th International Conference on Machine Learning, 2021
A Theory of Label Propagation for Subpopulation Shift.
Proceedings of the 38th International Conference on Machine Learning, 2021
2020
RANDOM MASK: Towards Robust Convolutional Neural Networks.
CoRR, 2020
Sanity-Checking Pruning Methods: Random Tickets can Win the Jackpot.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020
Locally Differentially Private (Contextual) Bandits Learning.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020
2019
Defective Convolutional Layers Learn Robust CNNs.
CoRR, 2019
Convergence of Adversarial Training in Overparametrized Networks.
CoRR, 2019
Adversarially Robust Generalization Just Requires More Unlabeled Data.
CoRR, 2019
A Gram-Gauss-Newton Method Learning Overparameterized Deep Neural Networks for Regression Problems.
CoRR, 2019
Convergence of Adversarial Training in Overparametrized Neural Networks.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019