Sashank J. Reddi

Affiliations:
  • Carnegie Mellon University, Machine Learning Department


According to our database1, Sashank J. Reddi authored at least 73 papers between 2010 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
On the Inductive Bias of Stacking Towards Improving Reasoning.
CoRR, 2024

Efficient Document Ranking with Learnable Late Interactions.
CoRR, 2024

Landscape-Aware Growing: The Power of a Little LAG.
CoRR, 2024

Efficient Stagewise Pretraining via Progressive Subnetworks.
CoRR, 2024

Can Looped Transformers Learn to Implement Multi-step Gradient Descent for In-context Learning?
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Simplicity Bias via Global Convergence of Sharpness Minimization.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

2023
The Inductive Bias of Flatness Regularization for Deep Matrix Factorization.
CoRR, 2023

Depth Dependence of μP Learning Rates in ReLU MLPs.
CoRR, 2023

What is the Inductive Bias of Flatness Regularization? A Study of Deep Matrix Factorization Models.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Efficient Training of Language Models using Few-Shot Learning.
Proceedings of the International Conference on Machine Learning, 2023

The Lazy Neuron Phenomenon: On Emergence of Activation Sparsity in Transformers.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Differentially Private Adaptive Optimization with Delayed Preconditioners.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

2022
On the Algorithmic Stability and Generalization of Adaptive Optimization Methods.
CoRR, 2022

Large Models are Parsimonious Learners: Activation Sparsity in Trained Transformers.
CoRR, 2022

FedLite: A Scalable Approach for Federated Learning on Resource-constrained Clients.
CoRR, 2022

In defense of dual-encoders for neural ranking.
Proceedings of the International Conference on Machine Learning, 2022

Private Adaptive Optimization with Side information.
Proceedings of the International Conference on Machine Learning, 2022

Robust Training of Neural Networks Using Scale Invariant Architectures.
Proceedings of the International Conference on Machine Learning, 2022

2021
A Field Guide to Federated Optimization.
CoRR, 2021

Distilling Double Descent.
CoRR, 2021

Efficient Training of Retrieval Models using Negative Cache.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Breaking the centralized barrier for cross-device federated learning.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Federated Composite Optimization.
Proceedings of the 38th International Conference on Machine Learning, 2021

Disentangling Sampling and Labeling Bias for Learning in Large-output Spaces.
Proceedings of the 38th International Conference on Machine Learning, 2021

A statistical perspective on distillation.
Proceedings of the 38th International Conference on Machine Learning, 2021

Adaptive Federated Optimization.
Proceedings of the 9th International Conference on Learning Representations, 2021

RankDistil: Knowledge Distillation for Ranking.
Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, 2021

2020
Mime: Mimicking Centralized Stochastic Algorithms in Federated Learning.
CoRR, 2020

Why distillation helps: a statistical perspective.
CoRR, 2020

Doubly-stochastic mining for heterogeneous retrieval.
CoRR, 2020

Adaptive Sampling Distributed Stochastic Variance Reduced Gradient for Heterogeneous Distributed Datasets.
CoRR, 2020

Why are Adaptive Methods Good for Attention Models?
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

O(n) Connections are Expressive Enough: Universal Approximability of Sparse Transformers.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

SCAFFOLD: Stochastic Controlled Averaging for Federated Learning.
Proceedings of the 37th International Conference on Machine Learning, 2020

Low-Rank Bottleneck in Multi-head Attention Models.
Proceedings of the 37th International Conference on Machine Learning, 2020

Are Transformers universal approximators of sequence-to-sequence functions?
Proceedings of the 8th International Conference on Learning Representations, 2020

Large Batch Optimization for Deep Learning: Training BERT in 76 minutes.
Proceedings of the 8th International Conference on Learning Representations, 2020

Learning to Learn by Zeroth-Order Oracle.
Proceedings of the 8th International Conference on Learning Representations, 2020

Can gradient clipping mitigate label noise?
Proceedings of the 8th International Conference on Learning Representations, 2020

2019
Why ADAM Beats SGD for Attention Models.
CoRR, 2019

SCAFFOLD: Stochastic Controlled Averaging for On-Device Federated Learning.
CoRR, 2019

AdaCliP: Adaptive Clipping for Private SGD.
CoRR, 2019

Breaking the Glass Ceiling for Embedding-Based Classifiers for Large Output Spaces.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Escaping Saddle Points with Adaptive Gradient Methods.
Proceedings of the 36th International Conference on Machine Learning, 2019

Stochastic Negative Mining for Learning with Large Output Spaces.
Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, 2019

2018
Adaptive Methods for Nonconvex Optimization.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

On the Convergence of Adam and Beyond.
Proceedings of the 6th International Conference on Learning Representations, 2018

A Generic Approach for Escaping Saddle points.
Proceedings of the International Conference on Artificial Intelligence and Statistics, 2018

2017
New Optimization Methods for Modern Machine Learning.
PhD thesis, 2017

2016
Fast stochastic optimization on Riemannian manifolds.
CoRR, 2016

Fast Stochastic Methods for Nonsmooth Nonconvex Optimization.
CoRR, 2016

Fast Incremental Method for Nonconvex Optimization.
CoRR, 2016

AIDE: Fast and Communication Efficient Distributed Optimization.
CoRR, 2016

Riemannian SVRG: Fast Stochastic Optimization on Riemannian Manifolds.
Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

Proximal Stochastic Methods for Nonsmooth Nonconvex Finite-Sum Optimization.
Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

Variance Reduction in Stochastic Gradient Langevin Dynamics.
Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

Stochastic Variance Reduction for Nonconvex Optimization.
Proceedings of the 33nd International Conference on Machine Learning, 2016

Fast incremental method for smooth nonconvex optimization.
Proceedings of the 55th IEEE Conference on Decision and Control, 2016

Stochastic Frank-Wolfe methods for nonconvex optimization.
Proceedings of the 54th Annual Allerton Conference on Communication, 2016

2015
Adaptivity and Computation-Statistics Tradeoffs for Kernel and Distance based High Dimensional Two Sample Testing.
CoRR, 2015

Communication Efficient Coresets for Empirical Loss Minimization.
Proceedings of the Thirty-First Conference on Uncertainty in Artificial Intelligence, 2015

Large-scale randomized-coordinate descent methods with non-separable linear constraints.
Proceedings of the Thirty-First Conference on Uncertainty in Artificial Intelligence, 2015

On Variance Reduction in Stochastic Gradient Descent and its Asynchronous Variants.
Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 2015

On the High Dimensional Power of a Linear-Time Two Sample Test under Mean-shift Alternatives.
Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, 2015

Doubly Robust Covariate Shift Correction.
Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015

On the Decreasing Power of Kernel and Distance Based Nonparametric Hypothesis Tests in High Dimensions.
Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015

2014
Kernel MMD, the Median Heuristic and Distance Correlation in High Dimensions.
CoRR, 2014

On the High-dimensional Power of Linear-time Kernel Two-Sample Testing under Mean-difference Alternatives.
CoRR, 2014

k-NN Regression on Functional Data with Incomplete Observations.
Proceedings of the Thirtieth Conference on Uncertainty in Artificial Intelligence, 2014

2013
Scale Invariant Conditional Dependence Measures.
Proceedings of the 30th International Conference on Machine Learning, 2013

2012
Incentive Decision Processes.
Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence, 2012

A Maximum Likelihood Approach For Selecting Sets of Alternatives.
Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence, 2012

2010
MAP estimation in Binary MRFs via Bipartite Multi-cuts.
Proceedings of the Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010. Proceedings of a meeting held 6-9 December 2010, 2010


  Loading...