Suvrit Sra

Orcid: 0000-0001-8516-4925

Affiliations:
  • Massachusetts Institute of Technology (MIT), Laboratory for Information and Decision Systems, Cambridge, MA, USA
  • Max Planck Institute for Biological Cybernetics, Tübingen, Germany
  • University of Texas at Austin, Department of Computer Sciences, Austin, TX, USA


According to our database1, Suvrit Sra authored at least 171 papers between 2003 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Graph Transformers Dream of Electric Flow.
CoRR, 2024

Memory-augmented Transformers can implement Linear First-Order Optimization Methods.
CoRR, 2024

First-Order Methods for Linearly Constrained Bilevel Optimization.
CoRR, 2024

Riemannian Bilevel Optimization.
CoRR, 2024

Transformers Implement Functional Gradient Descent to Learn Non-Linear Functions In Context.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

How to Escape Sharp Minima with Random Perturbations.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Linear attention is (maybe) all you need (to understand Transformer optimization).
Proceedings of the Twelfth International Conference on Learning Representations, 2024

2023
Sion's Minimax Theorem in Geodesic Metric Spaces and a Riemannian Extragradient Algorithm.
SIAM J. Optim., December, 2023

Riemannian Optimization via Frank-Wolfe Methods.
Math. Program., May, 2023

Invex Programs: First Order Algorithms and Their Convergence.
CoRR, 2023

How to escape sharp minima.
CoRR, 2023

Transformers learn to implement preconditioned gradient descent for in-context learning.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

The Crucial Role of Normalization in Sharpness-Aware Minimization.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Can Direct Latent Model Learning Solve Linear Quadratic Gaussian Control?
Proceedings of the Learning for Dynamics and Control Conference, 2023

On the Training Instability of Shuffling SGD with Batch Normalization.
Proceedings of the International Conference on Machine Learning, 2023

Global optimality for Euclidean CCCP under Riemannian convexity.
Proceedings of the International Conference on Machine Learning, 2023

Sign and Basis Invariant Networks for Spectral Graph Representation Learning.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Toward Understanding State Representation Learning in MuZero: A Case Study in Linear Quadratic Gaussian Control.
Proceedings of the 62nd IEEE Conference on Decision and Control, 2023

2022
Computing Brascamp-Lieb Constants through the lens of Thompson Geometry.
CoRR, 2022

On a class of geodesically convex optimization problems solved via Euclidean MM methods.
CoRR, 2022

Minimax in Geodesic Metric Spaces: Sion's Theorem and Algorithms.
CoRR, 2022

Understanding Nesterov's Acceleration via Proximal Point Method.
Proceedings of the 5th Symposium on Simplicity in Algorithms, 2022

CCCP is Frank-Wolfe in disguise.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Efficient Sampling on Riemannian Manifolds via Langevin MCMC.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Time Varying Regression with Hidden Linear Dynamics.
Proceedings of the Learning for Dynamics and Control Conference, 2022

Neural Network Weights Do Not Converge to Stationary Points: An Invariant Measure Perspective.
Proceedings of the International Conference on Machine Learning, 2022

Beyond Worst-Case Analysis in Stochastic Approximation: Moment Estimation Improves Instance Complexity.
Proceedings of the International Conference on Machine Learning, 2022

Understanding the unstable convergence of gradient descent.
Proceedings of the International Conference on Machine Learning, 2022

Minibatch vs Local SGD with Shuffling: Tight Convergence Bounds and Beyond.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Understanding Riemannian Acceleration via a Proximal Extragradient Framework.
Proceedings of the Conference on Learning Theory, 2-5 July 2022, London, UK., 2022

Max-Margin Contrastive Learning.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021
A Riemannian Accelerated Proximal Extragradient Framework and its Implications.
CoRR, 2021

On Convergence of Training Loss Without Reaching Stationary Points.
CoRR, 2021

Can Single-Shuffle SGD be Better than Reshuffling SGD and GD?
CoRR, 2021

Three Operator Splitting with Subgradients, Stochastic Gradients, and Adaptive Learning Rates.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Can contrastive learning avoid shortcut solutions?
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Three Operator Splitting with a Nonconvex Loss Function.
Proceedings of the 38th International Conference on Machine Learning, 2021

Provably Efficient Algorithms for Multi-Objective Competitive RL.
Proceedings of the 38th International Conference on Machine Learning, 2021

Online Learning in Unknown Markov Games.
Proceedings of the 38th International Conference on Machine Learning, 2021

Coping with Label Shift via Distributionally Robust Optimisation.
Proceedings of the 9th International Conference on Learning Representations, 2021

Contrastive Learning with Hard Negative Samples.
Proceedings of the 9th International Conference on Learning Representations, 2021

Open Problem: Can Single-Shuffle SGD be Better than Reshuffling SGD and GD?
Proceedings of the Conference on Learning Theory, 2021

2020
An alternative to EM for Gaussian mixture models: batch and stochastic Riemannian optimization.
Math. Program., 2020

An Interpretable Predictive Model of Vaccine Utilization for Tanzania.
Frontiers Artif. Intell., 2020

Why do classifier accuracies show linear trends under distribution shift?
CoRR, 2020

Provably Efficient Online Agnostic Learning in Markov Games.
CoRR, 2020

Stochastic Optimization with Non-stationary Noise.
CoRR, 2020

On Tight Convergence Rates of Without-replacement SGD.
CoRR, 2020

On Complexity of Finding Stationary Points of Nonsmooth Nonconvex Functions.
CoRR, 2020

Why are Adaptive Methods Good for Attention Models?
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Towards Minimax Optimal Reinforcement Learning in Factored Markov Decision Processes.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

SGD with shuffling: optimal rates without component convexity and large epoch requirements.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Complexity of Finding Stationary Points of Nonconvex Nonsmooth Functions.
Proceedings of the 37th International Conference on Machine Learning, 2020

Strength from Weakness: Fast Learning Using Weak Supervision.
Proceedings of the 37th International Conference on Machine Learning, 2020

Learning Adversarial Markov Decision Processes with Bandit Feedback and Unknown Transition.
Proceedings of the 37th International Conference on Machine Learning, 2020

Why Gradient Clipping Accelerates Training: A Theoretical Justification for Adaptivity.
Proceedings of the 8th International Conference on Learning Representations, 2020

From Nesterov's Estimate Sequence to Riemannian Acceleration.
Proceedings of the Conference on Learning Theory, 2020

Geodesically-convex optimization for averaging partially observed covariance matrices.
Proceedings of The 12th Asian Conference on Machine Learning, 2020

2019
Why ADAM Beats SGD for Attention Models.
CoRR, 2019

Metrics Induced by Quantum Jensen-Shannon-Renyí and Related Divergences.
CoRR, 2019

Nonconvex stochastic optimization on manifolds via Riemannian Frank-Wolfe methods.
CoRR, 2019

Analysis of Gradient Clipping and Adaptive Scaling with a Relaxed Smoothness Condition.
CoRR, 2019

Are deep ResNets provably better than linear predictors?
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Small ReLU networks are powerful memorizers: a tight analysis of memorization capacity.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Flexible Modeling of Diversity with Strongly Log-Concave Distributions.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Conditional Gradient Methods via Stochastic Path-Integrated Differential Estimator.
Proceedings of the 36th International Conference on Machine Learning, 2019

Escaping Saddle Points with Adaptive Gradient Methods.
Proceedings of the 36th International Conference on Machine Learning, 2019

Random Shuffling Beats SGD after Finite Epochs.
Proceedings of the 36th International Conference on Machine Learning, 2019

Small nonlinearities in activation functions create bad local minima in neural networks.
Proceedings of the 7th International Conference on Learning Representations, 2019

Efficiently testing local optimality and escaping saddles for ReLU networks.
Proceedings of the 7th International Conference on Learning Representations, 2019

Acceleration in First Order Quasi-strongly Convex Optimization by ODE Discretization.
Proceedings of the 58th IEEE Conference on Decision and Control, 2019

Learning Determinantal Point Processes by Corrective Negative Sampling.
Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, 2019

2018
Modular Proximal Optimization for Multidimensional Total-Variation Regularization.
J. Mach. Learn. Res., 2018

Deep-RBF Networks Revisited: Robust Classification with Rejection.
CoRR, 2018

R-SPIDER: A Fast Riemannian Stochastic Optimization Algorithm with Curvature Independent Rate.
CoRR, 2018

Finite sample expressive power of small-width ReLU networks.
CoRR, 2018

Towards Riemannian Accelerated Gradient Methods.
CoRR, 2018

Learning Determinantal Point Processes by Sampling Inferred Negatives.
CoRR, 2018

A Critical View of Global Optimality in Deep Learning.
CoRR, 2018

Direct Runge-Kutta Discretization Achieves Acceleration.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Exponentiated Strongly Rayleigh Distributions.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Global Optimality Conditions for Deep Neural Networks.
Proceedings of the 6th International Conference on Learning Representations, 2018

Distributional Adversarial Networks.
Proceedings of the 6th International Conference on Learning Representations, 2018

Non-Linear Temporal Subspace Representations for Activity Recognition.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

An Estimate Sequence for Geodesically Convex Optimization.
Proceedings of the Conference On Learning Theory, 2018

On Geodesically Convex Formulations for the Brascamp-Lieb Constant.
Proceedings of the Approximation, 2018

A Generic Approach for Escaping Saddle points.
Proceedings of the International Conference on Artificial Intelligence and Statistics, 2018

2017
Riemannian Dictionary Learning and Sparse Coding for Positive Definite Matrices.
IEEE Trans. Neural Networks Learn. Syst., 2017

Frank-Wolfe methods for geodesically convex optimization with application to the matrix geometric mean.
CoRR, 2017

Unsupervised robust nonparametric learning of hidden community properties.
CoRR, 2017

Sequence Summarization Using Order-constrained Kernelized Feature Subspaces.
CoRR, 2017

Elementary Symmetric Polynomials for Optimal Experimental Design.
Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

Polynomial time algorithms for dual volume sampling.
Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

Combinatorial Topic Models using Small-Variance Asymptotics.
Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, 2017

2016
Entropic metric alignment for correspondence problems.
ACM Trans. Graph., 2016

On inequalities for normalized Schur functions.
Eur. J. Comb., 2016

Inference and mixture modeling with the Elliptical Gamma Distribution.
Comput. Stat. Data Anal., 2016

Fast stochastic optimization on Riemannian manifolds.
CoRR, 2016

Fast Stochastic Methods for Nonsmooth Nonconvex Optimization.
CoRR, 2016

Fast Incremental Method for Nonconvex Optimization.
CoRR, 2016

Diversity Networks.
Proceedings of the 4th International Conference on Learning Representations, 2016

Fast Sampling for Strongly Rayleigh Measures with Application to Determinantal Point Processes.
CoRR, 2016

Riemannian SVRG: Fast Stochastic Optimization on Riemannian Manifolds.
Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

Proximal Stochastic Methods for Nonsmooth Nonconvex Finite-Sum Optimization.
Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

Kronecker Determinantal Point Processes.
Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

Fast Mixing Markov Chains for Strongly Rayleigh Measures, DPPs, and Constrained Sampling.
Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

Geometric Mean Metric Learning.
Proceedings of the 33nd International Conference on Machine Learning, 2016

Parallel and Distributed Block-Coordinate Frank-Wolfe Algorithms.
Proceedings of the 33nd International Conference on Machine Learning, 2016

Stochastic Variance Reduction for Nonconvex Optimization.
Proceedings of the 33nd International Conference on Machine Learning, 2016

Gaussian quadrature for matrix inverse forms with applications.
Proceedings of the 33nd International Conference on Machine Learning, 2016

Fast DPP Sampling for Nystrom with Application to Kernel Methods.
Proceedings of the 33nd International Conference on Machine Learning, 2016

First-order Methods for Geodesically Convex Optimization.
Proceedings of the 29th Conference on Learning Theory, 2016

Fast incremental method for smooth nonconvex optimization.
Proceedings of the 55th IEEE Conference on Decision and Control, 2016

Stochastic Frank-Wolfe methods for nonconvex optimization.
Proceedings of the 54th Annual Allerton Conference on Communication, 2016

AdaDelay: Delay Adaptive Distributed Stochastic Optimization.
Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, 2016

Efficient Sampling for k-Determinantal Point Processes.
Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, 2016

2015
Conic Geometric Optimization on the Manifold of Positive Definite Matrices.
SIAM J. Optim., 2015

AdaDelay: Delay Adaptive Distributed Stochastic Convex Optimization.
CoRR, 2015

Fixed-point algorithms for determinantal point processes.
CoRR, 2015

Bounds on bilinear inverse forms via Gaussian quadrature with applications.
CoRR, 2015

Convex Optimization for Parallel Energy Minimization.
CoRR, 2015

Manifold Optimization for Gaussian Mixture Models.
CoRR, 2015

Large-scale randomized-coordinate descent methods with non-separable linear constraints.
Proceedings of the Thirty-First Conference on Uncertainty in Artificial Intelligence, 2015

On Variance Reduction in Stochastic Gradient Descent and its Asynchronous Variants.
Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 2015

Matrix Manifold Optimization for Gaussian Mixtures.
Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 2015

Fixed-point algorithms for learning determinantal point processes.
Proceedings of the 32nd International Conference on Machine Learning, 2015

Data modeling with the elliptical gamma distribution.
Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, 2015

2014
Efficient Nearest Neighbors via Robust Sparse Hashing.
IEEE Trans. Image Process., 2014

Fast Newton methods for the group fused lasso.
Proceedings of the Thirtieth Conference on Uncertainty in Artificial Intelligence, 2014

Efficient Structured Matrix Rank Minimization.
Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, 2014

Randomized Nonlinear Component Analysis.
Proceedings of the 31th International Conference on Machine Learning, 2014

Towards an optimal stochastic alternating direction method of multipliers.
Proceedings of the 31th International Conference on Machine Learning, 2014

Riemannian Sparse Coding for Positive Definite Matrices.
Proceedings of the Computer Vision - ECCV 2014, 2014

Tractable Optimization in Machine Learning.
Proceedings of the Tractability: Practical Approaches to Hard Problems, 2014

2013
Jensen-Bregman LogDet Divergence with Application to Efficient Similarity Search for Covariance Matrices.
IEEE Trans. Pattern Anal. Mach. Intell., 2013

A non-monotonic method for large-scale non-negative least squares.
Optim. Methods Softw., 2013

The multivariate Watson distribution: Maximum-likelihood estimation and other aspects.
J. Multivar. Anal., 2013

Statistical estimation for optimization problems on graphs.
CoRR, 2013

Geometric optimisation on positive definite matrices for elliptically contoured distributions.
Proceedings of the Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013

Reflection methods for user-friendly submodular optimization.
Proceedings of the Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013

2012
Fast projections onto mixed-norm balls with applications.
Data Min. Knowl. Discov., 2012

A short note on parameter approximation for von Mises-Fisher distributions: and a fast implementation of <i>I</i> <sub> <i>s</i> </sub>(<i>x</i>).
Comput. Stat., 2012

Scalable nonconvex inexact proximal splitting.
Proceedings of the Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3-6, 2012

A new metric on the manifold of kernel matrices with application to matrix geometric means.
Proceedings of the Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3-6, 2012

2011
Generalized Dictionary Learning for Symmetric Positive Definite Matrices with Application to Nearest Neighbor Retrieval.
Proceedings of the Machine Learning and Knowledge Discovery in Databases, 2011

Fast Projections onto ℓ1, q -Norm Balls for Grouped Feature Selection.
Proceedings of the Machine Learning and Knowledge Discovery in Databases, 2011

Fast Newton-type Methods for Total Variation Regularization.
Proceedings of the 28th International Conference on Machine Learning, 2011

Efficient similarity search for covariance matrices via the Jensen-Bregman LogDet Divergence.
Proceedings of the IEEE International Conference on Computer Vision, 2011

Denoising sparse noise via online dictionary learning.
Proceedings of the IEEE International Conference on Acoustics, 2011

2010
Tackling Box-Constrained Optimization via a New Projected Quasi-Newton Approach.
SIAM J. Sci. Comput., 2010

A scalable trust-region algorithm with application to mixed-norm regression.
Proceedings of the 27th International Conference on Machine Learning (ICML-10), 2010

Multiframe blind deconvolution, super-resolution, and saturation correction via incremental EM.
Proceedings of the International Conference on Image Processing, 2010

Efficient filter flow for space-variant multiframe blind deconvolution.
Proceedings of the Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition, 2010

2009
Convex Perturbations for Scalable Semidefinite Programming.
Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics, 2009

A Trivial Observation related to Sparse Recovery
CoRR, 2009

Workshop summary: Numerical mathematics in machine learning.
Proceedings of the 26th Annual International Conference on Machine Learning, 2009

Approximation Algorithms for Tensor Clustering.
Proceedings of the Algorithmic Learning Theory, 20th International Conference, 2009

2008
The Metric Nearness Problem.
SIAM J. Matrix Anal. Appl., 2008

Fast Projection-Based Methods for the Least Squares Nonnegative Matrix Approximation Problem.
Stat. Anal. Data Min., 2008

Approximation Algorithms for Bregman Co-clustering and Tensor Clustering
CoRR, 2008

Block-Iterative Algorithms for Non-negative Matrix Approximation.
Proceedings of the 8th IEEE International Conference on Data Mining (ICDM 2008), 2008

2007
Fast Newton-type Methods for the Least Squares Nonnegative Matrix Approximation Problem.
Proceedings of the Seventh SIAM International Conference on Data Mining, 2007

Information-theoretic metric learning.
Proceedings of the Machine Learning, 2007

2006
Incremental Aspect Models for Mining Document Streams.
Proceedings of the Knowledge Discovery in Databases: PKDD 2006, 2006

Row-Action Methods for Compressed Sensing.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

Efficient Large Scale Linear Programming Support Vector Machines.
Proceedings of the Machine Learning: ECML 2006, 2006

2005
Clustering on the Unit Hypersphere using von Mises-Fisher Distributions.
J. Mach. Learn. Res., 2005

Generalized Nonnegative Matrix Approximations with Bregman Divergences.
Proceedings of the Advances in Neural Information Processing Systems 18 [Neural Information Processing Systems, 2005

2004
Minimum Sum-Squared Residue Co-Clustering of Gene Expression Data.
Proceedings of the Fourth SIAM International Conference on Data Mining, 2004

Triangle Fixing Algorithms for the Metric Nearness Problem.
Proceedings of the Advances in Neural Information Processing Systems 17 [Neural Information Processing Systems, 2004

2003
Generative model-based clustering of directional data.
Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, August 24, 2003


  Loading...