Martin Jaggi

Orcid: 0000-0003-1579-5558

Affiliations:
  • EPFL, School of Computer and Communication Sciences, Lausanne, Switzerland


According to our database1, Martin Jaggi authored at least 191 papers between 2009 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
MyThisYourThat for interpretable identification of systematic bias in federated learning for biomedical images.
npj Digit. Medicine, 2024

Analyzing & Reducing the Need for Learning Rate Warmup in GPT Training.
CoRR, 2024

Improving Stochastic Cubic Newton with Momentum.
CoRR, 2024

HyperINF: Unleashing the HyperPower of the Schulz's Method for Data Influence Estimation.
CoRR, 2024

On-device Collaborative Language Modeling via a Mixture of Generalists and Specialists.
CoRR, 2024

CoBo: Collaborative Learning via Bilevel Optimization.
CoRR, 2024

A New First-Order Meta-Learning Algorithm with Convergence Guarantees.
CoRR, 2024

Could ChatGPT get an Engineering Degree? Evaluating Higher Education Vulnerability to AI Assistants.
, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
CoRR, 2024

Effective Interplay between Sparsity and Quantization: From Theory to Practice.
CoRR, 2024

Deep Grokking: Would Deep Neural Networks Generalize Better?
CoRR, 2024

Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations.
CoRR, 2024

Personalized Collaborative Fine-Tuning for On-Device Large Language Models.
CoRR, 2024

QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs.
CoRR, 2024

Towards an empirical understanding of MoE design choices.
CoRR, 2024

Attention with Markov: A Framework for Principled Analysis of Transformers via Markov Chains.
CoRR, 2024

InterpretCC: Conditional Computation for Inherently Interpretable Neural Networks.
CoRR, 2024

DenseFormer: Enhancing Information Flow in Transformers via Depth Weighted Averaging.
CoRR, 2024

LASER: Linear Compression in Wireless Distributed Optimization.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Rotational Equilibrium: How Weight Decay Balances Learning Across Neural Networks.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

On Convergence of Incremental Gradient for Non-convex Smooth Functions.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

DOGE: Domain Reweighting with Generalization Estimation.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Spectral Preconditioning for Gradient Methods on Graded Non-convex Functions.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

The Privacy Power of Correlated Noise in Decentralized Learning.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Layer-wise linear mode connectivity.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Ghost Noise for Regularizing Deep Neural Networks.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Provably Personalized and Robust Federated Learning.
Trans. Mach. Learn. Res., 2023

DeepBreath - automated detection of respiratory pathology from lung auscultation in 572 pediatric outpatients across 5 countries.
npj Digit. Medicine, 2023

Beyond Spectral Gap: The Role of the Topology in Decentralized Learning.
J. Mach. Learn. Res., 2023

MEDITRON-70B: Scaling Medical Pretraining for Large Language Models.
CoRR, 2023

Controllable Topic-Focused Abstractive Summarization.
CoRR, 2023

Irreducible Curriculum for Language Model Pretraining.
CoRR, 2023

CoTFormer: More Tokens With Attention Make Up For Less Depth.
CoRR, 2023

Layerwise Linear Mode Connectivity.
CoRR, 2023

Faster Causal Attention Over Large Sequences Through Sparse Flash Attention.
CoRR, 2023

Shuffle SGD is Always Better than SGD: Improved Analysis of SGD with Arbitrary Data Orders.
CoRR, 2023

Rotational Optimizers: Simple & Robust DNN Training.
CoRR, 2023

Hardware-Efficient Transformer Training via Piecewise Affine Operations.
CoRR, 2023

Landmark Attention: Random-Access Infinite Context Length for Transformers.
CoRR, 2023

Unified Convergence Theory of Stochastic and Variance-Reduced Cubic Newton Methods.
CoRR, 2023

Beyond spectral gap (extended): The role of the topology in decentralized learning.
CoRR, 2023

MultiMoDN - Multimodal, Multi-Task, Interpretable Modular Networks.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Fast Attention Over Long Sequences With Dynamic Sparse Flash Attention.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Random-Access Infinite Context Length for Transformers.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Multiplication-Free Transformer Training via Piecewise Affine Operations.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Collaborative Learning via Prediction Consensus.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Special Properties of Gradient Descent with Large Learning Rates.
Proceedings of the International Conference on Machine Learning, 2023

Second-Order Optimization with Lazy Hessians.
Proceedings of the International Conference on Machine Learning, 2023

Agree to Disagree: Diversity through Disagreement for Better Transferability.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Linearization Algorithms for Fully Composite Optimization.
Proceedings of the Thirty Sixth Annual Conference on Learning Theory, 2023

SIMSUM: Document-level Text Simplification via Simultaneous Summarization.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

2022
Scalable Collaborative Learning via Representation Sharing.
CoRR, 2022

Accuracy Boosters: Epoch-Driven Mixed-Mantissa Block Floating-Point for DNN Training.
CoRR, 2022

Modular Clinical Decision Support Networks (MoDN) - Updatable, Interpretable, and Portable Predictions for Evolving Clinical Environments.
CoRR, 2022

On Avoiding Local Minima Using Gradient Descent With Large Learning Rates.
CoRR, 2022

Data-heterogeneity-aware Mixing for Decentralized Learning.
CoRR, 2022

Improving Generalization via Uncertainty Driven Perturbations.
CoRR, 2022

Characterizing & Finding Good Data Orderings for Fast Convergence of Sequential Gradient Methods.
CoRR, 2022

Byzantine-Robust Decentralized Learning via Self-Centered Clipping.
CoRR, 2022

FLamby: Datasets and Benchmarks for Cross-Silo Federated Learning in Realistic Healthcare Settings.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Sharper Convergence Guarantees for Asynchronous SGD for Distributed and Federated Learning.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

SKILL: Structured Knowledge Infusion for Large Language Models.
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

Byzantine-Robust Learning on Heterogeneous Datasets via Bucketing.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Masked Training of Neural Networks with Partial Gradients.
Proceedings of the International Conference on Artificial Intelligence and Statistics, 2022

Implicit Gradient Alignment in Distributed and Federated Learning.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021
PSYCHE-D: predicting change in depression severity using person-generated health data (DATASET).
Dataset, July, 2021

An accelerated communication-efficient primal-dual optimization framework for structured machine learning.
Optim. Methods Softw., 2021

Advances and Open Problems in Federated Learning.
Found. Trends Mach. Learn., 2021

Understanding Memorization from the Perspective of Optimization via Efficient Influence Estimation.
CoRR, 2021

Interpreting Language Models Through Knowledge Graph Extraction.
CoRR, 2021

Linear Speedup in Personalized Collaborative Learning.
CoRR, 2021

Optimal Model Averaging: Towards Personalized Collaborative Learning.
CoRR, 2021

WAFFLE: Weighted Averaging for Personalized Federated Learning.
CoRR, 2021

On Second-order Optimization Methods for Federated Learning.
CoRR, 2021

A Field Guide to Federated Optimization.
CoRR, 2021

IFedAvg: Interpretable Data-Interoperability for Federated Learning.
CoRR, 2021

Simultaneous Training of Partially Masked Neural Networks.
CoRR, 2021

Federated Learning for Malware Detection in IoT Devices.
CoRR, 2021

RelaySum for Decentralized Deep Learning on Heterogeneous Data.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Breaking the centralized barrier for cross-device federated learning.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Prediction of self-reported depression scores using person-generated health data from a virtual 1-year mental health observational study.
Proceedings of the DigiBiom@MobiSys '21: Proceedings of the 2021 Workshop on Future of Digital Biomarkers, 2021

Equinox: Training (for Free) on a Custom Inference Accelerator.
Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021

Learning from History for Byzantine Robust Optimization.
Proceedings of the 38th International Conference on Machine Learning, 2021

Exact Optimization of Conformal Predictors via Incremental and Decremental Learning.
Proceedings of the 38th International Conference on Machine Learning, 2021

Quasi-global Momentum: Accelerating Decentralized Deep Learning on Heterogeneous Data.
Proceedings of the 38th International Conference on Machine Learning, 2021

Consensus Control for Decentralized Deep Learning.
Proceedings of the 38th International Conference on Machine Learning, 2021

Understanding the effects of data parallelism and sparsity on neural network training.
Proceedings of the 9th International Conference on Learning Representations, 2021

Taming GANs with Lookahead-Minmax.
Proceedings of the 9th International Conference on Learning Representations, 2021

Semantic Perturbations with Normalizing Flows for Improved Generalization.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Faster Parallel Training of Word Embeddings.
Proceedings of the 28th IEEE International Conference on High Performance Computing, 2021

Self-Supervised Neural Topic Modeling.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2021, 2021

Critical Parameters for Scalable Distributed Learning with Large Batches and Asynchronous Updates.
Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, 2021

A Linearly Convergent Algorithm for Decentralized Optimization: Sending Less Bits for Free!
Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, 2021

LENA: Communication-Efficient Distributed Learning with Self-Triggered Gradient Uploads.
Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, 2021

Lightweight Cross-Lingual Sentence Representation Learning.
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

Obtaining Better Static Word Embeddings Using Contextual Embedding Models.
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

2020
Sparse Communication for Training Deep Networks.
CoRR, 2020

Mime: Mimicking Centralized Stochastic Algorithms in Federated Learning.
CoRR, 2020

PowerGossip: Practical Low-Rank Communication Compression in Decentralized Deep Learning.
CoRR, 2020

Multi-Head Attention: Collaborate Instead of Concatenate.
CoRR, 2020

Taming GANs with Lookahead.
CoRR, 2020

Byzantine-Robust Learning on Heterogeneous Datasets via Resampling.
CoRR, 2020

Secure Byzantine-Robust Machine Learning.
CoRR, 2020

Masking as an Efficient Alternative to Finetuning for Pretrained Language Models.
CoRR, 2020

Data Parallelism in Training Sparse Neural Networks.
CoRR, 2020

Practical Low-Rank Communication Compression in Decentralized Deep Learning.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Model Fusion via Optimal Transport.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Ensemble Distillation for Robust Model Fusion in Federated Learning.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Weight Erosion: An Update Aggregation Scheme for Personalized Collaborative Machine Learning.
Proceedings of the Domain Adaptation and Representation Transfer, and Distributed and Collaborative Learning, 2020

Optimizer Benchmarking Needs to Account for Hyperparameter Tuning.
Proceedings of the 37th International Conference on Machine Learning, 2020

Extrapolation for Large-batch Training in Deep Learning.
Proceedings of the 37th International Conference on Machine Learning, 2020

A Unified Theory of Decentralized SGD with Changing Topology and Local Updates.
Proceedings of the 37th International Conference on Machine Learning, 2020

Evaluating The Search Phase of Neural Architecture Search.
Proceedings of the 8th International Conference on Learning Representations, 2020

Don't Use Large Mini-batches, Use Local SGD.
Proceedings of the 8th International Conference on Learning Representations, 2020

Dynamic Model Pruning with Feedback.
Proceedings of the 8th International Conference on Learning Representations, 2020

Decentralized Deep Learning with Arbitrary Communication Compression.
Proceedings of the 8th International Conference on Learning Representations, 2020

On the Relationship between Self-Attention and Convolutional Layers.
Proceedings of the 8th International Conference on Learning Representations, 2020

Masking as an Efficient Alternative to Finetuning for Pretrained Language Models.
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

Context Mover's Distance & Barycenters: Optimal Transport of Contexts for Building Representations.
Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, 2020

Linearly Convergent Frank-Wolfe without Line-Search.
Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, 2020

2019
Unsupervised robust nonparametric learning of hidden community properties.
Math. Found. Comput., 2019

Robust Cross-lingual Embeddings from Parallel Sentences.
CoRR, 2019

Advances and Open Problems in Federated Learning.
CoRR, 2019

On the Tunability of Optimizers in Deep Learning.
CoRR, 2019

SysML: The New Frontier of Machine Learning Systems.
CoRR, 2019

Structure Tree-LSTM: Structure-aware Attentional Document Encoders.
CoRR, 2019

Forecasting intracranial hypertension using multi-scale waveform metrics.
CoRR, 2019

Crosslingual Document Embedding as Reduced-Rank Ridge Regression.
Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, 2019

Correlating Twitter Language with Community-Level Health Outcomes.
Proceedings of the Fourth Social Media Mining for Health Application Workshop & Shared Task, 2019

PowerSGD: Practical Low-Rank Gradient Compression for Distributed Optimization.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Unsupervised Scalable Representation Learning for Multivariate Time Series.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Better Word Embeddings by Disentangling Contextual n-Gram Information.
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019

Open-Vocabulary Keyword Spotting with Audio and Text Embeddings.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Decentralized Stochastic Optimization and Gossip Algorithms with Compressed Communication.
Proceedings of the 36th International Conference on Machine Learning, 2019

Error Feedback Fixes SignSGD and other Gradient Compression Schemes.
Proceedings of the 36th International Conference on Machine Learning, 2019

Overcoming Multi-model Forgetting.
Proceedings of the 36th International Conference on Machine Learning, 2019

On Linear Learning with Manycore Processors.
Proceedings of the 26th IEEE International Conference on High Performance Computing, 2019

Efficient Greedy Coordinate Descent for Composite Problems.
Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, 2019

2018
Optimal Affine-Invariant Smooth Minimization Algorithms.
SIAM J. Optim., 2018

Wasserstein is all you need.
CoRR, 2018

Don't Use Large Mini-Batches, Use Local SGD.
CoRR, 2018

COLA: Communication-Efficient Decentralized Linear Learning.
CoRR, 2018

Global linear convergence of Newton's method without strong-convexity or Lipschitz gradients.
CoRR, 2018

End-to-End DNN Training with Block Floating Point Arithmetic.
CoRR, 2018

Revisiting First-Order Convex Optimization Over Linear Spaces.
CoRR, 2018

EmbedRank: Unsupervised Keyphrase Extraction using Sentence Embeddings.
CoRR, 2018

Sparsified SGD with Memory.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

COLA: Decentralized Linear Learning.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Training DNNs with Hybrid Block Floating Point.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Unsupervised Learning of Sentence Embeddings Using Compositional n-Gram Features.
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018

On Matching Pursuit and Coordinate Descent.
Proceedings of the 35th International Conference on Machine Learning, 2018

A Distributed Second-Order Algorithm You Can Trust.
Proceedings of the 35th International Conference on Machine Learning, 2018

Simple Unsupervised Keyphrase Extraction using Sentence Embeddings.
Proceedings of the 22nd Conference on Computational Natural Language Learning, 2018

Adaptive balancing of gradient and update computation times using global geometry and approximate subproblems.
Proceedings of the International Conference on Artificial Intelligence and Statistics, 2018

2017
Learning Aerial Image Segmentation From Online Maps.
IEEE Trans. Geosci. Remote. Sens., 2017

Distributed optimization with arbitrary local solvers.
Optim. Methods Softw., 2017

CoCoA: A General Framework for Communication-Efficient Distributed Optimization.
J. Mach. Learn. Res., 2017

Efficient Use of Limited-Memory Resources to Accelerate Linear Learning.
CoRR, 2017

Unsupervised robust nonparametric learning of hidden community properties.
CoRR, 2017

Leveraging Large Amounts of Weakly Supervised Data for Multi-Language Sentiment Classification.
Proceedings of the 26th International Conference on World Wide Web, 2017

Safe Adaptive Importance Sampling.
Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

Greedy Algorithms for Cone Constrained Optimization with Convergence Guarantees.
Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

Efficient Use of Limited-Memory Accelerators for Linear Learning on Heterogeneous Systems.
Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

Approximate Steepest Coordinate Descent.
Proceedings of the 34th International Conference on Machine Learning, 2017

Faster Coordinate Descent via Adaptive Importance Sampling.
Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, 2017

A Unified Optimization View on Generalized Matching Pursuit and Frank-Wolfe.
Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, 2017

Generating Steganographic Text with LSTMs.
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017

2016
Screening Rules for Convex Problems.
CoRR, 2016

Pursuits in Structured Non-Convex Matrix Factorizations.
CoRR, 2016

SwissCheese at SemEval-2016 Task 4: Sentiment Classification Using an Ensemble of Convolutional Neural Networks with Distant Supervision.
Proceedings of the 10th International Workshop on Semantic Evaluation, 2016

Primal-Dual Rates and Certificates.
Proceedings of the 33nd International Conference on Machine Learning, 2016

Audio Based Bird Species Identification using Deep Learning Techniques.
Proceedings of the Working Notes of CLEF 2016, 2016

2015
L1-Regularized Distributed Optimization: A Communication-Efficient Primal-Dual Framework.
CoRR, 2015

Swiss-Chocolate: Combining Flipout Regularization and Random Forests with Artificially Built Subsystems to Boost Text-Classification for Sentiment.
Proceedings of the 9th International Workshop on Semantic Evaluation, 2015

On the Global Linear Convergence of Frank-Wolfe Optimization Variants.
Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 2015

Adding vs. Averaging in Distributed Primal-Dual Optimization.
Proceedings of the 32nd International Conference on Machine Learning, 2015

2014
Swiss-Chocolate: Sentiment Detection using Sparse SVMs and Part-Of-Speech n-Grams.
Proceedings of the 8th International Workshop on Semantic Evaluation, 2014

Communication-Efficient Distributed Dual Coordinate Ascent.
Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, 2014

2013
An Equivalence between the Lasso and Support Vector Machines
CoRR, 2013

Block-Coordinate Frank-Wolfe Optimization for Structural SVMs.
Proceedings of the 30th International Conference on Machine Learning, 2013

Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization.
Proceedings of the 30th International Conference on Machine Learning, 2013

2012
Approximating parameterized convex optimization problems.
ACM Trans. Algorithms, 2012

An Exponential Lower Bound on the Complexity of Regularization Paths.
J. Comput. Geom., 2012

Regularization Paths with Guarantees for Convex Semidefinite Optimization.
Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, 2012

Stochastic Block-Coordinate Frank-Wolfe Optimization for Structural SVMs
CoRR, 2012

Optimizing over the Growing Spectrahedron.
Proceedings of the Algorithms - ESA 2012, 2012

2011
Sparse Convex Optimization Methods for Machine Learning.
PhD thesis, 2011

Convex Optimization without Projection Steps
CoRR, 2011

2010
A Simple Algorithm for Nuclear Norm Regularized Problems.
Proceedings of the 27th International Conference on Machine Learning (ICML-10), 2010

2009
A Combinatorial Algorithm to Compute Regularization Paths
CoRR, 2009

An Exponential Lower Bound on the Complexity of Regularization Paths
CoRR, 2009

Coresets for polytope distance.
Proceedings of the 25th ACM Symposium on Computational Geometry, 2009


  Loading...