Léon Bottou

Orcid: 0000-0002-9894-8128

According to our database1, Léon Bottou authored at least 122 papers between 1989 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
MagicPIG: LSH Sampling for Efficient LLM Generation.
CoRR, 2024

Memory Mosaics.
CoRR, 2024

Fine-tuning with Very Large Dropout.
CoRR, 2024

2023
Borges and AI.
CoRR, 2023

Birth of a Transformer: A Memory Viewpoint.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Learning useful representations for shifting tasks and distributions.
Proceedings of the International Conference on Machine Learning, 2023

Model Ratatouille: Recycling Diverse Models for Out-of-Distribution Generalization.
Proceedings of the International Conference on Machine Learning, 2023

Active Self-Supervised Learning: A Few Low-Cost Relationships Are All You Need.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

2022
A Simple Convergence Proof of Adam and Adagrad.
Trans. Mach. Learn. Res., 2022

A scaling calculus for the design and initialization of ReLU networks.
Neural Comput. Appl., 2022

Recycling diverse models for out-of-distribution generalization.
CoRR, 2022

The Effects of Regularization and Data Augmentation are Class Dependent.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Rich Feature Construction for the Optimization-Generalization Dilemma.
Proceedings of the International Conference on Machine Learning, 2022

On Distributionally Robust Optimization and Data Rebalancing.
Proceedings of the International Conference on Artificial Intelligence and Statistics, 2022

On the Relation between Distributionally Robust Optimization and Data Curation (Student Abstract).
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021
An Attract-Repel Decomposition of Undirected Networks.
CoRR, 2021

Algorithmic Bias and Data Bias: Understanding the Relation between Distributionally Robust Optimization and Data Curation.
CoRR, 2021

Linear unit-tests for invariance discovery.
CoRR, 2021

2020
On the Convergence of Adam and Adagrad.
CoRR, 2020

Symplectic Recurrent Neural Networks.
Proceedings of the 8th International Conference on Learning Representations, 2020

Learning Representations Using Causal Invariance.
Proceedings of the Extraction et Gestion des Connaissances, 2020

2019
Music Source Separation in the Waveform Domain.
CoRR, 2019

Demucs: Deep Extractor for Music Sources with extra unlabeled data remixed.
CoRR, 2019

Invariant Risk Minimization.
CoRR, 2019

Scaling Laws for the Principled Design, Initialization and Preconditioning of ReLU Networks.
CoRR, 2019

Cold Case: The Lost MNIST Digits.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

On the Ineffectiveness of Variance Reduced Optimization for Deep Learning.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

AdaGrad stepsizes: sharp convergence over nonconvex landscapes.
Proceedings of the 36th International Conference on Machine Learning, 2019

First-Order Adversarial Vulnerability of Neural Networks and Input Dimension.
Proceedings of the 36th International Conference on Machine Learning, 2019

2018
Optimization Methods for Large-Scale Machine Learning.
SIAM Rev., 2018

An efficient distributed learning algorithm based on effective local functional approximations.
J. Mach. Learn. Res., 2018

Controlling Covariate Shift using Equilibrium Normalization of Weights.
CoRR, 2018

AdaGrad stepsizes: Sharp convergence over nonconvex landscapes, from any initialization.
CoRR, 2018

WNGrad: Learn the Learning Rate in Gradient Descent.
CoRR, 2018

Adversarial Vulnerability of Neural Networks Increases With Input Dimension.
CoRR, 2018

SING: Symbol-to-Instrument Neural Generator.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Empirical Analysis of the Hessian of Over-Parametrized Neural Networks.
Proceedings of the 6th International Conference on Learning Representations, 2018

2017
Diagonal Rescaling For Neural Networks.
CoRR, 2017

Wasserstein GAN.
CoRR, 2017

Wasserstein Generative Adversarial Networks.
Proceedings of the 34th International Conference on Machine Learning, 2017

Towards Principled Methods for Training Generative Adversarial Networks.
Proceedings of the 5th International Conference on Learning Representations, 2017

Discovering Causal Signals in Images.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

Geometrical Insights for Implicit Generative Modeling.
Proceedings of the Braverman Readings in Machine Learning. Key Ideas from Inception to Current State, 2017

2016
Singularity of the Hessian in Deep Learning.
CoRR, 2016

Unifying distillation and privileged information.
Proceedings of the 4th International Conference on Learning Representations, 2016

No Regret Bound for Extreme Bandits.
Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, 2016

2015
A Lower Bound for the Optimization of Finite Sums.
Proceedings of the 32nd International Conference on Machine Learning, 2015

Is object localization for free? - Weakly-supervised learning with convolutional neural networks.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015

How big data changes statistical machine learning.
Proceedings of the 2015 IEEE International Conference on Big Data (IEEE BigData 2015), Santa Clara, CA, USA, October 29, 2015

2014
From machine learning to machine reasoning - An essay.
Mach. Learn., 2014

Introduction to the special issue on learning semantics.
Mach. Learn., 2014

ICE: Enabling Non-Experts to Build Models Interactively for Large-Scale Lopsided Problems.
CoRR, 2014

Learning Image Embeddings using Convolutional Neural Networks for Improved Multi-Modal Semantics.
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, 2014

Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks.
Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014

2013
Counterfactual reasoning and learning systems: the example of computational advertising.
J. Mach. Learn. Res., 2013

A Parallel SGD method with Strong Convergence.
CoRR, 2013

A Functional Approximation Based Distributed Learning Algorithm.
CoRR, 2013

Para-active learning.
CoRR, 2013

In Hindsight: Doklady Akademii Nauk SSSR, 181(4), 1968.
Proceedings of the Empirical Inference - Festschrift in Honor of Vladimir N. Vapnik, 2013

2012
Efficient BackProp.
Proceedings of the Neural Networks: Tricks of the Trade - Second Edition, 2012

Stochastic Gradient Descent Tricks.
Proceedings of the Neural Networks: Tricks of the Trade - Second Edition, 2012

Counterfactual Reasoning and Learning Systems
CoRR, 2012

2011
Batch and online learning algorithms for nonconvex neyman-pearson classification.
ACM Trans. Intell. Syst. Technol., 2011

Nonconvex Online Support Vector Machines.
IEEE Trans. Pattern Anal. Mach. Intell., 2011

Natural Language Processing (Almost) from Scratch.
J. Mach. Learn. Res., 2011

From Machine Learning to Machine Reasoning
CoRR, 2011

2010
L'apprentissage statistique à grande échelle.
Monde des Util. Anal. Données, 2010

Guarantees for Approximate Incremental SVMs.
Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, 2010

Erratum: SGDQN is Less Careful than Expected.
J. Mach. Learn. Res., 2010

Large-Scale Machine Learning with Stochastic Gradient Descent.
Proceedings of the 19th International Conference on Computational Statistics, 2010

2009
SGD-QN: Careful Quasi-Newton Stochastic Gradient Descent.
J. Mach. Learn. Res., 2009

2008
Sequence Labelling SVMs Trained in One Pass.
Proceedings of the Machine Learning and Knowledge Discovery in Databases, 2008

2007
The Need for Open Source Software in Machine Learning.
J. Mach. Learn. Res., 2007

The Tradeoffs of Large Scale Learning.
Proceedings of the Advances in Neural Information Processing Systems 20, 2007

Learning using Large Datasets.
Proceedings of the Mining Massive Data Sets for Security, 2007

Solving multiclass support vector machines with LaRank.
Proceedings of the Machine Learning, 2007

Learning on the border: active learning in imbalanced data classification.
Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management, 2007

2006
Large Scale Transductive SVMs.
J. Mach. Learn. Res., 2006

Inference with the Universum.
Proceedings of the Machine Learning, 2006

Trading convexity for scalability.
Proceedings of the Machine Learning, 2006

2005
Toward Automatic Phenotyping of Developing Embryos From Videos.
IEEE Trans. Image Process., 2005

Fast Kernel Classifiers with Online and Active Learning.
J. Mach. Learn. Res., 2005

The Huller: A Simple and Efficient Online SVM.
Proceedings of the Machine Learning: ECML 2005, 2005

Online (and Offline) on an Even Tighter Budget.
Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics, 2005

2004
Parallel Support Vector Machines: The Cascade SVM.
Proceedings of the Advances in Neural Information Processing Systems 17 [Neural Information Processing Systems, 2004

Breaking SVM Complexity with Cross-Training.
Proceedings of the Advances in Neural Information Processing Systems 17 [Neural Information Processing Systems, 2004

Learning Methods for Generic Object Recognition with Invariance to Pose and Lighting.
Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2004), with CD-ROM, 27 June, 2004

2003
Scalable video coding with managed drift.
IEEE Trans. Circuits Syst. Video Technol., 2003

Geometric Clustering Using the Information Bottleneck Method.
Proceedings of the Advances in Neural Information Processing Systems 16 [Neural Information Processing Systems, 2003

Large Scale Online Learning.
Proceedings of the Advances in Neural Information Processing Systems 16 [Neural Information Processing Systems, 2003

Stochastic Learning.
Proceedings of the Advanced Lectures on Machine Learning, 2003

2002
Electronic Document Publishing Using DjVu.
Proceedings of the Document Analysis Systems V, 5th International Workshop, 2002

2001
DCT-based scalable video coding with drift.
Proceedings of the 2001 International Conference on Image Processing, 2001

Efficient Conversion of Digital Documents to Multilayer Raster Formats.
Proceedings of the 6th International Conference on Document Analysis and Recognition (ICDAR 2001), 2001

Managing Drift in DCT-Based Scalable Video Coding.
Proceedings of the Data Compression Conference, 2001

Masked Wavelets: Applications to Image Compression.
Proceedings of the Data Compression Conference, 2001

2000
Vicinal Risk Minimization.
Proceedings of the Advances in Neural Information Processing Systems 13, 2000

1999
Object Recognition with Gradient-Based Learning.
Proceedings of the Shape, Contour and Grouping in Computer Vision, 1999

Color Documents on the Web with DJVU.
Proceedings of the 1999 International Conference on Image Processing, 1999

DjVu: Analyzing and Compressing Scanned Documents for Internet Distribution.
Proceedings of the Fifth International Conference on Document Analysis and Recognition, 1999

1998
Image and video coding-emerging standards and beyond.
IEEE Trans. Circuits Syst. Video Technol., 1998

Gradient-based learning applied to document recognition.
Proc. IEEE, 1998

High quality document image compression with "DjVu".
J. Electronic Imaging, 1998

Boxlets: A Fast Convolution Algorithm for Signal Processing and Neural Networks.
Proceedings of the Advances in Neural Information Processing Systems 11, [NIPS Conference, Denver, Colorado, USA, November 30, 1998

DjVu: a Compression Method for Distributing Scanned Documents in Color over the Internet.
Proceedings of the 6th Color and Imaging Conference, 1998

Lossy Compression of Partially Masked Still Images.
Proceedings of the Data Compression Conference, 1998

The Z-Coder Adaptive Binary Coder.
Proceedings of the Data Compression Conference, 1998

Browsing through High Quality Document Images with DjVu.
Proceedings of the IEEE Forum on Research and Technology Advances in Digital Libraries, 1998

1997
Reading checks with multilayer graph transformer networks.
Proceedings of the 1997 IEEE International Conference on Acoustics, 1997

Global Training of Document Processing Systems Using Graph Transformer Networks.
Proceedings of the 1997 Conference on Computer Vision and Pattern Recognition (CVPR '97), 1997

1996
Effiicient BackProp.
Proceedings of the Neural Networks: Tricks of the Trade, 1996

1994
Convergence Properties of the K-Means Algorithms.
Proceedings of the Advances in Neural Information Processing Systems 7, 1994

Comparison of classifier methods: a case study in handwritten digit recognition.
Proceedings of the 12th IAPR International Conference on Pattern Recognition, 1994

1993
Local Algorithms for Pattern Recognition and Dependencies Estimation.
Neural Comput., 1993

Signature Verification Using A "Siamese" Time Delay Neural Network.
Int. J. Pattern Recognit. Artif. Intell., 1993

1992
Local Learning Algorithms.
Neural Comput., 1992

Computer aided cleaning of large databases for character recognition.
Proceedings of the 11th IAPR International Conference on Pattern Recognition, 1992

Capacity control in linear classifiers for pattern recognition.
Proceedings of the 11th IAPR International Conference on Pattern Recognition, 1992

1991
Structural Risk Minimization for Character Recognition.
Proceedings of the Advances in Neural Information Processing Systems 4, 1991

1990
Speaker-independent isolated digit recognition: Multilayer perceptrons vs. Dynamic time warping.
Neural Networks, 1990

A Framework for the Cooperation of Learning Algorithms.
Proceedings of the Advances in Neural Information Processing Systems 3, 1990

1989
Experiments with time delay networks and dynamic time warping for speaker independent isolated digits recognition.
Proceedings of the First European Conference on Speech Communication and Technology, 1989


  Loading...