Taiji Suzuki
Orcid: 0000-0003-3459-1016Affiliations:
- Tokyo Institute of Technology, Department of Mathematical and Computing Sciences
According to our database1,
Taiji Suzuki
authored at least 164 papers
between 2005 and 2024.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on orcid.org
On csauthors.net:
Bibliography
2024
On the Optimization and Generalization of Two-layer Transformers with Sign Gradient Descent.
CoRR, 2024
Unveil Benign Overfitting for Transformer in Vision: Training Dynamics, Convergence, and Generalization.
CoRR, 2024
Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit.
CoRR, 2024
State Space Models are Comparable to Transformers in Estimating Functions with Dynamic Smoothness.
CoRR, 2024
CoRR, 2024
Proceedings of the Machine Learning and Knowledge Discovery in Databases. Research Track, 2024
Mean Field Langevin Actor-Critic: Faster Convergence and Global Optimality beyond Lazy Learning.
Proceedings of the Forty-first International Conference on Machine Learning, 2024
Proceedings of the Forty-first International Conference on Machine Learning, 2024
Proceedings of the Forty-first International Conference on Machine Learning, 2024
Proceedings of the Forty-first International Conference on Machine Learning, 2024
Proceedings of the Forty-first International Conference on Machine Learning, 2024
Proceedings of the Forty-first International Conference on Machine Learning, 2024
Transformers Learn Nonlinear Features In Context: Nonconvex Mean-field Dynamics on the Attention Landscape.
Proceedings of the Forty-first International Conference on Machine Learning, 2024
High-Dimensional Kernel Methods under Covariate Shift: Data-Dependent Implicit Regularization.
Proceedings of the Forty-first International Conference on Machine Learning, 2024
Proceedings of the Forty-first International Conference on Machine Learning, 2024
Optimal criterion for feature learning of two-layer linear neural network in high dimensional interpolation regime.
Proceedings of the Twelfth International Conference on Learning Representations, 2024
Improved statistical and computational complexity of the mean-field Langevin dynamics under structured data.
Proceedings of the Twelfth International Conference on Learning Representations, 2024
Minimax optimality of convolutional neural networks for infinite dimensional input-output problems and separation from kernel methods.
Proceedings of the Twelfth International Conference on Learning Representations, 2024
Proceedings of the Twelfth International Conference on Learning Representations, 2024
Understanding Convergence and Generalization in Federated Learning through Feature Learning Theory.
Proceedings of the Twelfth International Conference on Learning Representations, 2024
Proceedings of the Twelfth International Conference on Learning Representations, 2024
Learning sum of diverse features: computational hardness and efficient gradient-based training for ridge combinations.
Proceedings of the Thirty Seventh Annual Conference on Learning Theory, June 30, 2024
2023
Graph Neural Networks Provably Benefit from Structural Information: A Feature Learning Perspective.
CoRR, 2023
Convergence of mean-field Langevin dynamics: Time and space discretization, stochastic gradient, and variance reduction.
CoRR, 2023
Koopman-Based Bound for Generalization: New Aspect of Neural Networks Regarding Nonlinear Noise Filtering.
CoRR, 2023
Feature learning via mean-field Langevin dynamics: classifying sparse parities and beyond.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023
Mean-field Langevin dynamics: Time-space discretization, stochastic gradient, and variance reduction.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023
Learning in the Presence of Low-dimensional Structure: A Spiked Random Matrix Perspective.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023
Proceedings of the International Joint Conference on Neural Networks, 2023
Scalable Federated Learning for Clients with Different Input Image Sizes and Numbers of Output Categories.
Proceedings of the International Conference on Machine Learning and Applications, 2023
Approximation and Estimation Ability of Transformers for Sequence-to-Sequence Functions with Infinite Dimensional Input.
Proceedings of the International Conference on Machine Learning, 2023
Proceedings of the International Conference on Machine Learning, 2023
Proceedings of the International Conference on Machine Learning, 2023
Proceedings of the International Conference on Machine Learning, 2023
DIFF2: Differential Private Optimization via Gradient Differences for Nonconvex Distributed Learning.
Proceedings of the International Conference on Machine Learning, 2023
Proceedings of the Eleventh International Conference on Learning Representations, 2023
Excess Risk of Two-Layer ReLU Neural Networks in Teacher-Student Settings and its Superiority to Kernel Methods.
Proceedings of the Eleventh International Conference on Learning Representations, 2023
2022
Graph Polynomial Convolution Models for Node Classification of Non-Homophilous Graphs.
CoRR, 2022
Versatile Single-Loop Method for Gradient Estimator: First and Second Order Optimality, and its Application to Federated Learning.
CoRR, 2022
Proceedings of the Machine Learning and Knowledge Discovery in Databases, 2022
Two-layer neural network on infinite dimensional data: global optimization guarantee in the mean-field regime.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022
Escaping Saddle Points with Bias-Variance Reduced Local Perturbed SGD for Communication Efficient Nonconvex Distributed Learning.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022
Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with Variance Reduction and its Application to Optimization.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022
High-dimensional Asymptotics of Feature Learning: How One Gradient Step Improves the Representation.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022
Proceedings of the International Joint Conference on Neural Networks, 2022
Data-Parallel Momentum Diagonal Empirical Fisher (DP-MDEF):Adaptive Gradient Method is Affected by Hessian Approximation and Multi-Class Data.
Proceedings of the 21st IEEE International Conference on Machine Learning and Applications, 2022
Learnability of convolutional neural networks for infinite dimensional input via mixed and anisotropic smoothness.
Proceedings of the Tenth International Conference on Learning Representations, 2022
Particle Stochastic Dual Coordinate Ascent: Exponential convergent algorithm for mean field neural network optimization.
Proceedings of the Tenth International Conference on Learning Representations, 2022
Proceedings of the Tenth International Conference on Learning Representations, 2022
Proceedings of the Conference on Learning Theory, 2-5 July 2022, London, UK., 2022
Proceedings of the International Conference on Artificial Intelligence and Statistics, 2022
Proceedings of the Asian Conference on Machine Learning, 2022
2021
Sharp characterization of optimal minibatch size for stochastic finite sum convex optimization.
Knowl. Inf. Syst., 2021
CoRR, 2021
CoRR, 2021
Proceedings of the IEEE Symposium Series on Computational Intelligence, 2021
Deep learning is adaptive to intrinsic dimensionality of model smoothness in anisotropic Besov space.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021
Particle Dual Averaging: Optimization of Mean Field Neural Network with Global Convergence Rate Analysis.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021
Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, 2021
Proceedings of the 38th International Conference on Machine Learning, 2021
Proceedings of the 38th International Conference on Machine Learning, 2021
On Learnability via Gradient Method for Two-Layer ReLU Neural Networks in Teacher-Student Setting.
Proceedings of the 38th International Conference on Machine Learning, 2021
Benefit of deep learning with non-convex noisy gradient descent: Provable excess risk bound and superiority to kernel methods.
Proceedings of the 9th International Conference on Learning Representations, 2021
Optimal Rates for Averaged Stochastic Gradient Descent under Neural Tangent Kernel Regime.
Proceedings of the 9th International Conference on Learning Representations, 2021
Proceedings of the 9th International Conference on Learning Representations, 2021
Exponential Convergence Rates of Classification Errors on Learning with SGD and Random Features.
Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, 2021
Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, 2021
2020
On the minimax optimality and superiority of deep neural network learning over sparse parameter spaces.
Neural Networks, 2020
Neural Comput., 2020
A reproducing kernel Hilbert space approach to high dimensional partially varying coefficient model.
Comput. Stat. Data Anal., 2020
Particle Dual Averaging: Optimization of Mean Field Neural Networks with Global Convergence Rate Analysis.
CoRR, 2020
Estimation error analysis of deep learning on the regression problem on the variable exponent Besov space.
CoRR, 2020
Meta Cyclical Annealing Schedule: A Simple Approach to Avoiding Meta-Amortization Error.
CoRR, 2020
Generalization bound of globally optimal non-convex neural network training: Transportation map estimation by infinite dimensional Langevin dynamics.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020
Optimization and Generalization Analysis of Transduction through Gradient Boosting and Application to Multi-scale Graph Neural Networks.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020
Spectral Pruning: Compressing Deep Neural Networks via Spectral Analysis and its Generalization Error.
Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, 2020
Compression based bound for non-compressed network: unified generalization error analysis of large compressible deep neural network.
Proceedings of the 8th International Conference on Learning Representations, 2020
Proceedings of the 8th International Conference on Learning Representations, 2020
Proceedings of the 8th International Conference on Learning Representations, 2020
Proceedings of the 31st British Machine Vision Conference 2020, 2020
Functional Gradient Boosting for Learning Residual-like Networks with Statistical Guarantees.
Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, 2020
Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, 2020
2019
Compression based bound for non-compressed network: unified generalization error analysis of large compressible deep neural network.
CoRR, 2019
Gradient Noise Convolution (GNC): Smoothing Loss Function for Distributed Large-Batch SGD.
CoRR, 2019
Refined Generalization Analysis of Gradient Descent for Over-parameterized Two-layer Neural Networks with Smooth Activations on Classification Problems.
CoRR, 2019
Approximation and non-parametric estimation of ResNet-type convolutional neural networks.
Proceedings of the 36th International Conference on Machine Learning, 2019
Adaptivity of deep ReLU network for learning in Besov and mixed smooth Besov spaces: optimal rate and curse of dimensionality.
Proceedings of the 7th International Conference on Learning Representations, 2019
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshops, 2019
Proceedings of the Advances in Information Retrieval, 2019
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Classification Errors.
Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, 2019
2018
Generalized ridge estimator and model selection criteria in multivariate linear regression.
J. Multivar. Anal., 2018
Sample Efficient Stochastic Gradient Iterative Hard Thresholding Method for Stochastic Sparse Linear Regression with Limited Attribute Observation.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018
Proceedings of the 17th IEEE International Conference on Machine Learning and Applications, 2018
Proceedings of the 35th International Conference on Machine Learning, 2018
Short-term local weather forecast using dense weather station by deep neural network.
Proceedings of the IEEE International Conference on Big Data (IEEE BigData 2018), 2018
Independently Interpretable Lasso: A New Regularizer for Sparse Regression with Uncorrelated Variables.
Proceedings of the International Conference on Artificial Intelligence and Statistics, 2018
Proceedings of the International Conference on Artificial Intelligence and Statistics, 2018
Gradient Layer: Enhancing the Convergence of Adversarial Training for Generative Models.
Proceedings of the International Conference on Artificial Intelligence and Statistics, 2018
2017
Doubly Accelerated Stochastic Variance Reduced Dual Averaging Method for Regularized Empirical Risk Minimization.
Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017
Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017
Stochastic Difference of Convex Algorithm and its Application to Training Deep Boltzmann Machines.
Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, 2017
2016
System identification and parameter estimation in mathematical medicine: examples demonstrated for prostate cancer.
Quant. Biol., 2016
Stochastic dual averaging methods using variance reduction techniques for regularized empirical risk minimization problems.
CoRR, 2016
Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016
Proceedings of the 33nd International Conference on Machine Learning, 2016
Proceedings of the 33nd International Conference on Machine Learning, 2016
2015
Proceedings of the 32nd International Conference on Machine Learning, 2015
Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, 2015
Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015
2014
Neural Comput., 2014
Convergence rate of Bayesian tensor estimator: Optimal rate without restricted strong convexity.
CoRR, 2014
Proceedings of the 31th International Conference on Machine Learning, 2014
2013
Neural Comput., 2013
Neural Comput., 2013
Computational complexity of kernel-based density-ratio estimation: a condition number analysis.
Mach. Learn., 2013
JSIAM Lett., 2013
Conjugate relation between loss functions and uncertainty sets in classification problems.
J. Mach. Learn. Res., 2013
Direct Divergence Approximation between Probability Distributions and Its Applications in Machine Learning.
J. Comput. Sci. Eng., 2013
Proceedings of the Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013
Dual Averaging and Proximal Gradient Descent for Online Alternating Direction Multiplier Method.
Proceedings of the 30th International Conference on Machine Learning, 2013
2012
f-Divergence Estimation and Two-Sample Homogeneity Test Under Semiparametric Density-Ratio Models.
IEEE Trans. Inf. Theory, 2012
Mach. Learn., 2012
Fast Learning Rate of Multiple Kernel Learning: Trade-Off between Sparsity and Smoothness.
Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, 2012
A Conjugate Property between Loss Functions and Uncertainty Sets in Classification Problems.
Proceedings of the COLT 2012, 2012
PAC-Bayesian Bound for Gaussian Process Regression and Multiple Kernel Additive Model.
Proceedings of the COLT 2012, 2012
Cambridge University Press, ISBN: 978-0-521-19017-6, 2012
2011
Direct density-ratio estimation with dimensionality reduction via least-squares hetero-distributional subspace search.
Neural Networks, 2011
Mach. Learn., 2011
Super-Linear Convergence of Dual Augmented Lagrangian Algorithm for Sparsity Regularized Estimation.
J. Mach. Learn. Res., 2011
Proceedings of the Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12-14 December 2011, 2011
Proceedings of the Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12-14 December 2011, 2011
2010
Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, 2010
IEICE Trans. Fundam. Electron. Commun. Comput. Sci., 2010
Proceedings of the SIAM International Conference on Data Mining, 2010
Proceedings of the 27th International Conference on Machine Learning (ICML-10), 2010
2009
IPSJ Trans. Comput. Vis. Appl., 2009
Mutual information estimation reveals global associations between stimuli and biological processes.
BMC Bioinform., 2009
Proceedings of the IEEE International Symposium on Information Theory, 2009
Proceedings of the Independent Component Analysis and Signal Separation, 2009
2008
Proceedings of the Third Workshop on New Challenges for Feature Selection in Data Mining and Knowledge Discovery, 2008
2005
Proceedings of the Fifth International Conference on Intelligent Systems Design and Applications (ISDA 2005), 2005