Taiji Suzuki

Orcid: 0000-0003-3459-1016

Affiliations:
  • Tokyo Institute of Technology, Department of Mathematical and Computing Sciences


According to our database1, Taiji Suzuki authored at least 165 papers between 2005 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Transformers Provably Solve Parity Efficiently with Chain of Thought.
CoRR, 2024

On the Optimization and Generalization of Two-layer Transformers with Sign Gradient Descent.
CoRR, 2024

Unveil Benign Overfitting for Transformer in Vision: Training Dynamics, Convergence, and Generalization.
CoRR, 2024

Transformers are Minimax Optimal Nonparametric In-Context Learners.
CoRR, 2024

Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit.
CoRR, 2024

Flow matching achieves minimax optimal convergence.
CoRR, 2024

State Space Models are Comparable to Transformers in Estimating Functions with Dynamic Smoothness.
CoRR, 2024

Understanding Multimodal Contrastive Learning Through Pointwise Mutual Information.
CoRR, 2024

Dimensionality-Induced Information Loss of Outliers in Deep Neural Networks.
Proceedings of the Machine Learning and Knowledge Discovery in Databases. Research Track, 2024

Mean Field Langevin Actor-Critic: Faster Convergence and Global Optimality beyond Lazy Learning.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Mean-field Analysis on Two-layer Neural Networks from a Kernel Perspective.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

How do Transformers Perform In-Context Autoregressive Learning ?
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Mechanistic Design and Scaling of Hybrid Architectures.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

State-Free Inference of State-Space Models: The *Transfer Function* Approach.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

SILVER: Single-loop variance reduction and application to federated learning.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Transformers Learn Nonlinear Features In Context: Nonconvex Mean-field Dynamics on the Attention Landscape.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

High-Dimensional Kernel Methods under Covariate Shift: Data-Dependent Implicit Regularization.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Provably Neural Active Learning Succeeds via Prioritizing Perplexing Samples.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Optimal criterion for feature learning of two-layer linear neural network in high dimensional interpolation regime.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Improved statistical and computational complexity of the mean-field Langevin dynamics under structured data.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Minimax optimality of convolutional neural networks for infinite dimensional input-output problems and separation from kernel methods.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Symmetric Mean-field Langevin Dynamics for Distributional Minimax Problems.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Understanding Convergence and Generalization in Federated Learning through Feature Learning Theory.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Koopman-based generalization bound: New aspect for full-rank weights.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Learning sum of diverse features: computational hardness and efficient gradient-based training for ridge combinations.
Proceedings of the Thirty Seventh Annual Conference on Learning Theory, June 30, 2024

2023
Learning Green's Function Efficiently Using Low-Rank Approximations.
CoRR, 2023

Graph Neural Networks Provably Benefit from Structural Information: A Feature Learning Perspective.
CoRR, 2023

Convergence of mean-field Langevin dynamics: Time and space discretization, stochastic gradient, and variance reduction.
CoRR, 2023

Koopman-Based Bound for Generalization: New Aspect of Neural Networks Regarding Nonlinear Noise Filtering.
CoRR, 2023

Feature learning via mean-field Langevin dynamics: classifying sparse parities and beyond.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Mean-field Langevin dynamics: Time-space discretization, stochastic gradient, and variance reduction.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Gradient-Based Feature Learning under Structured Data.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Learning in the Presence of Low-dimensional Structure: A Spiked Random Matrix Perspective.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Neural Network Module Decomposition and Recomposition with Superimposed Masks.
Proceedings of the International Joint Conference on Neural Networks, 2023

Scalable Federated Learning for Clients with Different Input Image Sizes and Numbers of Output Categories.
Proceedings of the International Conference on Machine Learning and Applications, 2023

Approximation and Estimation Ability of Transformers for Sequence-to-Sequence Functions with Infinite Dimensional Input.
Proceedings of the International Conference on Machine Learning, 2023

Tight and fast generalization error bound of graph embedding in metric space.
Proceedings of the International Conference on Machine Learning, 2023

Diffusion Models are Minimax Optimal Distribution Estimators.
Proceedings of the International Conference on Machine Learning, 2023

Primal and Dual Analysis of Entropic Fictitious Play for Finite-sum Problems.
Proceedings of the International Conference on Machine Learning, 2023

DIFF2: Differential Private Optimization via Gradient Differences for Nonconvex Distributed Learning.
Proceedings of the International Conference on Machine Learning, 2023

Uniform-in-time propagation of chaos for the mean-field gradient Langevin dynamics.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Excess Risk of Two-Layer ReLU Neural Networks in Teacher-Student Settings and its Superiority to Kernel Methods.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

2022
Deep two-way matrix reordering for relational data analysis.
Neural Networks, 2022

Graph Polynomial Convolution Models for Node Classification of Non-Homophilous Graphs.
CoRR, 2022

Versatile Single-Loop Method for Gradient Estimator: First and Second Order Optimality, and its Application to Federated Learning.
CoRR, 2022

A Scaling Law for Syn2real Transfer: How Much Is Your Pre-training Effective?
Proceedings of the Machine Learning and Knowledge Discovery in Databases, 2022

Two-layer neural network on infinite dimensional data: global optimization guarantee in the mean-field regime.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Escaping Saddle Points with Bias-Variance Reduced Local Perturbed SGD for Communication Efficient Nonconvex Distributed Learning.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with Variance Reduction and its Application to Optimization.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

High-dimensional Asymptotics of Feature Learning: How One Gradient Step Improves the Representation.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

MSR-DARTS: Minimum Stable Rank of Differentiable Architecture Search.
Proceedings of the International Joint Conference on Neural Networks, 2022

Data-Parallel Momentum Diagonal Empirical Fisher (DP-MDEF):Adaptive Gradient Method is Affected by Hessian Approximation and Multi-Class Data.
Proceedings of the 21st IEEE International Conference on Machine Learning and Applications, 2022

Learnability of convolutional neural networks for infinite dimensional input via mixed and anisotropic smoothness.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Particle Stochastic Dual Coordinate Ascent: Exponential convergent algorithm for mean field neural network optimization.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Understanding the Variance Collapse of SVGD in High Dimensions.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Dimension-free convergence rates for gradient Langevin dynamics in RKHS.
Proceedings of the Conference on Learning Theory, 2-5 July 2022, London, UK., 2022

Convex Analysis of the Mean Field Langevin Dynamics.
Proceedings of the International Conference on Artificial Intelligence and Statistics, 2022

Layer-wise Adaptive Graph Convolution Networks Using Generalized Pagerank.
Proceedings of the Asian Conference on Machine Learning, 2022

2021
Sharp characterization of optimal minibatch size for stochastic finite sum convex optimization.
Knowl. Inf. Syst., 2021

Goodness-of-fit test for latent block models.
Comput. Stat. Data Anal., 2021

Neural Network Module Decomposition and Recomposition.
CoRR, 2021

A Scaling Law for Synthetic-to-Real Transfer: A Measure of Pre-Training.
CoRR, 2021

Adaptive and Interpretable Graph Convolution Networks Using Generalized Pagerank.
CoRR, 2021

Goodness-of-fit Test on the Number of Biclusters in Relational Data Matrix.
CoRR, 2021

AutoLL: Automatic Linear Layout of Graphs based on Deep Neural Network.
Proceedings of the IEEE Symposium Series on Computational Intelligence, 2021

Deep learning is adaptive to intrinsic dimensionality of model smoothness in anisotropic Besov space.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Particle Dual Averaging: Optimization of Mean Field Neural Network with Global Convergence Rate Analysis.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Differentiable Multiple Shooting Layers.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Decomposable-Net: Scalable Low-Rank Compression for Neural Networks.
Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, 2021

Quantitative Understanding of VAE as a Non-linearly Scaled Isometric Embedding.
Proceedings of the 38th International Conference on Machine Learning, 2021

Bias-Variance Reduced Local SGD for Less Heterogeneous Federated Learning.
Proceedings of the 38th International Conference on Machine Learning, 2021

On Learnability via Gradient Method for Two-Layer ReLU Neural Networks in Teacher-Student Setting.
Proceedings of the 38th International Conference on Machine Learning, 2021

Benefit of deep learning with non-convex noisy gradient descent: Provable excess risk bound and superiority to kernel methods.
Proceedings of the 9th International Conference on Learning Representations, 2021

Optimal Rates for Averaged Stochastic Gradient Descent under Neural Tangent Kernel Regime.
Proceedings of the 9th International Conference on Learning Representations, 2021

When does preconditioning help or hurt generalization?
Proceedings of the 9th International Conference on Learning Representations, 2021

Exponential Convergence Rates of Classification Errors on Learning with SGD and Random Features.
Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, 2021

Gradient Descent in RKHS with Importance Labeling.
Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, 2021

2020
On the minimax optimality and superiority of deep neural network learning over sparse parameter spaces.
Neural Networks, 2020

Independently Interpretable Lasso for Generalized Linear Models.
Neural Comput., 2020

A reproducing kernel Hilbert space approach to high dimensional partially varying coefficient model.
Comput. Stat. Data Anal., 2020

Particle Dual Averaging: Optimization of Mean Field Neural Networks with Global Convergence Rate Analysis.
CoRR, 2020

Estimation error analysis of deep learning on the regression problem on the variable exponent Besov space.
CoRR, 2020

Neural Architecture Search Using Stable Rank of Convolutional Layers.
CoRR, 2020

Selective Inference for Latent Block Models.
CoRR, 2020

Meta Cyclical Annealing Schedule: A Simple Approach to Avoiding Meta-Amortization Error.
CoRR, 2020

Generalization bound of globally optimal non-convex neural network training: Transportation map estimation by infinite dimensional Langevin dynamics.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Optimization and Generalization Analysis of Transduction through Gradient Boosting and Application to Multi-scale Graph Neural Networks.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Spectral Pruning: Compressing Deep Neural Networks via Spectral Analysis and its Generalization Error.
Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, 2020

Compression based bound for non-compressed network: unified generalization error analysis of large compressible deep neural network.
Proceedings of the 8th International Conference on Learning Representations, 2020

Graph Neural Networks Exponentially Lose Expressive Power for Node Classification.
Proceedings of the 8th International Conference on Learning Representations, 2020

Generalization of Two-layer Neural Networks: An Asymptotic Viewpoint.
Proceedings of the 8th International Conference on Learning Representations, 2020

Domain Adaptation Regularization for Spectral Pruning.
Proceedings of the 31st British Machine Vision Conference 2020, 2020

Functional Gradient Boosting for Learning Residual-like Networks with Statistical Guarantees.
Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, 2020

Understanding Generalization in Deep Learning via Tensor Methods.
Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, 2020

2019
Scalable Deep Neural Networks via Low-Rank Matrix Factorization.
CoRR, 2019

Compression based bound for non-compressed network: unified generalization error analysis of large compressible deep neural network.
CoRR, 2019

Gradient Noise Convolution (GNC): Smoothing Loss Function for Distributed Large-Batch SGD.
CoRR, 2019

Accelerated Sparsified SGD with Error Feedback.
CoRR, 2019

On Asymptotic Behaviors of Graph CNNs from Dynamical Systems Perspective.
CoRR, 2019

Refined Generalization Analysis of Gradient Descent for Over-parameterized Two-layer Neural Networks with Smooth Activations on Classification Problems.
CoRR, 2019

Approximation and non-parametric estimation of ResNet-type convolutional neural networks.
Proceedings of the 36th International Conference on Machine Learning, 2019

Adaptivity of deep ReLU network for learning in Besov and mixed smooth Besov spaces: optimal rate and curse of dimensionality.
Proceedings of the 7th International Conference on Learning Representations, 2019

Understanding the Effects of Pre-Training for Object Detectors via Eigenspectrum.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshops, 2019

Cross-Domain Recommendation via Deep Domain Adaptation.
Proceedings of the Advances in Information Retrieval, 2019

Stochastic Gradient Descent with Exponential Convergence Rates of Expected Classification Errors.
Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, 2019

2018
Generalized ridge estimator and model selection criteria in multivariate linear regression.
J. Multivar. Anal., 2018

Spectral-Pruning: Compressing deep neural network via spectral analysis.
CoRR, 2018

Sample Efficient Stochastic Gradient Iterative Hard Thresholding Method for Stochastic Sparse Linear Regression with Limited Attribute Observation.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Adam Induces Implicit Weight Sparsity in Rectifier Neural Networks.
Proceedings of the 17th IEEE International Conference on Machine Learning and Applications, 2018

Functional Gradient Boosting based on Residual Network Perception.
Proceedings of the 35th International Conference on Machine Learning, 2018

Short-term local weather forecast using dense weather station by deep neural network.
Proceedings of the IEEE International Conference on Big Data (IEEE BigData 2018), 2018

Independently Interpretable Lasso: A New Regularizer for Sparse Regression with Uncorrelated Variables.
Proceedings of the International Conference on Artificial Intelligence and Statistics, 2018

Fast generalization error bound of deep learning from a kernel perspective.
Proceedings of the International Conference on Artificial Intelligence and Statistics, 2018

Gradient Layer: Enhancing the Convergence of Adversarial Training for Generative Models.
Proceedings of the International Conference on Artificial Intelligence and Statistics, 2018

2017
Stochastic Particle Gradient Descent for Infinite Ensembles.
CoRR, 2017

Fast learning rate of deep learning via a kernel perspective.
CoRR, 2017

Doubly Accelerated Stochastic Variance Reduced Dual Averaging Method for Regularized Empirical Risk Minimization.
Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

Trimmed Density Ratio Estimation.
Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

Stochastic Difference of Convex Algorithm and its Application to Training Deep Boltzmann Machines.
Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, 2017

2016
System identification and parameter estimation in mathematical medicine: examples demonstrated for prostate cancer.
Quant. Biol., 2016

Stochastic dual averaging methods using variance reduction techniques for regularized empirical risk minimization problems.
CoRR, 2016

Minimax Optimal Alternating Minimization for Kernel Nonparametric Tensor Learning.
Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

Structure Learning of Partitioned Markov Networks.
Proceedings of the 33nd International Conference on Machine Learning, 2016

Gaussian process nonparametric tensor estimator and its minimax optimality.
Proceedings of the 33nd International Conference on Machine Learning, 2016

2015
Convergence rate of Bayesian tensor estimator and its minimax optimality.
Proceedings of the 32nd International Conference on Machine Learning, 2015

A Consistent Method for Graph Based Anomaly Localization.
Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, 2015

Support Consistency of Direct Sparse-Change Learning in Markov Networks.
Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015

2014
Direct Learning of Sparse Changes in Markov Networks by Density Ratio Estimation.
Neural Comput., 2014

Convergence rate of Bayesian tensor estimator: Optimal rate without restricted strong convexity.
CoRR, 2014

Stochastic Dual Coordinate Ascent with Alternating Direction Method of Multipliers.
Proceedings of the 31th International Conference on Machine Learning, 2014

2013
Relative Density-Ratio Estimation for Robust Distribution Comparison.
Neural Comput., 2013

Sufficient Dimension Reduction via Squared-Loss Mutual Information Estimation.
Neural Comput., 2013

Density-Difference Estimation.
Neural Comput., 2013

Computational complexity of kernel-based density-ratio estimation: a condition number analysis.
Mach. Learn., 2013

Improvement of multiple kernel learning using adaptively weighted regularization.
JSIAM Lett., 2013

Conjugate relation between loss functions and uncertainty sets in classification problems.
J. Mach. Learn. Res., 2013

Direct Divergence Approximation between Probability Distributions and Its Applications in Machine Learning.
J. Comput. Sci. Eng., 2013

Convex Tensor Decomposition via Structured Schatten Norm Regularization.
Proceedings of the Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013

Dual Averaging and Proximal Gradient Descent for Online Alternating Direction Multiplier Method.
Proceedings of the 30th International Conference on Machine Learning, 2013

2012
f-Divergence Estimation and Two-Sample Homogeneity Test Under Semiparametric Density-Ratio Models.
IEEE Trans. Inf. Theory, 2012

Statistical analysis of kernel-based least-squares density-ratio estimation.
Mach. Learn., 2012

Fast Learning Rate of Multiple Kernel Learning: Trade-Off between Sparsity and Smoothness.
Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, 2012

A Conjugate Property between Loss Functions and Uncertainty Sets in Classification Problems.
Proceedings of the COLT 2012, 2012

PAC-Bayesian Bound for Gaussian Process Regression and Multiple Kernel Additive Model.
Proceedings of the COLT 2012, 2012

Density Ratio Estimation in Machine Learning.
Cambridge University Press, ISBN: 978-0-521-19017-6, 2012

2011
Direct density-ratio estimation with dimensionality reduction via least-squares hetero-distributional subspace search.
Neural Networks, 2011

Least-squares two-sample test.
Neural Networks, 2011

Least-Squares Independent Component Analysis.
Neural Comput., 2011

SpicyMKL: a fast algorithm for Multiple Kernel Learning with thousands of kernels.
Mach. Learn., 2011

Super-Linear Convergence of Dual Augmented Lagrangian Algorithm for Sparsity Regularized Estimation.
J. Mach. Learn. Res., 2011

Least-Squares Independence Test.
IEICE Trans. Inf. Syst., 2011

Statistical Performance of Convex Tensor Decomposition.
Proceedings of the Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12-14 December 2011, 2011

Unifying Framework for Fast Learning Rate of Non-Sparse Multiple Kernel Learning.
Proceedings of the Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12-14 December 2011, 2011

2010
Conditional Density Estimation via Least-Squares Density Ratio Estimation.
Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, 2010

Least-Squares Conditional Density Estimation.
IEICE Trans. Inf. Syst., 2010

Theoretical Analysis of Density Ratio Estimation.
IEICE Trans. Fundam. Electron. Commun. Comput. Sci., 2010

Regularization Strategies and Empirical Bayesian Learning for MKL.
CoRR, 2010

Direct Density Ratio Estimation with Dimensionality Reduction.
Proceedings of the SIAM International Conference on Data Mining, 2010

A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices.
Proceedings of the 27th International Conference on Machine Learning (ICML-10), 2010

2009
A Density-ratio Framework for Statistical Data Processing.
IPSJ Trans. Comput. Vis. Appl., 2009

Mutual information estimation reveals global associations between stimuli and biological processes.
BMC Bioinform., 2009

Mutual information approximation via maximum likelihood estimation of density ratio.
Proceedings of the IEEE International Symposium on Information Theory, 2009

Estimating Squared-Loss Mutual Information for Independent Component Analysis.
Proceedings of the Independent Component Analysis and Signal Separation, 2009

2008
Approximating Mutual Information by Maximum Likelihood Density Ratio Estimation.
Proceedings of the Third Workshop on New Challenges for Feature Selection in Data Mining and Knowledge Discovery, 2008

2005
Learning to estimate user interest utilizing the variational Bayes estimator.
Proceedings of the Fifth International Conference on Intelligent Systems Design and Applications (ISDA 2005), 2005


  Loading...