Home / Ricerca / Eventi
 18 Dicembre, 2025  12:00
MOX Seminar

Edge of Stochastic Stability: SGD does not train neural networks as you expect

evento
 Pierfrancesco Beneventano, Center for Biological, Computational Learning, MIT
 Aula Saleri
Abstract

Recent findings demonstrate that when training neural networks using full-batch gradient descent with step size eta, the largest eigenvalue lambda of the full-batch Hessian consistently stabilizes around 2/eta. These results have significant implications for convergence and generalization. This, however, is not the case for mini-batch optimization algorithms, limiting the broader applicability of the consequences of these findings. We show that mini-batch Stochastic Gradient Descent (SGD) trains in a different regime, which we term Edge of Stochastic Stability (EoSS). In this regime, what stabilizes at 2/eta is Batch Sharpness: the expected directional curvature of mini-batch Hessians along their corresponding stochastic gradients. As a consequence, lambda---which is generally smaller than Batch Sharpness---is suppressed, aligning with the long-standing empirical observation that smaller batches and larger step sizes favor flatter minima. We further discuss implications for m! athematic al modeling of SGD trajectories.

Contatto:
paolo.zunino@polimi.it