JavaScript is Disabled
Your browser's JavaScript functionality is disabled. It has to be enabled to use this function of ConfTool.
Here you can find information on how to enable JavaScript
If you have any problems, please contact the organizers at karsch@saw-tagungsmanagement.com.

Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

Only Sessions at Date / Time

Agenda Overview

Session

Theory of Machine Learning: Insights from Women Researchers

Time:

Thursday, 19/Mar/2026:

10:20am - 12:20pm

Session Chair: Mahsa Taheri

Location: 0.004

ZHSG

Presentations

Effects of Depth in Deep Learning: Independence vs Recurrence

Mariia Seleznova

LMU Munich, Germany

Depth plays a central role in modern deep learning, yet its probabilistic effects are subtle and are not fully captured by classical theories that primarily focus on the infinite-width limit. This talk explores how jointly scaling depth and width shapes the signal-propagation statistics of wide neural networks under two contrasting regimes: fully connected feedforward networks with independent weights across layers, and recurrent networks with shared weights. In feedforward networks, standard infinite-width analyses allow to stabilize forward and backward variance, ensuring well-behaved initialization. However, finite-width fluctuations accumulate with depth, breaking convergence to the Neural Tangent Kernel (NTK) regime. In contrast, in linear recurrent networks, finite-width effects already destabilize the forward-propagation variance, rendering conventional initialization schemes inadequate for long input sequences. Together, these results show that depth affects feedforward and recurrent architectures in qualitatively distinct ways that cannot be captured by infinite-width approximations.

Theoretical guarantees for diffusion models — beyond log-concavity

Gitte Kremling, Francesco Iafrate, Mahsa Taheri, Johannes Lederer

University of Hamburg, Germany

Score-based generative modeling, implemented through probability flow ODEs, has shown impressive results in numerous practical settings. However, most convergence guarantees rely on restrictive regularity assumptions on the target distribution—such as strong log-concavity or bounded support. This work establishes non-asymptotic convergence bounds in the 2-Wasserstein distance for a general class of probability flow ODEs under considerably weaker assumptions: weak log-concavity and Lipschitz continuity of the score function. Our framework accommodates non-log-concave distributions, such as Gaussian mixtures, and explicitly accounts for initialization errors, score approximation errors, and effects of discretization via an exponential integrator scheme. Bridging a key theoretical challenge in diffusion-based generative modeling, our results extend convergence theory to more realistic data distributions and practical ODE solvers. We provide concrete guarantees for the efficiency and correctness of the sampling algorithm, complementing the empirical success of diffusion models with rigorous theory. Moreover, from a practical perspective, our explicit rates might be helpful in choosing hyperparameters, such as the step size in the discretization.

Random Quadratic Form on a Sphere: Synchronization by Common Noise

Anna Shalova, Maximilian Engel

University of Amsterdam, Netherlands, The

We introduce Random Quadratic Form (RQF): a stochastic differential equation which formally corresponds to the gradient flow of a random quadratic functional on a sphere. While one-point motion of the system is a Brownian motion on a sphere and thus has no preferred direction, the two-point motion exhibits nontrivial synchronizing behaviour. In this work we study synchronization of the RQF, namely we give both distributional and path-wise characterizations of the solutions by studying invariant measures and random attractors of the system.
The RQF model is motivated by the study of the role of linear layers in transformers and illustrates the synchronization by common noise phenomena arising in the simplified models of transformers. In particular, we provide an alternative (independent of self-attention) explanation of the clustering behaviour in deep transformers and show that tokens cluster even in the absence of the self-attention mechanism.

Minimax rate of distribution regression

Rong Tang, Yun Yang

Hong Kong University of Science and Technology, Hong Kong S.A.R. (China)

Distribution regression seeks to estimate the conditional distribution of a multivariate response given a continuous covariate. This approach offers a more complete characterization of dependence than traditional regression methods. Classical nonparametric techniques often assume that the conditional distribution has a well-defined density, an assumption that fails in many real-world settings. These include cases where data contain discrete elements or lie on complex low-dimensional structures within high-dimensional spaces. In this work, we establish minimax convergence rates for distribution regression under nonparametric assumptions, focusing on scenarios where both covariates and responses lie on low-dimensional manifolds. We derive lower bounds that capture the inherent difficulty of the problem and propose a new hybrid estimator that combines adversarial learning with simultaneous least squares to attain matching upper bounds. Our results reveal how the smoothness of the conditional distribution and the geometry of the underlying manifolds together determine the estimation accuracy.