JavaScript is Disabled
Your browser's JavaScript functionality is disabled. It has to be enabled to use this function of ConfTool.
Here you can find information on how to enable JavaScript
If you have any problems, please contact the organizers at karsch@saw-tagungsmanagement.com.

Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

Filter by Track or Type of Session

Only Sessions at Date / Time

Agenda Overview

Date: Wednesday, 18/Mar/2026

8:50am - 9:00am

Opening
Location: 0.004

9:00am - 10:00am

Plenary Lecture 1
Location: 0.004

A unified theory of order flow, market impact and volatility

Mathieu Rosenbaum

Ecole Polytechnique, France

We propose a microstructural model for the order flow in financial markets that distinguishes between core orders and reaction flow, both modeled as Hawkes processes. This model has a natural scaling limit that reconciles a number of salient empirical properties: persistent signed order flow, rough trading volume and volatility, and power-law market impact. In our framework, all these quantities are pinned down by a single statistic H_0, which measures the persistence of the core flow. Specifically, the signed flow converges to the sum of a fractional process with Hurst index H_0 and a martingale, while the limiting traded volume is a rough process with Hurst index H_0-1/2. No-arbitrage constraints imply that volatility is rough, with Hurst parameter 2H_0-3/2, and that the price impact of trades follows a power law with exponent 2-2H_0. The analysis of signed order flow data yields an estimate H_0 close to 3/4. This is not only consistent with the square-root law of market impact, but also turns out to match estimates for the roughness of traded volumes and volatilities remarkably well.

10:00am - 10:40am

Coffee break 1

10:40am - 12:10pm

Statistics in natural sciences and technology
Location: 0.001
Session Chair: Gaby Schneider
Session Chair: Ansgar Steland

Self-Normalization for CUSUM-based Change Detection in Locally Stationary Time Series

Florian Heinrichs

FH Aachen, Germany

A novel self-normalization procedure for CUSUM-based change detection in the mean of a locally stationary time series is introduced. Classical self-normalization relies on the factorization of a constant long-run variance and a stochastic factor. In this case, the CUSUM statistic can be divided by another statistic proportional to the long-run variance, so that the latter cancels. Thereby, a tedious estimation of the long-run variance can be avoided. Under local stationarity, the partial sum process converges to $int_0^t sigma(x) dBx$ and no such factorization is possible. To overcome this obstacle, a self-normalized test statistic is constructed from a carefully designed bivariate partial-sum process. Weak convergence of the process implies that the resulting self-normalized test attains asymptotic level α under the null hypothesis of no change, while being consistent against a broad class of alternatives. Extensive simulations demonstrate better finite-sample properties compared to existing methods. Applications to real data illustrate the method’s practical effectiveness.

Prior shift estimation for positive unlabeled data through the lens of kernel embedding

Jan Mielniczuk¹, Paweł Teisseyre², Wojciech Rejchel³

¹Warsaw University of Technology, Poland; ²Institute of Computer Science; ³Nicolas Copernicus University

We study estimation of a class prior for unlabeled target samples which possibly differs from that of source population. Moreover, it is assumed that the source data is partially observable: only samples from the positive class and from the whole population are available (PU learning scenario). We introduce a novel direct estimator of the class prior which avoids estimation of posterior probabilities in both populations and has a simple geometric interpretation. It is based on a distribution matching technique together with kernel embedding in Reproducing Kernel Hilbert Space and is obtained as an explicit solution to an optimisation task. We establish its asymptotic consistency as well as an explicit non-asymptotic bound on its deviation from the unknown prior, which is calculable in practice. We study finite sample behaviour for synthetic and real data and show that the proposal works consistently on par or better than its competitors.

Asymptotic studies of adapted threshold detectors based on density processes

Duohong Sun, Ansgar Steland

RWTH Aachen University, Germany

Control statistics are widely used to monitor the quality of processes in various fields such as industry, healthcare, and machine learning. These statistics give an alarm when observed data exceed a threshold, traditionally set as a constant value to maintain a desired false alarm rate. Now we want to focus on a new setting: When monitoring a sequence of observations, there may be additional information that potentially affects the law of the observations, and we would like to change the design by using adapted thresholds, which are functions of the additional information.
So far, we have considered several classes of adaptive threshold functions for continuous observations, including constant, proportional, and dominated classes. Our focus is on the proportional class threshold, which adjusts the sensitivity automatically based on the external information, making it particularly effective in detecting rare but critical cases. We derive an estimator for this threshold function using kernel density estimation and establish its consistency, asymptotic normality and some further asymptotic properties. Additionally, we demonstrate the application of this method for real car engine data, where we monitor the engine revolutions per minute (RPM) using lubrication pressure as external information to assess the performance of this method.

10:40am - 12:10pm

Discrete time series
Location: 0.002
Session Chair: Christian H. Weiß

Overview of the STINARMA Class of Models and its STINAR and STINMA Subclasses

Ana Martins¹, Manuel G. Scotto², Christian H. Weiss³, Sónia Gouveia¹

¹Institute of Electronics and Informatics Engineering of Aveiro (IEETA) and Department of Electronics, Telecommunications and Informatics (DETI), University of Aveiro, Aveiro, Portugal; Intelligent Systems Associate Laboratory (LASI), University of Aveiro, Portugal.; ²Center for Computational and Stochastic Mathematics (CEMAT), Department of Mathematics, IST, University of Lisbon, Lisbon, Portugal; ³Department of Mathematics and Statistics, Helmut Schmidt University, Hamburg, Germany

Spatio-temporal count data arise in many applied fields, where observations are collected over time across multiple spatial units. In these settings, it is crucial to jointly capture temporal and spatial dynamics. The spatio-temporal integer-valued autoregressive and moving average (STINARMA) class of models provides a flexible framework to address these challenges within the class of integer-valued processes. This work presents an overview of the STINARMA class of models, together with its main subclasses, those of the STINAR and STINMA models.The STINARMA can be viewed as the natural spatio-temporal extension of univariate INARMA models. Moreover, they are the integer counterpart of the continuous STARMA models, which is achieved by replacing the multiplication operator with the matrix binomial thinning operator and by considering component-wise independent discrete innovations.
Similarly to the STARMA model, the spatial dependence in the STINARMA is incorporated through a pre-specified weight matrix that reflects the underlying structure of the spatial system.

The general class of STINARMA models is introduced, followed by a discussion of its autoregressive and moving average subclasses. Key probabilistic properties are briefly presented through first- and second-order moments. Estimation approaches based on the method of moments, conditional least squares and conditional maximum likelihood are also outlined. The practical relevance of the STINARMA class is illustrated using spatio-temporal health data from Portugal and Germany, and its performance is compared with multivariate models that do not explicitly account for spatial dependence.

References

Martins, A., Scotto, M. G., Weiß, C. H., Gouveia, S. Space-time integer-valued ARMA modelling for time series of counts, Electronic Journal of Statistics, 17 (2), (2023), 3472-3511.

Franke, J. Subba Rao, T. Multivariate First-Order Integer-Valued Autoregressions, Technical Report, University of Kaiserslaute, (1993).

Pfeifer P. E., Deutsch S. J., A Three-Stage Iterative Procedure for Space-Time Modeling, Technometrics, 22 (1), (1980), 35-47.

Steutel, F. W., Van Harn, K., Discrete Analogues of Self-Decomposability and Stability, The Annals of Probability, 7 (5), (1979), 893-899

Integer-valued random field models

Angelika Silbernagel, Christian Weiß

Helmut-Schmidt-Universität, Germany

Ghodsi et al. (2012) have introduced the first-order integer-valued autoregressive model for count random fields as a planar analogue of the classical INAR(1) model, designed for count data observed on a regular lattice. We extend this framework to higher-order dependence structures and derive key stochastic properties of the resulting models.

Building on this approach, we further propose two additional count random field models: the CINAR random field model and the INMA random field model. For each model, we investigate fundamental properties and provide a comparative analysis highlighting their respective strengths and limitations.

Ghodsi, A., Shitan, M., & Bakouch, H. S. (2012). A first-order spatial integer-valued autoregressive SINAR (1, 1) model. Communications in Statistics-Theory and Methods, 41(15), 2773-2787.

Influence network reconstruction from discrete time-series of count data modelled by multidimensional Hawkes processes

Naratip Santitissadeekorn

University of Surrey, United Kingdom

Identifying key influencers from time series data without a known prior network structure is a challenging problem in various applications, from crime analysis to social media. While much work has focused on event-based time series (timestamp) data, fewer methods address count data, where event counts are recorded in fixed intervals. We develop network inference methods for both batched and sequential count data. Here the strong network connection represents the key influences among the nodes. We introduce an ensemble-based algorithm, rooted in the expectation-maximization (EM) framework, and demonstrate its utility to identify node dynamics and connections through a discrete-time Cox or Hawkes process. For the linear multidimensional Hawkes model, we employ a minimization-majorization (MM) approach, allowing for parallelized inference of networks.

For sequential inference, we use a second-order approximation of the Bayesian inference problem. Under certain assumptions, a rank-1 update for the covariance matrix reduces computational costs. We validate our methods on synthetic data and real-world datasets, including email communications within European academic communities. Our approach effectively reconstructs underlying networks, accounting for both excitation and diffusion influences. This work advances network reconstruction from count data in real-world scenarios.

10:40am - 12:10pm

Multivariate Statistics and Copulas
Location: 0.004
Session Chair: Sebastian Fuchs

Measures and Models of Non-Monotonic Dependence

Alexander McNeil¹, Johanna Neslehova², Andrew Smith³

¹University of York, United Kingdom; ²McGill University, Montreal, Canada; ³University College Dublin, Ireland

We propose a margin-free measure of bivariate association generalizing Spearman’s rho to the case of non- monotonic dependence that is defined in terms of two square integrable functions on the unit interval. We investigate properties of generalized Spearman correlation when the functions are piecewise continuous and strictly monotonic, with particular focus on the special cases where the functions are drawn from orthonormal bases defined by Legendre polynomials and cosine functions. For continuous random variables, generalized Spearman correlation is treated as a copula-based measure and shown to depend on a pair of uniform-distribution-preserving (udp) transformations determined by the underlying functions. We derive bounds for generalized Spearman correlation and we use a novel technique that we refer to as stochastic inversion of udp transformations to construct singular copulas that attain the bounds and parametric copulas with densities that interpolate between the bounds and model different degrees of non-monotonic dependence.

We also propose sample analogues of generalized Spearman correlation and investigate their asymptotic and small-sample properties. Potential applications of the theory are demonstrated including: exploratory analyses of the dependence structures of datasets and their symmetries; elicitation of functions maximizing generalized Spearman correlation via expansions in orthonormal basis functions; and construction of tractable probability densities to model a wide variety of non-monotonic dependencies.

Multivariate tail dependence: further insights with an application to the Spanish banking sector

Fabrizio Durante¹, César García-Gómez², Ana Pérez², Mercedes Prieto-Alaiz²

¹Università del Salento, Italy; ²Universidad de Valladolid, Spain

Extending bivariate dependence concepts to higher dimensions is a challenging but essential task for a comprehensive understanding of multivariate dependence. Moreover, measuring overall dependence based on averages across the full domain of the joint distribution may fail to discern changes in dependence across different segments of the distribution, especially in the tails. In order to incorporate these features, we present the multivariate tail concentration function (TCF) as a graphical tool to assess both global and tail dependence. We show that this tool allows to represent multivariate dependence in a 2D plot regardless of the number of dimensions, it quantifies both lower and upper tail dependence at a finite scale, and it relates to multivariate Blomqvist’s beta. We propose to estimate the TCF non-parametrically using two methods and we compare their finite sample performance through a simulation study. To illustrate its practical application, we use the TCF to evaluate co-movements among the six Spanish banks included in the IBEX35 stock index.

Multivariate Kendall regression coefficients

Eckhard Liebscher

University of Applied Sciences Merseburg, Germany

In multivariate regression analysis, the multiple linear correlation coefficient is a commonly used association measure. This measure focuses on a linear relationship between a response variable and predictor variables. When moving away from the linearity of the functional relationship, then we arrive at Kendall's tau and multivariate versions, among others. In an earlier paper by the author (2021), the Kendall regression coefficient was introduced. Here, we extend the coefficient to vector responses Y and discuss properties of it. The coefficient we introduce describes to what degree the response variable Y can be approximated by a monotonous function of the regressors. These regressors are combined in a random vector. One advantage of this approach is that the association measure is based only on the copula (does not depend on marginal distributions), and is hence robust against outliers.
In the talk, we provide a series of interesting properties of the new Kendall coefficient, which are based on those described in Schmid et al. (2010). It turns out that it makes sense to consider Kendall coefficients w.r.t. the various directions of the regressors. Furthermore, we introduce an estimator for the Kendall coefficient, for which a central limit theorem is provided. It is demonstrated that the multivariate extension of Kendall's tau coefficients can be employed in multiple non-linear regression analysis and in dependence modelling, even in the situation of higher dimensions.
Literature:
Schmid, F.; Schmidt, R.; Blumentritt, T.; Gaißer, S.; Ruppert, M. (2010). Copula-based measures of multivariate association. in F. Durante, W. Härdle, P. Jaworski, T. Rychlik (eds.) Copula Theory and Its Applications. Springer Berlin, 2010.
Liebscher, E. (2021). Kendall regression coefficient. Computational Statistics and Data Analysis 157. 107140

10:40am - 12:10pm

Data Science Perspectives from Industry
Location: 1.002
Session Chair: Rainer Göb

Deploying Deep Learning for Real-Time Optical Sorting: A Case Study in Hazelnut Quality Control

Kristina Krebs¹, Thomas Christ¹, Christian Grotheer¹, Adelbert Demar², Marco Seith²

¹prognostica GmbH; ²IFSYS Integrated Feeding Systems GmbH

Optical sorting is widely used in industrial quality control, yet conventional rule-based vision systems often struggle when quality cues are subtle, heterogeneous, or hard to formalize. We present an industry data science case study on deploying deep learning for real-time optical sorting of hazelnuts, driven by the practical need to grade product quality from fine-grained appearance characteristics under strict throughput and latency constraints.

The talk traces the path from an early prototype to an industrialized system that has been transferred into a market-ready product and is operated in practice. We summarize the end-to-end solution: multi-camera image acquisition, a supervised learning pipeline built on a representative labeled dataset, domain-specific preprocessing and targeted data augmentation, and a neural image classifier designed for on-premise inference.

We emphasize industrial aspects that proved central for making the system operational: formalizing expert grading into maintainable classes, managing imbalance and borderline cases during data preparation, data labeling and training, and setting decision thresholds based on acceptance criteria. We then cover deployment realities for industrial environments, e.g. latency, throughput, robustness, and the interface between the ML component and machine control.

Finally, we describe how the solution was productized and extended beyond hazelnuts to additional crops, enabling new application scenarios and market opportunities for the customer. We conclude with practical considerations for lifecycle management and periodic re-calibration.

Bridging the Gap: Operational Realities and Emerging Trends in Supply Chain Forecasting

Thomas Christ

prognostica GmbH, Germany

While forecasting remains a cornerstone of strategic decision-making, its industrial application involves challenges that extend beyond model accuracy. In the context of supply chain management, a forecast must not only be precise but also interpretable and actionable within specific operational constraints.

This talk provides insights into how practitioners bridge the gap between theoretical models and business requirements, focusing on the following key areas:

Operational Integration & Evaluation: Moving from pure demand forecasting to integrated planning. We will discuss the necessity of using business-centric metrics to evaluate model performance in a way that aligns with supply chain realities.
Foundation Models for Time Series: An assessment of current trends in global models and their potential to complement or redefine classical forecasting methods in an industrial setting.
Contextualization via Generative AI: A practical look at using Large Language Models (LLMs) and advanced search capabilities to identify external drivers and "weak signals" (e.g., geopolitical events or market shifts). We explore how to translate unstructured information into structured context for analytical models.
Agentic Workflows: The emerging role of autonomous agents in designing and modeling forecasting pipelines, and the potential of conversational interfaces to transform human-machine interaction.

The presentation demonstrates that the value of Generative AI in forecasting lies not only in potential accuracy gains but also in its capacity to handle unstructured context and significantly improve interactability with the forecasts.

By highlighting these real-world requirements and current technical frontiers, the talk seeks to provide practical impulses and identify open questions for further academic research in the field of applied AI and time series analysis.

10:40am - 12:10pm

High-dimensional statistics and learning
Location: 1.012
Session Chair: Martin Wahl

Supervised classification for Ornstein-Uhlenbeck diffusions with separation condition

Shanshan Meng

Humboldt University of Berlin, Germany

We study binary supervised classification based on repeated independent observations of continuous sample paths. Our focus is a diffusion classification model in which the features follow an Ornstein-Uhlenbeck process with class-dependent drifts. We consider plug-in classifiers constructed from drift estimators and analyze the performance via the excess risk.

Under a separation condition on the drift parameters, we establish upper bounds of the excess risk, which are explicitly parametrized by the separation distance quantifying the difficulty of the problem. Specifically, when the drift distance is bounded away from zero, the plug-in classifiers achieve a fast convergence rate of order n^-1 (up to logarithmic factors) in the constant drift scenario. Furthermore, we discuss extensions of this framework to time-inhomogeneous drift functions.

The theoretical approach utilizes the Wiener chaos representation and spectral theory to characterize the log-likelihood ratio as a quadratic form of Gaussian random variables, enabling a precise analysis of margin properties and concentration results. This extends the fast-rate results from classification problems with linear and Gaussian white noise models to dynamical diffusion systems with Gaussian structure under separation conditions.

Asymptotic Bounds and Online Algorithms for Average-Case Matrix Discrepancy

Dmitriy Kunisky¹, Timm Oertel², Nicola Wengiel², Peiyuan Zhang³

¹Johns Hopkins University, USA; ²FAU Erlangen-Nürnberg, Germany; ³Yale University, USA

We study the matrix discrepancy problem in the average-case setting. Given a sequence of $m times m$ symmetric matrices $A_1,ldots,A_n$, its discrepancy is defined as the minimal spectral norm over all signed sums $sum_{i=1}^n x_iA_i$ with $x_1,ldots,x_n in {pm1}$. Our contributions are twofold. First, we study the asymptotic discrepancy of random matrices. When the matrices belong to the Gaussian orthogonal ensemble, we provide a sharp characterization of the asymptotic discrepancy and show that the limiting distribution is concentrated around $Theta(sqrt{nm}4^{-(1 + o(1))n/m^2})$, under the assumption $m^2 ll n/log{n}$. We observe that the trivial bound $O(sqrt{nm})$ cannot be improved when $n ll m^2$ and show that this phenomenon occurs for a broad class of random matrices. In the case $n = Omega(m^2)$, we provide a matching upper bound. Second, we analyse the matrix hyperbolic cosine algorithm, an online algorithm for matrix discrepancy minimization due to Zouzias~(2011), in the average-case setting. We show that the algorithm achieves with high probability a discrepancy of $O(mlog{m})$ for a broad class of random matrices, including Wigner matrices with entries satisfying a hypercontractive inequality and Gaussian Wishart matrices.

Asymptotic confidence bands for centered purely random forests

Mathias Trabs

Karlsruhe Institute of Technology, Germany

In this talk we will study asymptotic uniform confidence bands for centered purely random forests in a multivariate nonparametric regression setting. The most popular example in this class of random forests, namely the uniformly centered purely random forests, is well known to suffer from suboptimal rates. Therefore, a new type of purely random forests, called the Ehrenfest centered purely random forests, is proposed which achieves minimax optimal rates. Our main confidence band theorem applies to both random forests. The proof is based on an interpretation of random forests as generalized U-Statistics together with a Gaussian approximation of the supremum of empirical processes.
The talk is based on joint work with Natalie Neumeyer and Jan Rabe.

12:10pm - 1:30pm

Lunch break 1

1:30pm - 3:30pm

New developments in nonparametric classification and estimation based on the nearest neighbor method
Location: 0.001
Session Chair: Hajo Holzmann

Chatterjee's graph correlation

Fang Han

University of Washington, United States of America

This talk will survey recent advances in understanding Chatterjee's nearest neighbor graph-based correlation coefficient. I will introduce, for the first time, a comprehensive theoretical framework for statistical inference based on this coefficient. The framework involves results on asymptotic normality, bias correction, and the (in)consistency of bootstrap methods.

Nearest Neighbor Estimates for Dependent Data

Miroslaw Pawlak

University of Manitoba, Canada

This paper considers the nonparametric estimation problem for a class of nonlinear time series
systems that are characterized by their block-oriented structure.
This consists of series and parallel connections where a nonlinear memoryless subsystem
is imbedded in the linear dynamical model.
The input-output training data generated from the system are dependent
and they do not reveal the strong mixing property. The latter is a common assumption required
in the nonparametric estimation theory for dependent data.
The nonlinear part of the system is recovered with the weighted $k$-nearest neighbor regression estimate. The textit{a
priori} information is nonparametric, both the nonlinear characteristic and the
impulse response of the linear part are completely unknown and can be of any form.
Local and
global properties of the estimate are examined. Whatever the probability
density of the input signal, the estimate converges at every continuity
point of the characteristic as well as in the global sense.
The convergence rate is evaluated and it is found to be
independent of the shape of the input density.
These results allow us to find a set
of optimal weights that further improve the accuracy of our
estimation algorithm. Particularly, the weights that take into account the
correlation structure of the observed data are considered.
This reveals the advantage of the weighted $k$-nearest neighbor
estimate over the commonly used kernel estimates.
The obtained results are also extended to other types of nonparametric
time series models that are driven by spike trains data.

Nearest Neighbor matching: from Average Treatment Effects to Transfer Learning

Lionel Truquet

ENSAI-CREST, France

Estimating some mathematical expectations from partially observed data and in particular missing outcomes is a central problem encountered in numerous fields such as transfer learning, counterfactual analysis or causal inference. Matching estimators, estimators based on k-nearest neighbors, are widely used in this context. Under suitable regularity conditions, one can show that the variance of such estimators can converge to zero at a parametric rate. However their bias can have a slower rate when the dimension of the covariates is larger than 2. This makes analysis of this bias particularly important. In this paper, we provide higher order properties of the bias. In contrast to the existing literature on this topic, we do not assume that the support of the target distribution of the covariates is strictly included in that of the source, and we discuss two geometric conditions on the support that prevent boundary bias issues. We show that these conditions are much more general than the usual convex support assumption, leading to an improvement of existing results. Furthermore, we show that the matching estimator studied by Abadie and Imbens (2006) for the average treatment effect can be asymptotically efficient when the dimension of the covariates is less than 4, a result only known in dimension 1.

Multivariate Root-N-Consistent Smoothing Parameter Free Matching Estimators and Estimators of Inverse Density Weighted Expectations

Hajo Holzmann², Alexander Meister¹

¹Universität Rostock, Germany; ²Philipps-Universität Marburg, Germany

Expected values weighted by the inverse of a multivariate density or, equivalently, Lebesgue integrals of regression functions with multivariate regressors occur in various areas of applications, including estimating average treatment effects, nonparametric estimators in random coefficient regression models or deconvolution estimators in Berkson errors-in-variables models. The frequently used nearest-neighbor and matching estimators suffer from bias problems in multiple dimensions. By using polynomial least squares fits on each cell of the Kth-order Voronoi tessellation for sufficiently large K, we develop novel modifications of nearest-neighbor and matching estimators which again converge at the parametric root-n-rate under mild smoothness assumptions on the unknown regression function and without any smoothness conditions on the unknown density of the covariates. We stress that in contrast to competing methods for correcting for the bias of matching estimators, our estimators do not involve nonparametric function estimators and in particular do not rely on sample-size dependent smoothing parameters. We complement the upper bounds with appropriate lower bounds derived from information-theoretic arguments, which show that some smoothness of the regression function is indeed required to achieve the parametric rate. Simulations illustrate the practical feasibility of the proposed methods.

1:30pm - 3:30pm

Discrete time series
Location: 0.002
Session Chair: Christian H. Weiß

Asymptotic Inference for Rank Correlations

Marc-Oliver Pohle^1,2, Jan-Lukas Wermuth³, Christian H. Weiß⁴

¹Karlsruhe Institute of Technology; ²Heidelberg Institute for Theoretical Studies; ³Goethe University Frankfurt; ⁴Helmut-Schmidt-University

Kendall's tau and Spearman's rho are widely used tools for measuring dependence. Surprisingly, when it comes to asymptotic inference for these rank correlations, some fundamental results and methods have not yet been developed, in particular for discrete random variables and in the time series case, and concerning variance estimation in general. Consequently, asymptotic confidence intervals are not available. We provide a comprehensive treatment of asymptotic inference for classical rank correlations, including Kendall's tau, Spearman's rho, Goodman-Kruskal's gamma, Kendall's tau-b, and grade correlation. We derive asymptotic distributions for both iid and time series data, resorting to asymptotic results for U-statistics, and introduce consistent variance estimators. This enables the construction of confidence intervals and tests, generalizes classical results for continuous random variables and leads to corrected versions of widely used tests of independence. We analyze the finite-sample performance of our variance estimators, confidence intervals, and tests in simulations and illustrate their use in case studies.

Inference for INAR Models with Structural Breaks: Classical and Bayesian Approaches

Isabel Pereira¹, Magda Monteiro², Maniha Zafar³

¹Universidade de Aveiro; CIDMA, Portugal; ²ESTGA, Universidade de Aveiro; CIDMA, Portugal; ³Universidade de Aveiro, Portugal

Integer-valued autoregressive (INAR) models provide a flexible framework for modeling count time series through thinning operators that emulate autoregressive dynamics while respecting the discrete nature of the data. These models naturally accommodate both equidispersion and overdispersion, features commonly observed in count-valued processes.

This paper investigates INAR models with structural breaks, with particular emphasis on the detection and estimation of parameter changes over time—an issue of critical importance in dynamic settings such as epidemics, policy interventions, and other regime-shifting phenomena. We consider both classical and Bayesian inferential approaches for identifying change points and estimating model parameters.

The classical framework is based on maximum likelihood estimation, where structural changes are detected using a CUSUM-based procedure, followed by a focused grid search within a window centered around the candidate breakpoint. The Bayesian approach employs advanced Markov Chain Monte Carlo (MCMC) techniques, incorporating hidden Markov chains to model latent regimes and infer structural shifts probabilistically.

A comprehensive simulation study is conducted under a variety of scenarios, including differing regime lengths and sample size proportions, and distributional characteristics. Finally, the proposed methodologies are illustrated through an application to real-world health indicator data, demonstrating their practical effectiveness in capturing complex dynamics and structural changes in count time series.

Model diagnostics and semi-parametric inference for count time series

Carsten Jentsch¹, Maxime Faymonville², Christian Weiß³, Efstathios Paparoditis⁴

¹TU Dortmund University, Germany; ²TU Dortmund University, Germany; ³Helmut-Schmidt-University Hamburg, Germany; ⁴Cyprus Academy of Sciences, Letters, and Arts, Cyprus

For modeling the serial dependence in discrete-valued time series, various approaches have been proposed in the literature. In particular, models based on a recursive, autoregressive-type structure such as the integer-valued autoregressive (INAR) models for count time series are very popular in practice. While their estimation typically relies on purely parametric approaches that impose restrictive assumptions on the innovation distribution, we consider semi-parametric estimation techniques that jointly estimate the autoregressive coefficients and the innovation distribution without requiring parametric specification. Building on this, we propose a general semi-parametric bootstrap procedure for INAR models and prove its consistency for general classes of statistics that are functions of the estimated model coefficients and the estimated innovation distribution. This semi-parametric bootstrap approach can be leveraged for various statistical tasks such as goodness-of-fit testing, predictive inference, and dispersion analysis. Additionally, we introduce novel semi-parametric goodness-of-fit tests tailored for the INAR model class. Relying on the INAR-specific shape of the joint probability generating function, our approach allows for model validation of INAR models without specifying the parametric family of the innovation distribution. We derive the limiting null distribution of our proposed test statistics, prove consistency under fixed alternatives and discuss its asymptotic behavior under local alternatives. Moreover, when it comes to predictive inference for discrete-valued time series, this task cannot be implemented through the construction of prediction intervals as they are generally not able to retain a desired coverage level neither in finite samples nor asymptotically. To address this problem, we propose to reverse the construction principle by considering preselected sets of interest and estimating the corresponding predictive probability. The accuracy of this prediction is then evaluated by quantifying the uncertainty associated with the estimation of these predictive probabilities. In this context, we consider parametric and non-parametric approaches and derive asymptotic as well as bootstrap theory, which also covers the practically important case of model misspecification.

Nonparametric symmetry tests for integer-valued time series

Michael H. Neumann

Friedrich-Schiller-Universität Jena, Germany

During the last years, there have been many proposals for modelling integer-valued time series. We propose tests of hypotheses related to certain symmetry and antisymmetry properties. For example, we consider the hypotheses that the conditional mean is an odd function or that the conditional variance is an even function. The proposed test statistics are nonparametric and have non-standard limit distributions. We show that the wild bootstrap offers a simple method of generating asymptotically correct critical values. The talk is based on joint work with Paul Doukhan and Christian Weiß.

1:30pm - 3:30pm

Multivariate Statistics and Copulas
Location: 0.004
Session Chair: Eckhard Liebscher

Characterization of multi-way binary tables with uniform margins and fixed correlations

Roberto Fontana¹, Elisa Perrone², Fabio Rapallo³

¹Politecnico di Torino, Italy; ²Eindhoven University of Technology, the Netherlands; ³Università di Genova, Italy

In many applications involving binary variables, only pairwise dependence measures, such as correlations, are available. However, for multi-way tables involving more than two variables, these quantities do not uniquely determine the joint distribution, but instead define a family of admissible distributions that share the same pairwise dependence while potentially differing in higher-order interactions. In this talk, we introduce a geometric framework to describe the entire feasible set of such joint distributions with uniform margins. We show that this admissible set forms a convex polytope, analyze its symmetry properties, and characterize its extreme rays. These extremal distributions provide fundamental insights into how higher-order dependence structures may vary while preserving the prescribed pairwise information. Unlike traditional methods for table generation, which return a single table, our framework makes it possible to explore and understand the full admissible space of dependence structures, enabling more flexible choices for modeling and simulation. We illustrate the usefulness of our theoretical results through examples and a real case study on rater agreement.

Copula robustness in quantitative risk management

Henryk Zähle

Saarland University, Germany

Characteristics of d-variate risks, such as downside risk measures of aggregate positions or optimal portfolio values, play a central role in financial and actuarial applications. This talk addresses the question of when such characteristics are robust to (small) misspecifications in the copula.

DIRECTIONAL FOOTRULE-COEFFICIENTS

Enrique de Amo Artero, David García Fernández, Manuel Úbeda Flores

University of Almería, Spain

Measures of association based on ranks, such as Spearman’s footrule[1], play a central role in multivariate statistics due to their robustness and invariance properties. However, classical versions of these coefficients are often unable to capture directional dependence structures that arise in high-dimensional settings. Motivated by this limitation and by the newly defined coefficients described subsequently[2] [3], we introduce a novel family of directional Spearman’s footrule coefficients designed to quantify multivariate dependence along prescribed directions in the unit d-dimensional hypercube.

The proposed coefficients are formulated within the framework of copula theory, which allows for a clear separation between marginal behavior and the underlying dependence structure. Our construction extends the classical Spearman’s footrule by incorporating directional information, enabling the detection of dependence patterns that remain undetected by standard measures. We establish a general definition for arbitrary dimensions and directions and investigate its main theoretical properties. In particular, we analyze their behavior under independence and maximal positive dependence, their relation to stochastic orders, as well as their relationship with marginal distributions and lower-dimensional structures. These properties are shown to be consistent with those of the classical footrule coefficient.

To facilitate practical implementation, we also introduce nonparametric estimators based on ranks. These estimators are easy to compute and suitable for multivariate data. Their asymptotic behavior is discussed, highlighting consistency and stability properties analogous to those of existing rank-based dependence measures.

Several illustrative examples are provided to demonstrate the usefulness of the proposed coefficients. Explicit expressions are derived for well-known families of d-copulas, including the Farlie–Gumbel–Morgenstern and Cuadras–Augé, allowing for a detailed analysis of how directional dependence varies with model parameters. These examples show that the proposed coefficients are able to distinguish different directional dependence patterns even when classical global measures coincide.

Overall, this work provides a new tool for directional dependence analysis in multivariate settings, complementing existing rank-based measures and offering a finer understanding of complex dependence structures with applications in finance, reliability, and multivariate risk analysis.

[1] Spearman, C. (1906). ‘Footrule’ for measuring correlation. Brithis Journal of Psychology, 2, 89-108.

[2] Úbeda-Flores, M. (2004). Multivariate versions of Blomqvist’s beta and Spearman’s footrule. Ann. Inst. Statist. Math., 57(4), 781-788.

[3] Decancq, K., Pérez, A., Prieto-Alaiz, M. (2025). Multivariate Dependence Based on Diagonal Sections: Spearman’s Footrule and Related Measures. In: Steland, A., Rafajłowicz, E., Parolya, N. (eds) Stochastic Models, Statistics and Their Applications. SMSA 2024. Springer Proceedings in Mathematics & Statistics, vol 499. Springer, Cham.

Estimating Portfolio Risk with Product Copulas: A GARCH-EVT Approach Applied to Financial Data

Marcel Steinborn, Eckhard Liebscher

Hochschule Merseburg, Germany

This talk introduces a sophisticated GARCH-EVT-Copula framework designed
to enhance portfolio risk estimation for multi-asset portfolios. We specically
address the limitations of the traditional Markowitz mean-variance model, which
often fails to account for extreme market events and tail dependencies. Our
approach integrates a Generalized Autoregressive Conditional Heteroscedasticity
(GARCH) model to capture time-varying volatility with Extreme Value Theory
(EVT) for modeling heavy-tailed distributions.

A key innovation presented is the application of product copulas to model the
intricate, non-linear dependencies among four diverse nancial indices: the MSCI
World IMI, ICE BofAML Global Government Index, HFRX Global Hedge Fund
Index, and Swiss Re Global Unhedged CAT Bond Performance Index. Product
copulas prove particularly eective in capturing asymmetric dependencies and
non-normal characteristics, leading to signicantly more accurate Value-at-Risk
(VaR) estimations.

Our empirical analysis demonstrates the superior performance of the product
copula framework compared to traditional models, particularly regarding its resilience
during the 2008 nancial crisis and its ecacy in equal risk contribution
portfolios.

1:30pm - 3:30pm

Statistics in sports
Location: 1.002
Session Chair: Jakob Söhl

The Best of Both Worlds: Predicting Coverage Schemes in American Football with Supervised and Unsupervised Learning

Rouven Michels¹, Robert Bajons², Jan-Ole Koslik³

¹TU Dortmund; ²WU Vienna; ³Bielefeld University

Choosing between man and zone coverage is one of the most critical strategic decisions a defensive coordinator must make before each offensive play in American football. In simple terms, in man coverage each defender is assigned to guard a specific offensive player, while zone coverage requires defenders to protect designated areas of the field. This choice fundamentally shapes how the defense reacts to offensive formations and movements. Traditionally, experienced offensive coordinators and quarterbacks rely on visual cues, such as defenders’ alignment or pre-snap motion, to infer these defensive schemes. However, with the increasing availability of high-resolution player tracking data, statistical models can now uncover such tactical patterns quantitatively rather than relying solely on expert intuition.

In this project, we first employ an elastic net and an XGBoost classifier to predict whether a defense is in man or zone coverage based on all players’ positions once both teams are set before the snap. The models thus captures spatial configurations that often reveal underlying defensive intentions. In a second step, we incorporate dynamic information from pre-snap player movements. Finally, in a third step, we employ features derived from a hidden Markov model (HMM). Specifically, we use an HMM to represent defenders’ movement trajectories over time. The hidden states correspond to potential offensive players being covered by each defender. From the decoded state sequences, we extract summary statistics, such as the number of state (defender) switches. Including these HMM-based features in the aforementioned models significantly enhances the model’s predictive accuracies. Beyond the pure classification performance, our approach also enables deeper tactical analyses. For instance, it allows us to explore how pre-snap motion helps offenses identify defensive coverages more effectively. Comparing these pre- and post-motion probabilities provides insight into how well offensive movements reveal defensive strategies.

Overall, this framework demonstrates how modern machine learning techniques in combination with a statistical model can provide quantitative insights into complex team sports tactics. While developed within an American football context, the methodology may generalize to other sports where spatial positioning and interaction dynamics play similarly crucial roles.

Modelling momentum in tennis: A latent-state approach to point outcomes and rally lengths

Sina Mews¹, Jan-Ole Koslik¹, Rouven Michels², Christian Deutscher¹

¹Bielefeld University, Germany; ²TU Dortmund, Germany

Tennis matches are often characterised by momentum shifts – i.e., changes in match dynamics over time – marked by transitions between phases where either player 1 or player 2 dominates. While dominance is clearly reflected in a player’s point wins, rally lengths provide additional valuable information for modelling momentum; short rallies suggest strong momentum, whereas long rallies and point losses indicate pressure. To effectively model momentum shifts, we hence propose considering both the outcomes of the points and the rally lengths. These sequentially observed outcomes reflect the current dynamics of the match (i.e., the level of pressure a player exerts on their opponent), which we regard as an unobserved state process. Thus, we employ a latent-state approach to investigate these momentum shifts. Specifically, we model the outcomes of server wins and rally lengths jointly using Markov-modulated marked Poisson processes (MMMPPs). This flexible framework allows us to relate the events (server wins or loses the point) and the event times (rally length) to an underlying latent state process, modelled as a continuous-time Markov chain. Its states determine the distribution of the outcomes and can be interpreted as proxies for the players’ momentum. For data from all Grand Slam tournaments from 2016 to 2024, we identify momentum shifts within tennis matches using MMMPPs with two latent states, accounting for player- and match-specific effects such as player rankings and court surfaces.

The Accuracy–Complexity Trade-Off in the Expected Threat model for Football

Koen van Arem¹, Jakob Söhl¹, Mirjam Bruinsma², Geurt Jongbloed¹

¹TU Delft, The Netherlands; ²AFC Ajax, The Netherlands

The Expected Threat model is a possession value model in football (soccer) with a Markov chain structure that allows for interpretation and visualization. To create a Markov chain, the pitch is discretized into different Markov states. However, selecting the right discretization of the pitch is still a challenging design choice. A model with more game states can better distinguish between different scenarios, but has less samples per state when estimating the Markov chain. This creates a trade-off between the model complexity in terms of the number of Markov states and the accuracy of the probability estimates. Theoretical analysis of the model gives error bounds, but interpretation of the results indicates that these might be on the conservative side. Simulations provide a more accurate characterization of the model’s error, which is indeed more optimistic than the theoretical bound. Finally, these insights are converted into a practical rule of thumb to help practitioners choose the right balance between the number of Markov states and accuracy of the probability estimates of the Expected Threat model.

1:30pm - 3:30pm

Computational Biostatistics
Location: 1.012
Session Chair: Dennis Dobler

Computational and Biostatistical Challenges in Polygenic Score Modelling and Gene–Environment Integration

Christian Staerk^1,2

¹IUF - Leibniz Research Institute for Environmental Medicine; ²TU Dortmund University

Polygenic scores (PGS) quantify genetic predisposition to complex traits and clinical outcomes based on genotype data. This talk addresses recent computational and biostatistical challenges in PGS modelling, including their integration with environmental risk factors. First, training PGS models on high-dimensional and large-scale genotype data with hundreds of thousands of genetic variants and individuals requires scalable yet interpretable statistical learning methods. Second, the transferability of PGS models to diverse populations with different ancestries remains limited, as models are typically trained on cohorts predominantly of European ancestry. Third, the evaluation of predictive performance is complicated by different and sometimes conflicting definitions of the commonly used R-squared measure on test data. To address these challenges, scalable statistical learning approaches for PGS modelling based on individual-level genotype data are presented, including boosting and anchor regression. Finally, open problems and directions for future research are highlighted, with the aim of improving robustness, interpretability and gene–environment integration in personalized medicine.

Robust Feature Selection for High-Dimensional Mixtures of Cox Models

Dayasri Ravi

University of Augsburg, Germany

Time-to-event analysis is fundamental for studying patient survival in modern biomedical research, particularly in the presence of high-dimensional covariate information. When survival data are collected over long time horizons, population heterogeneity naturally arises due to evolving clinical practices and patient characteristics. Mixtures of Cox proportional hazards models offer an effective way to account for such heterogeneity by modeling latent subpopulations with distinct risk profiles.

In high-dimensional settings, feature selection is crucial for improving model interpretability and predictive performance. This talk presents a robust feature selection approach for mixtures of Cox models based on a combined ℓ1–ℓ2 penalty, which encourages sparsity while stabilizing estimation across mixture components. The resulting optimization problem is non-smooth and challenging to solve within mixture models. We address this challenge by developing an efficient Expectation–Maximization (EM) algorithm that effectively handles the non-smooth penalty structure.

Empirical results demonstrate that the proposed method improves patient-specific survival time prediction across heterogeneous populations while achieving stable and interpretable feature selection.

A regularized Cox model for selecting interactions and time-varying covariate effects

Alina Schenk¹, Anna-Lena Künster^1,2, Matthias Schmid¹

¹Institute for Medical Biometry, Informatics and Epidemiology, Medical Faculty, University of Bonn; ²Department of Mathematics, Informatics and Technology, Koblenz University of Applied Sciences, RheinAhrCampus Remagen,

The Cox proportional hazards model is a widely used method for analyzing clinical time-to-event data. In its standard form, the Cox model assumes the covariate effects on the hazard function to be constant over time. However, in many clinical settings, covariate effects may vary with time, and covariate interactions may significantly influence survival. Selecting interactions and time-varying effects within the Cox model framework may be challenging and often requires manual pre-screening followed by model selection steps. These selection steps are often carried out through automated stepwise procedures, which, however, can be unstable or even infeasible—particularly if a large number of potential effects is considered.

We introduce a linked-shrinkage adaptive elastic net procedure for selecting two-way interactions and time-varying effects in Cox regression models. The proposed approach integrates an adaptive elastic net with penalty weights derived from an initial ridge regression that includes main effects only. Time-varying effects are modeled as piecewise constant functions. Penalty weights for interactions and time-varying terms are specified using a linked-shrinkage strategy based on the pre-estimated main effects, such that these effects are penalized more strongly than the main effects. We assessed the proposed modeling approach through a simulation study based on Weibull-distributed survival times, incorporating various structures of time-varying covariate effects. Using a simulation study, we compared the proposed method with several established approaches, including the classical elastic net extended to the Cox regression model. Model performance was assessed in terms of the mean squared error (MSE) of the estimated survival probabilities and the accuracy of variable selection.

The proposed method reliably identified true time-varying and two-way interaction effects. The true positive rates ranged between 80%-90% depending on the scenario. Compared to standard regularized Cox regression models, the proposed method yielded better performance in terms of MSE and the ability to select informative main/interaction/timevarying effects in a more precise way. Furthermore, we illustrate the proposed approach by analyzing real-world data from the National Cancer Institute Surveillance, Epidemiology, and End Results (SEER) program.

By addressing the limitations of manual covariate selection and stepwise procedures, the proposed method extends penalized estimation techniques to Cox regression with time-varying coefficients. Further, it facilitates the simultaneous selection of relevant interaction terms and time-varying covariate effects.

Inferring Individual-Level Cell Type-Specific Transcriptomic Profiles from Bulk RNA-Seq Using a Bayesian Hierarchical Model

Kai Kang

University of North Carolina Wilmington, United States of America

The high cost of single-cell sequencing often compels large cohort studies to rely on bulk RNA-seq, which presents challenges in resolving tissue heterogeneity and understanding the roles of individual cell types. In bulk RNA-seq analysis, deconvolution is essential for extracting cell-type-specific information. Most tools focus on estimating cell type proportions, but only a few aim to infer cell-type-specific gene expression profiles (ctsGEPs). Among these, very few estimate ctsGEPs at the individual sample level. The technical challenges of this task highlight the need for more advanced approaches capable of generating accurate individual-level ctsGEP estimates. Such estimates are critical for downstream analyses, including cell-type-specific differential expression and expression quantitative trait locus studies.

To address this, we developed a novel deconvolution method to estimate individual-level ctsGEPs and cell type proportions simultaneously from bulk RNA-seq data. Using a hierarchical Bayesian framework, our method captures the stochastic variation of ctsGEPs across individuals. Parameters are estimated via Markov Chain Monte Carlo (MCMC), with hyperparameters optimized for robust inference.

We benchmarked our method using 48 in silico mixtures generated from single-cell RNA-seq data of human brain donors. The results demonstrated strong performance, with correlations of ~0.9 for ctsGEP estimates and >0.6 for gene expression variation across samples for ~80% of genes. Our method outperformed existing tools, reducing Root-Mean-Square Errors by ~16%. Additionally, we showcased its application in cell-type-specific differential expression analysis.

Our method provides a powerful tool to computationally unravel cell-type-specific expression profiles in bulk RNA-seq data, enabling advances in understanding cellular heterogeneity in biological and pathological contexts.

3:30pm - 4:00pm

Coffee break 2

4:00pm - 5:00pm

Plenary Lecture 2
Location: 0.004

Statistical Optimal Transport in Action: From Theory to Applications

Axel Munk

University of Göttingen, Germany

While optimal transport has been a long standing mathematical, physical and economic concept for more than two centuries, recent developments in statistics, optimization and machine learning suggests its use as a tool for modern data analysis. Extensions, such as Gromov-Wasserstein transport respect the inner metric structure of data sets and have been proven to be useful for image registration and object matching. In this talk we introduce some basic statistical methods related to optimal transport and illustrate these with examples from cell biology and biometric identification.

5:05pm - 6:35pm

Applied Econometrics
Location: 0.001
Session Chair: Yannick Hoga

The impact of central bank backstops on sovereign risk premia: Evidence from the ECB's Transmission Protection Instrument

Bernd Schwaab¹, Maria Viola²

¹European Central Bank, Germany; ²European Central Bank, Germany

We study the effects of central bank backstops on sovereign risk premia using the Eurosystem’s Transmission Protection Instrument (TPI) announced in July 2022. We develop a nonlinear non-Gaussian state-space model that decomposes euro area sovereign yields into expected short rates, a common term premium, and country-specific default, redenomination, liquidity, and convenience premia. Structural shocks are identified through heteroscedasticity and fat tails. Using euro area data from 2015 to 2025, we extract latent risk premia and assess the impact of the TPI using event-time and differences-in-differences designs. The results show that the TPI primarily increased the convenience value of sovereign bonds and reduced the volatility of a subset of shocks, while leaving other risk premia largely unchanged. Lower convenience-adjusted yields partially dampened the transmission of policy rate hikes to medium-term sovereign yields.

Forecast Combination for Tail Risk: Virtues of the Harmonic Mean

Roxana Halbleib, Winfried Pohlmeier, Ekaterina Kazak

University of Freiburg, Germany

This paper examines the properties of the loss functions used for forecasting Value-at-Risk (VaR) and

Expected Shortfall (ES). We show that the weighted arithmetic average commonly used to construct a

forecast combination utilises the convexity property of the loss function only in case of Value-at-Risk. This

paper introduces a novel forecasting combination approach for Expected Shortfall, which is constructed using

weighted harmonic means. We show that only in this case the insurance against model risk is guaranteed.

To construct combination weights consistent with this aggregation result, we propose a novel forecast

combination for tail risk measures based on the Bagged Pretested Forecast Combination (BPFC) algorithm.

The combination weights assigned to candidate models are determined by their predictive performance

using the Model Confidence Set (MCS) test. Unlike many traditional combination methods, BPFC adapts

to changing market conditions while simultaneously facilitating model selection and improving forecast

stability. We evaluate the performance of forecasting combinations for VaR and ES within the framework of

consistent loss functions, highlighting the role of convexity in performance improvements. Our results show

that the advantages of combining forecasts are especially evident when there is substantial disagreement

among candidate models, a situation that commonly arises during turbulent financial periods.

To empirically validate our approach, we apply it to a dataset of 90 stocks spanning various market

capitalizations and covering periods of severe financial stress, including the Global Financial Crisis and the

COVID-19 pandemic. The results illustrate the ability of BPFC to dynamically select and combine the

most effective models from a pool of over 60 candidates, continuously adjusting weights based on model’s

forecasting performance and evolving market conditions.

Systemic Risk Surveillance

Timo Dimitriadis¹, Yannick Hoga²

¹Goethe University Frankfurt, Germany; ²University Duisburg-Essen, Germany

Following several episodes of financial market turmoil in recent decades, changes in systemic risk have drawn growing attention. Therefore, we propose surveillance schemes for systemic risk, which allow to detect misspecified systemic risk forecasts in an “on-line” fashion. This enables daily monitoring of the forecasts while controlling for the accumulation of false test rejections. Such online schemes are vital in taking timely countermeasures to avoid financial distress. Our monitoring procedures allow multiple series at once to be monitored, thus increasing the likelihood and the speed at which early signs of trouble may be picked up. The tests hold size by construction, such that the null of correct systemic risk assessments is only rejected during the monitoring period with (at most) a pre-specified probability. Monte Carlo simulations illustrate the good finite-sample properties of our procedures. An empirical application to US banks during multiple crises demonstrates the usefulness of our surveillance schemes for both regulators and financial institutions.

5:05pm - 6:35pm

Statistical Inverse Problems
Location: 0.002
Session Chair: Frank Werner

Linear methods for non-linear inverse problems

Geerten Koers¹, Botond Szabo², Aad van der Vaart¹

¹Delft University of Technology; ²Bocconi University, Italy

We propose a novel Bayesian linearization approach for non-linear PDE constrained inverse problems. We split the non-linear inverse problem into a linear statistical and a non-linear analytic component. We derive optimal posterior contraction rates, reliable uncertainty quantification, data driven tuning and scalable approximations. The general approaches is applied to specific examples, including Darcy flow and heat equation with absorption term.

Learning with Heavy-tailes

Nicole Mücke

TU Braunschweig, Germany

We examine the performance of ridge regression in reproducing kernel Hilbert spaces in the presence of noise that exhibits a finite number of higher moments. We establish excess risk bounds consisting of subgaussian and polynomial terms based on the well known integral operator framework. The dominant subgaussian component allows to achieve convergence rates that have previously only been derived under subexponential noise - a prevalent assumption in related work from the last two decades. These rates are optimal under standard eigenvalue decay conditions, demonstrating the asymptotic robustness of regularized least squares against heavy-tailed noise. Our derivations are based on a Fuk-Nagaev inequality for Hilbert-space valued random variables.

Comparing regularisation paths of (conjugate) gradient estimators in ridge regression

Laura Hucker¹, Markus Reiß¹, Thomas Stark²

¹Humboldt-Universität zu Berlin, Germany; ²Aarhus Universitet, Denmark

We consider standard gradient descent, gradient flow and conjugate gradients as iterative algorithms for minimising a penalised ridge criterion in linear regression. While it is well known that conjugate gradients exhibit fast numerical convergence, the statistical properties of their iterates are more difficult to assess due to inherent non-linearities and dependencies. On the other hand, standard gradient flow is a linear method with well-known regularising properties when stopped early. By an explicit non-standard error decomposition we are able to bound the prediction error for conjugate gradient iterates by a corresponding prediction error of gradient flow at transformed iteration indices. This way, the risk along the entire regularisation path of conjugate gradient iterations can be compared to that for regularisation paths of standard linear methods like gradient flow and ridge regression. In particular, the oracle conjugate gradient iterate shares the optimality properties of the gradient flow and ridge regression oracles up to a constant factor. Numerical examples show the similarity of the regularisation paths in practice.

5:05pm - 6:35pm

Inference in Wasserstein Spaces and Optimal Transport
Location: 0.004
Session Chair: Ansgar Steland

Statistical Aspects of Optimal Transport: Regularization, Estimation, and Applications

Shayan Hundrieser

University of Twente, The Netherlands

In recent years, statistical methodology based on optimal transport (OT) witnessed a considerable increase in practical and theoretical interest. A central reason for this trend is the ability of optimal transport to efficiently compare data in a geometrically meaningful way. This development was further amplified by computational advances spurred by the introduction of entropy regularized optimal transport (EOT). In applications, the OT or EOT cost are often estimated through an empirical plug-in approach, raising statistical questions about the performance and uncertainty of these estimators. This talk will survey recent theoretical and methodological insights to these topics and discusses future opportunities.

This talk is based on joint work with Thomas Staudt, Marcel Klatt, Michel Groppe, Alberto-Gonzáles-Sanz, Gilles Mordant, Christoph Weitkamp, and Axel Munk.

On the cut-offs of Optimal Transport based statistical tests

Natalia Kravtsova

University of British Columbia, Canada

Tests for equality of distributions based on Optimal Transport functionals are often referred to as being not distribution free: asymptotic laws for tests statistics depend on the underlying true distributions, and this dependence seems unavoidable. Here we show that these tests are ``almost" distribution free, in a sense that there exist cut-offs independent of the true distributions that result in tests with given level of significance. These cut-offs are easy to compute and may serve as a rule-of-thumb-type heuristics, making Optimal Transport based tests more accessible for practical applications.

Detecting change-points of univariate time series using the empirical Wasserstein distance

Anton Imm¹, Fabian Mies², Ansgar Steland¹

¹RWTH Aachen University, Germany; ²Delft University of Technology, Netherlands

In this talk we are interested in detecting change-points of univariate nonstationary time series in a nonparametric setting. We introduce statistics based on the Wasserstein distance between local empirical distribution functions of the time series which are suitable to detect change-points. The one-dimensional Wasserstein distance is characterized by the sequential quantile process, and we show that this weakly converges to a Gaussian limit. Due to the nonlinearity of the quantile process, difficulties arise from the localization. A new Bahadur representation result is needed to address this, which allows us to consider the asymptotic behavior of the empirical process instead of the quantile process. The proof of this requires further study of the modulus of continuity of the empirical process. As the limit distributions of the test statistics depend on the unknown underlying distributions, a Gaussian multiplier bootstrap scheme is introduced. Lastly, a simulation study shows how well the significance level is retained under the null hypothesis of no change, and an outlook towards the power of the tests will be given.

5:05pm - 6:35pm

Advances in Latent Variable Models
Location: 1.002
Session Chair: Daniele Tancini

A multilevel discrete latent variable model for joint modeling of response accuracy and times

Luca Brusa¹, Francesco Bartolucci², Fulvia Pennoni¹

¹University of Milano-Bicocca, Italy; ²University of Perugia, Italy

In recent years, the widespread adoption of computer-based testing has produced large volumes of data on examinee behavior. Beyond traditional binary indicators of correct responses, these datasets now typically include item-level response times, providing a richer and more informative perspective on the performance.
The joint modeling of response accuracy and response times has therefore attracted increasing attention, as the interaction between these two aspects can provide a clearer picture of the underlying phenomena.
Furthermore, such data commonly exhibit a hierarchical structure, where, for example, students are nested within classes or schools. Individuals in the same cluster may share unobserved characteristics, inducing heterogeneity at both cluster and individual levels. Consequently, appropriately accounting for this nested structure is essential to ensure valid and unbiased inference.
We propose a multilevel latent class response time model formulated using a normal-ogive parameterization—similar to the Rasch model—for the conditional probability of a correct response, while the conditional distribution of response times given the latent variables is assumed to be log-normal. To account for the multilevel structure of the data we assume discrete latent variables at both cluster and individual levels, and we adopt a multinomial logit parameterization to include covariates at both levels. Inference is carried out via a maximum likelihood approach using the Expectation–Maximization algorithm.
We analyze data on dichotomous responses to a mathematical test and related item response times administered to a representative sample of Grade-10 students during the 2017-2018 school year by the Italian National Institute for the Evaluation of the Educational System. Through the proposed model, estimated with covariates at both student and class levels, we identify five distinct ability subpopulations of students characterized by different response times, with a nonlinear association between ability and speed. Low prior mathematics and anxiety emerge as significant covariates among others, being associated with both a lower probability of correct responses and longer response times. Anxiety is particularly influential on the performance of students with average ability.
We also propose a model formulation within a hierarchical Bayesian framework. In this context, estimation is performed via a Markov chain Monte Carlo (MCMC) algorithm based on a data augmentation scheme. We aim to compare the two models and the related estimation methods in terms of computational efficiency, accuracy, and applied results.

The Bradley–Terry Stochastic Block Model

Lapo Santi, Nial Friel

University College Dublin, Ireland

The Bradley-Terry model is widely used for the analysis of pairwise comparison data and, in essence, produces a ranking of the items under comparison. We embed the Bradley-Terry model within a stochastic block model, allowing items to cluster. The resulting Bradley-Terry SBM (BT-SBM) ranks clusters so that items within a cluster share the same tied rank. We develop a fully Bayesian specification in which all quantities-the number of blocks, their strengths, and item assignments-are jointly learned via a fast Gibbs sampler derived through a Thurstonian data augmentation. Despite its efficiency, the sampler yields coherent and interpretable posterior summaries for all model components. Our motivating application analyzes men's tennis results from ATP tournaments over the seasons 2000-2022. We find that the top 100 players can be broadly partitioned into three or four tiers in most seasons. Moreover, the size of the strongest tier was small from the mid-2000s to 2018 and has increased since, providing evidence that men's tennis has become more competitive in recent years.

A latent space approach for jointly modelling social influence on binary outcomes in networks

Noemi Corsini¹, Michael Fop²

¹University of Cambridge, United Kingdom; ²University College Dublin, Ireland

A central task in network analysis is to model social influence, that is, how individual behaviours and outcomes are shaped by their social environment. Classical regression models are not suitable for this purpose, as they frequently rely on independence assumptions that are violated in network data, where individuals' behaviours are inherently interdependent. Although several methods have been proposed to address this problem, existing approaches either treat the network as fixed, rely on multi-step estimation procedures, or are limited to continuous outcome variables.
We introduce the Bayesian logistic actor-attribute latent space model for social influence, a novel approach that jointly models binary actor-level outcomes and the network structure within a unified model framework. The network is represented through a latent social space that provides an interpretable, low-dimensional characterization of the underlying social structure. Our goal is to model a binary actor-level outcome as a function of both observed covariates and latent positions, where the latent social space captures complex network dependencies not explained by covariates alone, but affecting the outcome of interest. Inference is performed within a fully Bayesian framework via a Gibbs sampling algorithm based on Pólya–Gamma data augmentation. This scheme enables principled uncertainty quantification, efficient posterior estimation, and scalability to large networks.

5:05pm - 6:35pm

Contributions to Computational Biostatistics and Data Science
Location: 1.012
Session Chair: Dennis Dobler

Bootstrap-based inference in regression using jackknife pseudo-observations

Simon Mack¹, Morten Overgaard², Dennis Dobler¹

¹RWTH Aachen University, Germany; ²Aarhus University, Denmark

The pseudo-observation regression approach provides a flexible alternative to the omnipresent proportional hazards model when modeling time-to-event outcomes. In this approach, estimands representable as expectations are fitted to regression models using covariates of interest. Exemplary estimands that fit this framework are the restricted mean time lost (in competing risks models) or the survival function at a fixed time-point (in simple survival models).
Even though consistent parameter estimates are readily obtained using standard statistical software, variance estimation turns out to be a more intricate task: We verify the longstanding conjecture that the usual Huber-White estimator is not consistent. By confirming that a plug-in estimator can be used instead, we obtain asymptotically exact and consistent tests for general linear hypotheses in the parameters of the model. Additionally, we confirm that naive bootstrapping can not be used for covariance estimation in the pseudo-observation approach either. However, it can still be used for hypothesis testing by applying a suitable studentization. These methods are evaluated in an extensive simulation study and exemplified with a real data analysis.

Likelihood-Based Inference for Dirichlet Mixture Models via Unconstrained Parameterization

Samyajoy Pal¹, Christian Heumann²

¹TU Kaiserslautern, Germany; ²LMU Munich, Germany

Dirichlet mixture models (DMMs) provide a flexible and interpretable framework for clustering and modeling compositional data and have found widespread application in genomics, ecology, and the social sciences. Despite their popularity, formal likelihood-based inference for DMM parameters remains underdeveloped, primarily due to the presence of simplex constraints on mixture weights and the complex dependence structure induced by latent component memberships. In this paper, we develop a unified framework for classical likelihood-based inference in Dirichlet mixture models by working on an unconstrained parameterization that combines an additive log-ratio transformation of the mixture weights with the original Dirichlet concentration parameters. Within this framework, we derive closed-form expressions for score functions and observed Fisher information matrices, including full cross-component information terms obtained via the Louis identity. These results enable the construction of Wald, score (Lagrange Multiplier), and likelihood ratio tests for a broad class of regular parametric hypotheses, including fixed-value restrictions and equality constraints across mixture components. We show how the proposed methods apply seamlessly to both soft and hard EM-based estimation schemes and provide a numerically stable implementation that yields consistent standard errors and confidence intervals on the original parameter scale. Through simulation experiments and a real-data application, we demonstrate that the proposed inferential procedures perform well in finite samples and provide meaningful uncertainty quantification for DMM parameters.

6:40pm - 8:30pm

Welcome Reception

Date: Thursday, 19/Mar/2026

8:50am - 9:50am

Plenary Lecture 3
Location: 0.004

Statistical and computational challenges in unsupervised learning: focus on ranking

Alexandra Carpentier

University of Potsdam, Germany

Ranking problems are prevalent in modern statistical, machine learning, and computer science literature. This includes a variety of practical situations ranging from ranking experts/workers in crowd-sourced data, ranking players in a tournament or equivalently sorting objects based on pairwise comparisons. A main challenge in this field is to construct an estimator of the rank of the experts, based on incomplete and noisy data.
In this talk, we focus on understanding the problem of ranking both from an informational − namely, characterizing the fundamental statistical thresholds for optimal estimation − and a computational − namely, also characterising the fundamental limits of computationally efficient estimation − perspective. A core question for these problems is on whether statistical optimality is compatible with computational efficiency.
To do that, we first consider the simpler sub-problem of sub-matrix detection and estimation, which is useful to apprehend the more complex problem of ranking − and we will particularly focus on computational lower bounds. Based on results for this problem, we explain how they can be used to solve the more challenging problem of ranking.

9:50am - 10:20am

Coffee break 3

10:20am - 12:20pm

Statistics in natural sciences and technology
Location: 0.001
Session Chair: Gaby Schneider
Session Chair: Ansgar Steland

Time-varying degree-corrected stochastic block models

Rainer von Sachs, Mengxue Li, Eugen Pircalabelu

ISBA/LIDAM, UC Louvain, Belgium

Recent interest has emerged in community detection for dynamic networks which are observed along a trajectory of points in time. In this talk, we present a time-varying degree-corrected stochastic block model to fit a dynamic network which allows evolving heterogeneity in the degrees of nodes within a community over time. Considering the influence of the varying time window on the aggregation of network information from different time points, in the parameter estimation, we propose a smoothing-based method to recover time-varying degree parameters and communities. In particular we provide rates of consistency of our smoothed estimators for degree parameters and communities using a time-localised profile-likelihood approach. We illustrate our method by some comparative simulation studies and an application to a real data set.

Learning population and individual structure in dynamic networks with degree heterogeneity

Mengxue Li, Rainer von Sachs, Eugen Pircalabelu

UCLouvain, Belgium

Dynamic networks provide a powerful framework for characterizing time-varying functional connectivity in neuroimaging studies. In practice, such networks are typically collected from multiple subjects across time and exhibit both temporal dynamics and subject-specific heterogeneity. Brain functional connectivity networks also contain hub nodes, defined as highly connected regions that play critical roles in understanding brain functional connectivity. In this talk, we propose a mixed-effect dynamic stochastic block model with degree heterogeneity, which simultaneously disentangles the population connectivity structure from individual variability and recovers the trajectories of hub regions through time-varying degree parameters. We develop an efficient local approximate estimation procedure and evaluate its performance through extensive simulations and a case study of dynamic functional connectivity from the Human Connectome Project.

How to build your latent Markov model — the role of time and space

Sina Mews, Jan-Ole Koslik, Roland Langrock

Bielefeld University, Germany

Statistical models that involve latent Markovian state processes have become immensely popular tools for analysing time series and other sequential data. However, the plethora of model formulations, the inconsistent use of terminology, and the various inferential approaches and software packages can be overwhelming to practitioners, especially when they are new to this area. Here we aim to provide guidance for both statisticians and practitioners working with latent Markov models by offering a unifying view on what otherwise are often considered separate model classes, from hidden Markov models over state-space models to Markov-modulated Poisson processes. In particular, we provide a roadmap for identifying a suitable latent Markov model formulation given the data to be analysed. Furthermore, we emphasise that it is key to applied work with any of these model classes to understand how recursive techniques exploiting the models' dependence structure can be used for inference. The R package LaMa adapts this unified view and provides an easy-to-use framework for fast numerical maximum likelihood estimation, allowing users to flexibly tailor a latent Markov model to their data using a Lego-type approach. Real-data examples from ecology, medicine and finance will be used to illustrate the modelling workflow.

A Simple and Robust Multi-Fidelity Data Fusion Method for Effective Modelling of Citizen-Science Air Pollution Data

Camilla Andreozzi², Pietro Colombo¹, Philipp Otto¹

¹University of Glasgow, United Kingdom; ²ETH Zürich

We propose a robust multi-fidelity Gaussian process for integrating sparse, high-quality reference monitors with dense but noisy citizen-science sensors. The approach replaces the Gaussian log-likelihood in the high-fidelity channel with a global Huber loss applied to precision-weighted residuals, yielding bounded influence on all parameters, including the cross-fidelity coupling, while retaining the flexibility of co-kriging. We establish attenuation and unbounded influence of the Gaussian maximum likelihood estimator under low-fidelity contamination and derive explicit finite bounds for the proposed estimator that clarify how whitening and mean-shift sensitivity determine robustness. Monte Carlo experiments with controlled contamination show that the robust estimator maintains stable MAE and RMSE as anomaly magnitude and frequency increase, whereas the Gaussian MLE deteriorates rapidly. In an empirical study of PM2.5 concentrations in Hamburg, combining UBA monitors with openSenseMap data, the method consistently improves cross-validated predictive accuracy and yields coherent uncertainty maps without relying on auxiliary covariates. The framework remains computationally scalable through diagonal or low-rank whitening and is fully reproducible with publicly available code.

10:20am - 12:20pm

High-dimensional estimation and concentration phenomena
Location: 0.002
Session Chair: Marie Düker

Copula tensor count autoregressions

Mirko Armillotta¹, Paolo Gorgi², André Lucas²

¹University of Rome Tor Vergata; ²Vrije Universiteit Amsterdam

This paper presents a novel copula-based autoregressive framework for multi-layer arrays of integer-valued time series with tensor structure. Our framework generalizes recent advances in tensor time series models for real-valued data to a context that accounts for the unique properties of integer-valued data, such as discreteness and non-negativity. The model incorporates feedback effects for the counts’ temporal dynamics and introduces identification constraints. An asymptotic theory is developed for a Two-Stage Maximum Likelihood Estimator (2SMLE) for the model’s parameters. The estimator balances the challenges of parameter dimensionality, interdependence of the different count series, and computational stability. Together, this substantially pushes the frontier for modeling multi-dimensional, structured tensor time series of counts. An application to tensor crime counts demonstrates the practical usefulness of the proposed methodology.

High-Dimensional Inference for Network Stochastic Differential Equations

Francesco Iafrate

University of Hamburg, Germany

We consider the setting where the state dynamics at each node in a network depend on interactions with its neighbors. We model this using the general framework of Network Stochastic Differential Equations (N-SDEs). The evolution at each node arises from three components: intrinsic dynamics (a momentum term), feedback from adjacent nodes (a network term), and a stochastic volatility component driven by Brownian motion. Our goals are twofold: (i) parameter estimation for N-SDE systems and (ii) recovery of the underlying graph.
The main motivation is to handle very high-dimensional time series by exploiting sparsity in the network structure. We study two settings. i) Known network structure: the graph is given, and we provide identifiability conditions for the parameters, accounting for the fact that the parameter dimension grows with the number of edges. ii) Unknown network structure: the graph must be learned from data; for this case, we propose an iterative procedure based on adaptive Lasso, developed for a particular class of N-SDE models.
We focus on oriented graphs, which supports applications to causal inference by allowing the investigation of directed cause–effect relationships in dynamical systems. Using simulations and real data, we illustrate the performance of the proposed estimators across several graph topologies in high-dimensional regimes. We establish non-asymptotic bounds for parametric estimation when the system dimension is large, in two observation schemes: (1) high-frequency data from an ergodic diffusion, and (2) continuous observation in a small-diffusion, not necessarily ergodic, setting.

Based on joint works with S.M. Iacus and N. Yoshida.

Testing approximate sphericity for high-dimensional covariance matrices

Nina Therese Dörnemann, Tim Kutta, Daria Tieplova

Aarhus University, Denmark

Exact testing of model assumptions is often of limited relevance, especially in high-dimensional settings. Structural assumptions on large-dimensional covariance matrices, such as sphericity, are rarely expected to hold exactly for real data, and practitioners are often primarily interested in whether such model assumptions are approximately satisfied. In this work, we propose a test for approximate sphericity of high-dimensional covariance matrices, where the tolerated level of deviation from sphericity can be chosen by the user. Our test statistic is based on estimators of the largest and smallest eigenvalues of the population covariance matrix in a high-dimensional regime, where the corresponding sample eigenvalues are not consistent. We derive theoretical guarantees showing that the test keeps the prescribed asymptotic level under the null hypothesis and is power consistent under the alternative. Our key theoretical contribution is a joint central limit theorem for the estimators of the extreme eigenvalues of the population covariance matrix, provided the corresponding eigenvalues exceed the critical phase transition threshold.

Principal Components Analysis for Irregular Data

Kartik Waghmare¹, Almond Stoecker², Victor Panaretos²

¹ETH Zurich, Switzerland; ²EPFL, Switzerland

Functional principal component analysis (FPCA) is a fundamental tool for exploring variation in samples of random curves or surfaces. We propose a new approach to FPCA for functional data observed irregularly and sparsely over their domains, based on smoothing directly at the level of the eigenfunctions. Our formulation leads to an efficient optimization-based procedure whose computational and storage costs are comparable to those of standard multivariate PCA for regularly observed data. The method is flexible with respect to domain geometry and model class, accommodates structural constraints and penalties, and facilitates uncertainty quantification via resampling and asymptotic theory.

10:20am - 12:20pm

Theory of Machine Learning: Insights from Women Researchers
Location: 0.004
Session Chair: Mahsa Taheri

Effects of Depth in Deep Learning: Independence vs Recurrence

Mariia Seleznova

LMU Munich, Germany

Depth plays a central role in modern deep learning, yet its probabilistic effects are subtle and are not fully captured by classical theories that primarily focus on the infinite-width limit. This talk explores how jointly scaling depth and width shapes the signal-propagation statistics of wide neural networks under two contrasting regimes: fully connected feedforward networks with independent weights across layers, and recurrent networks with shared weights. In feedforward networks, standard infinite-width analyses allow to stabilize forward and backward variance, ensuring well-behaved initialization. However, finite-width fluctuations accumulate with depth, breaking convergence to the Neural Tangent Kernel (NTK) regime. In contrast, in linear recurrent networks, finite-width effects already destabilize the forward-propagation variance, rendering conventional initialization schemes inadequate for long input sequences. Together, these results show that depth affects feedforward and recurrent architectures in qualitatively distinct ways that cannot be captured by infinite-width approximations.

Theoretical guarantees for diffusion models — beyond log-concavity

Gitte Kremling, Francesco Iafrate, Mahsa Taheri, Johannes Lederer

University of Hamburg, Germany

Score-based generative modeling, implemented through probability flow ODEs, has shown impressive results in numerous practical settings. However, most convergence guarantees rely on restrictive regularity assumptions on the target distribution—such as strong log-concavity or bounded support. This work establishes non-asymptotic convergence bounds in the 2-Wasserstein distance for a general class of probability flow ODEs under considerably weaker assumptions: weak log-concavity and Lipschitz continuity of the score function. Our framework accommodates non-log-concave distributions, such as Gaussian mixtures, and explicitly accounts for initialization errors, score approximation errors, and effects of discretization via an exponential integrator scheme. Bridging a key theoretical challenge in diffusion-based generative modeling, our results extend convergence theory to more realistic data distributions and practical ODE solvers. We provide concrete guarantees for the efficiency and correctness of the sampling algorithm, complementing the empirical success of diffusion models with rigorous theory. Moreover, from a practical perspective, our explicit rates might be helpful in choosing hyperparameters, such as the step size in the discretization.

Random Quadratic Form on a Sphere: Synchronization by Common Noise

Anna Shalova, Maximilian Engel

University of Amsterdam, Netherlands, The

We introduce Random Quadratic Form (RQF): a stochastic differential equation which formally corresponds to the gradient flow of a random quadratic functional on a sphere. While one-point motion of the system is a Brownian motion on a sphere and thus has no preferred direction, the two-point motion exhibits nontrivial synchronizing behaviour. In this work we study synchronization of the RQF, namely we give both distributional and path-wise characterizations of the solutions by studying invariant measures and random attractors of the system.
The RQF model is motivated by the study of the role of linear layers in transformers and illustrates the synchronization by common noise phenomena arising in the simplified models of transformers. In particular, we provide an alternative (independent of self-attention) explanation of the clustering behaviour in deep transformers and show that tokens cluster even in the absence of the self-attention mechanism.

Minimax rate of distribution regression

Rong Tang, Yun Yang

Hong Kong University of Science and Technology, Hong Kong S.A.R. (China)

Distribution regression seeks to estimate the conditional distribution of a multivariate response given a continuous covariate. This approach offers a more complete characterization of dependence than traditional regression methods. Classical nonparametric techniques often assume that the conditional distribution has a well-defined density, an assumption that fails in many real-world settings. These include cases where data contain discrete elements or lie on complex low-dimensional structures within high-dimensional spaces. In this work, we establish minimax convergence rates for distribution regression under nonparametric assumptions, focusing on scenarios where both covariates and responses lie on low-dimensional manifolds. We derive lower bounds that capture the inherent difficulty of the problem and propose a new hybrid estimator that combines adversarial learning with simultaneous least squares to attain matching upper bounds. Our results reveal how the smoothness of the conditional distribution and the geometry of the underlying manifolds together determine the estimation accuracy.

10:20am - 12:20pm

Mathematical Statistics
Location: 1.012
Session Chair: Mathias Trabs

Alternative argmin method in the non-unique case and application for gradual regression changes

Natalie Neumeyer¹, Marie Huskova², Leonie Selk¹

¹University of Hamburg, Germany; ²Charles University of Prague, Czech Republic

Assume one wants to estimate the true parameter $vartheta_0$, which is the {it maximal} value {it minimizing} a function $M(vartheta)$ over $vartheta$. Let $M_n(vartheta)$ be a consistent estimator for $M(vartheta)$ uniformly in $vartheta$. Although uniform convergence holds, one cannot apply the argmin theorem in the non-unique minimum case. Using the {it maximal} value {it minimizing} the function $M_n(vartheta)$ over $vartheta$ generally does not give a consistent estimator. We consider a special case with real-valued parameter, and define a new consistent estimator. This method is then applied to estimate the gradual (smooth) change point $vartheta_0$ of a nonparametric regression model $Y=m(X)+varepsilon$ with real-valued covariates, and a continuous regression function $m$ with maximal value $vartheta_0$, where $m$ is zero.

Flow Matching as a forecasting model

Lea Kunkel¹, Mathias Trabs²

¹Ruhr-Universität Bochum, Germany; ²Karlsruher Institut für Technologie

Flow Matching (introduced by Lipman et. al.) and associated models have recently attracted significant interest due to their simulation-free training via a straightforward least squares criterion and the extremely broad and consequently adaptable underlying ordinary differential equation framework. Despite being a generative model that aims to mimic an unknown distribution, its possible applications extend far beyond the core task of generating new samples. The cheap generation of new samples opens the door to efficient distribution estimation, an essential component of forecasting tasks such as weather prediction. In this talk, we first adapt the Flow Matching method to smooth conditional density estimation. We show that the resulting estimator is closely related to th Nadaraya-Watson estimator. Then, we bridge the gap between proper scoring rules, the established method of evaluating predictions, and the fundamental concept of risk in statistical learning. Building on this, we show that the Nadaraya-Watson estimator achieves a minimax optimal anisotropic rate of convergence with respect to the risk associated with the Fourier score. In the end, we transfer this result to the Flow Matching estimator and demonstrate its capability in practice.

Maximum likelihood estimation of the location of a symmetric convex body

Vladimir Koltchinskii¹, Lakshmi Ramesh², Martin Wahl²

¹Georgia Tech, United States; ²Universität Bielefeld, Germany

Consider data points sampled independently from the uniform distribution on a known symmetric convex body in high-dimensional Euclidean space with unknown location parameter. In this setting, the set of maximum likelihood estimators (MLE set) is a convex body containing the true location parameter. The goal of this talk is to present non-asymptotic upper and lower bounds for the diameter of the MLE set.

Permutation testing under local differential privacy

Alexander Kent, Thomas Berrett, Yi Yu

University of Warwick, United Kingdom

In this talk I will discuss recent work on two-sample testing under a local differential privacy constraint where a permutation procedure is used to calibrate the tests. While permutation testing is a classical resampling technique, popular due to its ease of implementation and uniform Type I error control, its use under local privacy constraints is complicated by the fact that access to the data is limited. In this work we design appropriate mechanisms for private data collection, both interactive and non-interactive, that allow for permutation tests. Our analysis shows that these lead to minimax optimal separation rates in both discrete and continuous settings, with interactive procedures being significantly more powerful. This is recent joint work with Alexander Kent and Yi Yu (https://arxiv.org/abs/2505.24811).

12:20pm - 1:30pm

Lunch break 2

1:30pm - 3:30pm

Statistics in natural sciences and technology
Location: 0.001
Session Chair: Gaby Schneider
Session Chair: Ansgar Steland

MEWMA control charts for the covariance matrix -- on the validity of a certain approximation to achieve a feasible ARL integral equation

Maik Ulmer¹, Sven Knoth²

¹RWTH Aachen / HSU Hamburg, Germany; ²HSU Hamburg, Germany

In this talk, we consider the problem of monitoring changes in the covariance matrices of a sequence of multivariate normally distributed random vectors. Therefore, we introduce a Multivariate Exponentially Weighted Moving Average (MEWMA) control chart in which, at each time step, the empirical covariance matrix is computed and vectorized. The control limit and the corresponding Average Run Length (ARL) are determined not only by Monte Carlo simulation, but also by numerically solving an integral equation for the ARL. In order to set up this integral equation, the exact transition density of the monitoring statistic is approximated by its asymptotic transition density. This approximation exploits the fact that the asymptotic transition density is invariant under rotations of the sample covariance matrix. Finally, we provide an outlook on an application of the proposed control chart to data from a bridge monitoring project.

EWMA control charts for the correlation coefficient

Sven Knoth, Maik Ulmer

Helmut Schmidt University Hamburg, Germany

There are indeed many EWMA control charts for various parameters available. However, there is none for monitoring the linear correlation coefficient ρ. Despite it is known for a long time, the usage of he explicit distribution of the estimator of ρ while setting up a control chart seems to be non-existent. Here, we build an EWMA chart utilizing this estimator, namely the Pearson correlation, and calculate the most popular performance measure, the zero-state average run length (ARL), by means of various numerical methods. Less surprisingly, the two standard methods work poorly for certain chart designs. We solve these problems by utilizing piece-wise collocation. Moreover, we examine further configuration details and provide some guidelines. Two applications illustrate the usefulness of monitoring the ρ level.

Integrated Modelling of Age-and Sex-Structured Wildlife Population Dynamics: The Example of Hartebeest

Joseph Ogutu

University of Hohenheim, Germany

Biodiversity underpins life on Earth, yet it is declining at an accelerating pace, sharpening the need for interventions that can slow, halt, or reverse these losses. Designing such interventions requires clear insight into the processes driving population declines in particular species—and into the relative importance of those processes—insight most directly generated by population dynamics models. Yet appropriate population dynamics models for quantifying declines and guiding conservation management of wild herbivore populations remain scarce, leaving a critical gap in both evidence and practice. To address this gap, we develop an integrated Bayesian state-space population dynamics model, using the Mara-Serengeti hartebeest population as a case study. The model extends and generalizes an earlier framework we developed and illustrated for the Mara-Serengeti topi (Mukhopadhyay et al. 2024), adding multiple features designed to improve realism, inference, and management relevance.

The model fuses ground demographic surveys with aerial monitoring data, explicitly representing population age–sex structure and key life-history traits and strategies. It links birth rates, age-specific survival rates, and sex ratios to meteorological covariates, prior population density, environmental seasonality, predation risk, and several environmental and anthropogenic covariates. Operating on a monthly time step, it enables fine-grained estimation of reproductive seasonality, phenology, synchrony, and birth prolificacy, as well as juvenile and adult recruitment dynamics. We evaluate performance using balanced bootstrap sampling and by comparing model predictions with empirical aerial estimates of population size. We perform detailed assessment of model robustness, including by checking for parameter redundancy, estimability and identifiability, performing sensitivity analysis of the priors and running multiple MCMC chains. Implemented as a hierarchical Bayesian model using MCMC methods for parameter estimation, prediction, and inference, the model reproduces several well-established features of the hartebeest population, including a steep and persistent decline, weakly seasonal births, and juvenile and adult recruitment patterns. The framework is general and flexible and easily adaptable for other species.

References Mukhopadhyay, S., Piepho, H. P., Bhattacharya, S., Dublin, H. T., & Ogutu, J. O. (2024). Hierarchical Bayesian integrated modeling of age-and sex-structured wildlife population dynamics. Journal of Agricultural, Biological and Environmental Statistics, 1-26.

Joseph O. Ogutu, Hans-Peter Piepho et al.

University of Hohenheim, Institute of Crop Science, Biostatistics Unit, Fruwirthstrasse 23, 70599 Stuttgart, Germany

The second order generalization of Hájek-Le Cam asymptotic minimax theorem

Junichi Hirukawa

Nanzan University, Japan

The basic results concerning with the asymptotic theory of estimation and testing, Le Cam (1960) introduced so-called locally asymptotically normal (LAN) family of distributions. The convolution theorem for LAN case is obtained by Hájek (1970). The convolution result was extended by Le Cam (1972) to more general situations than that of LAN case. These results sometimes called the Hájek-Le Cam asymptotic minimax theorem. In this talk we derive the second order generalization of Hájek's convolution theorem. Furthermore, as a application of the second order Hájek's convolution theorem, we lead to the second order Hájek-Le Cam asymptotic minimax theorem. It automatically provides the conditions that the second order asymptotic efficient estimators should satisfy.

1:30pm - 3:30pm

Statistics for Stochastic Processes
Location: 0.002
Session Chair: Fabian Mies

A nonparametric statistic for rank changes of volatility functions of Ito semimartingales

Bastian Schroeter, Mathias Vetter

Christian-Albrechts-Universität, Germany

The change of the rank of the volatility function in Ito semimartingales poses a complicated signal-detection problem. In their paper from 2013 Jacod & Podolskij have derived a statistic to detect whether the rank of the volatility function is constant over the observation period. Based on their results we develop a statistic which allows us to detect local jumps in the rank which is based on random perturbation of the high-frequency observations on an Ito semimartingale. This statistic can be used to estimate the time points at which the rank jumps occur. We illustrate our results with some simulated data.

Nonparametric density estimation for the small jumps of Lévy processes

Ester Mariucci

Université Versailles Saint Quentin, France

We consider the problem of estimating the density of the process associated with the small jumps of a pure jump Lévy process, possibly of infinite variation, from discrete observations of one trajectory. The interest of such a question lies on the observation that even when the Lévy measure is known, the density of the increments of the small jumps of the process cannot be computed in closed-form. We discuss results both from low and high-frequency observations. In a low frequency setting, assuming the Lévy density associated with the jumps larger than $epsilonin(0,1)$ in absolute value is known, a spectral estimator relying on the convolution structure of the problem achieves a parametric rate of convergence with respect to the integrated $L_2$ loss, up to a logarithmic factor. In a high-frequency setting, we remove the assumption on the knowledge of the Lévy measure of the large jumps and show that the rate of convergence depends both on the sampling scheme and on the behavior of the Lévy measure in a neighborhood of zero. We show that the rate we find is minimax up to a logarithmic factor. An adaptive penalized procedure is studied to select the cutoff parameter. These results are extended to encompass the case where a Brownian component is present in the Lévy process. Furthermore, we numerically illustrate the performances of our procedures.

Fractional interacting particle system: drift parameter estimation via Malliavin calculus

Chiara Amorino

Universitat Pompeu Fabra, Spain

We address the problem of estimating the drift parameter in a system of $N$ interacting particles driven by additive fractional Brownian motion of Hurst index ( H geq 1/2 ). Considering continuous observation of the interacting particles over a fixed interval ([0, T]), we examine the asymptotic regime as ( N to infty ). Our main tool is a random variable reminiscent of the least squares estimator but unobservable due to its reliance on the Skorohod integral. We demonstrate that this object is consistent and asymptotically normal by establishing a quantitative propagation of chaos for Malliavin derivatives, which holds for any ( H in (0,1) ). Leveraging a connection between the divergence integral and the Young integral, we construct computable estimators of the drift parameter. These estimators are shown to be consistent and asymptotically Gaussian. Finally, a numerical study highlights the strong performance of the proposed estimators.

Adaptive denoising diffusion modelling via random time reversal

Sören Christensen¹, Jan Kallsen¹, Claudia Strauch², Lukas Trottner³

¹Kiel University, Germany; ²Heidelberg University, Germany; ³University of Stuttgart, Germany

We introduce a new class of generative diﬀusion models that, unlike conventional denoising diﬀusion models, achieve a time-homogeneous structure for both the noising and denoising processes, allowing the number of steps to adaptively adjust based on the noise level. This is accomplished by conditioning the forward process using Doob’s h-transform, which terminates the process at a suitable sampling distribution at a random time. The model is particularly well suited for generating data with lower intrinsic dimensions, as the termination criterion simplifies to a first hitting rule. A key feature of the model is its adaptability to the target data, enabling a variety of downstream tasks using a pre-trained unconditional generative model. We highlight this point by demonstrating how our generative model may be used as an unsupervised learning algorithm: in high dimensions the model outputs with high probability the metric projection of a noisy observation $y$ of some latent data point $x$ onto the lower-dimensional support of the data – which we don't assume to be analytically accessible but to be only represented by the unlabeled training data set of the generative model.

1:30pm - 3:30pm

Multivariate Statistics and Copulas
Location: 0.004
Session Chair: Eckhard Liebscher

Tests for independence between random vectors

Irène Gijbels

University of Leuven (KU Leuven), Belgium, Belgium

In this talk the focus is on copula-based procedures for testing whether a finite collection of continuous random vectors is mutually independent. In particular, we look into the class of meta-elliptical copulas and test the hypothesis whether the copula correlation matrix is a block diagonal matrix. The test statistic is a Phi-dependence measure of a rank-based correlation matrix estimator, whose asymptotic distribution under the null is obtained for general (Phi) functions and general elliptical generators. In case of the Gaussian copula, we also develop asymptotics when optimal transport dependence measures are used for testing the null hypothesis of independent random vectors. Some numerical studies, including comparisons with existing methods, are reported on.

Irène Gijbels, Steven De Keyser

University of Leuven (KU Leuven), Belgium.

Restrictions of PCBNs for integration-free computations

Alexis Derumigny, Niels Horsman, Dorota Kurowicka

Delft University of Technology, The Netherlands

The pair-copula Bayesian Networks (PCBN) are graphical models composed of a directed acyclic graph (DAG) that represents (conditional) independence in a joint distribution. The nodes of the DAG are associated with marginal densities, and arcs are assigned with bivariate (conditional) copulas following a prescribed collection of parental orders. The choice of marginal densities and copulas is unconstrained. However, the simulation and inference of a PCBN model may necessitate possibly high-dimensional integration.
We present the full characterization of DAGs that do not require any integration for density evaluation or simulations. Furthermore, we propose an algorithm that can find all possible parental orders that do not lead to (expensive) integration. Finally, we show the asymptotic normality of estimators of PCBN models using stepwise estimating equations. Such estimators can be computed effectively if the PCBN does not require integration. A simulation study shows the good finite-sample properties of our estimators.

A nonparametric copula-based imputation method

F. Marta L. Di Lascio

Free university of Bozen-Bolzano, Italy

Missing values in multivariate dependent data are common in many applied settings and pose challenges for standard imputation methods, particularly when complex dependence structures are present. We introduce NPCoImp, a nonparametric copula-based approach for imputing multivariate missing data. The method relies on the empirical beta copula to estimate conditional distribution functions of missing variables given the observed ones, allowing the imputation process to account for the radial symmetry or asymmetry of the joint dependence structure. NPCoImp is highly flexible and can accommodate arbitrary missingness patterns in multivariate settings. We assess its performance through an extensive Monte Carlo simulation study, comparing it with classical imputation methods, the CoImp algorithm, and the machine-learning-based missForest approach. The results show that NPCoImp performs particularly well in preserving dependence structures across different sample sizes, missingness levels, and dependence strengths. The practical relevance of the method is illustrated through applications to real data from the agricultural sector.

An ordering for the strength of functional dependence

Jonathan Ansari, Sebastian Fuchs

Paris Lodron Universität Salzburg, Austria

We introduce a new dependence order, termed the conditional convex order, whose minimal and maximal elements characterize independence and perfect dependence. Moreover, it characterizes conditional independence, satisfies information monotonicity, and exhibits several invariance properties. Consequently, it is an ordering for the strength of functional dependence of a random variable Y on a random vector X. As we show, various recently studied dependence measures---including Chatterjee's rank correlation, Wasserstein correlations, and rearranged dependence measures---are increasing in this order and inherit their fundamental properties from it. We characterize the conditional convex order by the Schur order and by the concordance order, and we verify it in settings such as additive error models, the multivariate normal distribution, and various copula-based models. Our results offer a unified perspective on the behavior of dependence measures across statistical models.

1:30pm - 3:30pm

Topics in functional data analysis
Location: 1.012
Session Chair: Siegfried Hörmann

Tests of symmetry for functional data

Daniel Hlubinka

Charles University, Czech Republic

We present test of symmetry of distribution and test of time symmetry for functional data. These test are Cramér - von Mises type tests based on empirical characteristic functionals. Specific variants of time symmetry including time symmetry of Wiener process are proposed. In general, the test statistics assume a relatively simple form if we use a Gaussian measure to construct the test. Then, we use bootstrap or permutation techniques to estimate the asymptotic critical values for the test statistics.

Making Event Study Plots Honest: A Functional Data Approach to Causal Inference

Chencheng Fang, Dominik Liebl

University of Bonn, Germany

Event study plots are the centerpiece of Difference-in-Differences (DiD) analysis, but current plotting methods cannot provide honest causal inference when the parallel trends and/or no-anticipation assumption fails. We introduce a novel functional data approach to DiD that directly enables honest causal inference via event study plots. Our DiD estimator converges to a Gaussian process in the Banach space of continuous functions, enabling powerful simultaneous confidence bands. This theoretical contribution allows us to turn an event study plot into a rigorous, honest causal inference tool through equivalence and relevance testing: Honest reference bands can be validated using equivalence testing in the pre-treatment period, and honest causal effects can be tested using relevance testing in the post-treatment period. We demonstrate the performance of our method in simulations and two case studies.

Kernel Expansions in Sobolev Spaces and Applications to Stochastic Processes

Daniel Constantin Rademacher

TU Graz, Austria

Mercer's celebrated theorem is refined and extended for (weakly) differentiable symmetric kernels by associating not the common $L^2$-integral operator but a slightly more complex operator, that additionally takes into account information encoded in the (weak) derivatives of the kernel. The natural domain for this associated operator is the Sobolev Space $H^k(Theta) = W^{k,2}(Theta) subset L^2(Theta)$, where $Theta subset R^d$ is some bounded domain and $kinN_0$ depends on the order of weak differentiability. The spectral decomposition of this operator then leads to a Mercer-type expansion of the kernel, which converges with respect to the $H^k$-norm and, if $k>d$, also uniformly emph{without} requiring the kernel to be positive-definite. In case the kernel is also positive-definite and differentiable in the strong sense, a refinement of Mercer's theorem is obtained that additionally provides uniform convergence of the term-wise derivatives of the expansion to the respective derivatives of the kernel as well.
Finally, applied to the covariance kernel of a (weakly) differentiable stochastic process, these results provide a novel Karhunen-Loève-type representation that simultaneously approximates the process as well as its (weak) derivatives in a mean square optimal sense.

Uncertainty of Functional Data Reconstruction

David Kraus

Masaryk University, Czech Republic

We revisit the classic situation in functional data analysis in which data items such as curves are observed at discrete (possibly sparse and irregular) arguments with observation noise. We focus on the reconstruction of individual curves, especially on prediction intervals and prediction bands for them. The standard approach is to proceed in two steps: First, one estimates the mean and covariance function of curves and observation noise variance function by smoothing techniques such as penalized splines. Second, under Gaussian assumptions, one derives the conditional distribution of a curve given its noisy discrete observations and constructs prediction sets with required properties (usually employing sampling from the predictive distribution). This approach is indeed well established, commonly used and theoretically valid but practically, it surprisingly fails in its key property: prediction sets constructed this way often do not have the required coverage. The actual coverage is lower than the nominal one. This has been little reported and studied in the literature. We investigate the cause of this issue and propose a remedy.

3:30pm - 4:00pm

Coffee break 4

4:00pm - 6:00pm

Computational Statistics
Location: 0.001
Session Chair: Ostap Okhrin

Tensor changepoint detection and eigenbootstrap

Michal Pešta, Barbora Peštová, Martin Romaňák

Charles University, Czech Republic

Tensor data consisting of multivariate outcomes over the items and across the subjects with longitudinal and cross-sectional dependence are considered. A completely distribution-free and tweaking-parameter-free detection procedure for changepoints at different locations is designed, which does not require training data. A CUSUM-type test statistic is employed, and its asymptotic properties are derived for a large number of available individual profiles. The considered test is shown to be consistent. The aim is to propose eigenbootstrap superstructure that overcomes the computational curse of dimensionality without any loss of information, while it preserves all the dependencies within and between the panels. The validity of this new and fast resampling algorithm is proved in this general setting. The empirical properties of the detection technique are investigated through a simulation study. The fully data-driven test is applied to real-world data from EEG and psychometrics.

Functional-based claims reserving with ProfileLadder

Matus Maciak

Charles University, Czech Republic

Risk reserving is a fundamental task in non-life insurance and is performed on a regular basis. It is typically carried out using parametric estimation and prediction methods applied to aggregated data structured in so-called run-off triangles. In this talk, we present nonparametric, functional-based reserving alternatives that rely on the completion of MNAR functional segments in the underlying run-off triangles.

In addition to the theoretical and methodological framework, we focus on algorithmic details implemented in the recent R package ProfileLadder. The package offers a flexible and computationally efficient tools for pointwise and distributional reserve prediction and includes relevant visualization and diagnostic tools implemented via standard S3 methods. These nonparametric approaches provide modern, transparent, and extensible alternatives to classical reserving methods used by researchers, actuarial scientists, or insurance practitioners.

Proxy-identification of a structural MGARCH model for asset returns

Matthias Fengler, Jeannine Polivka

Matthias R. Fengler, Professor of Econometrics, University of St.Gallen, Switzerland

We identify shocks in a structural MGARCH model of asset returns using news-based proxy instruments. Structural parameters, including an orthogonal matrix, are estimated via Riemannian optimization. We study daily returns on the S&P500, the 10-year Treasury yield, and the USD index. The proxies identify an equity valuation shock, capturing shifts in expected dividend growth and risk premia, and a bond valuation shock, reflecting fundamental shocks in safe-haven asset pricing. The dynamic impact matrix is asymmetric, and sign changes in the bond valuation shock loading drive switches between negative and positive stock–bond co-movement. A decomposition of the COVID-19 episode shows that bond valuation shocks partially offset equity market stress and explain the temporary yield surge in mid-March 2020.

Estimating ``Realized'' Skewness using Convolutional Neural Network

Haozhe Jiang¹, Ostap Okhrin¹, Michael Rockinger²

¹Technische Universität Dresden, Germany; ²University of Lausanne, Switzerland

We propose a new estimator of low-frequency skewness that exploits high-frequency data through a direct functional mapping consisting of layers of convolutional neural networks followed by layers of MLPs. We show that the relevant high-frequency features converge to a continuous limit and that the latent skewness admits a continuous functional representation. This allows us to establish the unbiasedness of our NN estimator using classical universal approximation results and Rademacher complexity arguments. Monte Carlo experiments under stochastic volatility models, with and without jumps, show that the estimator reduces finite-sample bias relative to existing realized-skewness estimators and remains accurate under model misspecification. Empirically, our estimator exhibits temporal stability and delivers superior cross-sectional pricing performance in skewness-sorted portfolios. Another application finds no evidence that ESG-oriented firms exhibit lower crash risk. Overall, the results demonstrate how learning-based functionals can improve the estimation of nonlinear distributional characteristics from high-frequency data.

4:00pm - 6:00pm

Statistics for Stochastic Processes
Location: 0.002
Session Chair: Fabian Mies

Sharp adaptive nonparametric testing for a constant volatility

Johannes Brutsche, Lukas Riepl

Albert-Ludwigs-Universität Freiburg, Germany

Based on discrete observations within the nonparametric Gaussian white noise model $dY_t = sigma(t)dW_t$, we develop a test to infer if the volatility function $sigma(cdot)$ is constant. In particular, at prescribed significance, we simultaneously identify those time intervals where a violation of the constancy hypothesis occurs without a priori knowledge of their number and size. The testing procedure is shown to be minimax-optimal and adaptive for infill asymptotics and these results entail that a deviation from the null hypothesis of constancy is best measured in terms of $sup_{tin [0,1]}|sigma(t)^2 /|sigma|_{L^2}^2 - 1|$. The derivation of the optimal constants requires to build hypotheses with height solving $F_n(x)=0$ for given functions $F_n$ and to understand the asymptotic behavior of their solution, which is done using the implicit function theorem.

Geometric ergodicity of Langevin dynamics and its discretizations

Vitaliy Golomoziy

Taras Schevchenko National University of Kyiv, Ukraine

We study the Langevin stochastic differential equation and its discrete approximations: the Euler–Maruyama scheme, commonly referred to as the Unadjusted Langevin Algorithm (ULA), and direct sampling from the continuous-time process. We show that the ULA process is geometrically ergodic in $mathbb{R}^d$ under suitable conditions and derive a corresponding drift condition using a Foster–Lyapunov test function. We then analyze time-inhomogeneous approximations with diminishing step sizes and establish geometric recurrence for both chains—the ULA and the directly sampled chain.

Topology Matters for High-Frequency Inference: Weak Convergence of Stochastic Integrals in M1

Fabrice Wunderlich

University of Luxembourg, Germany

Statistical analysis of stochastic processes increasingly relies on functional limit theorems for path-dependent estimators, particularly in the presence of jumps. Many estimators in econometrics and time series analysis, such as statistics used for cointegration testing, self-normalized inference, or high-frequency volatility estimation, can be expressed as functionals of stochastic integrals with random, data-dependent integrands, or as continuous-time limits thereof. Their asymptotic validity therefore hinges on weak convergence results that remain stable beyond the classical continuous-path regime. In particular, Skorokhod’s M1 topology becomes increasingly relevant, since it captures convergence in situations where large discontinuities are approximated by clusters of smaller jumps, a behavior that is typically not captured in the classical framework of the J1. Such phenomena arise naturally in econometrics and high-frequency data settings.

This talk develops a weak limit theory for stochastic integrals on the space of càdlàg paths under Skorokhod’s M1 topology. I present a new, self-contained approach based on good decompositions of semimartingale integrators, yielding tractable conditions under which Itô integration is continuous jointly in the integrator and integrand. The results unify classical J1 continuity theorems and provide new conclusions in M1. I also show that for families of local martingales, M1-tightness implies J1-tightness under a mild localised uniform integrability condition. I conclude with a discussion of applications, including anomalous diffusion models represented as stochastic integrals with respect to continuous-time random walks.

4:00pm - 6:00pm

Nonparametric statistics
Location: 0.004
Session Chair: Anne Leucht

Nonparametric spectral density estimation using interactive mechanisms under local differential privacy

Cristina Butucea¹, Karolina Klockmann², Tatyana Krivobokova³

¹CREST, ENSAE, IP PARIS, France; ²University of Kassel, Germany; ³University of Vienna, Austria

We are interested in the spectral density of a centered stationary Gaussian time series under local differential privacy constraints. Specifically, we propose new interactive privacy mechanisms for three tasks: recovering a single covariance coefficient, recovering the spectral density at a fixed frequency, and globally. Our approach achieves faster rates through a two-stage process: we apply first the Laplace mechanism to the truncated value and then use the former privatized sample to gain knowledge on the dependence mechanism in the time series. For spectral densities belonging to Hölder and Sobolev smoothness classes, we demonstrate that our algorithms improve upon the non-interactive mechanism of Kroll (2024) for small privacy parameter α, since the pointwise rates depend on nα² instead of nα⁴. Moreover, we show that the rate 1/(nα⁴) is optimal for estimating a covariance coefficient with non-interactive mechanisms. However, the L2 rate of our interactive estimator is slower than the pointwise rate. We show how to use these procedures to provide a bona-fide, locally differentially private estimator of the full covariance matrix.

Detecting Periodicity of a General Stationary Time Series via AR(2)-Model Fitting

Jens-Peter Kreiss¹, Panagiotis Maouris², Efstathios Paparoditis³

¹TU Braunschweig, Germany; ²University of Cyprus; ³Cyprus Academy of Sciences, Letters and Arts

Estimating the periodicity of a stationary time series via fitting a second order stationary autoregressive (AR(2)) model has been initiated by the seminal paper of Yule(1927). We investigate properties of this procedure when applied to general stationary processes possessing a spectral density with a dominant peak at some frequency λ₀in (0,π).
We show that if the peak of the spectral density is sharp enough (in a way to be specified) then the AR(2) model, which best (in mean square sense) approximates the underlying process, correctly identifies the frequency λ₀.
To investigate consistency properties of the AR(2) based estimator of λ₀, a near to pole framework is adopted. Triangular arrays of stationary stochastic processes are considered that possess a spectral density the peak of which at λ₀ becomes more pronounced as the sample size n of the observed time series increases to infinity. It is shown in this set up, that the AR(2) based estimator achieves a rate of convergence which is larger than the parametric n^-1/2-rate and which can be arbitrarily close to n^-2/3, the best rate that can be achieved by this estimator.

Conditionally specified graphical modeling of stationary multivariate time series

Suhasini Subba Rao¹, Anirban Bhattacharya¹, Jan Johannes²

¹Texas A&M University, United States of America; ²Universiteat Heidelberg, Germany

Graphical models are ubiquitous for summarizing conditional relations in multivariate data. In many applications involving multivariate time series, it is of interest to learn an interaction graph that treats each individual time series as nodes of the graph, with the presence of an edge between two nodes signifying conditional dependence given the others. Typically, the partial covariance is used as a measure of conditional dependence. However, in many applications, the outcomes may not be Gaussian and/or could be a mixture of different outcomes. For such time series using the partial covariance as a measure of conditional dependence may be restrictive. In this article, we propose a broad class of time series models which are specifically designed to succinctly encode process-wide conditional independence in its parameters. For each univariate component in the time series, we model its conditional distribution with a distribution from the exponential family. We develop a notion of process-wide compatibility under which such conditional specifications can be stitched together to form a well-defined strictly stationary multivariate time series. We call this construction a conditionally exponential stationary graphical model (CEStGM). A central quantity underlying CEStGM is a positive kernel which we call the interaction kernel. Spectral properties of such positive kernel operators constitute a core technical foundation of this work. We establish process-wide local and global Markov properties of CEStGM exploiting a Hammersley-Clifford type decomposition of the interaction kernel. Further, we study various probabilistic properties of CEStGM and show that it is geometrically mixing. An approximate Gibbs sampler is also developed to simulate sample paths of CEStGM.

4:00pm - 6:00pm

Topics in functional data analysis
Location: 1.012
Session Chair: Siegfried Hörmann

Measuring dependence between a categorical response and a functional covariate

Siegfried Hörmann

Graz University of Technology, Austria

We suggest a dependence coefficient between a categorical variable and some general variable taking values in a metric space. In particular, this framework includes functional data.

We derive important theoretical properties and study the large sample behaviour of our suggested estimator. Moreover, we develop an independence test and prove that it is consistent against any violation of independence. The test is also applicable to the classical $K$-sample problem with possibly high- or infinite-dimensional distributions.

Rate-optimal estimation for synchronously sampled functional data

Hajo Holzmann

Philipp-Universität Marburg, Germany

We obtain minimax-optimal convergence rates in the supremum norm,
including information-theoretic lower bounds,
for estimating the covariance kernel as well as principle component basis functions
of a stochastic process which is repeatedly observed at discrete,
synchronous design points.
We focus on the supremum norm instead of the simpler $L_2$ norm,
since it corresponds to the visualization of the estimation error and
forms the basis for the construction of uniform confidence bands.
For dense design, assuming Hölder-smooth sample paths we obtain
the $sqrt n$-rate of convergence in the supremum norm without
additional logarithmic factors which typically occur in the results
in the literature.
Surprisingly, for the covariance kernel, in the transition from dense to
sparse design the rates do not reflect the two-dimensional nature
of the covariance kernel but correspond to those for univariate mean function
estimation.
Our estimation method can make use of higher-order smoothness of the
covariance kernel away from the diagonal, and does not require the same smoothness
on the diagonal itself. Hence, our results
cover covariance kernels of processes with rough, non-differentiable sample paths.
Moreover, the estimator does not use mean function estimation to form residuals,
and no smoothness assumptions on the mean have to be imposed.
In the dense case we also obtain central limit theorems in the supremum norm,
both for the covariance kernel and the principle component basis functions,
which can be used as the basis for the construction of uniform confidence sets.
Simulations and real-data applications illustrate the practical usefulness of the methods.

Beyond the positive drift: Comparing historical and current daily temperature patterns based on two sample statistics for unbalanced dense-sparse functional data

Kevin Wilk, Hajo Holzmann

Marburg University, Germany

The two-sample problem for functional data is investigated for discrete, synchronous designs in each sample, in settings in which one sample is densely observed while the other is only relatively sparsely observed. This is motivated by comparing historical and more current daily temperature patterns, where more recent devices take measurements every 10 minutes, while historical measurements in the time period 1952 to 1972 are available only every hour. We use recently developed methods from transfer learning for functional data to estimate the difference of the mean functions at optimal rates in the supremum norm. Further, we derive a central limit theorem in the space of continuous functions and discuss the construction of uniform confidence bands using the multiplier bootstrap. We also show how our methods can be extended to functional time series.
In the application to daily temperature patterns we decompose the mean difference function into a daily average - the normalized integral of the mean difference function - as well as into the deviation from the average value. Using the developed inferential methodology we show that not only the daily average temperatures in each month have increased significantly, but also that the daily temperature patterns have changed for most months, with night temperature remaining relatively stable while daily temperatures increased beyond the daily average increase.

7:30pm - 10:00pm

Dinner

Date: Friday, 20/Mar/2026

8:50am - 10:20am

Time Series Econometrics
Location: 0.001
Session Chair: Carsten Jentsch

Pitfalls of Inference in Panels with Cross-Dependence of Uncertain Strength

Daria Ovsyannikova, Matei Demetrescu

TU Dortmund, Germany

When panel data exhibit cross-sectional dependence, particular care is required, as cross-dependence may be induced by omitting relevant variables. If these variables correlate with the regressors, rendering them endogenous, sophisticated approaches such as the CCE approach or the PC estimator are recommended. These approaches may however be difficult to implement or build on strong assumptions. Therefore, if regressor endogeneity can reasonably be excluded, it is common to resort to simpler estimators in conjunction with panel-robust standard errors.
In practice, it is furthermore not uncommon for cross-dependence to be of various strength, ranging from weak (e.g. in spatial data models) to strong (when unobserved factors load onto all units).
We quantify the impact the strength of cross-dependence has on the within estimator and on the associated (panel-robust) tests when unobserved factors in errors are not confounding. Using theoretical justifications and Monte Carlo simulation we find that cross-dependence alone in the errors is not sufficient to impact the fixed-effects estimator or its standard errors. Rather, it is the interaction of error and regressor cross-dependence that causes problems such as reduced convergence rates and distorted significance tests in finite samples.

Structural analysis in matrix-autoregressive models

Christian Wurtz, Carsten Jentsch

TU Dortmund University, Germany

We consider a structural matrix-autregressive (SMAR) model to conduct impulse response analysis for structural shocks to matrix-valued time series. The MAR model of order $p$ offers a parsimonious and interpretable framework for these time series, thus addressing issues of high-dimensionality in corresponding vector-autoregressive (VAR) models. To interpret the dynamics, we resort to impulse response analysis as a popular tool from the SVAR context. Its conclusions rely on the valid identification of structural shocks that are mutually contemporaneously uncorrelated and interpretable. In contrast to the existing literature, the proposed SMAR model enables the identification of multiple structural shocks. To address the restrictive nature of the single-term MAR($p$) model, we discuss the extension to a multi-term SMAR($p$) model as a compromise between the single-term SMAR and the (unrestricted) SVAR model, trading off parsimony against flexibility. We discuss its identification, focusing in particular on issues that arise due to the typical Kronecker-product structure of the coefficient matrices in the MAR framework. Further, we discuss estimation and inference in the general multi-term SMAR($p$) model, including a bootstrap method to compute confidence bands for the impulse response curves. In this context, a key point concerns model misspecification and the use of MAR models to approximate more general SVAR data generating processes. Finally, we demonstrate the performance and practical use of our approach by Monte Carlo simulations and a real data application.

Specification Tests for Vector Multiplicative Error Models

Šárka Hudecová

Charles University, Czech Republic

Vector Multiplicative Error Models (vMEMs) provide a flexible framework for modeling multivariate non-negative time series. Within this framework, each variable is expressed as the product of its conditional mean—modeled as a function of past observations—and a positive innovation with unit expectation. Consequently, the model can capture dynamic cross-dependencies and have proven useful in applications such as modeling durations, volatilities, and trading volumes. This contribution focuses on goodness-of-fit (GOF) tests for vMEMs, aiming to assess whether the model structure and the assumed innovation distribution adequately reflect the properties of the observed data. We propose a GOF test statistic and derive its asymptotic distribution under the null hypothesis. The performance of a bootstrap version of the test is illustrated through Monte Carlo simulations.

8:50am - 10:20am

Discrete time series
Location: 0.002
Session Chair: Christian H. Weiß

A universal time series model (for discrete data)

Malte Jahn

Helmut Schmidt University Hamburg, Germany

A novel time series framework is proposed which addresses all relevant empirical properties of a time series, making it an essentially universal model. More specifically, the dynamics in all conditional moments of a suitable continuous or discrete distribution are modeled jointly and without the need to make restrictive assumptions about the functional form of the link functions. Furthermore, all considered explanatory variables are allowed to exhibit nonlinear and potentially time-varying effects on the conditional moments. This can be achieved by employing a simple feedforward neural network with a single hidden layer and an output for each conditional moment (parameter). In contrast to many (deep) neural network approaches, the proposed model is stochastically interpretable and allows for the calculation of standard errors, and in particular, confidence intervals. Many conventional time series frameworks such as (integer-valued) GARCH can be interpreted as simplified special cases of the proposed model. Several empirical applications are presented to illustrate the capabilities and the implementation.

A Feature-Based Approach to Generate Time Series of Counts

Maria Eduarda Silva¹, Isabel Silva², Isabel Pereira³

¹LIAAD INESC TEC, Faculdade de Economia da Universidade do Porto; ²Universidade de Aveiro, CIDMA; ³Faculdade de Engenharia da Universidade do Porto, CIDMA

Research on count time series has grown substantially, leading to the development of numerous models designed to capture key characteristics such as trends, seasonality, overdispersion, outliers, and complex dependence structures. Despite these advances, the evaluation of such models remains challenging due to the limited availability of real-world count time series. This scarcity often forces researchers to illustrate new methods using only a few datasets, which restricts systematic comparison and hinders robust performance assessment. Addressing this gap is essential for advancing methodological development and ensuring practical applicability in diverse domains.
In this presentation, we focus on the generation of synthetic count time series that can exhibit a wide range of controlled behaviors. Our approach relies on Mixture INAR (MixINAR) models and a set of descriptive features designed to summarize key properties of count data. We first discuss which features are most suitable for characterizing count time series and then show how they can be used to guide the simulation of diverse collections of synthetic data. By comparing the simulated series with commonly used benchmark datasets, we illustrate how realistic and flexible the generated time series can be. Overall, this work provides a practical way to create representative collections of count time series, supporting more thorough testing and comparison of models in this area.

This work is financed by National Funds through the FCT - Fundação para a Ciência e a Tecnologia, I.P. (Portuguese Foundation for Science and Technology) within the project TSP2Net, with reference 2023.13039.PEX, https://doi.org/10.54499/2023.13039.PEX

A new class of generalized INARMA models: estimation and testing against INGARCH alternatives

Johannes Bracher, Barbora Němcová

Karlsruhe Institute of Technology, Germany

INAR and INGARCH-type processes are widely used approaches to model time series of counts. In this talk, I will speak about a class of generalized INARMA (integer-valued autoregressive) models which contains both of the aforementioned types of models as special cases. Notably I will outline a generalization of the INAR model which parallels the extension of the INARCH to the INGARCH process. Special attention is given to inference questions. These include maximum likelihood, moment-based and Gaussian quasi-likelihood techniques for parameter estimation. Moreover, I will discuss various testing problems. The developed methods are illustrated in simulation studies and a data example on childhood diseases in the German state of Bavaria.

8:50am - 10:20am

High-dimensional statistics and learning
Location: 0.004
Session Chair: Martin Wahl

Self-regularized learning methods

Max Schölpple

University of Stuttgart, Germany

We introduce a new framework for the theoretical analysis of learning algorithms called self-regularization. In a nutshell, self-regularized learning algorithms guarantee implicitly that they produce sufficiently regular prediction functions. Central examples of self-regularized learning algorithms include gradient descent and regularized empirical risk minimization. We establish a general theory for the statistical analysis of self-regularized algorithms which in many cases yields minmax-optimal learning rates.

Max Schölpple, Ingo Steinwart

Institut für Stochastik und Anwendungen, Universität Stuttgart, Pfaffenwaldring 57,
70569 Stuttgart

Concentration and moment inequalities for heavy-tailed random matrices

Moritz Jirak

universität wien, Austria

Fuk-Nagaev and Rosenthal-type inequalities are proven for the sums of independent random matrices, focusing on the situation when the norms of the matrices possess finite moments of only low orders. The bounds depend on the intrinsic dimensional characteristics, such as the effective rank, as opposed to the dimension of the ambient space. The advantages of such results are illustrated in several applications, including new moment inequalities for sample covariance matrices and the corresponding eigenvectors of heavy-tailed random vectors.

Authors:

Moritz Jirak, Stanislav Minsker, Yiqiu Shen, Martin Wahl

Laplacian eigenmaps for bounded manifolds and the Neumann Laplacian

Petr Zamolodtchikov, Martin Wahl

Universität Bielefeld, Germany

The spectrum of the Laplace-Beltrami operator encodes essential geometric information about a smooth manifold. In practice, the manifold is unknown, but supports a finite sample of random points. It is then standard to approximate its spectrum by the spectrum of the resulting graph Laplacian. When the manifold is bounded, it is known that the graph Laplacian eigen-converges to the Neumann Laplacian. However, finite sample results, such as convergence rates, are still lacking, and are at the center of this talk.

8:50am - 10:20am

Contributions to Mathematical Statistics
Location: 1.002
Session Chair: Mathias Trabs

Local polynomial estimation of quantile density functions

Niclas Jacobsen, Natalie Neumeyer

University of Hamburg, Germany

A new approach for nonparametric estimation of quantile density functions based on
local polynomial estimation is presented. Estimation of the quantile density is important
because it appears for example in the expression for the asymptotic variance of empirical
and kernel type estimators of the quantile function.

The new approach uses a local polynomial regression on (F_n(X_i), Q_n(F_n(X_i))), where F_n
and Q_n represent the empirical distribution function and the empirical quantile function
respectively.

The new approach has more advantageous properties at the boundary than classical quan-
tile density estimators. We present a result on the asymptotic normality of the proposed
estimator. In this context we also get the leading bias term and the asymptotic variance,
which we use to compare the new estimator to classical estimators, especially focusing on
the boundary.

Keywords: asymptotic normality, bias rates, boundary adaptation, empirical quan-
tile function, nonparametric function estimator

Model checks for copula regression

Philip Dörr, Holger Dette

Ruhr-Universität Bochum, Germany

There is a great variety of statistical models expressing relations between response variables of interest and explanatory variables, ranging from classical conditional mean regression to fully distributional regression models. We are particularly interested in expressing regression models by means of copulas which are a valuable tool to separate marginal distributions and dependencies. New goodness-of-fit tests and new measures of deviation can be developed based on such copula representations. These tests are desirable since regression models often impose parametric or semiparametric assumptions to overcome the curse of dimensionality, running a risk of misspecification. We present a new goodness-of-fit test for the classical mean regression model. More importantly, we also introduce a new measure of deviation between the true regression function and the imposed parametric assumption. By self-normalization, we develop pivotal inference for this measure including tests for relevant hypotheses. These inference tools are illustrated via simulated and empirical data.

Rank-based association measures for zero-inflated data

Jasper Arends¹, Guanjie Lyu², Mhamed Mesfioui³, Elisa Perrone¹, Julien Trufin⁴

¹Eindhoven University of Technology, the Netherlands; ²University of Windsor, Canada; ³University of Quebec in Trois-Rivères, Canada; ⁴Université Libre de Bruxelles, Belgium

Rank-based association measures, including Spearman’s rho, Gini’s gamma and Spearman’s footrule, are well established in continuous settings, but become problematic when ties are present. We investigate these measures in context of zero-inflated data, where continuous random variables have an increased probability mass at zero and there is a substantial number of ties. Such data is commonly found in fields such as insurance, health care and weather forecasting. Traditional rank-based estimators exhibit a large bias in these settings. To overcome this problem, we derive new formulations of the association measures and propose plug-in estimators. In a simulation study, we show that these outperform state-of-the-art estimators. Additionally, we make the estimator interpretable by deriving its achievable bounds.

8:50am - 10:20am

Random Matrix Theory
Location: 1.012
Session Chair: Nestor Parolya

Nonlinear higher-order shrinkage estimation of the large dimensional covariance and precision matrices

Nestor Parolya¹, Taras Bodnar², Alexis Derumigny¹

¹Delft University of Technology, Netherlands, The; ²Linköping University, Sweden

In this paper, we develop nonlinear higher-order shrinkage estimators for both covariance and precision matrices. Our framework applies to settings in which the sample size n is either larger or smaller than p, the dimensionality of the data-generating process. The proposed estimators incorporate higher-order moments up to an arbitrary order and therefore encompass linear shrinkage estimators as special cases. We derive recursive representations of these higher-order nonlinear shrinkage estimators using partial exponential Bell polynomials.

Through simulation studies, the proposed methods are compared with the oracle nonlinear shrinkage estimator and are shown to be particularly effective in settings where no closed-form expressions for nonlinear shrinkage estimators are available. The theoretical derivations rely on mild assumptions on the underlying model, including the existence of fourth moments and a bounded spectrum of the true population covariance matrix. The finite-sample performance of the proposed estimators is evaluated in an extensive simulation study and benchmarked against existing approaches. Our main finding is that the higher-order shrinkage estimators can outperform well-established nonlinear shrinkage methods, particularly when the concentration ratio p/n is large.

Monitoring for a phase transition in a time series of Wigner matrices

Nina Dörnemann¹, Piotr Kokoszka², Tim Kutta¹, Sunmin Lee²

¹Aarhus University, Denmark; ²Colorado State University

We develop methodology and theory for the detection of a phase transition in a time-series of high-dimensional random matrices. In the model we study, at each time point $ t = 1,2,ldots $, we observe a deformed Wigner matrix $ mathbf{M}_t $, where the unobservable deformation represents a latent signal. This signal is detectable only in the supercritical regime, and our objective is to detect the transition to this regime in real time, as new matrix--valued observations arrive.
Our approach is based on a partial sum process of extremal eigenvalues of $mathbf{M}_t$, and its theoretical analysis combines state-of-the-art tools from random-matrix-theory and Gaussian approximations. The resulting detector is self-normalized, which ensures appropriate scaling for convergence and a pivotal limit,,without any additional parameter estimation. Simulations show excellent performance for varying dimensions. Applications to pollution monitoring and social interactions in primates illustrate the usefulness of our approach.

Central limit theorems for linear eigenvalue statistics of random geometric graphs

Moritz Otto

Leiden University, Netherlands, The

Random geometric graphs provide a fundamental model for spatially embedded networks, yet their spectral fluctuations remain poorly understood. In this talk, I will present the first rigorous results on Gaussian fluctuations of linear eigenvalue statistics for such graphs. Specifically, we establish central limit theorems for quantities of the form $mathrm{Tr}[phi(A)]$, where $A$ denotes the adjacency matrix and $phi$ belongs to a broad class of test functions, including non-polynomial functions. In the polynomial setting, we go further and prove a quantitative central limit theorem with an explicit rate of convergence to the limiting Gaussian distribution. I will also discuss extensions of these results to other canonical spatial networks, such as $k$-nearest neighbor graphs and relative neighborhood graphs. Together, these results highlight new mechanisms governing spectral fluctuations in random spatial structures and reveal a delicate interplay between geometry, local dependence, and spectral behavior.

The talk is based on joint work with Christian Hirsch (Aarhus) and Kyeongsik Nam (Seoul).

10:20am - 10:50am

Coffee break 5

10:50am - 11:50am

Time Series Econometrics
Location: 0.001
Session Chair: Carsten Jentsch

A two-sample smooth test for multivariate dependent data

Eric Beutner

Vrije Universiteit Amsterdam, Netherlands, The

In this talk, we consider a two-sample smooth test for testing the equality of multivariate distributions. Dependency between the two samples is allowed for. For instance, the data can be mixing. The asymptotic distribution under the null hypothesis is derived, and consistency of the two-sample smooth test for dependent samples is shown.

Satterthwaite Approximation and Gaussian Time Series

Gabriel Bailly¹, Yvik Swan², Rainer von Sachs¹

¹UCLouvain, Belgium; ²Université Libre de Bruxelles, Belgium

Satterthwaite (1941, 1946) proposed a very simple approximation to the distribution of linear combinations of Chi-squared random variables. It can be used in univariate time series analysis to approximate the distribution of the sample variance and the periodogram of Gaussian time series; we provide Wasserstein bounds and rates of convergence of the approximation towards the true distribution. Similarly, Tan & Gupta (1983) proposed an approximation to the distribution of linear combinations of Wishart random matrices. This, however, has not yet been applied to the framework of multivariate time series: we take advantage of a special case of the matrix normal distribution to propose a feasible approximation to the distribution of the sample covariance matrix of Gaussian time series.

10:50am - 11:50am

Discrete time series
Location: 0.002
Session Chair: Christian H. Weiß

Estimating parameters for long-range dependence via ordinal patterns

Alexander Schnurr¹, Annika Betken², Herold Dehling³, Ines Nüßgen¹

¹Siegen University, Germany; ²University Twente, The Netherlands; ³Ruhr University Bochum, Germany

The ordinal structure of long-range dependent time series is analyzed. To this end, so-called ordinal patterns are used, which describe the relative position of consecutive data points. Two estimators are provided for the probabilities of ordinal patterns and we prove limit theorems in different settings, namely for funtions of Hermite Rank 1 and 2. In the second setting, a Rosenblatt distribution in the limit is encountered. In the context of fractional Gaussian noise, the limit distribution is derived for an estimation of the Hurst parameter H if it is higher than 3/4. Thus, the theorems complement results for lower values of H, which can be found in the literature.

Transcripts and Algebraic Distances in Time Series: Stochastic Properties and Nonparametric Dependence Tests

Christian H. Weiß¹, José M. Amigó²

¹Helmut Schmidt University, Hamburg, Germany; ²Universidad Miguel Hernández, Elche, Spain

The use of ordinal patterns (OPs) for analyzing the dependence structure of univariate and continuously distributed processes has gained popularity in recent years. Here, we go one step further and consider the transcripts being computed from successive OPs in the time series. Transcripts constitute a kind of "difference" between successive OPs and thus naturally relate to two algebraic distances between OPs, the Cayley and Kendall distance. We transform the original time series into a sequence of transcripts or distances, respectively, and derive important stochastic properties thereof. We show that these properties differ substantially between different types of original process. This motivates to develop various statistics based on transcripts and algebraic distances in order to investigate the dependence structure of the original process. In particular, we derive the asymptotic distribution of these statistics under the null hypothesis of serial independence, which is then used to develop nonparametric tests for serial dependence. A simulation study shows that these novel dependence tests have appealing power properties, often outperforming the former OP-based dependence tests. We conclude with a real-world data example, where we illustrate the application and interpretaion of the proposed approaches in practice.

10:50am - 11:50am

Inference in Wasserstein Spaces and Optimal Transport
Location: 0.004
Session Chair: Ansgar Steland

Sliced-Wasserstein distance based change detection with sequential empirical processes

Florian Scholze^1,2, Fabian Mies³, Ansgar Steland²

¹University of Bamberg; ²RWTH Aachen University; ³Delft University of Technology

We study the problem of detecting changes in the marginal distributions of a multivariate time series with a novel CUSUM-type detector statistic based on the (maximum-) sliced-Wasserstein distance. This projection-based approach has two appealing properties. Firstly, unlike the family of Wasserstein distances, it does not suffer from the curse of dimensionality. And secondly, by means of the Kantorovich duality, asymptotic properties of the so-defined detector statistic can be derived from results for (sequential) empirical processes for nonstationary time series. This talk presents new weak limit theorems for sequential empirical processes under the functional dependence measure and their application to the given testing problem. Practical implications, limitations and possible extensions are discussed.

Distributional Convergence of Empirical Entropic Optimal Transport and Applications

Santiago Arenas Velilla, Axel Munk, Luis Alberto Rodríguez Ramírez

Georg August Universität Göttingen, Germany

The statistical properties of empirical entropic optimal transport (empirical EOT) have attracted great interest, as this quantity has been shown to be useful for complex data analysis, among other reasons due to its computational efficiency. In several applications, it has been realized that in addition to the optimal value, also the EOT plan carries important information. For example, in cell biology, colocalization analysis based on the EOT plan has been introduced as a measure for quantification of spatial proximity of different protein assemblies. Despite recent progress in the analysis of its risk properties, a precise understanding of its statistical fluctuations to make it accessible for inference remains elusive to some extent. We derive asymptotic weak convergence result for a large class of functionals of the EOT plan, in which the colocalization process is included. As an application, we obtain uniform confidence bands for colocalization curves and bootstrap consistency. Our theory is supported by simulation studies and is illustrated by real world data analysis from mitochondrial protein colocalization.

11:55am - 12:55pm

Plenary Lecture 4
Location: 0.004

Unlocking the Regression Space

Liudas Giraitis

Queen Mary University of London, United Kingdom

This paper introduces and analyzes a framework that accommodates general heterogeneity in regression modeling. It demonstrates that regression models with fixed or time-varying parameters can be estimated using OLS and time-varying OLS methods, respectively, across a broad class of regressors and noise processes not covered by existing theory. The proposed setting facilitates the development of asymptotic theory and the estimation of robust standard errors. The resulting robust confidence interval estimators accommodate substantial heterogeneity in both regressors and noise. The robust standard error estimates coincide with White’s (1980) heteroskedasticity-consistent estimator but apply under much broader conditions, including models with missing data. The methods are computationally simple and perform well in Monte Carlo simulations, making them highly suitable for empirical applications. The paper also provides a brief empirical illustration.

12:55pm - 1:00pm

Closing
Location: 0.004

1:00pm - 2:00pm

Lunch break 3