Conference Agenda
Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).
|
Agenda Overview |
| Date: Wednesday, 18/Mar/2026 | |
| 8:50am - 9:00am | Opening Location: 0.004 |
| 9:00am - 10:00am | Plenary Lecture 1 Location: 0.004 |
|
|
A unified theory of order flow, market impact and volatility Ecole Polytechnique, France We propose a microstructural model for the order flow in financial markets that distinguishes between core orders and reaction flow, both modeled as Hawkes processes. This model has a natural scaling limit that reconciles a number of salient empirical properties: persistent signed order flow, rough trading volume and volatility, and power-law market impact. In our framework, all these quantities are pinned down by a single statistic H_0, which measures the persistence of the core flow. Specifically, the signed flow converges to the sum of a fractional process with Hurst index H_0 and a martingale, while the limiting traded volume is a rough process with Hurst index H_0-1/2. No-arbitrage constraints imply that volatility is rough, with Hurst parameter 2H_0-3/2, and that the price impact of trades follows a power law with exponent 2-2H_0. The analysis of signed order flow data yields an estimate H_0 close to 3/4. This is not only consistent with the square-root law of market impact, but also turns out to match estimates for the roughness of traded volumes and volatilities remarkably well. |
| 10:00am - 10:40am | Coffee break 1 |
| 10:40am - 12:10pm | Statistics in natural sciences and technology Location: 0.001 Session Chair: Gaby Schneider Session Chair: Ansgar Steland |
|
|
Self-Normalization for CUSUM-based Change Detection in Locally Stationary Time Series FH Aachen, Germany
A novel self-normalization procedure for CUSUM-based change detection in the mean of a locally stationary time series is introduced. Classical self-normalization relies on the factorization of a constant long-run variance and a stochastic factor. In this case, the CUSUM statistic can be divided by another statistic proportional to the long-run variance, so that the latter cancels. Thereby, a tedious estimation of the long-run variance can be avoided.
Under local stationarity, the partial sum process converges to $int_0^t sigma(x) dBx$ and no such factorization is possible. To overcome this obstacle, a self-normalized test statistic is constructed from a carefully designed bivariate partial-sum process. Weak convergence of the process implies that the resulting self-normalized test attains asymptotic level α under the null hypothesis of no change, while being consistent against a broad class of alternatives. Extensive simulations demonstrate better finite-sample properties compared to existing methods. Applications to real data illustrate the method’s practical effectiveness.
Prior shift estimation for positive unlabeled data through the lens of kernel embedding 1Warsaw University of Technology, Poland; 2Institute of Computer Science; 3Nicolas Copernicus University We study estimation of a class prior for unlabeled target samples which possibly differs from that of source population. Moreover, it is assumed that the source data is partially observable: only samples from the positive class and from the whole population are available (PU learning scenario). We introduce a novel direct estimator of the class prior which avoids estimation of posterior probabilities in both populations and has a simple geometric interpretation. It is based on a distribution matching technique together with kernel embedding in Reproducing Kernel Hilbert Space and is obtained as an explicit solution to an optimisation task. We establish its asymptotic consistency as well as an explicit non-asymptotic bound on its deviation from the unknown prior, which is calculable in practice. We study finite sample behaviour for synthetic and real data and show that the proposal works consistently on par or better than its competitors. Asymptotic studies of adapted threshold detectors based on density processes RWTH Aachen University, Germany Control statistics are widely used to monitor the quality of processes in various fields such as industry, healthcare, and machine learning. These statistics give an alarm when observed data exceed a threshold, traditionally set as a constant value to maintain a desired false alarm rate. Now we want to focus on a new setting: When monitoring a sequence of observations, there may be additional information that potentially affects the law of the observations, and we would like to change the design by using adapted thresholds, which are functions of the additional information. |
| 10:40am - 12:10pm | Discrete time series Location: 0.002 Session Chair: Christian H. Weiß |
|
|
Overview of the STINARMA Class of Models and its STINAR and STINMA Subclasses 1Institute of Electronics and Informatics Engineering of Aveiro (IEETA) and Department of Electronics, Telecommunications and Informatics (DETI), University of Aveiro, Aveiro, Portugal; Intelligent Systems Associate Laboratory (LASI), University of Aveiro, Portugal.; 2Center for Computational and Stochastic Mathematics (CEMAT), Department of Mathematics, IST, University of Lisbon, Lisbon, Portugal; 3Department of Mathematics and Statistics, Helmut Schmidt University, Hamburg, Germany Spatio-temporal count data arise in many applied fields, where observations are collected over time across multiple spatial units. In these settings, it is crucial to jointly capture temporal and spatial dynamics. The spatio-temporal integer-valued autoregressive and moving average (STINARMA) class of models provides a flexible framework to address these challenges within the class of integer-valued processes. This work presents an overview of the STINARMA class of models, together with its main subclasses, those of the STINAR and STINMA models.The STINARMA can be viewed as the natural spatio-temporal extension of univariate INARMA models. Moreover, they are the integer counterpart of the continuous STARMA models, which is achieved by replacing the multiplication operator with the matrix binomial thinning operator and by considering component-wise independent discrete innovations. The general class of STINARMA models is introduced, followed by a discussion of its autoregressive and moving average subclasses. Key probabilistic properties are briefly presented through first- and second-order moments. Estimation approaches based on the method of moments, conditional least squares and conditional maximum likelihood are also outlined. The practical relevance of the STINARMA class is illustrated using spatio-temporal health data from Portugal and Germany, and its performance is compared with multivariate models that do not explicitly account for spatial dependence. References Martins, A., Scotto, M. G., Weiß, C. H., Gouveia, S. Space-time integer-valued ARMA modelling for time series of counts, Electronic Journal of Statistics, 17 (2), (2023), 3472-3511. Franke, J. Subba Rao, T. Multivariate First-Order Integer-Valued Autoregressions, Technical Report, University of Kaiserslaute, (1993). Pfeifer P. E., Deutsch S. J., A Three-Stage Iterative Procedure for Space-Time Modeling, Technometrics, 22 (1), (1980), 35-47. Steutel, F. W., Van Harn, K., Discrete Analogues of Self-Decomposability and Stability, The Annals of Probability, 7 (5), (1979), 893-899 Integer-valued random field models Helmut-Schmidt-Universität, Germany Ghodsi et al. (2012) have introduced the first-order integer-valued autoregressive model for count random fields as a planar analogue of the classical INAR(1) model, designed for count data observed on a regular lattice. We extend this framework to higher-order dependence structures and derive key stochastic properties of the resulting models. Building on this approach, we further propose two additional count random field models: the CINAR random field model and the INMA random field model. For each model, we investigate fundamental properties and provide a comparative analysis highlighting their respective strengths and limitations. Ghodsi, A., Shitan, M., & Bakouch, H. S. (2012). A first-order spatial integer-valued autoregressive SINAR (1, 1) model. Communications in Statistics-Theory and Methods, 41(15), 2773-2787. Influence network reconstruction from discrete time-series of count data modelled by multidimensional Hawkes processes University of Surrey, United Kingdom Identifying key influencers from time series data without a known prior network structure is a challenging problem in various applications, from crime analysis to social media. While much work has focused on event-based time series (timestamp) data, fewer methods address count data, where event counts are recorded in fixed intervals. We develop network inference methods for both batched and sequential count data. Here the strong network connection represents the key influences among the nodes. We introduce an ensemble-based algorithm, rooted in the expectation-maximization (EM) framework, and demonstrate its utility to identify node dynamics and connections through a discrete-time Cox or Hawkes process. For the linear multidimensional Hawkes model, we employ a minimization-majorization (MM) approach, allowing for parallelized inference of networks. For sequential inference, we use a second-order approximation of the Bayesian inference problem. Under certain assumptions, a rank-1 update for the covariance matrix reduces computational costs. We validate our methods on synthetic data and real-world datasets, including email communications within European academic communities. Our approach effectively reconstructs underlying networks, accounting for both excitation and diffusion influences. This work advances network reconstruction from count data in real-world scenarios. |
| 10:40am - 12:10pm | Multivariate Statistics and Copulas Location: 0.004 Session Chair: Sebastian Fuchs |
|
|
Measures and Models of Non-Monotonic Dependence 1University of York, United Kingdom; 2McGill University, Montreal, Canada; 3University College Dublin, Ireland We propose a margin-free measure of bivariate association generalizing Spearman’s rho to the case of non- monotonic dependence that is defined in terms of two square integrable functions on the unit interval. We investigate properties of generalized Spearman correlation when the functions are piecewise continuous and strictly monotonic, with particular focus on the special cases where the functions are drawn from orthonormal bases defined by Legendre polynomials and cosine functions. For continuous random variables, generalized Spearman correlation is treated as a copula-based measure and shown to depend on a pair of uniform-distribution-preserving (udp) transformations determined by the underlying functions. We derive bounds for generalized Spearman correlation and we use a novel technique that we refer to as stochastic inversion of udp transformations to construct singular copulas that attain the bounds and parametric copulas with densities that interpolate between the bounds and model different degrees of non-monotonic dependence. We also propose sample analogues of generalized Spearman correlation and investigate their asymptotic and small-sample properties. Potential applications of the theory are demonstrated including: exploratory analyses of the dependence structures of datasets and their symmetries; elicitation of functions maximizing generalized Spearman correlation via expansions in orthonormal basis functions; and construction of tractable probability densities to model a wide variety of non-monotonic dependencies. Multivariate tail dependence: further insights with an application to the Spanish banking sector 1Università del Salento, Italy; 2Universidad de Valladolid, Spain Extending bivariate dependence concepts to higher dimensions is a challenging but essential task for a comprehensive understanding of multivariate dependence. Moreover, measuring overall dependence based on averages across the full domain of the joint distribution may fail to discern changes in dependence across different segments of the distribution, especially in the tails. In order to incorporate these features, we present the multivariate tail concentration function (TCF) as a graphical tool to assess both global and tail dependence. We show that this tool allows to represent multivariate dependence in a 2D plot regardless of the number of dimensions, it quantifies both lower and upper tail dependence at a finite scale, and it relates to multivariate Blomqvist’s beta. We propose to estimate the TCF non-parametrically using two methods and we compare their finite sample performance through a simulation study. To illustrate its practical application, we use the TCF to evaluate co-movements among the six Spanish banks included in the IBEX35 stock index. Multivariate Kendall regression coefficients University of Applied Sciences Merseburg, Germany In multivariate regression analysis, the multiple linear correlation coefficient is a commonly used association measure. This measure focuses on a linear relationship between a response variable and predictor variables. When moving away from the linearity of the functional relationship, then we arrive at Kendall's tau and multivariate versions, among others. In an earlier paper by the author (2021), the Kendall regression coefficient was introduced. Here, we extend the coefficient to vector responses Y and discuss properties of it. The coefficient we introduce describes to what degree the response variable Y can be approximated by a monotonous function of the regressors. These regressors are combined in a random vector. One advantage of this approach is that the association measure is based only on the copula (does not depend on marginal distributions), and is hence robust against outliers. |
| 10:40am - 12:10pm | Data Science Perspectives from Industry Location: 1.002 Session Chair: Rainer Göb |
|
|
Deploying Deep Learning for Real-Time Optical Sorting: A Case Study in Hazelnut Quality Control 1prognostica GmbH; 2IFSYS Integrated Feeding Systems GmbH Optical sorting is widely used in industrial quality control, yet conventional rule-based vision systems often struggle when quality cues are subtle, heterogeneous, or hard to formalize. We present an industry data science case study on deploying deep learning for real-time optical sorting of hazelnuts, driven by the practical need to grade product quality from fine-grained appearance characteristics under strict throughput and latency constraints. The talk traces the path from an early prototype to an industrialized system that has been transferred into a market-ready product and is operated in practice. We summarize the end-to-end solution: multi-camera image acquisition, a supervised learning pipeline built on a representative labeled dataset, domain-specific preprocessing and targeted data augmentation, and a neural image classifier designed for on-premise inference. We emphasize industrial aspects that proved central for making the system operational: formalizing expert grading into maintainable classes, managing imbalance and borderline cases during data preparation, data labeling and training, and setting decision thresholds based on acceptance criteria. We then cover deployment realities for industrial environments, e.g. latency, throughput, robustness, and the interface between the ML component and machine control. Finally, we describe how the solution was productized and extended beyond hazelnuts to additional crops, enabling new application scenarios and market opportunities for the customer. We conclude with practical considerations for lifecycle management and periodic re-calibration. Bridging the Gap: Operational Realities and Emerging Trends in Supply Chain Forecasting prognostica GmbH, Germany While forecasting remains a cornerstone of strategic decision-making, its industrial application involves challenges that extend beyond model accuracy. In the context of supply chain management, a forecast must not only be precise but also interpretable and actionable within specific operational constraints. This talk provides insights into how practitioners bridge the gap between theoretical models and business requirements, focusing on the following key areas:
The presentation demonstrates that the value of Generative AI in forecasting lies not only in potential accuracy gains but also in its capacity to handle unstructured context and significantly improve interactability with the forecasts. By highlighting these real-world requirements and current technical frontiers, the talk seeks to provide practical impulses and identify open questions for further academic research in the field of applied AI and time series analysis. |
| 10:40am - 12:10pm | High-dimensional statistics and learning Location: 1.012 Session Chair: Martin Wahl |
|
|
Supervised classification for Ornstein-Uhlenbeck diffusions with separation condition Humboldt University of Berlin, Germany We study binary supervised classification based on repeated independent observations of continuous sample paths. Our focus is a diffusion classification model in which the features follow an Ornstein-Uhlenbeck process with class-dependent drifts. We consider plug-in classifiers constructed from drift estimators and analyze the performance via the excess risk. Under a separation condition on the drift parameters, we establish upper bounds of the excess risk, which are explicitly parametrized by the separation distance quantifying the difficulty of the problem. Specifically, when the drift distance is bounded away from zero, the plug-in classifiers achieve a fast convergence rate of order n-1 (up to logarithmic factors) in the constant drift scenario. Furthermore, we discuss extensions of this framework to time-inhomogeneous drift functions. The theoretical approach utilizes the Wiener chaos representation and spectral theory to characterize the log-likelihood ratio as a quadratic form of Gaussian random variables, enabling a precise analysis of margin properties and concentration results. This extends the fast-rate results from classification problems with linear and Gaussian white noise models to dynamical diffusion systems with Gaussian structure under separation conditions. Asymptotic Bounds and Online Algorithms for Average-Case Matrix Discrepancy 1Johns Hopkins University, USA; 2FAU Erlangen-Nürnberg, Germany; 3Yale University, USA
We study the matrix discrepancy problem in the average-case setting. Given a sequence of $m times m$ symmetric matrices $A_1,ldots,A_n$, its discrepancy is defined as the minimal spectral norm over all signed sums $sum_{i=1}^n x_iA_i$ with $x_1,ldots,x_n in {pm1}$. Our contributions are twofold. First, we study the asymptotic discrepancy of random matrices. When the matrices belong to the Gaussian orthogonal ensemble, we provide a sharp characterization of the asymptotic discrepancy and show that the limiting distribution is concentrated around $Theta(sqrt{nm}4^{-(1 + o(1))n/m^2})$, under the assumption $m^2 ll n/log{n}$. We observe that the trivial bound $O(sqrt{nm})$ cannot be improved when $n ll m^2$ and show that this phenomenon occurs for a broad class of random matrices. In the case $n = Omega(m^2)$, we provide a matching upper bound. Second, we analyse the matrix hyperbolic cosine algorithm, an online algorithm for matrix discrepancy minimization due to Zouzias~(2011), in the average-case setting. We show that the algorithm achieves with high probability a discrepancy of $O(mlog{m})$ for a broad class of random matrices, including Wigner matrices with entries satisfying a hypercontractive inequality and Gaussian Wishart matrices.
Asymptotic confidence bands for centered purely random forests Karlsruhe Institute of Technology, Germany In this talk we will study asymptotic uniform confidence bands for centered purely random forests in a multivariate nonparametric regression setting. The most popular example in this class of random forests, namely the uniformly centered purely random forests, is well known to suffer from suboptimal rates. Therefore, a new type of purely random forests, called the Ehrenfest centered purely random forests, is proposed which achieves minimax optimal rates. Our main confidence band theorem applies to both random forests. The proof is based on an interpretation of random forests as generalized U-Statistics together with a Gaussian approximation of the supremum of empirical processes. |
| 12:10pm - 1:30pm | Lunch break 1 |
| 1:30pm - 3:30pm | New developments in nonparametric classification and estimation based on the nearest neighbor method Location: 0.001 Session Chair: Hajo Holzmann |
|
|
Chatterjee's graph correlation University of Washington, United States of America This talk will survey recent advances in understanding Chatterjee's nearest neighbor graph-based correlation coefficient. I will introduce, for the first time, a comprehensive theoretical framework for statistical inference based on this coefficient. The framework involves results on asymptotic normality, bias correction, and the (in)consistency of bootstrap methods. Nearest Neighbor Estimates for Dependent Data University of Manitoba, Canada This paper considers the nonparametric estimation problem for a class of nonlinear time series Nearest Neighbor matching: from Average Treatment Effects to Transfer Learning ENSAI-CREST, France Estimating some mathematical expectations from partially observed data and in particular missing outcomes is a central problem encountered in numerous fields such as transfer learning, counterfactual analysis or causal inference. Matching estimators, estimators based on k-nearest neighbors, are widely used in this context. Under suitable regularity conditions, one can show that the variance of such estimators can converge to zero at a parametric rate. However their bias can have a slower rate when the dimension of the covariates is larger than 2. This makes analysis of this bias particularly important. In this paper, we provide higher order properties of the bias. In contrast to the existing literature on this topic, we do not assume that the support of the target distribution of the covariates is strictly included in that of the source, and we discuss two geometric conditions on the support that prevent boundary bias issues. We show that these conditions are much more general than the usual convex support assumption, leading to an improvement of existing results. Furthermore, we show that the matching estimator studied by Abadie and Imbens (2006) for the average treatment effect can be asymptotically efficient when the dimension of the covariates is less than 4, a result only known in dimension 1. Multivariate Root-N-Consistent Smoothing Parameter Free Matching Estimators and Estimators of Inverse Density Weighted Expectations 1Universität Rostock, Germany; 2Philipps-Universität Marburg, Germany Expected values weighted by the inverse of a multivariate density or, equivalently, Lebesgue integrals of regression functions with multivariate regressors occur in various areas of applications, including estimating average treatment effects, nonparametric estimators in random coefficient regression models or deconvolution estimators in Berkson errors-in-variables models. The frequently used nearest-neighbor and matching estimators suffer from bias problems in multiple dimensions. By using polynomial least squares fits on each cell of the Kth-order Voronoi tessellation for sufficiently large K, we develop novel modifications of nearest-neighbor and matching estimators which again converge at the parametric root-n-rate under mild smoothness assumptions on the unknown regression function and without any smoothness conditions on the unknown density of the covariates. We stress that in contrast to competing methods for correcting for the bias of matching estimators, our estimators do not involve nonparametric function estimators and in particular do not rely on sample-size dependent smoothing parameters. We complement the upper bounds with appropriate lower bounds derived from information-theoretic arguments, which show that some smoothness of the regression function is indeed required to achieve the parametric rate. Simulations illustrate the practical feasibility of the proposed methods. |
| 1:30pm - 3:30pm | Discrete time series Location: 0.002 Session Chair: Christian H. Weiß |
|
|
Asymptotic Inference for Rank Correlations 1Karlsruhe Institute of Technology; 2Heidelberg Institute for Theoretical Studies; 3Goethe University Frankfurt; 4Helmut-Schmidt-University Kendall's tau and Spearman's rho are widely used tools for measuring dependence. Surprisingly, when it comes to asymptotic inference for these rank correlations, some fundamental results and methods have not yet been developed, in particular for discrete random variables and in the time series case, and concerning variance estimation in general. Consequently, asymptotic confidence intervals are not available. We provide a comprehensive treatment of asymptotic inference for classical rank correlations, including Kendall's tau, Spearman's rho, Goodman-Kruskal's gamma, Kendall's tau-b, and grade correlation. We derive asymptotic distributions for both iid and time series data, resorting to asymptotic results for U-statistics, and introduce consistent variance estimators. This enables the construction of confidence intervals and tests, generalizes classical results for continuous random variables and leads to corrected versions of widely used tests of independence. We analyze the finite-sample performance of our variance estimators, confidence intervals, and tests in simulations and illustrate their use in case studies. Inference for INAR Models with Structural Breaks: Classical and Bayesian Approaches 1Universidade de Aveiro; CIDMA, Portugal; 2ESTGA, Universidade de Aveiro; CIDMA, Portugal; 3Universidade de Aveiro, Portugal Integer-valued autoregressive (INAR) models provide a flexible framework for modeling count time series through thinning operators that emulate autoregressive dynamics while respecting the discrete nature of the data. These models naturally accommodate both equidispersion and overdispersion, features commonly observed in count-valued processes. This paper investigates INAR models with structural breaks, with particular emphasis on the detection and estimation of parameter changes over time—an issue of critical importance in dynamic settings such as epidemics, policy interventions, and other regime-shifting phenomena. We consider both classical and Bayesian inferential approaches for identifying change points and estimating model parameters. The classical framework is based on maximum likelihood estimation, where structural changes are detected using a CUSUM-based procedure, followed by a focused grid search within a window centered around the candidate breakpoint. The Bayesian approach employs advanced Markov Chain Monte Carlo (MCMC) techniques, incorporating hidden Markov chains to model latent regimes and infer structural shifts probabilistically. A comprehensive simulation study is conducted under a variety of scenarios, including differing regime lengths and sample size proportions, and distributional characteristics. Finally, the proposed methodologies are illustrated through an application to real-world health indicator data, demonstrating their practical effectiveness in capturing complex dynamics and structural changes in count time series. Model diagnostics and semi-parametric inference for count time series 1TU Dortmund University, Germany; 2TU Dortmund University, Germany; 3Helmut-Schmidt-University Hamburg, Germany; 4Cyprus Academy of Sciences, Letters, and Arts, Cyprus For modeling the serial dependence in discrete-valued time series, various approaches have been proposed in the literature. In particular, models based on a recursive, autoregressive-type structure such as the integer-valued autoregressive (INAR) models for count time series are very popular in practice. While their estimation typically relies on purely parametric approaches that impose restrictive assumptions on the innovation distribution, we consider semi-parametric estimation techniques that jointly estimate the autoregressive coefficients and the innovation distribution without requiring parametric specification. Building on this, we propose a general semi-parametric bootstrap procedure for INAR models and prove its consistency for general classes of statistics that are functions of the estimated model coefficients and the estimated innovation distribution. This semi-parametric bootstrap approach can be leveraged for various statistical tasks such as goodness-of-fit testing, predictive inference, and dispersion analysis. Additionally, we introduce novel semi-parametric goodness-of-fit tests tailored for the INAR model class. Relying on the INAR-specific shape of the joint probability generating function, our approach allows for model validation of INAR models without specifying the parametric family of the innovation distribution. We derive the limiting null distribution of our proposed test statistics, prove consistency under fixed alternatives and discuss its asymptotic behavior under local alternatives. Moreover, when it comes to predictive inference for discrete-valued time series, this task cannot be implemented through the construction of prediction intervals as they are generally not able to retain a desired coverage level neither in finite samples nor asymptotically. To address this problem, we propose to reverse the construction principle by considering preselected sets of interest and estimating the corresponding predictive probability. The accuracy of this prediction is then evaluated by quantifying the uncertainty associated with the estimation of these predictive probabilities. In this context, we consider parametric and non-parametric approaches and derive asymptotic as well as bootstrap theory, which also covers the practically important case of model misspecification. Nonparametric symmetry tests for integer-valued time series Friedrich-Schiller-Universität Jena, Germany During the last years, there have been many proposals for modelling integer-valued time series. We propose tests of hypotheses related to certain symmetry and antisymmetry properties. For example, we consider the hypotheses that the conditional mean is an odd function or that the conditional variance is an even function. The proposed test statistics are nonparametric and have non-standard limit distributions. We show that the wild bootstrap offers a simple method of generating asymptotically correct critical values. The talk is based on joint work with Paul Doukhan and Christian Weiß. |
| 1:30pm - 3:30pm | Multivariate Statistics and Copulas Location: 0.004 Session Chair: Eckhard Liebscher |
|
|
Characterization of multi-way binary tables with uniform margins and fixed correlations 1Politecnico di Torino, Italy; 2Eindhoven University of Technology, the Netherlands; 3Università di Genova, Italy In many applications involving binary variables, only pairwise dependence measures, such as correlations, are available. However, for multi-way tables involving more than two variables, these quantities do not uniquely determine the joint distribution, but instead define a family of admissible distributions that share the same pairwise dependence while potentially differing in higher-order interactions. In this talk, we introduce a geometric framework to describe the entire feasible set of such joint distributions with uniform margins. We show that this admissible set forms a convex polytope, analyze its symmetry properties, and characterize its extreme rays. These extremal distributions provide fundamental insights into how higher-order dependence structures may vary while preserving the prescribed pairwise information. Unlike traditional methods for table generation, which return a single table, our framework makes it possible to explore and understand the full admissible space of dependence structures, enabling more flexible choices for modeling and simulation. We illustrate the usefulness of our theoretical results through examples and a real case study on rater agreement. Copula robustness in quantitative risk management Saarland University, Germany Characteristics of d-variate risks, such as downside risk measures of aggregate positions or optimal portfolio values, play a central role in financial and actuarial applications. This talk addresses the question of when such characteristics are robust to (small) misspecifications in the copula. DIRECTIONAL FOOTRULE-COEFFICIENTS University of Almería, Spain Measures of association based on ranks, such as Spearman’s footrule[1], play a central role in multivariate statistics due to their robustness and invariance properties. However, classical versions of these coefficients are often unable to capture directional dependence structures that arise in high-dimensional settings. Motivated by this limitation and by the newly defined coefficients described subsequently[2] [3], we introduce a novel family of directional Spearman’s footrule coefficients designed to quantify multivariate dependence along prescribed directions in the unit d-dimensional hypercube. The proposed coefficients are formulated within the framework of copula theory, which allows for a clear separation between marginal behavior and the underlying dependence structure. Our construction extends the classical Spearman’s footrule by incorporating directional information, enabling the detection of dependence patterns that remain undetected by standard measures. We establish a general definition for arbitrary dimensions and directions and investigate its main theoretical properties. In particular, we analyze their behavior under independence and maximal positive dependence, their relation to stochastic orders, as well as their relationship with marginal distributions and lower-dimensional structures. These properties are shown to be consistent with those of the classical footrule coefficient. To facilitate practical implementation, we also introduce nonparametric estimators based on ranks. These estimators are easy to compute and suitable for multivariate data. Their asymptotic behavior is discussed, highlighting consistency and stability properties analogous to those of existing rank-based dependence measures. Several illustrative examples are provided to demonstrate the usefulness of the proposed coefficients. Explicit expressions are derived for well-known families of d-copulas, including the Farlie–Gumbel–Morgenstern and Cuadras–Augé, allowing for a detailed analysis of how directional dependence varies with model parameters. These examples show that the proposed coefficients are able to distinguish different directional dependence patterns even when classical global measures coincide. Overall, this work provides a new tool for directional dependence analysis in multivariate settings, complementing existing rank-based measures and offering a finer understanding of complex dependence structures with applications in finance, reliability, and multivariate risk analysis. [1] Spearman, C. (1906). ‘Footrule’ for measuring correlation. Brithis Journal of Psychology, 2, 89-108. [2] Úbeda-Flores, M. (2004). Multivariate versions of Blomqvist’s beta and Spearman’s footrule. Ann. Inst. Statist. Math., 57(4), 781-788. [3] Decancq, K., Pérez, A., Prieto-Alaiz, M. (2025). Multivariate Dependence Based on Diagonal Sections: Spearman’s Footrule and Related Measures. In: Steland, A., Rafajłowicz, E., Parolya, N. (eds) Stochastic Models, Statistics and Their Applications. SMSA 2024. Springer Proceedings in Mathematics & Statistics, vol 499. Springer, Cham. Estimating Portfolio Risk with Product Copulas: A GARCH-EVT Approach Applied to Financial Data Hochschule Merseburg, Germany This talk introduces a sophisticated GARCH-EVT-Copula framework designed A key innovation presented is the application of product copulas to model the Our empirical analysis demonstrates the superior performance of the product |
| 1:30pm - 3:30pm | Statistics in sports Location: 1.002 Session Chair: Jakob Söhl |
|
|
The Best of Both Worlds: Predicting Coverage Schemes in American Football with Supervised and Unsupervised Learning 1TU Dortmund; 2WU Vienna; 3Bielefeld University Choosing between man and zone coverage is one of the most critical strategic decisions a defensive coordinator must make before each offensive play in American football. In simple terms, in man coverage each defender is assigned to guard a specific offensive player, while zone coverage requires defenders to protect designated areas of the field. This choice fundamentally shapes how the defense reacts to offensive formations and movements. Traditionally, experienced offensive coordinators and quarterbacks rely on visual cues, such as defenders’ alignment or pre-snap motion, to infer these defensive schemes. However, with the increasing availability of high-resolution player tracking data, statistical models can now uncover such tactical patterns quantitatively rather than relying solely on expert intuition. In this project, we first employ an elastic net and an XGBoost classifier to predict whether a defense is in man or zone coverage based on all players’ positions once both teams are set before the snap. The models thus captures spatial configurations that often reveal underlying defensive intentions. In a second step, we incorporate dynamic information from pre-snap player movements. Finally, in a third step, we employ features derived from a hidden Markov model (HMM). Specifically, we use an HMM to represent defenders’ movement trajectories over time. The hidden states correspond to potential offensive players being covered by each defender. From the decoded state sequences, we extract summary statistics, such as the number of state (defender) switches. Including these HMM-based features in the aforementioned models significantly enhances the model’s predictive accuracies. Beyond the pure classification performance, our approach also enables deeper tactical analyses. For instance, it allows us to explore how pre-snap motion helps offenses identify defensive coverages more effectively. Comparing these pre- and post-motion probabilities provides insight into how well offensive movements reveal defensive strategies. Overall, this framework demonstrates how modern machine learning techniques in combination with a statistical model can provide quantitative insights into complex team sports tactics. While developed within an American football context, the methodology may generalize to other sports where spatial positioning and interaction dynamics play similarly crucial roles. Modelling momentum in tennis: A latent-state approach to point outcomes and rally lengths 1Bielefeld University, Germany; 2TU Dortmund, Germany Tennis matches are often characterised by momentum shifts – i.e., changes in match dynamics over time – marked by transitions between phases where either player 1 or player 2 dominates. While dominance is clearly reflected in a player’s point wins, rally lengths provide additional valuable information for modelling momentum; short rallies suggest strong momentum, whereas long rallies and point losses indicate pressure. To effectively model momentum shifts, we hence propose considering both the outcomes of the points and the rally lengths. These sequentially observed outcomes reflect the current dynamics of the match (i.e., the level of pressure a player exerts on their opponent), which we regard as an unobserved state process. Thus, we employ a latent-state approach to investigate these momentum shifts. Specifically, we model the outcomes of server wins and rally lengths jointly using Markov-modulated marked Poisson processes (MMMPPs). This flexible framework allows us to relate the events (server wins or loses the point) and the event times (rally length) to an underlying latent state process, modelled as a continuous-time Markov chain. Its states determine the distribution of the outcomes and can be interpreted as proxies for the players’ momentum. For data from all Grand Slam tournaments from 2016 to 2024, we identify momentum shifts within tennis matches using MMMPPs with two latent states, accounting for player- and match-specific effects such as player rankings and court surfaces. The Accuracy–Complexity Trade-Off in the Expected Threat model for Football 1TU Delft, The Netherlands; 2AFC Ajax, The Netherlands The Expected Threat model is a possession value model in football (soccer) with a Markov chain structure that allows for interpretation and visualization. To create a Markov chain, the pitch is discretized into different Markov states. However, selecting the right discretization of the pitch is still a challenging design choice. A model with more game states can better distinguish between different scenarios, but has less samples per state when estimating the Markov chain. This creates a trade-off between the model complexity in terms of the number of Markov states and the accuracy of the probability estimates. Theoretical analysis of the model gives error bounds, but interpretation of the results indicates that these might be on the conservative side. Simulations provide a more accurate characterization of the model’s error, which is indeed more optimistic than the theoretical bound. Finally, these insights are converted into a practical rule of thumb to help practitioners choose the right balance between the number of Markov states and accuracy of the probability estimates of the Expected Threat model. |
| 1:30pm - 3:30pm | Computational Biostatistics Location: 1.012 Session Chair: Dennis Dobler |
|
|
Computational and Biostatistical Challenges in Polygenic Score Modelling and Gene–Environment Integration 1IUF - Leibniz Research Institute for Environmental Medicine; 2TU Dortmund University Polygenic scores (PGS) quantify genetic predisposition to complex traits and clinical outcomes based on genotype data. This talk addresses recent computational and biostatistical challenges in PGS modelling, including their integration with environmental risk factors. First, training PGS models on high-dimensional and large-scale genotype data with hundreds of thousands of genetic variants and individuals requires scalable yet interpretable statistical learning methods. Second, the transferability of PGS models to diverse populations with different ancestries remains limited, as models are typically trained on cohorts predominantly of European ancestry. Third, the evaluation of predictive performance is complicated by different and sometimes conflicting definitions of the commonly used R-squared measure on test data. To address these challenges, scalable statistical learning approaches for PGS modelling based on individual-level genotype data are presented, including boosting and anchor regression. Finally, open problems and directions for future research are highlighted, with the aim of improving robustness, interpretability and gene–environment integration in personalized medicine. Robust Feature Selection for High-Dimensional Mixtures of Cox Models University of Augsburg, Germany Time-to-event analysis is fundamental for studying patient survival in modern biomedical research, particularly in the presence of high-dimensional covariate information. When survival data are collected over long time horizons, population heterogeneity naturally arises due to evolving clinical practices and patient characteristics. Mixtures of Cox proportional hazards models offer an effective way to account for such heterogeneity by modeling latent subpopulations with distinct risk profiles. In high-dimensional settings, feature selection is crucial for improving model interpretability and predictive performance. This talk presents a robust feature selection approach for mixtures of Cox models based on a combined ℓ1–ℓ2 penalty, which encourages sparsity while stabilizing estimation across mixture components. The resulting optimization problem is non-smooth and challenging to solve within mixture models. We address this challenge by developing an efficient Expectation–Maximization (EM) algorithm that effectively handles the non-smooth penalty structure. Empirical results demonstrate that the proposed method improves patient-specific survival time prediction across heterogeneous populations while achieving stable and interpretable feature selection. A regularized Cox model for selecting interactions and time-varying covariate effects 1Institute for Medical Biometry, Informatics and Epidemiology, Medical Faculty, University of Bonn; 2Department of Mathematics, Informatics and Technology, Koblenz University of Applied Sciences, RheinAhrCampus Remagen, The Cox proportional hazards model is a widely used method for analyzing clinical time-to-event data. In its standard form, the Cox model assumes the covariate effects on the hazard function to be constant over time. However, in many clinical settings, covariate effects may vary with time, and covariate interactions may significantly influence survival. Selecting interactions and time-varying effects within the Cox model framework may be challenging and often requires manual pre-screening followed by model selection steps. These selection steps are often carried out through automated stepwise procedures, which, however, can be unstable or even infeasible—particularly if a large number of potential effects is considered. We introduce a linked-shrinkage adaptive elastic net procedure for selecting two-way interactions and time-varying effects in Cox regression models. The proposed approach integrates an adaptive elastic net with penalty weights derived from an initial ridge regression that includes main effects only. Time-varying effects are modeled as piecewise constant functions. Penalty weights for interactions and time-varying terms are specified using a linked-shrinkage strategy based on the pre-estimated main effects, such that these effects are penalized more strongly than the main effects. We assessed the proposed modeling approach through a simulation study based on Weibull-distributed survival times, incorporating various structures of time-varying covariate effects. Using a simulation study, we compared the proposed method with several established approaches, including the classical elastic net extended to the Cox regression model. Model performance was assessed in terms of the mean squared error (MSE) of the estimated survival probabilities and the accuracy of variable selection. The proposed method reliably identified true time-varying and two-way interaction effects. The true positive rates ranged between 80%-90% depending on the scenario. Compared to standard regularized Cox regression models, the proposed method yielded better performance in terms of MSE and the ability to select informative main/interaction/timevarying effects in a more precise way. Furthermore, we illustrate the proposed approach by analyzing real-world data from the National Cancer Institute Surveillance, Epidemiology, and End Results (SEER) program. By addressing the limitations of manual covariate selection and stepwise procedures, the proposed method extends penalized estimation techniques to Cox regression with time-varying coefficients. Further, it facilitates the simultaneous selection of relevant interaction terms and time-varying covariate effects. Inferring Individual-Level Cell Type-Specific Transcriptomic Profiles from Bulk RNA-Seq Using a Bayesian Hierarchical Model University of North Carolina Wilmington, United States of America The high cost of single-cell sequencing often compels large cohort studies to rely on bulk RNA-seq, which presents challenges in resolving tissue heterogeneity and understanding the roles of individual cell types. In bulk RNA-seq analysis, deconvolution is essential for extracting cell-type-specific information. Most tools focus on estimating cell type proportions, but only a few aim to infer cell-type-specific gene expression profiles (ctsGEPs). Among these, very few estimate ctsGEPs at the individual sample level. The technical challenges of this task highlight the need for more advanced approaches capable of generating accurate individual-level ctsGEP estimates. Such estimates are critical for downstream analyses, including cell-type-specific differential expression and expression quantitative trait locus studies. To address this, we developed a novel deconvolution method to estimate individual-level ctsGEPs and cell type proportions simultaneously from bulk RNA-seq data. Using a hierarchical Bayesian framework, our method captures the stochastic variation of ctsGEPs across individuals. Parameters are estimated via Markov Chain Monte Carlo (MCMC), with hyperparameters optimized for robust inference. We benchmarked our method using 48 in silico mixtures generated from single-cell RNA-seq data of human brain donors. The results demonstrated strong performance, with correlations of ~0.9 for ctsGEP estimates and >0.6 for gene expression variation across samples for ~80% of genes. Our method outperformed existing tools, reducing Root-Mean-Square Errors by ~16%. Additionally, we showcased its application in cell-type-specific differential expression analysis. Our method provides a powerful tool to computationally unravel cell-type-specific expression profiles in bulk RNA-seq data, enabling advances in understanding cellular heterogeneity in biological and pathological contexts. |
| 3:30pm - 4:00pm | Coffee break 2 |
| 4:00pm - 5:00pm | Plenary Lecture 2 Location: 0.004 |
|
|
Statistical Optimal Transport in Action: From Theory to Applications University of Göttingen, Germany While optimal transport has been a long standing mathematical, physical and economic concept for more than two centuries, recent developments in statistics, optimization and machine learning suggests its use as a tool for modern data analysis. Extensions, such as Gromov-Wasserstein transport respect the inner metric structure of data sets and have been proven to be useful for image registration and object matching. In this talk we introduce some basic statistical methods related to optimal transport and illustrate these with examples from cell biology and biometric identification. |
| 5:05pm - 6:35pm | Applied Econometrics Location: 0.001 Session Chair: Yannick Hoga |
|
|
The impact of central bank backstops on sovereign risk premia: Evidence from the ECB's Transmission Protection Instrument 1European Central Bank, Germany; 2European Central Bank, Germany We study the effects of central bank backstops on sovereign risk premia using the Eurosystem’s Transmission Protection Instrument (TPI) announced in July 2022. We develop a nonlinear non-Gaussian state-space model that decomposes euro area sovereign yields into expected short rates, a common term premium, and country-specific default, redenomination, liquidity, and convenience premia. Structural shocks are identified through heteroscedasticity and fat tails. Using euro area data from 2015 to 2025, we extract latent risk premia and assess the impact of the TPI using event-time and differences-in-differences designs. The results show that the TPI primarily increased the convenience value of sovereign bonds and reduced the volatility of a subset of shocks, while leaving other risk premia largely unchanged. Lower convenience-adjusted yields partially dampened the transmission of policy rate hikes to medium-term sovereign yields. Forecast Combination for Tail Risk: Virtues of the Harmonic Mean University of Freiburg, Germany This paper examines the properties of the loss functions used for forecasting Value-at-Risk (VaR) and Expected Shortfall (ES). We show that the weighted arithmetic average commonly used to construct a forecast combination utilises the convexity property of the loss function only in case of Value-at-Risk. This paper introduces a novel forecasting combination approach for Expected Shortfall, which is constructed using weighted harmonic means. We show that only in this case the insurance against model risk is guaranteed. To construct combination weights consistent with this aggregation result, we propose a novel forecast combination for tail risk measures based on the Bagged Pretested Forecast Combination (BPFC) algorithm. The combination weights assigned to candidate models are determined by their predictive performance using the Model Confidence Set (MCS) test. Unlike many traditional combination methods, BPFC adapts to changing market conditions while simultaneously facilitating model selection and improving forecast stability. We evaluate the performance of forecasting combinations for VaR and ES within the framework of consistent loss functions, highlighting the role of convexity in performance improvements. Our results show that the advantages of combining forecasts are especially evident when there is substantial disagreement among candidate models, a situation that commonly arises during turbulent financial periods. To empirically validate our approach, we apply it to a dataset of 90 stocks spanning various market capitalizations and covering periods of severe financial stress, including the Global Financial Crisis and the COVID-19 pandemic. The results illustrate the ability of BPFC to dynamically select and combine the most effective models from a pool of over 60 candidates, continuously adjusting weights based on model’s forecasting performance and evolving market conditions. Systemic Risk Surveillance 1Goethe University Frankfurt, Germany; 2University Duisburg-Essen, Germany Following several episodes of financial market turmoil in recent decades, changes in systemic risk have drawn growing attention. Therefore, we propose surveillance schemes for systemic risk, which allow to detect misspecified systemic risk forecasts in an “on-line” fashion. This enables daily monitoring of the forecasts while controlling for the accumulation of false test rejections. Such online schemes are vital in taking timely countermeasures to avoid financial distress. Our monitoring procedures allow multiple series at once to be monitored, thus increasing the likelihood and the speed at which early signs of trouble may be picked up. The tests hold size by construction, such that the null of correct systemic risk assessments is only rejected during the monitoring period with (at most) a pre-specified probability. Monte Carlo simulations illustrate the good finite-sample properties of our procedures. An empirical application to US banks during multiple crises demonstrates the usefulness of our surveillance schemes for both regulators and financial institutions. |
| 5:05pm - 6:35pm | Statistical Inverse Problems Location: 0.002 Session Chair: Frank Werner |
|
|
Linear methods for non-linear inverse problems 1Delft University of Technology; 2Bocconi University, Italy We propose a novel Bayesian linearization approach for non-linear PDE constrained inverse problems. We split the non-linear inverse problem into a linear statistical and a non-linear analytic component. We derive optimal posterior contraction rates, reliable uncertainty quantification, data driven tuning and scalable approximations. The general approaches is applied to specific examples, including Darcy flow and heat equation with absorption term. Learning with Heavy-tailes TU Braunschweig, Germany We examine the performance of ridge regression in reproducing kernel Hilbert spaces in the presence of noise that exhibits a finite number of higher moments. We establish excess risk bounds consisting of subgaussian and polynomial terms based on the well known integral operator framework. The dominant subgaussian component allows to achieve convergence rates that have previously only been derived under subexponential noise - a prevalent assumption in related work from the last two decades. These rates are optimal under standard eigenvalue decay conditions, demonstrating the asymptotic robustness of regularized least squares against heavy-tailed noise. Our derivations are based on a Fuk-Nagaev inequality for Hilbert-space valued random variables. Comparing regularisation paths of (conjugate) gradient estimators in ridge regression 1Humboldt-Universität zu Berlin, Germany; 2Aarhus Universitet, Denmark We consider standard gradient descent, gradient flow and conjugate gradients as iterative algorithms for minimising a penalised ridge criterion in linear regression. While it is well known that conjugate gradients exhibit fast numerical convergence, the statistical properties of their iterates are more difficult to assess due to inherent non-linearities and dependencies. On the other hand, standard gradient flow is a linear method with well-known regularising properties when stopped early. By an explicit non-standard error decomposition we are able to bound the prediction error for conjugate gradient iterates by a corresponding prediction error of gradient flow at transformed iteration indices. This way, the risk along the entire regularisation path of conjugate gradient iterations can be compared to that for regularisation paths of standard linear methods like gradient flow and ridge regression. In particular, the oracle conjugate gradient iterate shares the optimality properties of the gradient flow and ridge regression oracles up to a constant factor. Numerical examples show the similarity of the regularisation paths in practice. |
| 5:05pm - 6:35pm | Inference in Wasserstein Spaces and Optimal Transport Location: 0.004 Session Chair: Ansgar Steland |
|
|
Statistical Aspects of Optimal Transport: Regularization, Estimation, and Applications University of Twente, The Netherlands In recent years, statistical methodology based on optimal transport (OT) witnessed a considerable increase in practical and theoretical interest. A central reason for this trend is the ability of optimal transport to efficiently compare data in a geometrically meaningful way. This development was further amplified by computational advances spurred by the introduction of entropy regularized optimal transport (EOT). In applications, the OT or EOT cost are often estimated through an empirical plug-in approach, raising statistical questions about the performance and uncertainty of these estimators. This talk will survey recent theoretical and methodological insights to these topics and discusses future opportunities. This talk is based on joint work with Thomas Staudt, Marcel Klatt, Michel Groppe, Alberto-Gonzáles-Sanz, Gilles Mordant, Christoph Weitkamp, and Axel Munk. On the cut-offs of Optimal Transport based statistical tests University of British Columbia, Canada Tests for equality of distributions based on Optimal Transport functionals are often referred to as being not distribution free: asymptotic laws for tests statistics depend on the underlying true distributions, and this dependence seems unavoidable. Here we show that these tests are ``almost" distribution free, in a sense that there exist cut-offs independent of the true distributions that result in tests with given level of significance. These cut-offs are easy to compute and may serve as a rule-of-thumb-type heuristics, making Optimal Transport based tests more accessible for practical applications. Detecting change-points of univariate time series using the empirical Wasserstein distance 1RWTH Aachen University, Germany; 2Delft University of Technology, Netherlands In this talk we are interested in detecting change-points of univariate nonstationary time series in a nonparametric setting. We introduce statistics based on the Wasserstein distance between local empirical distribution functions of the time series which are suitable to detect change-points. The one-dimensional Wasserstein distance is characterized by the sequential quantile process, and we show that this weakly converges to a Gaussian limit. Due to the nonlinearity of the quantile process, difficulties arise from the localization. A new Bahadur representation result is needed to address this, which allows us to consider the asymptotic behavior of the empirical process instead of the quantile process. The proof of this requires further study of the modulus of continuity of the empirical process. As the limit distributions of the test statistics depend on the unknown underlying distributions, a Gaussian multiplier bootstrap scheme is introduced. Lastly, a simulation study shows how well the significance level is retained under the null hypothesis of no change, and an outlook towards the power of the tests will be given. |
| 5:05pm - 6:35pm | Advances in Latent Variable Models Location: 1.002 Session Chair: Daniele Tancini |
|
|
A multilevel discrete latent variable model for joint modeling of response accuracy and times 1University of Milano-Bicocca, Italy; 2University of Perugia, Italy In recent years, the widespread adoption of computer-based testing has produced large volumes of data on examinee behavior. Beyond traditional binary indicators of correct responses, these datasets now typically include item-level response times, providing a richer and more informative perspective on the performance. The Bradley–Terry Stochastic Block Model University College Dublin, Ireland The Bradley-Terry model is widely used for the analysis of pairwise comparison data and, in essence, produces a ranking of the items under comparison. We embed the Bradley-Terry model within a stochastic block model, allowing items to cluster. The resulting Bradley-Terry SBM (BT-SBM) ranks clusters so that items within a cluster share the same tied rank. We develop a fully Bayesian specification in which all quantities-the number of blocks, their strengths, and item assignments-are jointly learned via a fast Gibbs sampler derived through a Thurstonian data augmentation. Despite its efficiency, the sampler yields coherent and interpretable posterior summaries for all model components. Our motivating application analyzes men's tennis results from ATP tournaments over the seasons 2000-2022. We find that the top 100 players can be broadly partitioned into three or four tiers in most seasons. Moreover, the size of the strongest tier was small from the mid-2000s to 2018 and has increased since, providing evidence that men's tennis has become more competitive in recent years. A latent space approach for jointly modelling social influence on binary outcomes in networks 1University of Cambridge, United Kingdom; 2University College Dublin, Ireland A central task in network analysis is to model social influence, that is, how individual behaviours and outcomes are shaped by their social environment. Classical regression models are not suitable for this purpose, as they frequently rely on independence assumptions that are violated in network data, where individuals' behaviours are inherently interdependent. Although several methods have been proposed to address this problem, existing approaches either treat the network as fixed, rely on multi-step estimation procedures, or are limited to continuous outcome variables. |
| 5:05pm - 6:35pm | Contributions to Computational Biostatistics and Data Science Location: 1.012 Session Chair: Dennis Dobler |
|
|
Bootstrap-based inference in regression using jackknife pseudo-observations 1RWTH Aachen University, Germany; 2Aarhus University, Denmark The pseudo-observation regression approach provides a flexible alternative to the omnipresent proportional hazards model when modeling time-to-event outcomes. In this approach, estimands representable as expectations are fitted to regression models using covariates of interest. Exemplary estimands that fit this framework are the restricted mean time lost (in competing risks models) or the survival function at a fixed time-point (in simple survival models). Likelihood-Based Inference for Dirichlet Mixture Models via Unconstrained Parameterization 1TU Kaiserslautern, Germany; 2LMU Munich, Germany Dirichlet mixture models (DMMs) provide a flexible and interpretable framework for clustering and modeling compositional data and have found widespread application in genomics, ecology, and the social sciences. Despite their popularity, formal likelihood-based inference for DMM parameters remains underdeveloped, primarily due to the presence of simplex constraints on mixture weights and the complex dependence structure induced by latent component memberships. In this paper, we develop a unified framework for classical likelihood-based inference in Dirichlet mixture models by working on an unconstrained parameterization that combines an additive log-ratio transformation of the mixture weights with the original Dirichlet concentration parameters. Within this framework, we derive closed-form expressions for score functions and observed Fisher information matrices, including full cross-component information terms obtained via the Louis identity. These results enable the construction of Wald, score (Lagrange Multiplier), and likelihood ratio tests for a broad class of regular parametric hypotheses, including fixed-value restrictions and equality constraints across mixture components. We show how the proposed methods apply seamlessly to both soft and hard EM-based estimation schemes and provide a numerically stable implementation that yields consistent standard errors and confidence intervals on the original parameter scale. Through simulation experiments and a real-data application, we demonstrate that the proposed inferential procedures perform well in finite samples and provide meaningful uncertainty quantification for DMM parameters. |
| 6:40pm - 8:30pm | Welcome Reception |
| Date: Thursday, 19/Mar/2026 | |
| 8:50am - 9:50am | Plenary Lecture 3 Location: 0.004 |
|
|
Statistical and computational challenges in unsupervised learning: focus on ranking University of Potsdam, Germany Ranking problems are prevalent in modern statistical, machine learning, and computer science literature. This includes a variety of practical situations ranging from ranking experts/workers in crowd-sourced data, ranking players in a tournament or equivalently sorting objects based on pairwise comparisons. A main challenge in this field is to construct an estimator of the rank of the experts, based on incomplete and noisy data. |
| 9:50am - 10:20am | Coffee break 3 |
| 10:20am - 12:20pm | Statistics in natural sciences and technology Location: 0.001 Session Chair: Gaby Schneider Session Chair: Ansgar Steland |
|
|
Time-varying degree-corrected stochastic block models ISBA/LIDAM, UC Louvain, Belgium Recent interest has emerged in community detection for dynamic networks which are observed along a trajectory of points in time. In this talk, we present a time-varying degree-corrected stochastic block model to fit a dynamic network which allows evolving heterogeneity in the degrees of nodes within a community over time. Considering the influence of the varying time window on the aggregation of network information from different time points, in the parameter estimation, we propose a smoothing-based method to recover time-varying degree parameters and communities. In particular we provide rates of consistency of our smoothed estimators for degree parameters and communities using a time-localised profile-likelihood approach. We illustrate our method by some comparative simulation studies and an application to a real data set. Learning population and individual structure in dynamic networks with degree heterogeneity UCLouvain, Belgium Dynamic networks provide a powerful framework for characterizing time-varying functional connectivity in neuroimaging studies. In practice, such networks are typically collected from multiple subjects across time and exhibit both temporal dynamics and subject-specific heterogeneity. Brain functional connectivity networks also contain hub nodes, defined as highly connected regions that play critical roles in understanding brain functional connectivity. In this talk, we propose a mixed-effect dynamic stochastic block model with degree heterogeneity, which simultaneously disentangles the population connectivity structure from individual variability and recovers the trajectories of hub regions through time-varying degree parameters. We develop an efficient local approximate estimation procedure and evaluate its performance through extensive simulations and a case study of dynamic functional connectivity from the Human Connectome Project. How to build your latent Markov model — the role of time and space Bielefeld University, Germany Statistical models that involve latent Markovian state processes have become immensely popular tools for analysing time series and other sequential data. However, the plethora of model formulations, the inconsistent use of terminology, and the various inferential approaches and software packages can be overwhelming to practitioners, especially when they are new to this area. Here we aim to provide guidance for both statisticians and practitioners working with latent Markov models by offering a unifying view on what otherwise are often considered separate model classes, from hidden Markov models over state-space models to Markov-modulated Poisson processes. In particular, we provide a roadmap for identifying a suitable latent Markov model formulation given the data to be analysed. Furthermore, we emphasise that it is key to applied work with any of these model classes to understand how recursive techniques exploiting the models' dependence structure can be used for inference. The R package LaMa adapts this unified view and provides an easy-to-use framework for fast numerical maximum likelihood estimation, allowing users to flexibly tailor a latent Markov model to their data using a Lego-type approach. Real-data examples from ecology, medicine and finance will be used to illustrate the modelling workflow. A Simple and Robust Multi-Fidelity Data Fusion Method for Effective Modelling of Citizen-Science Air Pollution Data 1University of Glasgow, United Kingdom; 2ETH Zürich We propose a robust multi-fidelity Gaussian process for integrating sparse, high-quality reference monitors with dense but noisy citizen-science sensors. The approach replaces the Gaussian log-likelihood in the high-fidelity channel with a global Huber loss applied to precision-weighted residuals, yielding bounded influence on all parameters, including the cross-fidelity coupling, while retaining the flexibility of co-kriging. We establish attenuation and unbounded influence of the Gaussian maximum likelihood estimator under low-fidelity contamination and derive explicit finite bounds for the proposed estimator that clarify how whitening and mean-shift sensitivity determine robustness. Monte Carlo experiments with controlled contamination show that the robust estimator maintains stable MAE and RMSE as anomaly magnitude and frequency increase, whereas the Gaussian MLE deteriorates rapidly. In an empirical study of PM2.5 concentrations in Hamburg, combining UBA monitors with openSenseMap data, the method consistently improves cross-validated predictive accuracy and yields coherent uncertainty maps without relying on auxiliary covariates. The framework remains computationally scalable through diagonal or low-rank whitening and is fully reproducible with publicly available code. |
| 10:20am - 12:20pm | High-dimensional estimation and concentration phenomena Location: 0.002 Session Chair: Marie Düker |
|
|
Copula tensor count autoregressions 1University of Rome Tor Vergata; 2Vrije Universiteit Amsterdam This paper presents a novel copula-based autoregressive framework for multi-layer arrays of integer-valued time series with tensor structure. Our framework generalizes recent advances in tensor time series models for real-valued data to a context that accounts for the unique properties of integer-valued data, such as discreteness and non-negativity. The model incorporates feedback effects for the counts’ temporal dynamics and introduces identification constraints. An asymptotic theory is developed for a Two-Stage Maximum Likelihood Estimator (2SMLE) for the model’s parameters. The estimator balances the challenges of parameter dimensionality, interdependence of the different count series, and computational stability. Together, this substantially pushes the frontier for modeling multi-dimensional, structured tensor time series of counts. An application to tensor crime counts demonstrates the practical usefulness of the proposed methodology. High-Dimensional Inference for Network Stochastic Differential Equations University of Hamburg, Germany We consider the setting where the state dynamics at each node in a network depend on interactions with its neighbors. We model this using the general framework of Network Stochastic Differential Equations (N-SDEs). The evolution at each node arises from three components: intrinsic dynamics (a momentum term), feedback from adjacent nodes (a network term), and a stochastic volatility component driven by Brownian motion. Our goals are twofold: (i) parameter estimation for N-SDE systems and (ii) recovery of the underlying graph. Based on joint works with S.M. Iacus and N. Yoshida. Testing approximate sphericity for high-dimensional covariance matrices Aarhus University, Denmark Exact testing of model assumptions is often of limited relevance, especially in high-dimensional settings. Structural assumptions on large-dimensional covariance matrices, such as sphericity, are rarely expected to hold exactly for real data, and practitioners are often primarily interested in whether such model assumptions are approximately satisfied. In this work, we propose a test for approximate sphericity of high-dimensional covariance matrices, where the tolerated level of deviation from sphericity can be chosen by the user. Our test statistic is based on estimators of the largest and smallest eigenvalues of the population covariance matrix in a high-dimensional regime, where the corresponding sample eigenvalues are not consistent. We derive theoretical guarantees showing that the test keeps the prescribed asymptotic level under the null hypothesis and is power consistent under the alternative. Our key theoretical contribution is a joint central limit theorem for the estimators of the extreme eigenvalues of the population covariance matrix, provided the corresponding eigenvalues exceed the critical phase transition threshold. Principal Components Analysis for Irregular Data 1ETH Zurich, Switzerland; 2EPFL, Switzerland Functional principal component analysis (FPCA) is a fundamental tool for exploring variation in samples of random curves or surfaces. We propose a new approach to FPCA for functional data observed irregularly and sparsely over their domains, based on smoothing directly at the level of the eigenfunctions. Our formulation leads to an efficient optimization-based procedure whose computational and storage costs are comparable to those of standard multivariate PCA for regularly observed data. The method is flexible with respect to domain geometry and model class, accommodates structural constraints and penalties, and facilitates uncertainty quantification via resampling and asymptotic theory. |
| 10:20am - 12:20pm | Theory of Machine Learning: Insights from Women Researchers Location: 0.004 Session Chair: Mahsa Taheri |
|
|
Effects of Depth in Deep Learning: Independence vs Recurrence LMU Munich, Germany Depth plays a central role in modern deep learning, yet its probabilistic effects are subtle and are not fully captured by classical theories that primarily focus on the infinite-width limit. This talk explores how jointly scaling depth and width shapes the signal-propagation statistics of wide neural networks under two contrasting regimes: fully connected feedforward networks with independent weights across layers, and recurrent networks with shared weights. In feedforward networks, standard infinite-width analyses allow to stabilize forward and backward variance, ensuring well-behaved initialization. However, finite-width fluctuations accumulate with depth, breaking convergence to the Neural Tangent Kernel (NTK) regime. In contrast, in linear recurrent networks, finite-width effects already destabilize the forward-propagation variance, rendering conventional initialization schemes inadequate for long input sequences. Together, these results show that depth affects feedforward and recurrent architectures in qualitatively distinct ways that cannot be captured by infinite-width approximations. Theoretical guarantees for diffusion models — beyond log-concavity University of Hamburg, Germany Score-based generative modeling, implemented through probability flow ODEs, has shown impressive results in numerous practical settings. However, most convergence guarantees rely on restrictive regularity assumptions on the target distribution—such as strong log-concavity or bounded support. This work establishes non-asymptotic convergence bounds in the 2-Wasserstein distance for a general class of probability flow ODEs under considerably weaker assumptions: weak log-concavity and Lipschitz continuity of the score function. Our framework accommodates non-log-concave distributions, such as Gaussian mixtures, and explicitly accounts for initialization errors, score approximation errors, and effects of discretization via an exponential integrator scheme. Bridging a key theoretical challenge in diffusion-based generative modeling, our results extend convergence theory to more realistic data distributions and practical ODE solvers. We provide concrete guarantees for the efficiency and correctness of the sampling algorithm, complementing the empirical success of diffusion models with rigorous theory. Moreover, from a practical perspective, our explicit rates might be helpful in choosing hyperparameters, such as the step size in the discretization. Random Quadratic Form on a Sphere: Synchronization by Common Noise University of Amsterdam, Netherlands, The We introduce Random Quadratic Form (RQF): a stochastic differential equation which formally corresponds to the gradient flow of a random quadratic functional on a sphere. While one-point motion of the system is a Brownian motion on a sphere and thus has no preferred direction, the two-point motion exhibits nontrivial synchronizing behaviour. In this work we study synchronization of the RQF, namely we give both distributional and path-wise characterizations of the solutions by studying invariant measures and random attractors of the system. Minimax rate of distribution regression Hong Kong University of Science and Technology, Hong Kong S.A.R. (China) Distribution regression seeks to estimate the conditional distribution of a multivariate response given a continuous covariate. This approach offers a more complete characterization of dependence than traditional regression methods. Classical nonparametric techniques often assume that the conditional distribution has a well-defined density, an assumption that fails in many real-world settings. These include cases where data contain discrete elements or lie on complex low-dimensional structures within high-dimensional spaces. In this work, we establish minimax convergence rates for distribution regression under nonparametric assumptions, focusing on scenarios where both covariates and responses lie on low-dimensional manifolds. We derive lower bounds that capture the inherent difficulty of the problem and propose a new hybrid estimator that combines adversarial learning with simultaneous least squares to attain matching upper bounds. Our results reveal how the smoothness of the conditional distribution and the geometry of the underlying manifolds together determine the estimation accuracy. |
| 10:20am - 12:20pm | Mathematical Statistics Location: 1.012 Session Chair: Mathias Trabs |
|
|
Alternative argmin method in the non-unique case and application for gradual regression changes 1University of Hamburg, Germany; 2Charles University of Prague, Czech Republic Assume one wants to estimate the true parameter $vartheta_0$, which is the {it maximal} value {it minimizing} a function $M(vartheta)$ over $vartheta$. Let $M_n(vartheta)$ be a consistent estimator for $M(vartheta)$ uniformly in $vartheta$. Although uniform convergence holds, one cannot apply the argmin theorem in the non-unique minimum case. Using the {it maximal} value {it minimizing} the function $M_n(vartheta)$ over $vartheta$ generally does not give a consistent estimator. We consider a special case with real-valued parameter, and define a new consistent estimator. This method is then applied to estimate the gradual (smooth) change point $vartheta_0$ of a nonparametric regression model $Y=m(X)+varepsilon$ with real-valued covariates, and a continuous regression function $m$ with maximal value $vartheta_0$, where $m$ is zero. Flow Matching as a forecasting model 1Ruhr-Universität Bochum, Germany; 2Karlsruher Institut für Technologie Flow Matching (introduced by Lipman et. al.) and associated models have recently attracted significant interest due to their simulation-free training via a straightforward least squares criterion and the extremely broad and consequently adaptable underlying ordinary differential equation framework. Despite being a generative model that aims to mimic an unknown distribution, its possible applications extend far beyond the core task of generating new samples. The cheap generation of new samples opens the door to efficient distribution estimation, an essential component of forecasting tasks such as weather prediction. In this talk, we first adapt the Flow Matching method to smooth conditional density estimation. We show that the resulting estimator is closely related to th Nadaraya-Watson estimator. Then, we bridge the gap between proper scoring rules, the established method of evaluating predictions, and the fundamental concept of risk in statistical learning. Building on this, we show that the Nadaraya-Watson estimator achieves a minimax optimal anisotropic rate of convergence with respect to the risk associated with the Fourier score. In the end, we transfer this result to the Flow Matching estimator and demonstrate its capability in practice. Maximum likelihood estimation of the location of a symmetric convex body 1Georgia Tech, United States; 2Universität Bielefeld, Germany Consider data points sampled independently from the uniform distribution on a known symmetric convex body in high-dimensional Euclidean space with unknown location parameter. In this setting, the set of maximum likelihood estimators (MLE set) is a convex body containing the true location parameter. The goal of this talk is to present non-asymptotic upper and lower bounds for the diameter of the MLE set. Permutation testing under local differential privacy University of Warwick, United Kingdom In this talk I will discuss recent work on two-sample testing under a local differential privacy constraint where a permutation procedure is used to calibrate the tests. While permutation testing is a classical resampling technique, popular due to its ease of implementation and uniform Type I error control, its use under local privacy constraints is complicated by the fact that access to the data is limited. In this work we design appropriate mechanisms for private data collection, both interactive and non-interactive, that allow for permutation tests. Our analysis shows that these lead to minimax optimal separation rates in both discrete and continuous settings, with interactive procedures being significantly more powerful. This is recent joint work with Alexander Kent and Yi Yu (https://arxiv.org/abs/2505.24811). |
| 12:20pm - 1:30pm | Lunch break 2 |
| 1:30pm - 3:30pm | Statistics in natural sciences and technology Location: 0.001 Session Chair: Gaby Schneider Session Chair: Ansgar Steland |
|
|
MEWMA control charts for the covariance matrix -- on the validity of a certain approximation to achieve a feasible ARL integral equation 1RWTH Aachen / HSU Hamburg, Germany; 2HSU Hamburg, Germany In this talk, we consider the problem of monitoring changes in the covariance matrices of a sequence of multivariate normally distributed random vectors. Therefore, we introduce a Multivariate Exponentially Weighted Moving Average (MEWMA) control chart in which, at each time step, the empirical covariance matrix is computed and vectorized. The control limit and the corresponding Average Run Length (ARL) are determined not only by Monte Carlo simulation, but also by numerically solving an integral equation for the ARL. In order to set up this integral equation, the exact transition density of the monitoring statistic is approximated by its asymptotic transition density. This approximation exploits the fact that the asymptotic transition density is invariant under rotations of the sample covariance matrix. Finally, we provide an outlook on an application of the proposed control chart to data from a bridge monitoring project. EWMA control charts for the correlation coefficient Helmut Schmidt University Hamburg, Germany There are indeed many EWMA control charts for various parameters available. However, there is none for monitoring the linear correlation coefficient ρ. Despite it is known for a long time, the usage of he explicit distribution of the estimator of ρ while setting up a control chart seems to be non-existent. Here, we build an EWMA chart utilizing this estimator, namely the Pearson correlation, and calculate the most popular performance measure, the zero-state average run length (ARL), by means of various numerical methods. Less surprisingly, the two standard methods work poorly for certain chart designs. We solve these problems by utilizing piece-wise collocation. Moreover, we examine further configuration details and provide some guidelines. Two applications illustrate the usefulness of monitoring the ρ level. Integrated Modelling of Age-and Sex-Structured Wildlife Population Dynamics: The Example of Hartebeest University of Hohenheim, Germany Biodiversity underpins life on Earth, yet it is declining at an accelerating pace, sharpening the need for interventions that can slow, halt, or reverse these losses. Designing such interventions requires clear insight into the processes driving population declines in particular species—and into the relative importance of those processes—insight most directly generated by population dynamics models. Yet appropriate population dynamics models for quantifying declines and guiding conservation management of wild herbivore populations remain scarce, leaving a critical gap in both evidence and practice. To address this gap, we develop an integrated Bayesian state-space population dynamics model, using the Mara-Serengeti hartebeest population as a case study. The model extends and generalizes an earlier framework we developed and illustrated for the Mara-Serengeti topi (Mukhopadhyay et al. 2024), adding multiple features designed to improve realism, inference, and management relevance. The model fuses ground demographic surveys with aerial monitoring data, explicitly representing population age–sex structure and key life-history traits and strategies. It links birth rates, age-specific survival rates, and sex ratios to meteorological covariates, prior population density, environmental seasonality, predation risk, and several environmental and anthropogenic covariates. Operating on a monthly time step, it enables fine-grained estimation of reproductive seasonality, phenology, synchrony, and birth prolificacy, as well as juvenile and adult recruitment dynamics. We evaluate performance using balanced bootstrap sampling and by comparing model predictions with empirical aerial estimates of population size. We perform detailed assessment of model robustness, including by checking for parameter redundancy, estimability and identifiability, performing sensitivity analysis of the priors and running multiple MCMC chains. Implemented as a hierarchical Bayesian model using MCMC methods for parameter estimation, prediction, and inference, the model reproduces several well-established features of the hartebeest population, including a steep and persistent decline, weakly seasonal births, and juvenile and adult recruitment patterns. The framework is general and flexible and easily adaptable for other species. References Mukhopadhyay, S., Piepho, H. P., Bhattacharya, S., Dublin, H. T., & Ogutu, J. O. (2024). Hierarchical Bayesian integrated modeling of age-and sex-structured wildlife population dynamics. Journal of Agricultural, Biological and Environmental Statistics, 1-26. Joseph O. Ogutu, Hans-Peter Piepho et al. University of Hohenheim, Institute of Crop Science, Biostatistics Unit, Fruwirthstrasse 23, 70599 Stuttgart, Germany The second order generalization of Hájek-Le Cam asymptotic minimax theorem Nanzan University, Japan The basic results concerning with the asymptotic theory of estimation and testing, Le Cam (1960) introduced so-called locally asymptotically normal (LAN) family of distributions. The convolution theorem for LAN case is obtained by Hájek (1970). The convolution result was extended by Le Cam (1972) to more general situations than that of LAN case. These results sometimes called the Hájek-Le Cam asymptotic minimax theorem. In this talk we derive the second order generalization of Hájek's convolution theorem. Furthermore, as a application of the second order Hájek's convolution theorem, we lead to the second order Hájek-Le Cam asymptotic minimax theorem. It automatically provides the conditions that the second order asymptotic efficient estimators should satisfy. |
| 1:30pm - 3:30pm | Statistics for Stochastic Processes Location: 0.002 Session Chair: Fabian Mies |
|
|
A nonparametric statistic for rank changes of volatility functions of Ito semimartingales Christian-Albrechts-Universität, Germany The change of the rank of the volatility function in Ito semimartingales poses a complicated signal-detection problem. In their paper from 2013 Jacod & Podolskij have derived a statistic to detect whether the rank of the volatility function is constant over the observation period. Based on their results we develop a statistic which allows us to detect local jumps in the rank which is based on random perturbation of the high-frequency observations on an Ito semimartingale. This statistic can be used to estimate the time points at which the rank jumps occur. We illustrate our results with some simulated data. Nonparametric density estimation for the small jumps of Lévy processes Université Versailles Saint Quentin, France We consider the problem of estimating the density of the process associated with the small jumps of a pure jump Lévy process, possibly of infinite variation, from discrete observations of one trajectory. The interest of such a question lies on the observation that even when the Lévy measure is known, the density of the increments of the small jumps of the process cannot be computed in closed-form. We discuss results both from low and high-frequency observations. In a low frequency setting, assuming the Lévy density associated with the jumps larger than $epsilonin(0,1)$ in absolute value is known, a spectral estimator relying on the convolution structure of the problem achieves a parametric rate of convergence with respect to the integrated $L_2$ loss, up to a logarithmic factor. In a high-frequency setting, we remove the assumption on the knowledge of the Lévy measure of the large jumps and show that the rate of convergence depends both on the sampling scheme and on the behavior of the Lévy measure in a neighborhood of zero. We show that the rate we find is minimax up to a logarithmic factor. An adaptive penalized procedure is studied to select the cutoff parameter. These results are extended to encompass the case where a Brownian component is present in the Lévy process. Furthermore, we numerically illustrate the performances of our procedures. Fractional interacting particle system: drift parameter estimation via Malliavin calculus Universitat Pompeu Fabra, Spain We address the problem of estimating the drift parameter in a system of $N$ interacting particles driven by additive fractional Brownian motion of Hurst index ( H geq 1/2 ). Considering continuous observation of the interacting particles over a fixed interval ([0, T]), we examine the asymptotic regime as ( N to infty ). Our main tool is a random variable reminiscent of the least squares estimator but unobservable due to its reliance on the Skorohod integral. We demonstrate that this object is consistent and asymptotically normal by establishing a quantitative propagation of chaos for Malliavin derivatives, which holds for any ( H in (0,1) ). Leveraging a connection between the divergence integral and the Young integral, we construct computable estimators of the drift parameter. These estimators are shown to be consistent and asymptotically Gaussian. Finally, a numerical study highlights the strong performance of the proposed estimators. Adaptive denoising diffusion modelling via random time reversal 1Kiel University, Germany; 2Heidelberg University, Germany; 3University of Stuttgart, Germany We introduce a new class of generative diffusion models that, unlike conventional denoising diffusion models, achieve a time-homogeneous structure for both the noising and denoising processes, allowing the number of steps to adaptively adjust based on the noise level. This is accomplished by conditioning the forward process using Doob’s h-transform, which terminates the process at a suitable sampling distribution at a random time. The model is particularly well suited for generating data with lower intrinsic dimensions, as the termination criterion simplifies to a first hitting rule. A key feature of the model is its adaptability to the target data, enabling a variety of downstream tasks using a pre-trained unconditional generative model. We highlight this point by demonstrating how our generative model may be used as an unsupervised learning algorithm: in high dimensions the model outputs with high probability the metric projection of a noisy observation $y$ of some latent data point $x$ onto the lower-dimensional support of the data – which we don't assume to be analytically accessible but to be only represented by the unlabeled training data set of the generative model. |
| 1:30pm - 3:30pm | Multivariate Statistics and Copulas Location: 0.004 Session Chair: Eckhard Liebscher |
|
|
Tests for independence between random vectors University of Leuven (KU Leuven), Belgium, Belgium In this talk the focus is on copula-based procedures for testing whether a finite collection of continuous random vectors is mutually independent. In particular, we look into the class of meta-elliptical copulas and test the hypothesis whether the copula correlation matrix is a block diagonal matrix. The test statistic is a Phi-dependence measure of a rank-based correlation matrix estimator, whose asymptotic distribution under the null is obtained for general (Phi) functions and general elliptical generators. In case of the Gaussian copula, we also develop asymptotics when optimal transport dependence measures are used for testing the null hypothesis of independent random vectors. Some numerical studies, including comparisons with existing methods, are reported on. Irène Gijbels, Steven De Keyser University of Leuven (KU Leuven), Belgium. Restrictions of PCBNs for integration-free computations Delft University of Technology, The Netherlands The pair-copula Bayesian Networks (PCBN) are graphical models composed of a directed acyclic graph (DAG) that represents (conditional) independence in a joint distribution. The nodes of the DAG are associated with marginal densities, and arcs are assigned with bivariate (conditional) copulas following a prescribed collection of parental orders. The choice of marginal densities and copulas is unconstrained. However, the simulation and inference of a PCBN model may necessitate possibly high-dimensional integration. A nonparametric copula-based imputation method Free university of Bozen-Bolzano, Italy Missing values in multivariate dependent data are common in many applied settings and pose challenges for standard imputation methods, particularly when complex dependence structures are present. We introduce NPCoImp, a nonparametric copula-based approach for imputing multivariate missing data. The method relies on the empirical beta copula to estimate conditional distribution functions of missing variables given the observed ones, allowing the imputation process to account for the radial symmetry or asymmetry of the joint dependence structure. NPCoImp is highly flexible and can accommodate arbitrary missingness patterns in multivariate settings. We assess its performance through an extensive Monte Carlo simulation study, comparing it with classical imputation methods, the CoImp algorithm, and the machine-learning-based missForest approach. The results show that NPCoImp performs particularly well in preserving dependence structures across different sample sizes, missingness levels, and dependence strengths. The practical relevance of the method is illustrated through applications to real data from the agricultural sector. An ordering for the strength of functional dependence Paris Lodron Universität Salzburg, Austria We introduce a new dependence order, termed the conditional convex order, whose minimal and maximal elements characterize independence and perfect dependence. Moreover, it characterizes conditional independence, satisfies information monotonicity, and exhibits several invariance properties. Consequently, it is an ordering for the strength of functional dependence of a random variable Y on a random vector X. As we show, various recently studied dependence measures---including Chatterjee's rank correlation, Wasserstein correlations, and rearranged dependence measures---are increasing in this order and inherit their fundamental properties from it. We characterize the conditional convex order by the Schur order and by the concordance order, and we verify it in settings such as additive error models, the multivariate normal distribution, and various copula-based models. Our results offer a unified perspective on the behavior of dependence measures across statistical models. |
| 1:30pm - 3:30pm | Topics in functional data analysis Location: 1.012 Session Chair: Siegfried Hörmann |
|
|
Tests of symmetry for functional data Charles University, Czech Republic We present test of symmetry of distribution and test of time symmetry for functional data. These test are Cramér - von Mises type tests based on empirical characteristic functionals. Specific variants of time symmetry including time symmetry of Wiener process are proposed. In general, the test statistics assume a relatively simple form if we use a Gaussian measure to construct the test. Then, we use bootstrap or permutation techniques to estimate the asymptotic critical values for the test statistics. Making Event Study Plots Honest: A Functional Data Approach to Causal Inference University of Bonn, Germany Event study plots are the centerpiece of Difference-in-Differences (DiD) analysis, but current plotting methods cannot provide honest causal inference when the parallel trends and/or no-anticipation assumption fails. We introduce a novel functional data approach to DiD that directly enables honest causal inference via event study plots. Our DiD estimator converges to a Gaussian process in the Banach space of continuous functions, enabling powerful simultaneous confidence bands. This theoretical contribution allows us to turn an event study plot into a rigorous, honest causal inference tool through equivalence and relevance testing: Honest reference bands can be validated using equivalence testing in the pre-treatment period, and honest causal effects can be tested using relevance testing in the post-treatment period. We demonstrate the performance of our method in simulations and two case studies. Kernel Expansions in Sobolev Spaces and Applications to Stochastic Processes TU Graz, Austria Mercer's celebrated theorem is refined and extended for (weakly) differentiable symmetric kernels by associating not the common $L^2$-integral operator but a slightly more complex operator, that additionally takes into account information encoded in the (weak) derivatives of the kernel. The natural domain for this associated operator is the Sobolev Space $H^k(Theta) = W^{k,2}(Theta) subset L^2(Theta)$, where $Theta subset R^d$ is some bounded domain and $kinN_0$ depends on the order of weak differentiability. The spectral decomposition of this operator then leads to a Mercer-type expansion of the kernel, which converges with respect to the $H^k$-norm and, if $k>d$, also uniformly emph{without} requiring the kernel to be positive-definite. In case the kernel is also positive-definite and differentiable in the strong sense, a refinement of Mercer's theorem is obtained that additionally provides uniform convergence of the term-wise derivatives of the expansion to the respective derivatives of the kernel as well. Uncertainty of Functional Data Reconstruction Masaryk University, Czech Republic We revisit the classic situation in functional data analysis in which data items such as curves are observed at discrete (possibly sparse and irregular) arguments with observation noise. We focus on the reconstruction of individual curves, especially on prediction intervals and prediction bands for them. The standard approach is to proceed in two steps: First, one estimates the mean and covariance function of curves and observation noise variance function by smoothing techniques such as penalized splines. Second, under Gaussian assumptions, one derives the conditional distribution of a curve given its noisy discrete observations and constructs prediction sets with required properties (usually employing sampling from the predictive distribution). This approach is indeed well established, commonly used and theoretically valid but practically, it surprisingly fails in its key property: prediction sets constructed this way often do not have the required coverage. The actual coverage is lower than the nominal one. This has been little reported and studied in the literature. We investigate the cause of this issue and propose a remedy. |
| 3:30pm - 4:00pm | Coffee break 4 |
| 4:00pm - 6:00pm | Computational Statistics Location: 0.001 Session Chair: Ostap Okhrin |
|
|
Tensor changepoint detection and eigenbootstrap Charles University, Czech Republic Tensor data consisting of multivariate outcomes over the items and across the subjects with longitudinal and cross-sectional dependence are considered. A completely distribution-free and tweaking-parameter-free detection procedure for changepoints at different locations is designed, which does not require training data. A CUSUM-type test statistic is employed, and its asymptotic properties are derived for a large number of available individual profiles. The considered test is shown to be consistent. The aim is to propose eigenbootstrap superstructure that overcomes the computational curse of dimensionality without any loss of information, while it preserves all the dependencies within and between the panels. The validity of this new and fast resampling algorithm is proved in this general setting. The empirical properties of the detection technique are investigated through a simulation study. The fully data-driven test is applied to real-world data from EEG and psychometrics. Functional-based claims reserving with ProfileLadder Charles University, Czech Republic Risk reserving is a fundamental task in non-life insurance and is performed on a regular basis. It is typically carried out using parametric estimation and prediction methods applied to aggregated data structured in so-called run-off triangles. In this talk, we present nonparametric, functional-based reserving alternatives that rely on the completion of MNAR functional segments in the underlying run-off triangles. In addition to the theoretical and methodological framework, we focus on algorithmic details implemented in the recent R package ProfileLadder. The package offers a flexible and computationally efficient tools for pointwise and distributional reserve prediction and includes relevant visualization and diagnostic tools implemented via standard S3 methods. These nonparametric approaches provide modern, transparent, and extensible alternatives to classical reserving methods used by researchers, actuarial scientists, or insurance practitioners. Proxy-identification of a structural MGARCH model for asset returns Matthias R. Fengler, Professor of Econometrics, University of St.Gallen, Switzerland We identify shocks in a structural MGARCH model of asset returns using news-based proxy instruments. Structural parameters, including an orthogonal matrix, are estimated via Riemannian optimization. We study daily returns on the S&P500, the 10-year Treasury yield, and the USD index. The proxies identify an equity valuation shock, capturing shifts in expected dividend growth and risk premia, and a bond valuation shock, reflecting fundamental shocks in safe-haven asset pricing. The dynamic impact matrix is asymmetric, and sign changes in the bond valuation shock loading drive switches between negative and positive stock–bond co-movement. A decomposition of the COVID-19 episode shows that bond valuation shocks partially offset equity market stress and explain the temporary yield surge in mid-March 2020. Estimating ``Realized'' Skewness using Convolutional Neural Network 1Technische Universität Dresden, Germany; 2University of Lausanne, Switzerland We propose a new estimator of low-frequency skewness that exploits high-frequency data through a direct functional mapping consisting of layers of convolutional neural networks followed by layers of MLPs. We show that the relevant high-frequency features converge to a continuous limit and that the latent skewness admits a continuous functional representation. This allows us to establish the unbiasedness of our NN estimator using classical universal approximation results and Rademacher complexity arguments. Monte Carlo experiments under stochastic volatility models, with and without jumps, show that the estimator reduces finite-sample bias relative to existing realized-skewness estimators and remains accurate under model misspecification. Empirically, our estimator exhibits temporal stability and delivers superior cross-sectional pricing performance in skewness-sorted portfolios. Another application finds no evidence that ESG-oriented firms exhibit lower crash risk. Overall, the results demonstrate how learning-based functionals can improve the estimation of nonlinear distributional characteristics from high-frequency data. |
| 4:00pm - 6:00pm | Statistics for Stochastic Processes Location: 0.002 Session Chair: Fabian Mies |
|
|
Sharp adaptive nonparametric testing for a constant volatility Albert-Ludwigs-Universität Freiburg, Germany Based on discrete observations within the nonparametric Gaussian white noise model $dY_t = sigma(t)dW_t$, we develop a test to infer if the volatility function $sigma(cdot)$ is constant. In particular, at prescribed significance, we simultaneously identify those time intervals where a violation of the constancy hypothesis occurs without a priori knowledge of their number and size. The testing procedure is shown to be minimax-optimal and adaptive for infill asymptotics and these results entail that a deviation from the null hypothesis of constancy is best measured in terms of $sup_{tin [0,1]}|sigma(t)^2 /|sigma|_{L^2}^2 - 1|$. The derivation of the optimal constants requires to build hypotheses with height solving $F_n(x)=0$ for given functions $F_n$ and to understand the asymptotic behavior of their solution, which is done using the implicit function theorem. Geometric ergodicity of Langevin dynamics and its discretizations Taras Schevchenko National University of Kyiv, Ukraine We study the Langevin stochastic differential equation and its discrete approximations: the Euler–Maruyama scheme, commonly referred to as the Unadjusted Langevin Algorithm (ULA), and direct sampling from the continuous-time process. We show that the ULA process is geometrically ergodic in $mathbb{R}^d$ under suitable conditions and derive a corresponding drift condition using a Foster–Lyapunov test function. We then analyze time-inhomogeneous approximations with diminishing step sizes and establish geometric recurrence for both chains—the ULA and the directly sampled chain. Topology Matters for High-Frequency Inference: Weak Convergence of Stochastic Integrals in M1 University of Luxembourg, Germany Statistical analysis of stochastic processes increasingly relies on functional limit theorems for path-dependent estimators, particularly in the presence of jumps. Many estimators in econometrics and time series analysis, such as statistics used for cointegration testing, self-normalized inference, or high-frequency volatility estimation, can be expressed as functionals of stochastic integrals with random, data-dependent integrands, or as continuous-time limits thereof. Their asymptotic validity therefore hinges on weak convergence results that remain stable beyond the classical continuous-path regime. In particular, Skorokhod’s M1 topology becomes increasingly relevant, since it captures convergence in situations where large discontinuities are approximated by clusters of smaller jumps, a behavior that is typically not captured in the classical framework of the J1. Such phenomena arise naturally in econometrics and high-frequency data settings. This talk develops a weak limit theory for stochastic integrals on the space of càdlàg paths under Skorokhod’s M1 topology. I present a new, self-contained approach based on good decompositions of semimartingale integrators, yielding tractable conditions under which Itô integration is continuous jointly in the integrator and integrand. The results unify classical J1 continuity theorems and provide new conclusions in M1. I also show that for families of local martingales, M1-tightness implies J1-tightness under a mild localised uniform integrability condition. I conclude with a discussion of applications, including anomalous diffusion models represented as stochastic integrals with respect to continuous-time random walks. |
| 4:00pm - 6:00pm | Nonparametric statistics Location: 0.004 Session Chair: Anne Leucht |
|
|
Nonparametric spectral density estimation using interactive mechanisms under local differential privacy 1CREST, ENSAE, IP PARIS, France; 2University of Kassel, Germany; 3University of Vienna, Austria We are interested in the spectral density of a centered stationary Gaussian time series under local differential privacy constraints. Specifically, we propose new interactive privacy mechanisms for three tasks: recovering a single covariance coefficient, recovering the spectral density at a fixed frequency, and globally. Our approach achieves faster rates through a two-stage process: we apply first the Laplace mechanism to the truncated value and then use the former privatized sample to gain knowledge on the dependence mechanism in the time series. For spectral densities belonging to Hölder and Sobolev smoothness classes, we demonstrate that our algorithms improve upon the non-interactive mechanism of Kroll (2024) for small privacy parameter α, since the pointwise rates depend on nα² instead of nα⁴. Moreover, we show that the rate 1/(nα⁴) is optimal for estimating a covariance coefficient with non-interactive mechanisms. However, the L2 rate of our interactive estimator is slower than the pointwise rate. We show how to use these procedures to provide a bona-fide, locally differentially private estimator of the full covariance matrix. Detecting Periodicity of a General Stationary Time Series via AR(2)-Model Fitting 1TU Braunschweig, Germany; 2University of Cyprus; 3Cyprus Academy of Sciences, Letters and Arts Estimating the periodicity of a stationary time series via fitting a second order stationary autoregressive (AR(2)) model has been initiated by the seminal paper of Yule(1927). We investigate properties of this procedure when applied to general stationary processes possessing a spectral density with a dominant peak at some frequency λ0 in (0,π). Conditionally specified graphical modeling of stationary multivariate time series 1Texas A&M University, United States of America; 2Universiteat Heidelberg, Germany Graphical models are ubiquitous for summarizing conditional relations in multivariate data. In many applications involving multivariate time series, it is of interest to learn an interaction graph that treats each individual time series as nodes of the graph, with the presence of an edge between two nodes signifying conditional dependence given the others. Typically, the partial covariance is used as a measure of conditional dependence. However, in many applications, the outcomes may not be Gaussian and/or could be a mixture of different outcomes. For such time series using the partial covariance as a measure of conditional dependence may be restrictive. In this article, we propose a broad class of time series models which are specifically designed to succinctly encode process-wide conditional independence in its parameters. For each univariate component in the time series, we model its conditional distribution with a distribution from the exponential family. We develop a notion of process-wide compatibility under which such conditional specifications can be stitched together to form a well-defined strictly stationary multivariate time series. We call this construction a conditionally exponential stationary graphical model (CEStGM). A central quantity underlying CEStGM is a positive kernel which we call the interaction kernel. Spectral properties of such positive kernel operators constitute a core technical foundation of this work. We establish process-wide local and global Markov properties of CEStGM exploiting a Hammersley-Clifford type decomposition of the interaction kernel. Further, we study various probabilistic properties of CEStGM and show that it is geometrically mixing. An approximate Gibbs sampler is also developed to simulate sample paths of CEStGM. |
| 4:00pm - 6:00pm | Topics in functional data analysis Location: 1.012 Session Chair: Siegfried Hörmann |
|
|
Measuring dependence between a categorical response and a functional covariate Graz University of Technology, Austria We suggest a dependence coefficient between a categorical variable and some general variable taking values in a metric space. In particular, this framework includes functional data. We derive important theoretical properties and study the large sample behaviour of our suggested estimator. Moreover, we develop an independence test and prove that it is consistent against any violation of independence. The test is also applicable to the classical $K$-sample problem with possibly high- or infinite-dimensional distributions. Rate-optimal estimation for synchronously sampled functional data Philipp-Universität Marburg, Germany We obtain minimax-optimal convergence rates in the supremum norm, Beyond the positive drift: Comparing historical and current daily temperature patterns based on two sample statistics for unbalanced dense-sparse functional data Marburg University, Germany The two-sample problem for functional data is investigated for discrete, synchronous designs in each sample, in settings in which one sample is densely observed while the other is only relatively sparsely observed. This is motivated by comparing historical and more current daily temperature patterns, where more recent devices take measurements every 10 minutes, while historical measurements in the time period 1952 to 1972 are available only every hour. We use recently developed methods from transfer learning for functional data to estimate the difference of the mean functions at optimal rates in the supremum norm. Further, we derive a central limit theorem in the space of continuous functions and discuss the construction of uniform confidence bands using the multiplier bootstrap. We also show how our methods can be extended to functional time series. |
| 7:30pm - 10:00pm | Dinner |
| Date: Friday, 20/Mar/2026 | |
| 8:50am - 10:20am | Time Series Econometrics Location: 0.001 Session Chair: Carsten Jentsch |
|
|
Pitfalls of Inference in Panels with Cross-Dependence of Uncertain Strength TU Dortmund, Germany When panel data exhibit cross-sectional dependence, particular care is required, as cross-dependence may be induced by omitting relevant variables. If these variables correlate with the regressors, rendering them endogenous, sophisticated approaches such as the CCE approach or the PC estimator are recommended. These approaches may however be difficult to implement or build on strong assumptions. Therefore, if regressor endogeneity can reasonably be excluded, it is common to resort to simpler estimators in conjunction with panel-robust standard errors. Structural analysis in matrix-autoregressive models TU Dortmund University, Germany We consider a structural matrix-autregressive (SMAR) model to conduct impulse response analysis for structural shocks to matrix-valued time series. The MAR model of order $p$ offers a parsimonious and interpretable framework for these time series, thus addressing issues of high-dimensionality in corresponding vector-autoregressive (VAR) models. To interpret the dynamics, we resort to impulse response analysis as a popular tool from the SVAR context. Its conclusions rely on the valid identification of structural shocks that are mutually contemporaneously uncorrelated and interpretable. In contrast to the existing literature, the proposed SMAR model enables the identification of multiple structural shocks. To address the restrictive nature of the single-term MAR($p$) model, we discuss the extension to a multi-term SMAR($p$) model as a compromise between the single-term SMAR and the (unrestricted) SVAR model, trading off parsimony against flexibility. We discuss its identification, focusing in particular on issues that arise due to the typical Kronecker-product structure of the coefficient matrices in the MAR framework. Further, we discuss estimation and inference in the general multi-term SMAR($p$) model, including a bootstrap method to compute confidence bands for the impulse response curves. In this context, a key point concerns model misspecification and the use of MAR models to approximate more general SVAR data generating processes. Finally, we demonstrate the performance and practical use of our approach by Monte Carlo simulations and a real data application. Specification Tests for Vector Multiplicative Error Models Charles University, Czech Republic Vector Multiplicative Error Models (vMEMs) provide a flexible framework for modeling multivariate non-negative time series. Within this framework, each variable is expressed as the product of its conditional mean—modeled as a function of past observations—and a positive innovation with unit expectation. Consequently, the model can capture dynamic cross-dependencies and have proven useful in applications such as modeling durations, volatilities, and trading volumes. This contribution focuses on goodness-of-fit (GOF) tests for vMEMs, aiming to assess whether the model structure and the assumed innovation distribution adequately reflect the properties of the observed data. We propose a GOF test statistic and derive its asymptotic distribution under the null hypothesis. The performance of a bootstrap version of the test is illustrated through Monte Carlo simulations. |
| 8:50am - 10:20am | Discrete time series Location: 0.002 Session Chair: Christian H. Weiß |
|
|
A universal time series model (for discrete data) Helmut Schmidt University Hamburg, Germany A novel time series framework is proposed which addresses all relevant empirical properties of a time series, making it an essentially universal model. More specifically, the dynamics in all conditional moments of a suitable continuous or discrete distribution are modeled jointly and without the need to make restrictive assumptions about the functional form of the link functions. Furthermore, all considered explanatory variables are allowed to exhibit nonlinear and potentially time-varying effects on the conditional moments. This can be achieved by employing a simple feedforward neural network with a single hidden layer and an output for each conditional moment (parameter). In contrast to many (deep) neural network approaches, the proposed model is stochastically interpretable and allows for the calculation of standard errors, and in particular, confidence intervals. Many conventional time series frameworks such as (integer-valued) GARCH can be interpreted as simplified special cases of the proposed model. Several empirical applications are presented to illustrate the capabilities and the implementation. A Feature-Based Approach to Generate Time Series of Counts 1LIAAD INESC TEC, Faculdade de Economia da Universidade do Porto; 2Universidade de Aveiro, CIDMA; 3Faculdade de Engenharia da Universidade do Porto, CIDMA Research on count time series has grown substantially, leading to the development of numerous models designed to capture key characteristics such as trends, seasonality, overdispersion, outliers, and complex dependence structures. Despite these advances, the evaluation of such models remains challenging due to the limited availability of real-world count time series. This scarcity often forces researchers to illustrate new methods using only a few datasets, which restricts systematic comparison and hinders robust performance assessment. Addressing this gap is essential for advancing methodological development and ensuring practical applicability in diverse domains. This work is financed by National Funds through the FCT - Fundação para a Ciência e a Tecnologia, I.P. (Portuguese Foundation for Science and Technology) within the project TSP2Net, with reference 2023.13039.PEX, https://doi.org/10.54499/2023.13039.PEX A new class of generalized INARMA models: estimation and testing against INGARCH alternatives Karlsruhe Institute of Technology, Germany INAR and INGARCH-type processes are widely used approaches to model time series of counts. In this talk, I will speak about a class of generalized INARMA (integer-valued autoregressive) models which contains both of the aforementioned types of models as special cases. Notably I will outline a generalization of the INAR model which parallels the extension of the INARCH to the INGARCH process. Special attention is given to inference questions. These include maximum likelihood, moment-based and Gaussian quasi-likelihood techniques for parameter estimation. Moreover, I will discuss various testing problems. The developed methods are illustrated in simulation studies and a data example on childhood diseases in the German state of Bavaria. |
| 8:50am - 10:20am | High-dimensional statistics and learning Location: 0.004 Session Chair: Martin Wahl |
|
|
Self-regularized learning methods University of Stuttgart, Germany We introduce a new framework for the theoretical analysis of learning algorithms called self-regularization. In a nutshell, self-regularized learning algorithms guarantee implicitly that they produce sufficiently regular prediction functions. Central examples of self-regularized learning algorithms include gradient descent and regularized empirical risk minimization. We establish a general theory for the statistical analysis of self-regularized algorithms which in many cases yields minmax-optimal learning rates. Max Schölpple, Ingo Steinwart Institut für Stochastik und Anwendungen, Universität Stuttgart, Pfaffenwaldring 57, Concentration and moment inequalities for heavy-tailed random matrices universität wien, Austria Fuk-Nagaev and Rosenthal-type inequalities are proven for the sums of independent random matrices, focusing on the situation when the norms of the matrices possess finite moments of only low orders. The bounds depend on the intrinsic dimensional characteristics, such as the effective rank, as opposed to the dimension of the ambient space. The advantages of such results are illustrated in several applications, including new moment inequalities for sample covariance matrices and the corresponding eigenvectors of heavy-tailed random vectors. Authors: Moritz Jirak, Stanislav Minsker, Yiqiu Shen, Martin Wahl Laplacian eigenmaps for bounded manifolds and the Neumann Laplacian Universität Bielefeld, Germany The spectrum of the Laplace-Beltrami operator encodes essential geometric information about a smooth manifold. In practice, the manifold is unknown, but supports a finite sample of random points. It is then standard to approximate its spectrum by the spectrum of the resulting graph Laplacian. When the manifold is bounded, it is known that the graph Laplacian eigen-converges to the Neumann Laplacian. However, finite sample results, such as convergence rates, are still lacking, and are at the center of this talk. |
| 8:50am - 10:20am | Contributions to Mathematical Statistics Location: 1.002 Session Chair: Mathias Trabs |
|
|
Local polynomial estimation of quantile density functions University of Hamburg, Germany A new approach for nonparametric estimation of quantile density functions based on The new approach uses a local polynomial regression on (F_n(X_i), Q_n(F_n(X_i))), where F_n The new approach has more advantageous properties at the boundary than classical quan- Keywords: asymptotic normality, bias rates, boundary adaptation, empirical quan- Model checks for copula regression Ruhr-Universität Bochum, Germany There is a great variety of statistical models expressing relations between response variables of interest and explanatory variables, ranging from classical conditional mean regression to fully distributional regression models. We are particularly interested in expressing regression models by means of copulas which are a valuable tool to separate marginal distributions and dependencies. New goodness-of-fit tests and new measures of deviation can be developed based on such copula representations. These tests are desirable since regression models often impose parametric or semiparametric assumptions to overcome the curse of dimensionality, running a risk of misspecification. We present a new goodness-of-fit test for the classical mean regression model. More importantly, we also introduce a new measure of deviation between the true regression function and the imposed parametric assumption. By self-normalization, we develop pivotal inference for this measure including tests for relevant hypotheses. These inference tools are illustrated via simulated and empirical data. Rank-based association measures for zero-inflated data 1Eindhoven University of Technology, the Netherlands; 2University of Windsor, Canada; 3University of Quebec in Trois-Rivères, Canada; 4Université Libre de Bruxelles, Belgium Rank-based association measures, including Spearman’s rho, Gini’s gamma and Spearman’s footrule, are well established in continuous settings, but become problematic when ties are present. We investigate these measures in context of zero-inflated data, where continuous random variables have an increased probability mass at zero and there is a substantial number of ties. Such data is commonly found in fields such as insurance, health care and weather forecasting. Traditional rank-based estimators exhibit a large bias in these settings. To overcome this problem, we derive new formulations of the association measures and propose plug-in estimators. In a simulation study, we show that these outperform state-of-the-art estimators. Additionally, we make the estimator interpretable by deriving its achievable bounds. |
| 8:50am - 10:20am | Random Matrix Theory Location: 1.012 Session Chair: Nestor Parolya |
|
|
Nonlinear higher-order shrinkage estimation of the large dimensional covariance and precision matrices 1Delft University of Technology, Netherlands, The; 2Linköping University, Sweden In this paper, we develop nonlinear higher-order shrinkage estimators for both covariance and precision matrices. Our framework applies to settings in which the sample size n is either larger or smaller than p, the dimensionality of the data-generating process. The proposed estimators incorporate higher-order moments up to an arbitrary order and therefore encompass linear shrinkage estimators as special cases. We derive recursive representations of these higher-order nonlinear shrinkage estimators using partial exponential Bell polynomials. Through simulation studies, the proposed methods are compared with the oracle nonlinear shrinkage estimator and are shown to be particularly effective in settings where no closed-form expressions for nonlinear shrinkage estimators are available. The theoretical derivations rely on mild assumptions on the underlying model, including the existence of fourth moments and a bounded spectrum of the true population covariance matrix. The finite-sample performance of the proposed estimators is evaluated in an extensive simulation study and benchmarked against existing approaches. Our main finding is that the higher-order shrinkage estimators can outperform well-established nonlinear shrinkage methods, particularly when the concentration ratio p/n is large. Monitoring for a phase transition in a time series of Wigner matrices 1Aarhus University, Denmark; 2Colorado State University We develop methodology and theory for the detection of a phase transition in a time-series of high-dimensional random matrices. In the model we study, at each time point $ t = 1,2,ldots $, we observe a deformed Wigner matrix $ mathbf{M}_t $, where the unobservable deformation represents a latent signal. This signal is detectable only in the supercritical regime, and our objective is to detect the transition to this regime in real time, as new matrix--valued observations arrive. Central limit theorems for linear eigenvalue statistics of random geometric graphs Leiden University, Netherlands, The Random geometric graphs provide a fundamental model for spatially embedded networks, yet their spectral fluctuations remain poorly understood. In this talk, I will present the first rigorous results on Gaussian fluctuations of linear eigenvalue statistics for such graphs. Specifically, we establish central limit theorems for quantities of the form $mathrm{Tr}[phi(A)]$, where $A$ denotes the adjacency matrix and $phi$ belongs to a broad class of test functions, including non-polynomial functions. In the polynomial setting, we go further and prove a quantitative central limit theorem with an explicit rate of convergence to the limiting Gaussian distribution. I will also discuss extensions of these results to other canonical spatial networks, such as $k$-nearest neighbor graphs and relative neighborhood graphs. Together, these results highlight new mechanisms governing spectral fluctuations in random spatial structures and reveal a delicate interplay between geometry, local dependence, and spectral behavior. The talk is based on joint work with Christian Hirsch (Aarhus) and Kyeongsik Nam (Seoul). |
| 10:20am - 10:50am | Coffee break 5 |
| 10:50am - 11:50am | Time Series Econometrics Location: 0.001 Session Chair: Carsten Jentsch |
|
|
A two-sample smooth test for multivariate dependent data Vrije Universiteit Amsterdam, Netherlands, The In this talk, we consider a two-sample smooth test for testing the equality of multivariate distributions. Dependency between the two samples is allowed for. For instance, the data can be mixing. The asymptotic distribution under the null hypothesis is derived, and consistency of the two-sample smooth test for dependent samples is shown. Satterthwaite Approximation and Gaussian Time Series 1UCLouvain, Belgium; 2Université Libre de Bruxelles, Belgium Satterthwaite (1941, 1946) proposed a very simple approximation to the distribution of linear combinations of Chi-squared random variables. It can be used in univariate time series analysis to approximate the distribution of the sample variance and the periodogram of Gaussian time series; we provide Wasserstein bounds and rates of convergence of the approximation towards the true distribution. Similarly, Tan & Gupta (1983) proposed an approximation to the distribution of linear combinations of Wishart random matrices. This, however, has not yet been applied to the framework of multivariate time series: we take advantage of a special case of the matrix normal distribution to propose a feasible approximation to the distribution of the sample covariance matrix of Gaussian time series. |
| 10:50am - 11:50am | Discrete time series Location: 0.002 Session Chair: Christian H. Weiß |
|
|
Estimating parameters for long-range dependence via ordinal patterns 1Siegen University, Germany; 2University Twente, The Netherlands; 3Ruhr University Bochum, Germany The ordinal structure of long-range dependent time series is analyzed. To this end, so-called ordinal patterns are used, which describe the relative position of consecutive data points. Two estimators are provided for the probabilities of ordinal patterns and we prove limit theorems in different settings, namely for funtions of Hermite Rank 1 and 2. In the second setting, a Rosenblatt distribution in the limit is encountered. In the context of fractional Gaussian noise, the limit distribution is derived for an estimation of the Hurst parameter H if it is higher than 3/4. Thus, the theorems complement results for lower values of H, which can be found in the literature. Transcripts and Algebraic Distances in Time Series: Stochastic Properties and Nonparametric Dependence Tests 1Helmut Schmidt University, Hamburg, Germany; 2Universidad Miguel Hernández, Elche, Spain The use of ordinal patterns (OPs) for analyzing the dependence structure of univariate and continuously distributed processes has gained popularity in recent years. Here, we go one step further and consider the transcripts being computed from successive OPs in the time series. Transcripts constitute a kind of "difference" between successive OPs and thus naturally relate to two algebraic distances between OPs, the Cayley and Kendall distance. We transform the original time series into a sequence of transcripts or distances, respectively, and derive important stochastic properties thereof. We show that these properties differ substantially between different types of original process. This motivates to develop various statistics based on transcripts and algebraic distances in order to investigate the dependence structure of the original process. In particular, we derive the asymptotic distribution of these statistics under the null hypothesis of serial independence, which is then used to develop nonparametric tests for serial dependence. A simulation study shows that these novel dependence tests have appealing power properties, often outperforming the former OP-based dependence tests. We conclude with a real-world data example, where we illustrate the application and interpretaion of the proposed approaches in practice. |
| 10:50am - 11:50am | Inference in Wasserstein Spaces and Optimal Transport Location: 0.004 Session Chair: Ansgar Steland |
|
|
Sliced-Wasserstein distance based change detection with sequential empirical processes 1University of Bamberg; 2RWTH Aachen University; 3Delft University of Technology We study the problem of detecting changes in the marginal distributions of a multivariate time series with a novel CUSUM-type detector statistic based on the (maximum-) sliced-Wasserstein distance. This projection-based approach has two appealing properties. Firstly, unlike the family of Wasserstein distances, it does not suffer from the curse of dimensionality. And secondly, by means of the Kantorovich duality, asymptotic properties of the so-defined detector statistic can be derived from results for (sequential) empirical processes for nonstationary time series. This talk presents new weak limit theorems for sequential empirical processes under the functional dependence measure and their application to the given testing problem. Practical implications, limitations and possible extensions are discussed. Distributional Convergence of Empirical Entropic Optimal Transport and Applications Georg August Universität Göttingen, Germany The statistical properties of empirical entropic optimal transport (empirical EOT) have attracted great interest, as this quantity has been shown to be useful for complex data analysis, among other reasons due to its computational efficiency. In several applications, it has been realized that in addition to the optimal value, also the EOT plan carries important information. For example, in cell biology, colocalization analysis based on the EOT plan has been introduced as a measure for quantification of spatial proximity of different protein assemblies. Despite recent progress in the analysis of its risk properties, a precise understanding of its statistical fluctuations to make it accessible for inference remains elusive to some extent. We derive asymptotic weak convergence result for a large class of functionals of the EOT plan, in which the colocalization process is included. As an application, we obtain uniform confidence bands for colocalization curves and bootstrap consistency. Our theory is supported by simulation studies and is illustrated by real world data analysis from mitochondrial protein colocalization. |
| 11:55am - 12:55pm | Plenary Lecture 4 Location: 0.004 |
|
|
Unlocking the Regression Space Queen Mary University of London, United Kingdom This paper introduces and analyzes a framework that accommodates general heterogeneity in regression modeling. It demonstrates that regression models with fixed or time-varying parameters can be estimated using OLS and time-varying OLS methods, respectively, across a broad class of regressors and noise processes not covered by existing theory. The proposed setting facilitates the development of asymptotic theory and the estimation of robust standard errors. The resulting robust confidence interval estimators accommodate substantial heterogeneity in both regressors and noise. The robust standard error estimates coincide with White’s (1980) heteroskedasticity-consistent estimator but apply under much broader conditions, including models with missing data. The methods are computationally simple and perform well in Monte Carlo simulations, making them highly suitable for empirical applications. The paper also provides a brief empirical illustration. |
| 12:55pm - 1:00pm | Closing Location: 0.004 |
| 1:00pm - 2:00pm | Lunch break 3 |

