Invited speakers

Rémi Bardenet (CNRS, Université de Lille): Tutorial - What is the Gaussian process of point processes?

Consider a kernel function, say positive semi-definite, and think of a distribution over functions parametrized by that kernel. There are chances that you are thinking of a Gaussian process, a probability distribution that has found many uses in uncertainty quantification, Bayesian optimization, etc. Now try to think of a natural *point process* parametrized by that same kernel. The situation is more complicated, and more than one candidate come to mind. To wit, point processes are random sets of isolated points, just like Gaussian processes are random functions. In this tutorial, I will present a few point processes parametrized by kernels, and survey a few theoretical results that support their interest in numerical integration and signal processing.

Margaux Brégère (EDF Lab Paris Saclay, Sorbonne Université): Explainability of electricity demand forecasting models: a Shapley value approach for positive component decomposition

Electricity is difficult to store at large scale, except at prohibitive cost, which is why the balance between generation and consumption must be strictly maintained at all times. It is traditionally managed by anticipating the demand as well as the intermittent production (wind, solar), in order to adjust the flexible production (hydro dams, nuclear power plants, coal and gas plants) accordingly. Accurate forecasts of electricity demand and renewable production are therefore essential to guarantee grid performance and stability. State-of-the-art demand forecasting models are complex online machine learning or statistical models that take weather and calendar variables as inputs. Explainability is needed to understand the models, analyze forecasting errors and interpret changes in the demand curve (quantification of eco-actions, penetration of new uses such as electric vehicles, etc.). In practice, this is currently only feasible if the model proposes an intrinsic decomposition according to each variable. That is one of the reasons why Generative Additive Models (GAMs, see among others Pierrot and Goude [2011]) are very often used, as their additive structure can be used for explainability. Yet, some recent and powerful models such as deep neural networks (Keisler et al. [2024]) actually suffer from a lack of interpretability: they are no longer additive are present a huge number of parameters. We propose a Shapley value approach to decompose the forecasts in positive component.

Lucas Drumetz (IMT Atlantique): Spatialized Bayesian inference and uncertainty quantification on constrained co-domains with Gaussian Processes: application to the simplex

In this talk, we are interested in Bayesian inference and uncertainty quantification (UQ) for inverse problems, under constraints. The existing literature typically tackles constrained domains (e.g. real or vector valued functions on the sphere in geosciences); in contrast we are interested in the case of fields on R^n (e.g. images for n=2) with constrained co-domains (or values). This has received much less attention despite clear applicative potential. We will present a way to perform tractable Bayesian inference in such situations by adapting Gaussian Processes to handle constrained co-domains. Beyond inference, we will show how to make sense of UQ diagnostics and visualizations for such types of non Euclidean data. We will illustrate these concepts in the case of simplex constraints with several applications in remote sensing and Machine Learning.

Pierre Gloaguen (Université Bretagne Sud): Bayesian Modelling of Abundance Data in Ecology Using Joint Species Distribution Models

In this talk, I will discuss the use of joint species distribution models (JSDMs) in ecology to analyze data from species abundance surveys. Using a recent Bayesian model as a guiding example, I will explain how two sources of dependence in the data are modeled: spatial dependence and residual dependence between species, the latter often being considered a proxy for ecological interactions. I will also discuss the inference challenges that arise in this context.
This is joint work with Heloïse Rozier, Baptiste Alglave, and François Septier.

Javier González-Delgado (ENSAI): Inference after human genetic clustering

Human genetic variation is both highly complex and deeply hierarchical, making its characterization challenging. Clustering techniques play a central role in how we interpret genetic variation and reconstruct population history, being widely used in genome-wide association studies. However, genetic research has shown that humans cannot be divided into biologically discrete subgroups: most genetic variation is distributed along continuous gradients, and clear boundaries between populations cannot be objectively defined. Consequently, the use of clusters risks reinforcing typological thinking, that is, the mistaken assumption that humans fall into a limited set of distinct types. It may also artificially amplify within-group homogeneity and between-group differences.

In this context, equipping clustering methods (which are inherently exploratory) with inferential tools may help strengthen the reliability of genetic data analyses. Such tools would make it possible to assess whether individuals assigned to different clusters truly differ genetically or, conversely, whether two clusters returned by an algorithm should be regarded as representing the same group. In this talk, we examine the applicability of post-clustering inference in human genetic clustering. Our results show that it performs well for some families of clustering algorithms, while also revealing theoretical limitations that emerge when the inference is applied to more sophisticated methods tailored to genetic datasets.

Sylvain Le Corff (Sorbonne Université): On Forgetting and Stability of Score-based Generative models

Understanding the stability and long-time behavior of generative models is a fundamental problem in modern machine learning. This talks provides quantitative bounds on the sampling error of score-based generative models by leveraging stability and forgetting properties of the Markov chain associated with the reverse-time dynamics. Under weak assumptions, we provide the two structural properties to ensure the propagation of initialization and discretization errors of the backward process: a Lyapunov drift condition and a Doeblin-type minorization condition. A practical consequence is quantitative stability of the sampling procedure, as the reverse diffusion dynamics induces a contraction mechanism along the sampling trajectory. Our results clarify the role of stochastic dynamics in score-based models and provide a principled framework for analyzing propagation of errors in such approaches.

Apolline Louvet (INRAE Avignon): Percolation, weeds and Bayesian inference - Assessing seed bank influence on plant metapopulation dynamics

Understanding the drivers of biodiversity in urban environments is a key question in urban ecology. As urban environments are very fragmented and subject to frequent disruptions, the biological traits associated to survival are expected to differ from the ones selectively advantaged in natural environments. In particular, one such trait could be the ability to enter a dormant stage and form a seed bank. However, direct detection of seed banks remains challenging and requires intense monitoring efforts, which motivates the use of indirect methods based on plant observations.
In this talk, we consider inference frameworks for seed bank and plant metapopulation parameters based on Hidden Markov Models. We show that uncertainty on dormancy parameters is often caused by the influence of the seed bank on the observed metapopulation dynamics being limited. As a result, we introduce metrics accounting for the effect of the (potential) seed bank on the observed dynamics, and show that their estimation is less data-demanding than raw dormancy parameter values. We then apply our framework to yearly floristic inventories carried out in 1324 tree bases in Paris, France.

Natalie Maus (MIT)
Pierre Tandeo (IMT Atlantique): Quantifying uncertainty in climate reanalyses

Climate reanalyses such as ERA5 allow the study of numerous variables over long periods. These reanalyses integrate a large number of historical observations. These observations are assimilated into a general circulation model. Most of the time, reanalyses data are obtained using ensemble methods that allow for the estimation of uncertainties in the reanalyzed fields. These assimilation uncertainties, also known as a posteriori errors, depend on two factors: the model error covariance, denoted Q and referring to the a priori error, and the observation error covariance, denoted R, which takes into account instrumentation error and representativeness error.

In this talk, I will demonstrate the importance of the Q and R covariances in estimating the uncertainty of data assimilation algorithms. I will review several algorithms for jointly estimating the Q and R matrices. I will also introduce a new metric to quantify the uncertainties of reanalyses: the credibility score. It is based on the notion of confidence interval, coverage probability, and allows us to check whether uncertainties are underestimated or overestimated.

Privacy | Accessibility: non-compliant