|
|
|
Invited speakers
Consider a kernel function, say positive semi-definite, and think of a distribution over functions parametrized by that kernel. There are chances that you are thinking of a Gaussian process, a probability distribution that has found many uses in uncertainty quantification, Bayesian optimization, etc. Now try to think of a natural *point process* parametrized by that same kernel. The situation is more complicated, and more than one candidate come to mind. To wit, point processes are random sets of isolated points, just like Gaussian processes are random functions. In this tutorial, I will present a few point processes parametrized by kernels, and survey a few theoretical results that support their interest in numerical integration and signal processing.
Electricity is difficult to store at large scale, except at prohibitive cost, which is why the balance between generation and consumption must be strictly maintained at all times. It is traditionally managed by anticipating the demand as well as the intermittent production (wind, solar), in order to adjust the flexible production (hydro dams, nuclear power plants, coal and gas plants) accordingly. Accurate forecasts of electricity demand and renewable production are therefore essential to guarantee grid performance and stability. State-of-the-art demand forecasting models are complex online machine learning or statistical models that take weather and calendar variables as inputs. Explainability is needed to understand the models, analyze forecasting errors and interpret changes in the demand curve (quantification of eco-actions, penetration of new uses such as electric vehicles, etc.). In practice, this is currently only feasible if the model proposes an intrinsic decomposition according to each variable. That is one of the reasons why Generative Additive Models (GAMs, see among others Pierrot and Goude [2011]) are very often used, as their additive structure can be used for explainability. Yet, some recent and powerful models such as deep neural networks (Keisler et al. [2024]) actually suffer from a lack of interpretability: they are no longer additive are present a huge number of parameters. We propose a Shapley value approach to decompose the forecasts in positive component.
In this talk, we are interested in Bayesian inference and uncertainty quantification (UQ) for inverse problems, under constraints. The existing literature typically tackles constrained domains (e.g. real or vector valued functions on the sphere in geosciences); in contrast we are interested in the case of fields on R^n (e.g. images for n=2) with constrained co-domains (or values). This has received much less attention despite clear applicative potential. We will present a way to perform tractable Bayesian inference in such situations by adapting Gaussian Processes to handle constrained co-domains. Beyond inference, we will show how to make sense of UQ diagnostics and visualizations for such types of non Euclidean data. We will illustrate these concepts in the case of simplex constraints with several applications in remote sensing and Machine Learning.
In this talk, I will discuss the use of joint species distribution models (JSDMs) in ecology to analyze data from species abundance surveys. Using a recent Bayesian model as a guiding example, I will explain how two sources of dependence in the data are modeled: spatial dependence and residual dependence between species, the latter often being considered a proxy for ecological interactions. I will also discuss the inference challenges that arise in this context.
Human genetic variation is both highly complex and deeply hierarchical, making its characterization challenging. Clustering techniques play a central role in how we interpret genetic variation and reconstruct population history, being widely used in genome-wide association studies. However, genetic research has shown that humans cannot be divided into biologically discrete subgroups: most genetic variation is distributed along continuous gradients, and clear boundaries between populations cannot be objectively defined. Consequently, the use of clusters risks reinforcing typological thinking, that is, the mistaken assumption that humans fall into a limited set of distinct types. It may also artificially amplify within-group homogeneity and between-group differences.
Understanding the stability and long-time behavior of generative models is a fundamental problem in modern machine learning. This talks provides quantitative bounds on the sampling error of score-based generative models by leveraging stability and forgetting properties of the Markov chain associated with the reverse-time dynamics. Under weak assumptions, we provide the two structural properties to ensure the propagation of initialization and discretization errors of the backward process: a Lyapunov drift condition and a Doeblin-type minorization condition. A practical consequence is quantitative stability of the sampling procedure, as the reverse diffusion dynamics induces a contraction mechanism along the sampling trajectory. Our results clarify the role of stochastic dynamics in score-based models and provide a principled framework for analyzing propagation of errors in such approaches.
Understanding the drivers of biodiversity in urban environments is a key question in urban ecology. As urban environments are very fragmented and subject to frequent disruptions, the biological traits associated to survival are expected to differ from the ones selectively advantaged in natural environments. In particular, one such trait could be the ability to enter a dormant stage and form a seed bank. However, direct detection of seed banks remains challenging and requires intense monitoring efforts, which motivates the use of indirect methods based on plant observations.
Climate reanalyses such as ERA5 allow the study of numerous variables over long periods. These reanalyses integrate a large number of historical observations. These observations are assimilated into a general circulation model. Most of the time, reanalyses data are obtained using ensemble methods that allow for the estimation of uncertainties in the reanalyzed fields. These assimilation uncertainties, also known as a posteriori errors, depend on two factors: the model error covariance, denoted Q and referring to the a priori error, and the observation error covariance, denoted R, which takes into account instrumentation error and representativeness error. |
Loading...