博士学位论文

主要内容

2019

2018

稀疏高维模型中的渐近推理
Jana Jankova
博士论文,苏黎世联邦理工学院,2018。

具有稀疏结构的高维数据出现在科学、工业和娱乐的许多领域。不同的应用程序激发了设计高效统计方法来分析高维数据集的需求。虽然点估计的方法是复杂的,一般都很好理解,但在许多应用中,通过提供置信区间、p值或检验来量化统计不确定性是必要的。在本论文中,我们专注于开发具有稀疏结构的特定高维环境下的不确定性估计的有效方法和理论。在第二章中,我们研究了高维逆协方差矩阵的估计。在lasso正则化估计的基础上,我们提出了一种构造精度矩阵项的渐近正态估计的简单方法。提供了两个显式结构:一个基于全局方法,最大联合似然和一个基于局部(节点)方法,顺序应用Lasso。当应用于高斯图形模型时,提出的估计会导致边的权值或边结构的恢复的置信区间。我们在广泛的模拟研究中评估它们的经验表现。该方法的理论保证是在相对于样本容量的稀疏性条件下,以及温和的分布和规律性条件下实现的。 Additionally, we apply the results derived in this chapter to construct confidence intervals for edge weights in directed acyclic graphs. In Chapter 3, we construct confidence intervals for loadings in high-dimensional principal component analysis. The non-convexity of the problem is handled by proposing a computationally efficient two-step procedure which yields a near-oracle estimator of the loadings vector. We derive oracle inequalities for the estimator and propose a de-biasing scheme to obtain an asymptotically normal estimator. We also provide an asymptotically valid confidence interval for the maximum eigenvalue of the underlying covariance matrix. Asymptotic guarantees are derived under a sparsity condition on the vector of loadings and sparsity in the inverse Hessian of the population risk function, under mild distributional and regularity conditions. In Chapter 4, motivated by robust regression, we explore construction of confidence intervals in settings where the loss function may not be differentiable. We show that differentiability of the loss function is not essential and may be replaced by differentiability of the expected loss and an entropy condition measuring the complexity of the considered class of functions. We apply these results to particular estimators which arise in robust regression and show that a de-biased estimator has entry-wise Gaussian limiting distribution. The price we pay for non-differentiability is a stronger sparsity condition on the high-dimensional parameter. Chapter 5 explores asymptotic efficiency of de-biased estimators in high-dimensional linear regression and Gaussian graphical models. The classical theory on asymptotic lower bounds on variance is not directly applicable in the high-dimensional settings due to the model changing with the sample size. We derive lower bounds on the variance of estimators which are strongly asymptotically unbiased, roughly meaning that their squared bias is of smaller order than variance. For the linear model under Gaussianity, we show that a de-biased estimator based on the Lasso achieves the asymptotic lower bound and is in this sense efficient, under sparsity conditions on both the high-dimensional parameter and the Fisher information matrix. We provide analogous results for Gaussian graphical models. As a by-product of our analysis, we establish oracle inequalities for the l1 -error of the Lasso, which hold in expectation.

2017

用于高维数据同化的集合卡尔曼粒子滤波器
Sylvain罗伯特
博士论文,苏黎世联邦理工学院,苏黎世,2017。

数据同化是通过将来自系统动力学定律的信息与一系列观测资料相结合来估计系统的状态,例如数值天气预报(NWP)中的大气。由于观测噪声和初始条件的不确定性的存在,一个概率而不是确定性的方法是首选的。因此,目标是根据所有过去的观测结果估计系统状态分布的时间演化。集合数据同化方法,如集合卡尔曼滤波(EnKF),通过用遵循系统动力学规律的有限粒子样本来表示状态分布来解决这个问题。使地球物理应用的数据同化特别具有挑战性的是,要估计的状态维度非常高(1亿的数量级),而由于沉重的计算成本,集合的大小被限制在100以下。与此同时,物理模型分辨率的提高使得EnKF所依赖的高斯假设越来越不有效。在本论文中,我们提出了对集合卡尔曼粒子滤波器(EnKPF)的扩展,这是一种混合算法,通过将EnKF与粒子滤波器(PF)结合来放松一些高斯假设。这些扩展的目标是使EnKPF适合于非常高维的应用程序。第一个贡献在于提出了算法的两个本地化版本:naive-LEnKPF和block-LEnKPF。naive-LEnKPF与本地EnKF (LEnKF)类似,其工作原理是吸收本地窗口中的数据,然后将结果拼接在一起。 It has the advantage to be simple and efficient, but it does not address the issue of discontinuities introduced by the PF part of the algorithm. The block-LEnKPF, on the other hand, assimilates the observations by blocks and limits their influence to a local area while smoothing out the introduced discontinuities. Both local EnKPFs are applied to an artificial model of cumulus convection of medium dimensionality. The results of the numerical experiments show that the new algorithms perform at a similar level to the LEnKF, and bring some noticeable improvements for non-Gaussian variables such as the precipitation field. The second main contribution of this thesis is to propose a new algorithm, the ensemble transform Kalman particle filter (ETKPF). It is based on a reformulation of the EnKPF in ensemble space, which allows it to be easily and efficiently implemented in an existing full-scale NWP data assimilation framework. Furthermore, the ETKPF replaces the stochastic part of the algorithm with a deterministic scheme, such that it has exact second moment instead of only on expectation. The algorithm was tested on a challenging high-dimensional application at convective scale with COSMO, in a setup similar to the one used operationally at MeteoSwiss. The results of the experiments show the feasibility of the new algorithm in real-world applications and encourage further developments in the direction of localized hybrid particle filters for high-dimensional data assimilation.

结构稀疏性下的渐近置信区域和Sharp Oracle结果
本杰明Stucky
博士论文,苏黎世联邦理工学院,苏黎世,2017。

将我们自己限制在稀疏解决方案的范围内已经成为现代统计、机器学习,尤其是高维线性回归模型的新范式。稀疏解决方案的目标是用一个小核心的活动解释参数来表示信息。作为一个令人愉快的副作用,由此产生的模型具有简单和相对容易的解释。这种模糊稀疏性通常由ℓ1-范数表示,这是活动变量数量的凸松弛。以这种方式降低复杂性是很容易理解的。然而,在实际应用中,我们通常对可能排列的结构有更多的了解。因此,结构稀疏性是近年来出现的一种很有前景的表示底层稀疏结构先验知识的方法。在这篇论文中,我们着重于通过一般规范惩罚体现潜在稀疏模式的先验知识。弱可分解性是理解范数稀疏性结构的一个基本概念。进一步将弱可分解性的思想推广到具有凹惩罚的LASSO型估计。 We also see that sharp oracle results can be obtained in the multivariate model. The square root LASSO is generalized to all weakly decomposable norm penalties, where sharp oracle results are given. The properties of the scaling of these square root estimators have nice applications for constructing χ2 confidence regions for the LASSO. Furthermore assigning uncertainty in high dimensionality for structured sparsity estimators is tackled by means of two related frameworks.

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

1997

浏览器中的JavaScript已被禁用