High-dimensional Discrete Inference

Sponsor: National Science Foundation (NSF)

Award Number: 1107067

PI: Luis Carvalho

Abstract:

Recent advances in the last decade have brought attention to the analysis of high-dimensional data and, in particular, to estimation on high-dimensional spaces. Such spaces are often structured by either exhibiting constraints on specific space components or by the incorporation of prior information identifying co-dependence patterns between components in order to help carrying out the inference. Given the recent predominance of discrete inference problems in many influential fields, the investigator takes on the critically important task of discrete estimation in high-dimensional settings and aims at laying foundational principles upon which estimation and characterization of high-dimensional discrete spaces can be efficiently performed and where structural properties of the space are adequately taken into account. More specifically, the PI explores estimators formally derived from statistical decision theory and based on loss functions that more naturally capture the features of the discrete space and are thus arguably better representatives of the ensemble. If the discrete space is constrained, obtaining an efficient procedure for estimation is of prime concern given the large size of the space; to this end, the PI also proposes to develop a general framework that can be explored to design efficient procedures for inference, assess the computational complexity of the proposed estimation, and further derive approximation schemes when needed. In addition, the investigator applies the proposed foundations to highlight important features of the discrete space such as regions of high concentration of probability mass, and studies a method to jointly elucidate features and identify good subspace representatives.

Many problems from fields like genetics, social sciences, molecular biology, and environmental studies can be casted as statistical inference problems on a large number of unknowns. Even though modern, high-throughput technology has enabled the collection of large datasets, these problems remain hard since the number of parameters describing the data generating process grows with the number of observations. In this setting, it is helpful to associate structure to the model in order to guide the inference. The investigator studies novel, principled estimators that address two issues under this high-dimensional regimen: effectively capture structural relationships among variables in the model, and efficiently derive solutions through computationally feasible routines. The PI intends to implement and publish the resulting methods as open-source software that benefits both academia and industry, and further fosters the development of algorithms and practical implementations. Through this research project the PI also intends to promote the integration of research and education by developing new courses and raise awareness for statistical analysis of high-dimensional data and inference on discrete spaces with state-of-the-art methods. Finally, the PI expects to encourage collaborations between statisticians and researchers from other fields and promote statistical methods in interdisciplinary areas.

For more information, click here.