[Pankaj Mehta] Statistical Inference and Feature Selection in Complex Datasets
Statistical Inference and Feature Selection in Complex Datasets
Pankaj Mehta
Junior Faculty Fellow, Hariri Institute for Computing
Assistant Professor, Physics Department
Boston University
Abstract: Feature selection, identifying a subset of variables that are relevant for predicting a response, is an important and challenging component of many methods in statistics and machine learning. Feature selection is especially difficult and computationally intensive when the number of variables approaches or exceeds the number of samples, as is often the case for many modern datasets. I will discuss a new approach we have developed– the Bayesian Ising Approximation (BIA) –to perform feature selection in this limit. Our results exploit the observation that the evidence take a universal form for strongly-regularizing priors — priors that have a large effect on the posterior probability even in the infinite data limit. We illustrate the power of the BIA using examples drawn from genomics and machine learning.
Bio: Pankaj Mehta is an Assistant Professor of Physics at BU. His group works on theoretical problems at the interface of physics and biology. He is interested in understanding how large-scale, collective behaviors observed in biological systems emerge from the interaction of many individual molecular elements, and how these interactions allow cells to perform complex computations in response to environmental cues. He is the winner of a Sloan Research Fellowship and was recently appointed a Simon’s Investigator in the Mathematical Modeling of Living Systems.