CAREER: Rich and Scalable Optimization for Modern Bayesian Nonparametric Learning
Sponsor: National Science Foundation (NSF)
Award Number: 1452903
PI: Brian Kulis
Abstract:Large-scale data analysis has become an indispensable tool throughout academia and industry. When the amount of data is very large, one often faces a tradeoff between the richness, flexibility, and potential predictive power of the models, and the computational requirements. While recent advances in statistics and machine learning provide us with a rich set of models and tools, many of these cannot be applied in contemporary applications due to the sheer volume of data available for analysis. In particular, Bayesian nonparametric models are a rich class of models which have largely been restricted to small-scale setting. This class of methods, in contrast to standard parametric statistical models, does not fix the complexity of the model, instead allowing the data to determine how complex the resulting models are. While these models appear well-suited for large-scale data analysis, current methods for inference in Bayesian nonparametric models have not been shown to work at scale. In terms of broader impacts, applications of machine learning continue to emerge in fields such as medicine, engineering, and the humanities; this research has the potential to impact a number of problems in these fields. Further, the PI’s research has applications in the field of computer vision, which itself has emerging impacts in autonomous driving and eldercare. The project also includes an effort to further integrate coursework in computer science with coursework in statistics, aiming to continue to bridge the gap between the fields. Finally, the research will introduce undergraduate and high school students to research, and will also yield new software for nonparametric problems that can be applied by practitioners outside the machine learning field.
This CAREER project explores scalable optimization methods aimed at making Bayesian nonparametric models applicable in large-scale settings. The research in this project is focused on three general themes:
1) Small-variance asymptotics for scalable nonparametric modeling. This part of the project aims to develop new and scalable algorithms for nonparametric problems that can be applied to problems such as topic modeling, image segmentation, and image feature learning. The goals include extending the technique of small-variance asymptotics to new Bayesian nonparametric models, improving the theoretical underpinnings of the asymptotic techniques, and developing large-scale software for several problems.
2) New variational inference techniques for nonparametric problems. This part of the project focuses on extending variational inference methods to new settings. In particular, the PI focuses on applying variational inference to gamma process models, and will develop variational inference methods for the emerging class of exponential-family completely random measures.
3) New applications for large-scale Bayesian nonparametrics. This part of the project uses the results in the previous two parts to explore applications of Bayesian nonparametric models that were previously unattainable. Applications include large-scale image modeling, social network analysis, and large-scale document analysis.
For more information: click here.