III: Small: Structural Matrix Completion for Data Mining Applications

Sponsor: National Science Foundation (NSF)

Award Number: 1421759

PI: Mark Crovella

Co-I/Co-PI: Evimaria Terzi

Abstract:

A common problem arising in science and engineering is that a dataset may only be partially measured. Often the complete dataset is naturally expressed as a matrix – for example, traffic flows in a city, gene expression across a set of treatments, or ratings of movies for users. Recently, a new solution strategy has emerged for the problem of inferring the missing entries in such datasets, but the power and limits of this new “structural” approach are not fully understood as yet. This project will develop a better understanding of this structural approach and apply that understanding to a number of important problems. In addition, the project will develop new course materials for data science education, and train both graduate and undergraduate students.

The matrix completion problem seeks to infer the missing entries of a matrix, under a low-rank assumption. To date, most matrix completion methods do not actually check whether the known entries contain sufficient information to complete the matrix. Recently, however, a new and very different class of “structural” methods have emerged, which analyze the information content of the visible matrix entries, and so can determine whether accurate completion is possible. From a data mining standpoint, the implications of structural matrix completion methods are largely unexplored. This project will investigate how to leverage structural matrix completion methods to attack a host of data analysis problems, including developing new methods for active matrix completion, new approaches to cross-validating matrix completion results, and new strategies for general matrix completion.

For more information, click here.