Divide-and-Conquer Learning for Big Data
With Ameet Talwalkar
National Science Foundation Postdoctoral Fellow
Electrical Engineering and Computer Sciences
University of California, Berkeley
Faculty Host: Ioannis Paschalidis
Refreshments will be served outside Room 339 at 3:45 p.m.
Abstract: Data analyses suitable for modest-sized datasets are often entirely infeasible for the terabyte and petabyte datasets that are fast becoming the norm. In order to deal with the massive sizes of modern datasets, it is essential for the field of machine learning to embrace parallel and distributed computing. In this talk, Ameet Talwalkar will present scalable learning algorithms for matrix factorization and estimator quality assessment that leverage modern distributed computing architectures. These algorithms retain existing base algorithms that have proven their value at smaller scales but apply them to subsamples of the data and thus scale linearly in space and time with the number of data points. Naively employing this strategy typically yields poor results. However, as Talwalkar will show, by carefully selecting and operating on subsamples of the data, it is possible to preserve the empirical performance and the theoretical guarantees of these base algorithms once they are embedded in a computationally-motivated divide-and-conquer procedure.
About the Speaker: Ameet Talwalkar is an NSF postdoctoral fellow in the AMPLab at University of California, Berkeley. His research focuses on devising scalable machine learning algorithms and more recently, on interdisciplinary approaches for connecting advances in machine learning to large-scale problems in science and technology. He graduated summa cum laude from Yale University and obtained his Ph.D. at New York University. He was awarded the Janet Fabri prize for the best doctoral dissertation in NYU’s computer science department, Yale's undergraduate prize in computer science, and a Westinghouse Science Talent Search Scholarship.