Ryan Adams - Harvard

Title: Accelerating Exact MCMC with Subsets of Data. Abstract: One of the challenges of building statistical models for large datasets is balancing the correctness of inference procedures againstcomputational realities. In the context of Bayesian procedures, thepain of such computations has been particularly acute as it hasappeared that algorithms such as Markov chain Monte Carlo necessarilyneed to touch all of the data at each iteration in order to arrive ata correct answer. Several recent proposals have been made to usesubsets (or "minibatches") of data to perform MCMC in ways analogousto stochastic gradient descent. Unfortunately, these proposals haveonly provided approximations, although in some cases it has beenpossible to bound the error of the resulting stationary distribution.In this talk I will discuss two new, complementary algorithms forusing subsets of data to perform faster MCMC. In both cases, theseprocedures yield stationary distributions that are exactly the desiredtarget posterior distribution. The first of these, "Firefly MonteCarlo", is an auxiliary variable method that uses randomized subsetsof data to achieve valid transition operators, with connections torecent developments in pseudo-marginal MCMC. The second approach Iwill discuss, parallel predictive prefetching, uses subsets of data toparallelize Markov chain Monte Carlo across multiple cores, whilestill leaving the target distribution intact. These methods have bothyielded significant gains in wallclock performance in sampling fromposterior distributions with millions of data.

When 4:00 pm to 5:00 pm on Thursday, April 17, 2014
Location MCS 148