Implicit methods for principled estimation with large data sets (Panagiotis Toulis -- Harvard University)

Starts: 11:00 am on Tuesday, March 29, 2016
Ends: 12:00 pm on Tuesday, March 29, 2016
Location: Hariri Institue

Abstract: Stochastic optimization procedures, such as stochastic gradient descent, have gained popularity for parameter estimation with large data sets. However, standard stochastic optimization procedures cannot effectively combine numerical stability with statistical efficiency, and their statistical properties are not well-understood. Here, we introduce an implicit stochastic gradient descent procedure, the iterates of which are defined through fixed-point equations, known as implicit updates. Intuitively, the estimator implied by the implicit procedure is a shrinked version of the estimator implied by the standard one. The amount of shrinkage depends on the observed Fisher information matrix, which does not need to be directly computed, thus increasing stability without increasing the computational burden. We formally quantify the statistical efficiency of both implicit and standard stochastic gradient descent estimators, establishing the first such result in this area of research. We also derive non-asymptotic bounds for the implicit estimator, which show that the implicit procedure resolves the stability issues of the standard procedure without sacrificing statistical efficiency. This unique combination of fast computation, numerical stability, and exact standard errors, provides the theoretical foundation for principled statistical analysis of large data sets with implicit stochastic gradient descent. We illustrate our theory on extensive simulations and real-world data analysis carried through our “sgd” R package, which suggest that implicit methods are poised to become the workhorse of estimation with large data sets.