PCPs and the Hardness of Generating Private Synthetic Data

Friday, February 18, 2011, 3:00 pm in MCS 135
Speaker: Jon Ullman, Harvard University


We study the computational complexity of differentially private data release for simple, natural queries.  Without computational constraints, the work of Blum, Ligett, and Roth (STOC ’08) showed that it is possible to produce differentially private “synthetic databases”

that approximately preserve rich classes of statistics about the original database.  Dwork et. al. (STOC ’09) showed that the inefficiency of this mechanism is inherent for certain families of queries, but left open whether or not simple statistics can be approximately preserved efficiently and with differential privacy.

We extend the hardness results of Dwork et. al. to simple, natural families of queries.  In particular, assuming the existence of one-way functions, we show that there is no polynomial-time, differentially private algorithm A that takes a database D in ({0,1}^n)^d and outputs a synthetic database D’ all of whose two-way marginals are approximately equal to those of D. (A two-way marginal is the fraction of database rows x in {0,1}^d with a given pair of values in a given pair of columns.) Our proof combines a construction of hard-to-sanitize databases based on digital signatures from Dwork et. al. with encodings based on the PCP Theorem.

Joint work with Salil Vadhan.