About
Adam Smith is a professor of computer science and founding member of the Faculty of Computing & Data Sciences. But he soon found himself infatuated with the concept of differential privacy, or the way data collectors can share information about the data while at the same time withholding information about the individuals in the data set.Hospitals want to aggregate data to tell each other about interesting statistics and trends. The US Census Bureau wants to paint a rich picture of the demographics of the country. But when it comes to releasing their data, these organizations, and others, always wind up leaking at least a small amount of sensitive information.
“And so this leads to this basic conundrum of trying to understand how can you release as much as possible without ending up releasing the raw data? Which in many of these data sets is very sensitive,” Smith says. “So what I’ve been interested in is where that line is and how to formalize it.”
That’s where differential algorithms come in.
“I’ve done lots of work on algorithms that satisfy these constraints so that they don’t leak too much about any one individual,” he says, “for useful statistical processing of data, training machine learning models, doing statistical analysis, basically doing just the kind of things people want to do with these data.”
It’s important to make the algorithms more efficient to hinder someone from taking, for example, the census data and reverse engineering it to find the individual data. This same approach helps hospitals protect sensitive information. And it even aids tech companies like Google when it’s programming machine learning tools to provide users with word predictions.