Data Security & Privacy FAQs

What is secure Multiparty Computation (MPC)?

Secure MPC is a technology that allows collaborative data analysis in a way that does not reveal the underlying private or confidential data in the process. This short article and video provide a very high level overview of the technology.

What scenarios are best-suited for the use of secure MPC?

There are many settings in which the use of secure MPC could be highly beneficial. Of particular note are settings in which existing regulations (such as HIPAA and FERPA) prohibit the release of data sets beyond the boundaries of the agencies authorized to collect, store, or analyze these data sets. If multiple data sets that are subject to these regulations (or which cannot be shared due to confidentiality considerations) need to be combined to compute some aggregated analytics, then secure MPC can be used to do just that.

Are there examples of using secure MPC in the public sector?

Secure MPC was used in a first-of-its-kind study of pay (in)equity across gender and racial dimensions for employees in the City of Boston. Led by the Boston Women Workforce Council in collaboration with BU, the study was repeated for three times to evaluate progress towards pay equity, with a fourth iteration slated for Fall of 2019. This article from BU Today provides details from the 2017 iteration of this study. An earlier article also sheds some light about the technology.

What is Differential Privacy (DP)?

Differential Privacy is a mechanism to ensure that the release of a communal data set or the release of some aggregated analytics based on such data does not lead to the identification of data about individuals in the underlying community. A “differentially private” release of information ensures that there is no way for an observer to tell if a particular individual’s information was used in the computation of that information.

Are there examples of using DP in the public sector?

Differential Privacy has been adopted by the Census Bureau as the mechanism via which data from the 2020 Census will be released for researchers. This blog post from the Census Bureau highlights the reason behind the adoption of DP for the 2020 Census (compared to the privacy protection methods deployed to protect data for the 2010 Census). 

What are data de-indentification/anonymization practices? 

De-identification refers to a strategy or protocol for removing information that can link a person (or an entity in general) to a specific record in a data set. De-identification is typically used to allow the release or sharing of data sets (e.g., involving human subject research) with third parties (e.g., researchers). Common de-identification protocols include deleting or masking name and social security identifiers, suppressing other associated information by blurring it (e.g., using year of birth as opposed to date of birth or zip code instead of street address). One form of de-identification — also called anonymization — replaces personally-identifying information (e.g., SSN) with unique random strings. This allows information to be associated with a unique anonymous (as opposed to the real)  individual or entity.

How effective are de-indentification/anonymization practices? 

While use extensively (e.g., HIPAA for medical data and FERPA for education data), it is now commonly accepted that combining multiple de-identified/anonymized data sets could allow for re-identification. As such de-identification and anonymization are widely believed to be ineffective against any party determined to undermine such protocols.  The US President’s Council of Advisors on Science and Technology (PCAST) found de-identification “somewhat useful as an added safeguard” but not “a useful basis for policy” as “it is not robust against near‐term future re‐identification methods”.

How do secure MPC and DP technologies compare to de-indentification/anonymization practices? 

Unlike de-identification and anonymization protocols, MPC and DP technologies do not rely on hiding or supressing sensitive information at the record level, while releasing other information. MPC and DP protect the entirety of the records and provide mathematical guarantees about what information can be gleaned. In general MPC ensures that no information is “leaked” about the individual records or data sets other than what was intended and pre-approved, whereas DP ensures that the information leaked cannot be used to ascertain whether an individual record was present or not in the data set.

In addition to addressing the vulnerabilities associated with re-identification, MPC has the additional advantage of allowing analytics to be conducted on multiple data sets without the need to mask or suppress information in these data sets. This allows for proper linkages across data sets (e.g., using social security numbers) and for more granural aggregation (e.g., using exact age and address), thus leading to far more accurate analysis. Of course, MPC does that in a way that does not reveal this sensitive data. Furthemore, MPC can be complemented with DP to further guarantee that aggregate analyses do not leak information that could be used to ascertain whether an individual record was present or not in the data set.

What is Blockchain?

Blockchain technology allows a set of distrusting parties to maintain an incorruptible digital ledger (or log) of transactions that can be checked/audited at any point in the future. A salient feature of blockchain is that it does not rely on any external authority and yet it provides an immutable record of transactions that can be validated for their authenticity. While the most common use of Blockchain technology is to keep a ledger of financial transactions (hence its relevance to crypto currencies such as bitcoins), the technology can be used to keep an incorruptible record of any transaction, including data access logs.

Are there examples of using Blockchain in the public sector?

Among many other applications, the US General Services Administration (GSA) is using Blockchain in a pilot that aims to automate the process by which it reviews bids on federal contracts. A description of this project is available in an article from GCN.