Institute Receives $1M NSF Grant to Deploy MPC within Cloud Dataverse

The Institute is pleased to announce it has received a $1M grant from the National Science Foundation (NSF) to build a framework that allows for analytics to be performed over shared datasets while protecting the data, even from collaborators.

Led by PI Mayank Varia, Co-Director of the BU RISCS Center, the project will integrate three different systems to “build a data repository and computational framework that enables participants to do analytics over data sets (even ones they cannot read) in a cryptographically- protected manner.” This work is being funded through NSF’s Cybersecurity Innovation for Cyberinfrastructure (CICI) program, which supports projects that “develop, deploy and integrate security solutions that benefit the scientific community by ensuring the integrity, resilience and reliability of the end-to-end scientific workflow.” 

Andrei Lapets, Director of Research Development & Research Scientist at the Hariri Institute, will serve as a Co-PI on the project, while Orran Kreiger, Director of BU’s Cloud Computing Initiative; Merce Crosas, Chief Data Science & Technology Officer at Harvard’s Institute for Quantitative Social Science and Hariri Institute Visiting Fellow; and Ata Turk, Research Scientist for the Massachusetts Open Cloud project, will contribute as senior personnel on the grant. 

The project builds on previous research conducted by the PIs and collaborators, and brings together the power of Dataverse, the Massachusetts Open Cloud, and Conclave. Led by Crosas, the Dataverse project is for data sharing and archiving is housed at Harvard University, and provides a repository for social scientists to house their datasets. The Massachusetts Open Cloud, led by Krieger, is an open, multi-provider cloud that allows for high-speed computation. Through the ongoing, NSF-funded Modular Approach to Cloud Security (MACS) project, headed by Varia, researchers have developed Conclave, an architecture that allows for scalable and complex secure multi-party computation (MPC). As described in the project’s proposal, the integration of these systems allows for advanced data sharing and data protection:

Dataverse provides the data management infrastructure, Conclave provides the computational method, and the MOC provides the isolated computational environments and low-latency communication needed to make Conclave efficient.

The newly-funded project is slated to last two years, and will include deployment of the infrastructure needed to connect the existing technologies, orchestration of the full secure computing workflow, scalable support for analytics desired by data scientists, and empowerment of data owners to maintain full control over uses of their data.  Once complete, the service will be available to thousands of Dataverse users and tens of thousands of potential users from the Massachusetts Green High Performance Computing Center (MGHPCC) member institutions.  

Additionally, the work will be made accessible to the broader scientific community through open source, allowing data scientists “to leverage vast stores of data that are too sensitive to share yet too valuable to society to ignore.” The application of this work has the potential to support a variety of fields, such as medicine, finance, homeland and cyber security, cloud engineering, and smart cities. In addition to its core value, the project’s technical work will serve as an enabler of future software development projects, research advances, student training, and industry connections.