Cloud Computing Initiative Advances Research Model through Industry Engagement

[Return to Nexus Newsletter]

By Kaitlin Barnes

More and more, the services and applications we use for communication, security, storage, analytics, and just about every other technological need are based in the cloud. For most users, that’s a vague term meaning “not housed locally.” For the researchers and engineers working with the Massachusetts Open Cloud (MOC) project and Red Hat Collaboratory, both hosted at Boston University, it’s a world of opportunity.

Part of BU’s Cloud Computing Initiative (CCI), the MOC is a unique academic, industry, and government partnership comprised of interconnected projects with the same goal: to develop an open, production-quality cloud computing platform. While today’s clouds are owned and operated by a single provider, the multi-institutional MOC is creating a multi-sided marketplace, which will allow multiple providers to collaborate and compete on a level playing field and give users control over the services and resources they consume. Incubated at Boston University, the project’s academic and government partners include Harvard, MIT, UMASS, Northeastern, the U.S. Air Force, and the Commonwealth of Massachusetts. The MOC has strong involvement from core industry partners, Brocade, Cisco, Intel, Lenovo, Red Hat, Two Sigma, and NetApp, as well as support from a variety of additional contributors.

Under the leadership of professor Orran Krieger (Electrical & Computer Engineering), the CCI strives to cultivate integrated initiatives in research, education, and technology development at all layers of the cloud computing ecosystem. Building on the success of the MOC, the initiative announced a five-year, $5 million partnership with Red Hat earlier this year to create the Red Hat Collaboratory at Boston University (Red Hat Collaboratory @BU), which will incubate research projects, provide fellowship opportunities, and support visiting scientists. After an exciting year, students, developers and researchers had the opportunity to highlight their achievements at two major spring events, the Red Hat Summit and the Boston OpenStack Summit.MOC.team

The Red Hat Summit was attended by over 6000 participants from across the world. The MOC team, together with Red Hat collaborators, gave a talk describing the broad range of research and development going on in the MOC and took turns manning a booth to promote the new Red Hat Collaboratory.  Students from BU’s popular cloud computing course gave final project demonstrations in a room open to all conference attendees. A truly intimidating “final examination,” students rose to the occasion and successfully handed a wide range of questions covering the projects they developed and technologies used.

The spring 2017 OpenStack Summit took place in Boston and hosted more than 5,000 attendees from over 1,000 different companies. A free and open-source software platform, OpenStack is the cloud platform of choice for many academic and scientific communities, and its Scientific Working Group brings researchers and scientists together. MOC team members gave five full-length talks (selected from over 1,100 proposals) and four lightning talks as well as participated in numerous technical meetings and discussions. During his keynote speech at the Summit, Chris Wright, Vice President and Chief Technologist at Red Hat, highlighted the ongoing work of MOC to drive multi-discipline research and create “a place to do rich sharing of information across researchers.”

One project highlighted at the OpenStack summit was Cloud Dataverse, a collaboration between the MOC and Harvard Dataverse teams.  Cloud Dataverse brings together the power and scalability of cloud computing and storage with access to thousands of datasets hosted in a reliable and feature-rich data repository platform. Collaborators Merce Crosas, Chief Data Science and Technology Officer at the Institute for Quantitative Social Science (IQSS) at Harvard University; Orran Krieger, CCI Founding Director and MOC Project Lead; and Piyanai Saowarattitada, Director of Engineering and Infrastructure for the MOC, presented the vision and current status of the project. As Saowarattitada explained, the project provides a platform for dataset owners and cloud providers to enable end users with a full stack of big data processing capabilities. The Cloud Dataverse community has control over how and what compute resources and processing frameworks are used. At the same time, cloud users have rich data sets readily available to them.

Piyanai Saowarattitda presents on Cloud Dataverse at the OpenStack Summit.
Piyanai Saowarattitda presents on Cloud Dataverse at the OpenStack Summit.

The Cloud Dataverse project is one example of CCI’s efforts to provide students with unique opportunities to transform research and classroom learnings into applied, real-world solutions. Sarah Ferry, a computer science student at Boston University, is a key developer on the project and works with both researchers and engineers on the MOC and Harvard IQSS teams. She credits her work at the MOC with teaching her to how to “dive into a software project without much previous knowledge and learn on the go,” making her “much more qualified for future employment.” Under Saowarattitada’s leadership, the MOC engages students in every aspect of cloud computing; since 2015 the project has sponsored over a hundred student internships. As the academic landscape continues to evolve, it’s imperative for universities to provide students with these experiential learning opportunities as part of their academic journey.

IMG_7176MOC student interns are supported by engineers and researchers who guide them through the complexities of open source communities.  Jeremy Freudberg, a BU undergraduate student and MOC intern, has been able to directly contribute to upstream OpenStack. The work he is engaged in is crucial in ensuring that the requirements and research of the MOC collaboration are supported by the upstream code base.  He credits his internship with allowing him to “learn under the mentorship of developers from around the world,” while having a “direct impact on the future of the cloud as a tool to incentivize research.”

Increasingly, employers are looking for graduates to have proven track records, and MOC partner companies are no exception. As Brian Riordan, Director of Software Engineering at Red Hat, recognizes, “beyond the performance improvements, a key benefit of this kind of collaboration is the tangible experience it offers students as they make contributions to upstream open source communities.”

One example of this is the ongoing work around making data access efficient for large “Datalakes,” low-cost storage systems used to store massive collections of data sets. In the Big Data as a Service talk, presented at the OpenStack Summit, MOC researchers demonstrated the type of innovative solutions that can be achieved through close collaboration between industry and academia. A major initiative involving a number of MOC partners (Intel, Lenovo, Brocade, and Red Hat) had planned to use the cache tiering functionality that was under development in the Ceph community. The idea behind the “tiering” approach was that “hot” (frequently used) data sets would be automatically identified and moved into a fast, solid state tier of storage, while datasets that are less frequently accessed would migrate into a slower (cheaper) spinning hard drive tier of storage.  Unfortunately, it turned out to be too difficult to efficiently identify and move hot data sets, and industry partners engaged with the MOC research community to explore alternatives.

After a number of iterations, a team of MOC graduate students, Ugur Kaynar of BU and her Northeastern University teammates, MaNia Abdi and Mohammad Hajkazimi, developed and demonstrated a successful, “caching” approach, where data is copied into fast solid state storage as it is accessed. The work has been submitted to major academic forums for publication and has already been accepted by the upstream Ceph community as an experimental feature.  Furthermore, the students have been hired as summer interns by Red to work with the company’s engineers on refining and augmenting the solution’s functionality.

MOC_Workshop_1268In addition to MOC internships, the CCI is constantly developing experiential learning opportunities to immerse students in cloud computing tools and techniques. For several years, Krieger has co-taught a cloud computing course with Peter Desnoyers (Northeastern), Ata Turk (BU) and Michael Daitzman (Vecna), in which student teams are supported by industry advisors to develop a cloud application or service. This past spring Rudolph Pienaar (Boston Children’s Hospital), and Dan McPherson (Red Hat) mentored a team of students to use Red Hat’s OpenShift platform on the MOC as a computational resource for Boston Children’s Hospital’s image processing service. The resulting project demonstrated the platform’s ability to employ the computational power of the cloud to solve complex image processing problems in real time. As Red Hat’s Chris Wright noted in his OpenStack Summit keynote speech, this could have a massive implication in medicine by dramatically reducing image processing time and improving patient care. What began as a student project has now opened the door for a formal collaboration with Boston Children’s Hospital.

The MOC is successfully demonstrating a new model for how academic institutions can work with industry and provide unique, applied research opportunities for students and faculty. As the federal funding landscape for research continues to contract, BU is working to increase collaboration with industry through innovative initiatives that leverage the University’s disciplinary breadth and domain expertise.

Orran Krieger sees the CCI and projects such as the MOC and Red Hat Collaboratory as “creating a meeting ground where a community of researchers can work with industry partners to solve large scale challenges.” Speaking to the value of being an MOC partner, Riordan (Red Hat) notes that “Boston University and Red Hat share an interest in using open source in a research context,” and that the MOC is “a vehicle [to] enable meaningful data science at scale.”MOC_Workshop_1421

Through the work of CCI and Hariri Institute for Computing, BU is excited to continue identifying new opportunities that expand an already successful model and deliver unparalleled gains for the University and its partners.

[Return to Hariri Nexus Newsletter]