Projects

Are you a Collaboratory contributor and don’t see your project here? Please contact us and we’d be happy to work with you to get your project listed here.

 

Project (Link to Red Hat Research Page)
Years Active (Jan 1 – Dec 31)
Summary
Project Team
Improving Cyber Security Operations using Knowledge Graphs
2023 – 2024
The objective of this project is to improve the workflow and performance of security operation centers, including automating several of their tasks, by leveraging the vast amount of structured and unstructured real-world data available on threats, attacks, and mitigations. Toward this end, this project designs novel methods based on knowledge graphs to model and derive insights from cyber security data. These methods aggregate and represent knowledge about cyber data of various kinds (e.g., threat databases, cyber security incidents, user access events, application usage, etc.) and make decisions with that knowledge. The research entails developing ontologies to characterize entities, their properties, and relationships between entities, and using the ontologies to produce knowledge graphs out of existing data. In turn, the project explores applications of knowledge graphs for various cyber security activity purposes, including uncovering hidden relationships, identifying patterns and trends, and querying the data.
  • David Staorbinski
  • David Sastre Medina
  • Zhenpeng Shi
  • Şevval Şimşek
Minimal Mobile Systems via Cloud-based Adaptive Task Processing
2023 – 2024
The high cost of robots today has hindered their widespread use. Specifically, a limiting factor involves extensive hardware and software computational resources required to run various real-time robot functions, from intensive inference with large neural network models to costly storage and compute (e.g., GPUs). How can cloud-enabled mechanisms efficiently bring about low-cost but highly-functional robots today?
In this project, our goal is to develop an efficient distributed computing platform between a robot and the cloud.  We will develop an adaptive robot-cloud task management system that can intelligently off-load real-time computation to the cloud while enabling highly affordable and efficient on-board operation. We will also work to integrate various cloud-enabled functionalities with existing open-source tools for robotics development.
  • Eshed Ohn-Bar
  • Renato Mancuso
  • Sanjay Arora
  • Hee Jae Kim
  • Lei Lai
  • Bassel Mabsout
Co-Ops: Collaborative Open Source and Privacy-Preserving Training for Learning to Drive
2023 – 2024
Current development of autonomous vehicles, a socially transformative technology, has been slow, costly, and inefficient. Deployment has been limited to restricted operational domains, e.g., a handful of cities and routes, where systems often fail to scale and generalize to new settings such as a new vehicle, city, weather, or geographical location. How can we enable urgently needed scalability in the development of autonomous driving AI models?
To address this critical need, our project introduces Co-Ops, a novel framework for collaborative development of training AI models at scale. Co-Ops will work towards enabling generalized, seamless and accessible collaboration among individuals, institutions and companies through two main innovations. First, we will develop a standardized and privacy-preserving platform for flexible and incentivized participation in distributed AI model training. Second, we will design principled AI models and training techniques that can effectively learn from unconstrained and heterogenous data, including different geographical locations, vehicles, rules of the road, social norms, and traffic regulations. With these two innovations, Co-Ops will enable high-capacity and open-source models that can easily scale across various locations and conditions, from Boston’s narrow roads to Singapore left-hand drivers.
  • Eshed Ohn-Bar
  • Adam Smith
  • Erik Erlandson
  • Michael Clifford
  • Lance Galletti
  • Sanjay Arora
  • Ruizhao Zhu
  • Jimuyang Zhang
  • Yuanming (John) Chai
CoDes : A co-design research lab to advance specialized hardware projects
2023 – 2024
CoDes research lab provides the infrastructure and engineering foundation needed to support co-design based specialized hardware research. The lab is currently located at Boston University, as part of the Red Hat – Boston University collaboratory. At its core, CoDes targets:
  • Automation: Replacing developer expertise requirements with Machine Learning, and automating tasks to improve productivity.
  • Scalability: Across the cloud-edge continuum, having a common development ecosystem; this in particular includes hardware blocks that can be scaled based on user and system constraints.
  • Tunability/Configurability: Enabling developers to configure both hardware and software to better match their requirements.
  • Features: Focusing not only on features that are critical for specialized hardware workloads, but also features that can significantly boost productivity and enable exploration into innovative methods for how specialized hardware can be used.
  • Portability:  Move as much functionality out of the chip/board specific blocks as possible, and minimize the role of components in board support packages.
  • Uniformity: Having uniform abstractions between different parts of the ecosystem. This includes both uniformity of interfaces between hardware components, as well as uniformity between devices so that specialized hardware can be leveraged by existing software stacks as much as possible.
  • Renato Mancuso
  • Ulrich Drepper
  • Ahmed Sanaullah
  • Jason Schlessman
  • Manos Athanassoulis
  • Martin Herbordt
  • Sahan Bandara
  • Hafsah Shahzad
  • Reza Sajjadinasab
  • Zaid Tahir
  • Robert Munafo
  • Ju Hyoung Mun
  • Shahin Roozkhosh
  • Tarikul Islam Papon
Prototyping a Distributed, Asynchronous Workflow for Iterative Near-Term Ecological Forecasting
2023 –
2024
The ongoing data revolution has begun to fuel the growth of near-term iterative ecological forecasts: continually-updated predictions about the future state (daily to multi-year) of ecosystems and their services that allow society to anticipate environmental challenges and improve decisions on actionable timescales, while allowing researchers to accelerate scientific discovery and answer fundamental research questions about the predictability of nature.
To fuel the growth of the ecological forecasting community, there is a need to openly develop and deploy accessible, reusable, and scalable community cyberinfrastructure (CI) that can be broadly applied to make large numbers of ecological forecasts on a repeatable, frequent basis.
This RHC project will prototype the beginnings of such a system, focusing on developing a cloud-native workflow that can handle an asynchronous, event-driven, and distributed approach to execution.
  • Michael Dietze
  • Christopher Tate
  • Ioannis (Yannis) Paschalidis
FHELib: Fully Homomorphic Encryption Hardware Library for Privacy-preserving Computing
2023 –
2024
In today’s data-driven society, we frequently share our private data with third-party cloud service providers. To maintain the privacy of this personal data, typically the data is transmitted and stored in an encrypted form. Unfortunately, a cloud service provider needs to decrypt this data for processing it, in turn creating a window of opportunity for data leaks and exposure. Fully homomorphic encryption (FHE) is an emerging class of encryption technology that allows a cloud service provider to keep the data in encrypted form while it is being processed. Currently, there exists a variety of FHE schemes that operate on encrypted data. As part of the International Organization for Standardization’s (ISO’s) efforts to standardize HE  schemes, the organization is considering the following four FHE schemes: Brakerski-Gentry-Vaikuntanathan (BGV), Brakerski/Fan-Vercauteren (B/FV), Cheon-Kim-Kim-Song (CKKS), and Fully Homomorphic Encryption scheme over the Torus (TFHE).
Our long-term vision is to design a practical and efficient hardware accelerator that supports all four schemes being considered by ISO, and then deploy this design in the Open Cloud to enable privacy-preserving computing systems research in Red Hat Collaboratory. To achieve this long-term vision, we propose to develop FHELib, an RTL hardware library that supports all four FHE schemes: BGV, B/FV, CKKS, and TFHE. This library can be leveraged to design both FPGA-based and ASIC-based custom accelerator solutions (that would eventually be deployed in the cloud) that support all four schemes.
  • Ajay Joshi
  • Zahra Azad
SECURE-ED: Open-Source Infrastructure for Student Learning Disability Identification and Treatment
2023 –
2024
The project aims to develop an infrastructure that would enable users to input data about an individual student and receive back information about the student’s risk profile and the likelihood of responding to a particular intervention. We will leverage Red Hat MPC Tools and OpenShift to address the challenges of integrating and analyzing educational data from multiple sources in an efficient and privacy-preserving manner. The project outcome will involve innovative solutions for personalized education and improved student outcomes while ensuring the privacy and security of sensitive data.
  • Hank Fien
  • Eshed Ohn-Bar
  • Ola Ozernov-Palchik
  • Kasey Tenggren
  • Sam Lindberg
  • Catherine Chan-Tse
Relational Memory Controller
2023 –
2024
Data movement through the memory hierarchy is a fundamental bottleneck for computing systems. A key reason is that data access patterns do not always follow how data is stored in memory. To address this, we propose to build a memory controller that can transform data on the fly, thus pushing through the memory hierarchy towards the CPU only the relevant data tightly packed, increasing locality and efficiency. To do that, we propose to build a customizable DDR4 memory controller that implements the functionality of Relational Memory, a software/hardware co-designed approach for on-the-fly data transformation.
  • Manos Athanassoulis
  • Renato Mancuso
  • Ulrich Drepper
  • Ahmed Sanaullah
  • Ju Hyoung Mun
  • Tarikul Islam Papon
  • Shahin Roozkhosh
  • Francesco Ciraolo
  • Denis Hoornaert
Learned Cost-Models for Robust Tuning
2023 –
2024
Data systems’ performance is tuned via analytical cost models that take into account all tuning knobs and predict performance. However, as the complexity of data systems increases, there are more tuning knobs and, as a result, analytical cost models become cumbersome to use and, in some cases, impossible to derive. We propose to augment cost-based tuning with Learned Cost Models that (i) can be trained using analytical cost models to allow for more flexible tuning and (ii) can learn from past and (targeted) future executions to capture the increased system complexity.
  • Manos Athanassoulis
  • Evimaria Terzi
  • Andy Huynh
  • Harshal Chaudhari
  • Josh Berkus
DISL: A Dynamic Infrastructure Services Layer for Reconfigurable Hardware
2023 –
2024
As modern data center workloads become increasingly complex, constrained and critical, mainstream “CPU-centric” computing can no longer keep pace. Future data centers are moving towards a more fluid model, with computation and communication no longer localized to commodity computers and communication devices. Next generation “data-centric” data centers will “compute everywhere,” whether data is stationary (in memory) or on the move (in network). Reconfigurable hardware, in the form of Field Programmable Gate Arrays (FPGAs), is transforming ordinary clouds into massive supercomputers. Currently, however, there are high costs and overheads associated with productivity in reconfigurable hardware ecosystems. These are predominantly vendor and deployment specific, which leads to limited portability. To enable developers to effectively leverage reconfigurable hardware we propose a Dynamic Infrastructure Services Layer, or DISL for short. DISL taps into the tried and tested abstractions used in software systems and will enable substantially higher productivity through reduced developer effort, well-defined developer roles, standard interfaces, and greater code compatibility across products and vendors.
  • Martin Herbordt
  • Ulrich Drepper
  • Ahmed Sanaullah
  • Zaid Tahir
  • Sahan Bandara
  • Yingqing Chen
AI for Cloud Ops
2022 –
2024
Today’s Continuous Integration/Continuous Development (CI/CD) trends encourage rapid design of software using a wide range of customized, off-the-shelf, and legacy software components, followed by frequent updates that are immediately deployed on the cloud. Altogether, this component diversity and breakneck pace of development amplify the difficulty in identifying, localizing, or fixing problems related to performance, resilience, and security. Existing approaches that rely on human experts have limited applicability to modern CI/CD processes, as they are fragile, costly, and often not scalable.
This project aims to address this gap in effective cloud management and operations with a concerted, systematic approach to building and integrating AI-driven software analytics into production systems. We aim to provide a rich selection of heavily-automated “ops” functionality as well as intuitive, easily-accessible analytics to users, developers, and administrators. In this way, our longer-term aim is to improve performance, resilience, and security in the cloud without incurring high operation costs.
  • Ayse Coskun
  • Alan Liu
  • Lesley Zhou
  • Gianluca Stringhini
  • Saad Ullah
  • Mert Toslali
  • Anthony Byrne
  • Marcel Hild
  • Daniel Riek
  • Steven Huels
D3N: A Multi-Layer Cache for Data Centers
2022 – 2023
This project designs and develops D3N, a novel multi-layer cooperative caching architecture that mitigates network imbalances by caching data on the access side of each layer of hierarchical network topology. A prototype implementation, which incorporates a two-layer cache, is highly-performant (can read cached data at 5GB/s, the maximum speed of our SSDs) and significantly improves the performance of big-data jobs.
  • Orran Krieger
  • Emine Ugur Kaynar
  • Mania Abdi
  • Peter Desnoyers
  • Matt Benjamin
  • Brett Niver
  • Ali Maredia
  • Mark Kogan
  • Amin Msayyebzadeh
  • Mohammad Hossein Hajkazemi
Unikernal Linux
2022 – 2023
Unikernels are small, lightweight, single address space operating systems with the kernel included as a library within the application. Because unikernels run a single application, there is no sharing or competition for resources among different applications, improving performance and security. Unikernels have thus far seen limited production deployment. This project aims to turn the Linux kernel into a unikernel with the following characteristics: 1) are easily compiled for any application, 2) use battle-tested, production Linux and glibc code, 3) allow the entire upstream Linux developer community to maintain and develop the code, and 4) provide applications normally running vanilla Linux to benefit from unikernel performance and security advantages.
  • Ali Raza
  • Tommy Unger
  • Eric Munson
  • Matthew Boyd
  • Parul Sohal
  • Ulrich Drepper
  • Daniel Bristot de Oliveira
  • Richard Jones
  • Larry Woodman
  • Jonathan Appavoo
  • Orran Krieger
  • Renato Mancuso
  • James Cadden
  • Isaiah Stapleton
An Optimizing Operating System: Accelerating Execution With Speculation
2022 – 2023
Kernel Techniques to Optimize Memory Bandwidth with Predictable Latency
2022 – 2023
This project will conduct a full evaluation of memory bandwidth and latency for typical modern systems.  This includes analyzing the individual components described above independently and also analysing how they interact with each other.  We will explore how the Linux kernel can take advantage of the newest CPU hardware features and system memory topology.  A significant amount of work has been done for NUMA placement in order to reduce remote memory access, but this work will likely be mutually exclusive with work necessary for proper scheduling of hyperthreads due to internal CPU cache and TLB conflicts.   The use of NVDIMMs running in memory mode is the near term future of computers and must also be optimized.  Currently the Linux kernel has no way of evenly distributing the pages of NVDIMM memory throughout the DRAM cache and this results in totally random memory bandwidth and latency.  A technique known as page coloring will be investigated and evaluated.
  • Ulrich Drepper
  • Orran Krieger
  • Parul Sohal
  • Larry Woodman
Elastic Secure Infrastructure
2022 – 2023
Today many organizations choose to host their physically deployed clusters outside of the cloud for security, price or performance reasons. Such organizations form a large section of the economy, including financial companies, medical institutions and government agencies. Organizations host these clusters in their private data-centers or rent a colocation facility. Clusters are typically created with enough capacity to service peak demand, resulting in silos of under-utilized hardware. Elastic Secure Infrastructure (ESI) is a platform, created at the Massachusetts Open Cloud (MOC), that enables physically deployed clusters to break these silos. It enables rapid multiplexing of bare-metal servers between clusters with different security requirements. The RUCS scheduler and FLOCX marketplace software are important components of ESI described in the ESI overview presentation. This project encompasses work in several areas to design, build and evaluate secure bare-metal elastic infrastructure for data centers. Additional research focuses on market-based models for resource allocation.
  • Tzu-Mainn Chen
  • Peter Desnoyers
  • Lars Kellog-Stedman
  • Gagan Kumar
  • Leo McGann
  • Apoorve Mohan
  • Amin Mosayyebzadeh
  • Shripad J. Nadgowda
  • Danni Shi
  • Sahil Tikale
  • Mania Abdi
  • Orran Krieger
  • Rohan Shriniwas Devasthale
Towards high performance and energy efficiency in open-source stream processing
2022 – 2023
BU faculty members Vasiliki Kalavari and Jonathan Appavoo will work with Red Hat researcher Sanjay Arora to create an open-source Mass Open Cloud (MOC)-hosted stream processing system using Apache Flink software. The researchers will leverage the open nature of the software to build a platform that optimizes trade-offs between energy efficiency and performance while maintaining transparency and the easy sharing of knowledge.  “This project aims to demonstrate that energy efficiency and the myriad layers of software that go into an open source streaming platform need not be incompatible,” the team wrote.
  • Vasiliki (Vasia) Kalavri
  • Jonathan Appavoo
  • Sanjay Arora
Symbiotes: A New step in Linux’s Evolution
2022 – 2023
Computer’s and the way we use them have dramatically evolved since the inception of UNIX in the 1970’s. LINUX’s ability to be evolved and adapted has proved invaluable in enabling everything from data center scale cloud computing to tiny wearable smart devices. There is a line however that has not been crossed. UNIX has always enforced a strict boundary between what constitutes the core, kernel, of the running operating systems and the applications programs, processes running on top of it. While this boundary is very useful in ensuring that programs cannot corrupt other programs it also means that writing applications that can directly use any part of the hardware or directly integrate OS kernel functionality is very difficult.
This work explores how a new kind of software entity, a symbiotie, might bridge this gap. By adding the ability for application software to shed the boundary that separates it from the OS kernel it is free to integrate, modify and evolve in to a hybrid that is both application and OS.
  • Jonathan Appavoo
Robust Data Systems Tuning
2022 – 2023
BU faculty members Manos Athanassoulis and Evimaria Terzi will work on building a new robust tuning framework for LSM-based data systems that will allow to deploy and effectively tune data systems even when the workload information is inaccurate or noisy. This approach has the potential to revolutionize tuning and deployment of cloud-based data systems, which face volatility in resources availability and workload as multiple applications are collocated on shared virtualized infrastructure. The proposed paradigm addressed the observed workload uncertainty by (re-)formulating the tuning problem as a min-max optimization problem that seeks to find the best tuning for the worst-case workload within an uncertainty region. “The proposed robust tuning paradigm will also improve the collective understanding of robust tuning for data systems in general, a useful tool as new applications face increased workload volatility,” the team wrote.
  • Manos Athanassoulis
  • Evimaria Terzi
  • Andy Huynh
  • Harshal Chaudhari
  • Josh Berkus
Learned Cost-Models for Robust Tuning
2022 – 2023
Data systems’ performance is tuned via analytical cost models that take into account all tuning knobs and predict performance. However, as the complexity of data systems increases, there are more tuning knobs and, as a result, analytical cost models become cumbersome to use and, in some cases, impossible to derive. We propose to augment cost-based tuning with Learned Cost Models that (i) can be trained using analytical cost models to allow for more flexible tuning and (ii) can learn from past and (targeted) future executions to capture the increased system complexity.
  • Manos Athanassoulis
  • Evimaria Terzi
  • Andy Huynh
  • Harshal Chaudhari
  • Josh Berkus
Serverless Streaming Graph Analytics
2022 – 2023
Streaming graph analytics is an emerging field of applications that aim to extract knowledge from evolving networks in a timely and efficient manner. Graph streams are (possibly unbounded) sequences of timestamped events that represent relationships between entities: user interactions in social networks, online financial transactions, product purchases, driver and user locations in ride-sharing services.
In this project, we will focus on graph streams that can be used to model distributed systems, where workers are represented as nodes connected with edges that denote communication or dependencies. In this model, monitoring and performance analysis can be expressed as graph streaming queries. For example, if the dynamic topology of an OpenShift cluster fleet is modeled as a graph, a streaming query can continuously detect disconnected regions. We will design a prototype open-source streaming graph analytics system on top of Apache Flink Stateful Functions and develop a temporal graph processing API for expressing continuous and ad-hoc queries on graph streams.
  • Vasiliki (Vasia) Kalavri
Code2Vec: Learning code representations
2022 – 2023
Creating a global open research platform to better understand social sustainability using data from a real-life smart village
2022 – 2023
A BU team is working with SmartaByar and the Red Hat Social Innovation Program in order to create a global secure open source research platform allowing universities and researchers to study what social sustainability means using actual data from Veberöd, Sweden (or from its digital twin) as a test village supported by SmartaByar. The goal of this project is to build an open source technological infrastructure which has never been built before, so that researchers can collaborate on this platform effectively and securely to study topics that will ultimately define a link between well-being and eco-smart cities and provide smart services to their citizens. During the first year of the project, the team studied the available data and designed an open source platform that interfaces with the data sources (IoT devices) and supports a digital twin model of the village on which smart services can be tested. A specific use case (a smart traffic light) was identified from which a smart service is being built. The second-year goals are to (a) complete the open source platform to include large-scale data exchange APIs which are aligned with open standards used in digital twin and application development platforms for smart cities, and (b) test the use case using actual real-time data and scale up the use case to the entire smart village and beyond.
  • Christos Cassandras
  • Mayank Varia
  • Vasiliki (Vasia) Kalavri
  • John Liagouris
  • Alexandra Machado
  • Jim Craig
  • Christopher Tate
  • Jan Malmgren
  • Yingqing Chen
OSMOSIS: Open-Source Multi-Organizational Collaborative Training for Societal-Scale AI Systems
2022 – 2023
The goal of our project is to develop a novel framework and cloud-based implementation for facilitating collaboration among highly heterogeneous research, development, and educational settings. Currently, AI models for real-world intelligent systems are rarely trained as part of a collaborative process across multiple entities. However, collaboration among different companies and institutions can increase AI model robustness and resource efficiency. Towards a more efficient development process of AI systems at massive scale, we propose a general framework for AI model sharing and incentivization structures for seamless collaboration across diverse models, devices, use cases, and underlying data distributions. Through distributed sharing of AI models in a secured, privacy-preserving, and incentivized manner, our proposed framework enables significant cost reduction of system development as well as increased system robustness and scalability.
  • Eshed Ohn-Bar
  • Jimuyang Zhang
  • Ruizhao Zhu
  • Shun Zhang
Fuzzing Device Emulation in QEMU
2022 – 2023
Hypervisors—the software that allows a computer to simulate multiple virtual computers—form the backbone of cloud computing. Because they are both ubiquitous and essential, they are security-critical applications that make attractive targets for potential attackers. Past vulnerabilities demonstrate that implementations of virtual devices are the most common site for security bugs in hypervisors. To address this problem, we have developed a novel method for fuzzing virtual devices and implemented it for the popular open source QEMU hypervisor. Our fuzzer combined a standard coverage-guided strategy with further guidance based on hypervisor-specific behaviors. It guarantees reproducible input execution and can, optionally, take advantage of existing virtual device test cases. In our evaluation, we found and reported previously unknown bugs in devices such as serial and virtio-net, ranging from memory corruptions to denial-of-service vulnerabilities. Our evaluation demonstrated that combining well known coverage guidance techniques with domain-specific feedback results in promising fuzzer performance, even for complex targets such as hypervisors.
  • Bandan Das
  • Paolo Bonzini
  • Stefan Hajnoczi
  • Manuel Egele
Quest-V, a Partitioning Hypervisor for Latency-Sensitive Workloads
2022 – 2023
Outfitting QEMU/KVM with Partitioning Hypervisor Functionality
2022 – 2023
Privacy-Preserving Cloud Computing using Homomorphic Encryption
2022 – 2023
In today’s data-driven world, a large amount of data is collected by billions of devices (cell phones, autonomous cars, handheld game consoles, etc.), and this data is then processed in the cloud. A common approach to maintain data privacy in the cloud is to keep the data in encrypted form, and we decrypt the data only when we need to process it. However, this approach requires efficient key management techniques, which are susceptible to attacks. There exists a ground-breaking technology called homomorphic encryption (HE), which allows us to operate on encrypted data and in turn maintain data privacy without the need to store and protect the secret keys. However, HE-based computing is multiple orders of magnitude slower than operating on unencrypted data. To make HE-based computing viable and practical, we need custom hardware designs and support for floating point numbers. In this project, we propose to design and prototype (using FPGAs in the Open Cloud Testbed) an efficient hardware solution for implementing the Cheon-Kim-Kim-Song (CKKS) HE scheme. Our design will be parametrized to support different polynomial lengths and coefficient bit widths, and will be optimized to minimize the time for HE-based privacy-preserving computing. We will perform an end-to-end evaluation of our hardware solution for image classification-based healthcare application.
  • Ajay Joshi
  • Rashmi Agrawal
  • Zahra Azad
Secure cross-site analytics on OpenShift logs
2022 – 2023
The project aims to explore whether cryptographically secure Multi-Party Computation, or MPC for short, can be used to perform secure cross-site analytics on OpenShift logs with minimum client participation. MPC enables mutually distrusting parties (in our case Red Hat clients) to compute arbitrary functions (e.g., identifying common trends involving crashes or failures) over their collective private data (in our case log files) while keeping their data siloed from each other and from external adversaries. Contrary to traditional MPC approaches that require data owners to act as computing parties using private resources, we will focus on a setting where clients outsource certain queries on their logs to untrusted non-colluding entities (e.g., Red Hat and ORCI) while retaining the full security guarantees of MPC. The proposed research will build on prior and ongoing work by the PI and, in particular, on Secrecy, a novel MPC framework for secure outsourced analytics with no information leakage that we build at BU.
  • John Liagouris
  • Muhammad Faisal
  • Jingyu Su
  • Adarsh Verma
  • Derek Ewers
Practical programming of FPGAs in the data center and on the edge
2022 – 2023
OpenInfra Labs
2022 – 2023
OpenInfra Labs is an OpenStack Foundation project connecting open source projects to production to advance open source infrastructure. The project goals are to build a community, created by and for operators, to test open source code in production, and publish complete, reproducible stacks for existing and emerging workloads.
The project will deliver open source tools to run cloud, container, AI, machine learning and edge workloads repeatedly and predictably by encouraging work in three focus areas: 1) integrated testing of all the necessary to provide a complete use case; 2) documentation of operational and functional gaps required to run upstream projects in a production environment; 3) shared code repositories for operational tooling and the “glue” code that is often written independently by users.
  • Orran Krieger
Automatic Configuration of Complex Hardware
2022 – 2023
A modern network interface card (NIC), such as the Intel X520 10 GbE, is complex, with hardware registers that control every aspect of the NIC’s operation from device initialization to dynamic runtime configuration. The Intel X520 datasheet documents over 5600 registers; yet only about 1890 are initialized by a modern Linux kernal. It is thus unclear what the performance impact of tuning these registers on a per application basis will be. In this project, we pursue three goals towards this understanding: 1) identify, via a set of microbenchmarks, application characteristics that will illuminate mappings between hardware register values and their corresponding microbenchmark performance impact, 2) use these mappings to frame NIC configuration as a set of learning problems such that an automated system can recommend hardware settings corresponding to each network application, and 3) introduce either new dynamic or application instrumented policy into the device driver in order to better attune dynamic hardware configuration to application runtime behavior.
We have shown manually that tuning a single parameter in NICs (the time delay between interrupts) can result in significant power savings by decreasing instruction and interrupt counts, while increasing frequency of sleep states, and maintaining stringent SLAs in the hundreds of microseconds. We are continuing to explore methods to automate this process.
  • Sanjay Arora
  • Han Dong
  • Yara Awad
  • Orran Krieger
  • James Cadden
Near-Data Data Transformation
2022 – 2023
BU faculty members Manos Athanassoulis and Renato Mancuso will work with Red Hat researchers Uli Drepper and Ahmed Sanaullah to create a hardware-software co-design paradigm for data systems that implements near-memory processing. The approach has the potential to revolutionize data management by bridging the gap of analytical and transactional processing. This paradigm addresses the performance bottleneck caused by memory bandwidth and will allow both cloud and edge systems to efficiently handle mixed transactional and analytics data-intensive workloads with a better trade-off between bandwidth and latency. “The proposed software-hardware co-design methodology will also improve the collective understanding of new design models and resource management strategies that are possible in systems with programmable memory hierarchies,” the team wrote.
  • Manos Athanassoulis
  • Ulrich Drepper
  • Renato Mancuso
  • Ahmed Sanaullah
  • Ju Hyoung Mun
  • Tarikul Islam Papon
  • Shahin Roozkhosh
  • Denis Hoornaert
Practical Programming of FPGAs with Open Source Tools
2022 – 2023
As artificial intelligence and the collection and processing of vast amounts of data continue to revolutionize every aspect of technology, data centers are moving from being “compute-centric” to “data-centric”; from processing data in place, to “compute everywhere” — whether data is stationary (in memory) or on the move (in network). A type of computing device known as the Field Programmable Gate Array (FPGA) is integral to this transformation: FPGAs are simultaneously communication and computation devices. They are also reconfigurable: rather than fitting the application to the hardware, the hardware is modified to best run the application. Currently, however, programming FPGAs requires particular expertise not present with most programmers. In this project we are augmenting common software development tools with machine learning capability so that the tool can learn to transform ordinary programs into high-quality FPGA configurations.
  • Martin Herbordt
  • Ulrich Drepper
  • Ahmed Sanaullah
  • Robert Munafo
  • Reza Sajjadinasab
  • Hafsah Shahzad
Open Cloud Testbed
2022 – 2023
Today’s cloud testbeds have become a critical tool for systems researchers; providing researchers access to large scale raw hardware, support for reproducible experiments, and automation for deploying complex environments. For example, in just the last three years nearly 3,400 researchers have used the CloudLab testbed to conduct over 70,000 experiment and have published results in many top-level research venues, including SIGCOMM, OSDI, SOSP, NSDI, and FAST. Unfortunately, today’s testbeds are isolated in that they are deployed using dedicated infrastructure and are made available to a specific community of researchers for which funding (e.g. from the NSF) was obtained. This isolation results in a number of problems. First, with fixed resources and a community with similar deadlines (e.g. important academic conferences) it is difficult to efficiently handle peak demand. Second, limiting the use of the testbed to a specific community limits the community enhancing and extending the testbed; testbed capabilities could have enormous value for a broad industry and open source community. Third, the testbed is isolated from production environments; meaning that the testbed has no direct way to provide researchers using it access to production information, real datasets and real users, which in turn limits the ability of these researchers to pursue certain research efforts. Finally, the combination of these challenges introduces barriers to another research goal, that of transitioning research developed in the testbed to practice.
The Open Cloud Testbed (OCT) project addresses the challenges inherent in an isolated testbed by integrating testbed capabilities into the Mass Open Cloud (MOC), an existing cloud for academic users. In particular, the project will 1) add testbed dedicated resources, including a cluster of FPGA enhanced nodes, in the MGHPCC data center used by the MOC, 2) harden the MOC’s Elastic Secure Infrastructure (ESI) mechanism, which allows physical servers to be elastically and securely moved between different services, 3) integrate ESI with CloudLab’s provisioning mechanisms, and 4) provide system researchers access to cloud telemetry and datasets and provide them with the ability to expose experimental services to users of the MOC.
  • Peter Desnoyers
  • Martin Herbordt
  • Miriam Leeser
  • Emmanuel Cecchet
  • David Irwin
  • Orran Krieger
  • Michael Daitzman
  • Michael Zink
Performance Management for Serverless Computing
2022 – 2023
Serverless computing provides developers the freedom to build and deploy applications without worrying about infrastructure. Resources (memory, cpu, location) specified for a function can affect performance, as well as cost, of a serverless platform, so configuring these resources properly is critical to both performance and cost. COSE uses a statistical learning approach to dynamically adapt the configurations of serverless functions while meeting QoS/SLA metrics and lowering the cost of cloud usage. This project evaluates COSE on a commercial serverless platform (AWS Lambda) as well as in multiple simulated scenarios, proving its efficacy.
  • Ali Raza
The Open Education Project (OPE)
2022 – 2023
The open source movement has reshaped the way we think about computing and how we create, curate and run software. We believe that the tools and processes of open source can be brought to bear on the development and distribution of educational material that are not only openly available but who’s construction, source material and use begin and remain in the public domain.
In this project we are developing an exemplar set of materials for an introductory computers systems class that exploits, Jupyter, Jupyter Books, OpenShift and the the Mass Open Cloud to develop and deliver a unique educational experience for learning about how computer systems work. The goal is that all materials, Textbook, Lecture presentations, and Lab manual are developed in open formats, hosted in public repositories, and deployed in a live and interactive manner that only thing require access to a web browser.
OPE leverages modern open source technologies to create an open environment and platform in which educators can create, publish, and operationalize high-quality open source materials, while students require no more than access to a web browser to access them. To achieve this, we have built an open ownership model that starts with high-performance, open data centers providing the hardware resources. This model allows us to exploit Linux and build a rich environment of tools and services to support a novel approach to educational material.
  • Jonathan Appavoo
  • Orran Krieger
  • Larry Woodman
  • Heidi Dempsey
  • Danni Shi
  • Arlo Albelli
Linux Computational Caching
2022 – 2023
Despite decades of research computer systems are no more “intelligent” today than they were when they were first constructed. That is to say they do not systemically incorporate the ability to exploit their past to improve their current or future operation.
In this speculative work we are attempting to explore a biologically motivated conjecture on how memory of past computing can be stored and recalled to automatically improve a system’s behavior. Building on our prior work our goal is to combine machine learning mechanisms with a representation of a virtual machines execution as images of a movie. The goal is to construct a distributed virtual machine runtime, based on LINUX KVM, that extracts such movies to create a form of associate cache that is used to recognize and recall information from past execution. This recalled information is then used to synthesize information that is specific to the current computation and permits its acceleration. The first steps, however, are to create and demonstrate the ability to extract and recognize “useful” patterns using existing machine learning techniques.
  • Jonathan Appavoo
Intelligent Data Synchronization for Hybrid Clouds
2022 – 2023
Data synchronization is a core functionality for hybrid clouds to ensure consistency among diverse, possibly geographically distant computing platforms. This technology plays a key role in many computing environments to keep data synchronized across file systems, mobile devices, databases and security configuration lists. In this context, synchronization services that are reliable and performant can be hugely beneficial to edge computing deployments. Such deployments must support devices with a wide range of computing and communication capabilities, ranging from low-power sensors to powerful smartphone devices.
The goal of this project is to design configurable synchronization solutions on a common platform for a wide range of edge computing scenarios relevant to Red Hat. These solutions will be thoroughly validated on a state-of-the-art testbed capable of emulating realistic environments (e.g., smart cities).
  • David Staorbinski
  • Şevval Şimşek