Colloquium Series
Contacts: Ari Trachtenberg, Professor, Electrical & Computer Engineering
Renato Mancuso, Assistant Professor, Computer Science
Zoom Link
Time: 12 – 1 pm ET
Fall 2023 Colloquium Series Events
November 16: Sherard Griffin, Director of Engineering, OpenShift AI, Red Hat, AI Product Strategies and Research Topics
- Location: Center for Computing & Data Sciences, 665 Commonwealth Ave., Room 1101 (11th floor)
- Time: 12 – 1 pm ET
Abstract & Bio
- Abstract:
Sherard Griffin will share experiences and strategies for open source AI that work in the real world. He’ll provide insights into what customers need and how that relates to AI research challenges.
- Bio:
Sherard Griffin was responsible for the development of Open Data Hub, a community-driven reference architecture for building and AI-as-a-service platform on OpenShift. He also leads engineering for Red Hat’s OpenShift AI products and services.
October 12: Murat Demirbas, State Machine Replication and the Art of Abstraction
- Location: Photonics Building, 8 St. Mary’s st., Boston, MA, Room PHO 210
- Time: 12:00 p.m. – 1:00 p.m. ET
Abstract & Bio
- Abstract:
State Machine Replication (SMR) serves as the backbone of dependable distributed systems, including cloud systems at AWS, Meta, Google, Azure, and renowned databases like DynamoDB, Aurora, Spanner, MongoDB, and CockroachDB. SMR ensures replication of operations across nodes as well as their consistent sequencing using Paxos variants for consensus.
This talk delves into optimizing consensus and refining the SMR abstraction to craft customized high-performance solutions. We spotlight wide-area network and high-throughput SMR solutions, and introduce efficient strategies for performing strongly-consistent reads. We also offer hints for guiding distributed systems design.
- Bio:
Murat Demirbas is a Principal Applied Scientist at AWS and a Professor of Computer Science & Engineering at the University at Buffalo, SUNY (on leave). He developed several influential protocols and systems, including hybrid logical clocks, WPaxos, PigPaxos, and PQR. Murat received a National Science Foundation CAREER award in 2008 and School of Engineering and Applied Sciences Senior Researcher of the Year Award in 2016. He maintains a popular blog on distributed systems at http://muratbuffalo.blogspot.com
Spring 2023 Colloquium Series Events
January 26: Matthew Miller, Conversation with Fedora Project Leader Matthew Miller
- Location: Center for Computing & Data Sciences, 665 Commonwealth Ave, Room 1646 (16th floor)
Abstract & Bio
- Abstract: Join for a conversation about community-driven software development and the future of Linux distributions and related technology — and maybe a little reminiscing about the days of BU Linux (Boston University’s own Fedora-based distro from the early 2000s!)
- Bio: Matthew Miller is a Distinguished Engineer at Red Hat and is the leader of the Fedora Project, which creates the Fedora Linux distribution.
March 16: Adam Belay, MIT, LDB: An Efficient, Full-Program Latency Profiler
- Location: Center for Computing & Data Sciences, 665 Commonwealth Ave, Room 1101 (11th floor)
Abstract & Bio
- Abstract:
Maintaining low tail latency is critical for the efficiency and performance of large-scale datacenter systems. Software bugs that cause tail latency problems, however, are notoriously difficult to debug. In this talk, I will present LDB, a new latency profiling tool that aims to overcome this challenge by precisely identifying the specific functions that are responsible for tail latency anomalies. LDB observes the latency of all functions in a running program. It uses a novel, software-only technique called stack sampling, where a busy-spinning stack scanner thread polls light-weight metadata recorded in call frames, shifting instrumentation cost away from program threads. In addition, LDB records request boundaries and inter-thread synchronization to generate per-request timelines and to find the root cause of complex tail latency problems such as lock contention in multi-threaded programs. Our results show that LDB has low overhead and can rapidly analyze recordings, making it feasible to use in production settings.
- Bio:
Adam Belay is an Associate Professor of Computer Science at the Massachusetts Institute of Technology, where he works on operating systems, runtime systems, and distributed systems. During his Ph.D. at Stanford, he developed Dune, a system that safely exposes privileged CPU instructions to userspace; and IX, a dataplane operating system that significantly accelerates I/O performance. Dr. Belay’s current research interests lie in developing systems that cut across hardware and software layers to increase datacenter efficiency and performance. He is a member of the Parallel and Distributed Operating Systems Group, and a recipient of a Google Faculty Award, a Facebook Research Award, and the Stanford Graduate Fellowship. http://abelay.me
March 30: Dionisio de Niz, Carnegie Mellon University: Mixed-Trust Real-Time Computation
- Location: Center for Computing & Data Sciences, 665 Commonwealth Ave, Room 365 (3rd floor)
Abstract & Bio
- Bio:
Dionisio de Niz is a Principal Researcher and the Technical Director of the Assuring Cyber-Physical Systems directorate at the Software Engineering Institute at Carnegie Mellon University. He received a Master of Science in Information Networking and a Ph.D. in Electrical and Computer Engineering both from Carnegie Mellon University. His research interest includes Cyber-Physical Systems, Real-Time Systems, Model-Based Engineering (MBE), and Security of CPS. In the Real-time arena he has recently focused on multicore processors and mixed-criticality scheduling and more recently in real-time mixed-trust computing. In MBE, he has focused on the symbolic integration of analysis using analysis contracts. Dr. de Niz co-edited and co-authored the book “Cyber-Physical Systems” where the authors discuss different application areas of CPS and the different foundational domains including real-time scheduling, logical verification, and CPS security. He has participated and/or helped in the organization of multiple workshops with industry on real-time multicore systems (two co-sponsored by the FAA and three by different services of the US military) and Safety Assurance of Nuclear Energy. He is a member of the executive committee of the IEEE Technical Committee on Real-Time Systems. Dr. de Niz participates regularly in technical program committees of the real-time systems conferences such as RTSS, RTAS, RTCSA, etc. where he also publishes a large part of his work.
- Abstract:
Certification authorities (e.g., FAA) allow the validation of different parts of a system with different degrees of rigor depending on their level of criticality. Formal methods have been recognized as important to verify safety-critical components. Unfortunately, a verified property can be easily compromised if the verified components are not protected from misbehaviors of the unverified ones (e.g., due to bugs). Thus, trust requires that both verification and protection of components are jointly considered.
A key challenge to building trust is the complexity of today’s operating systems (OSs) making them impractical to verify. Building a trusted system is challenging because the underlying operating systems (OSs) that implement protection mechanisms are extremely hard (if even possible) to thoroughly verify. Thus, there has been a trend to minimize the trusted computing base (TCB) by developing small verified hypervisors (HVs) and microkernels, e.g., seL4, CertiKOS}, and uberXMHF. In these systems, trusted and untrusted components co-exist on a single hardware platform but in a completely isolated and disjoint manner. We thus call this approach disjoint-trust computing. The fundamental limitation of disjoint-trust computing is that it does not allow the use of untrusted components in critical functionality whose safety must be assured through verification.
In this talk, we present the real-time mixed-trust computing (RT-MTC) framework. Unlike disjoint-trust computing, it gives the flexibility to use untrusted components even for CPS critical functionality. In this framework, untrusted components are monitored by verified components ensuring that the output of the untrusted components always lead to safe states (e.g., avoiding crashes). These monitoring components are known as logical enforcers. To ensure trust, these enforcers are protected by a verified micro-hypervisor. To preserve the timing guarantees of the system, RT-MTC uses temporal enforcers, which are small, self-contained codeblocks that perform a default safety action (e.g., hover in a quadrotor) if the untrusted component has not produced a correct output by a specified time. Temporal enforcers are contained within the verified micro-hypervisor. Our framework incorporates two schedulers: (i) a preemptive fixed-priority scheduler in the VM to run the untrusted components and (ii) a non-preemptive fixed-priority scheduler within the HV to run trusted components. To verify the timing correctness of safety-critical applications in our mixed-trust framework, we develop a new task model and schedulability analysis. We also present the design and implementation of a coordination protocol between the two schedulers to preserve the synchronization between the trusted and untrusted components while preventing dependencies that can compromise the trusted component.
Finally, we discuss the extension of this framework for trusted mode degradation. While a number of real-time modal models have been proposed, they fail to address the challenges presented here in at least two important respects. First, previous models consider mode transitions as simple task parameter changes without taking into account the computation required by the transition and the synchronization between the modes and the transition. Second, previous work does not address the challenges imposed by the need to preserve safety guarantees during the transition. Our work addresses these issues by extending the RT-MTC framework to include degradation modes and creating a schedulability model based on the digraph model that supports this extension.
May 4: Joydeep Bannerjee and Randy George, Red Hat: Challenges and opportunities customers have with real-world observability
- Location: Center for Computing & Data Sciences, 665 Commonwealth Ave, Room 1101 (11th floor)
Abstract & Bio
- Abstract:
Red Hat’s Joydeep Banerjee and Randy George combined have decades of experience working with customers and developing monitoring solutions to meet their real-world needs. In this seminar, they will share what their customers are doing today, their challenges as well as opportunities that they see where academic research could be applied.
May 25: Roxana Geumbasu, Managing Privacy as a Computing Resource in User-Data Workloads
- Location: Center for Computing & Data Sciences, 665 Commonwealth Ave, Room 1101 (11th floor)
Abstract & Bio
-
Abstract:
In this talk, I present the perspective that user privacy should be recognized as a crucial computing resource in user-data workloads and managed accordingly. These workloads, prevalent in today’s companies, constantly compute statistics or train machine learning models on user data, making these “products” of the data available to internal analysts, external partners, and even the general population. However, these products often leak significant information about individual users. Differential privacy (DP) offers a rigorous way to limit such data leakage by constraining the data products to noisy aggregates. The talk discusses our group’s work over the past few years on (1) designing a multi-dimensional privacy resource using DP to suit common user-data workloads and (2) integrating support for this resource into popular resource management systems like Kubernetes and caching components. This allows for proper management, including monitoring, scheduling, conservation, payment, and identification of bottlenecks for the privacy resource. By treating privacy as a computing resource, we put it on par with other computing resources that are routinely managed in computer systems (such as CPU, GPU, and RAM), and we acknowledge that user-data workloads are consuming something extra than just these traditional resources.
The talk highlights the main lessons I have learned from our experience building these systems. Firstly, considering privacy as a computing resource helps address certain limitations of DP for practical use. Secondly, while DP is close to practical in certain settings, incorporating it into effective systems requires further evolution of DP theory alongside system design. Lastly, I believe the systems research community is uniquely positioned in tackling the remaining challenges of implementing DP in practice, so my talk serves as a call to action for systems researchers to help bring this much needed privacy technology to practice.
-
Bio:
Roxana Geambasu is an Associate Professor of Computer Science at Columbia University and a member of Columbia’s Data Sciences Institute. She joined Columbia in Fall 2011 after finishing her Ph.D. at the University of Washington. For her work in data privacy, she received: an Alfred P. Sloan Faculty Fellowship, an NSF CAREER award, a Microsoft Research Faculty Fellowship, several Google Faculty awards, a “Brilliant 10” Popular Science nomination, the Honorable Mention for the 2013 inaugural Dennis M. Ritchie Doctoral Dissertation Award, a William Chan Dissertation Award, two best paper awards at top systems conferences, and the first Google Ph.D. Fellowship in Cloud Computing.
This list includes full abstracts of past Colloquium Series events.
-
Red Hat Collaboratory at Boston University Colloquium
Sage Weil
Ceph Project Lead; Sr Distinguished Engineer, Red Hat
Pause for democracy: Leaving open source storage to help Americans vote in 2020
Abstract
In 2020 I took a leave of absence to work on democracy and voting related efforts with VoteAmerica, a national get-out-the-vote organization with a focus on voter registration, voter turnout, and voter protection. This was a huge departure from my usual role at Red Hat leading Ceph, an open source distributed storage project, both in terms of the technical work and the underlying motivation and purpose. This talk will cover a mix of... [ More ]
-
Red Hat Collaboratory at Boston University Colloquium
Jonathan Bryce
Executive Director, Open Infrastructure Foundation
Open Technology and Unlock Human Potential
Abstract
Throughout human history, a conflict between closed innovation and open sharing has tugged at the progress made possible by technological advancement. When I create something new, do I control the benefits and retain the value for myself, or do I share the advancement broadly and let others move it forward while risking a lower return for myself? I imagine these questions have been asked every day for millennia by inventors and scientists, military leaders and governments, entrepreneurs and CEOs.
We now live in a hyper-connected... [ More ]
-
Red Hat Collaboratory at Boston University Colloquium
Hugh Brock
Research Director, Red Hat
The Platform Of The Future: What Is An Open Hybrid Cloud
And What Does It Mean For Open Source
Abstract
One of the early developments in computing that made it an essential part of both scientific research and business operations was a basic abstraction layer that hid the grungy implementation details of the computing machine from the programmer trying to write an application to run on it. This layer developed to serve the computer industry's need to sell products to more than one customer: If you could design one computer, but sell it... [ More ]
-
Red Hat Collaboratory at Boston University Colloquium
Rania Khalaf
Director of AI Platforms and Runtimes at IBM Research
Making AI faster, easier, and safer
Abstract
Artificial intelligence is being infused into applications at an ever increasing rate. The proliferation of machine learning models in production has surfaced the need to bridge between the worlds of machine learning and software engineering in order to scale these deployments in a fast, safe and repeatable way. Finally, it is important to consider the applications that these models are deployed within and the context that brings to improving business outcomes effectively through ML. In this talk, we will talk... [ More ]
-
Red Hat Collaboratory at Boston University Colloquium
Mike Zink
Associate Professor, Electrical and Computer Engineering Department,
University of Massachusetts
Open Cloud Testbed: Developing a Testbed for the Research Community Exploring Next-Generation Cloud Platforms
Abstract
Cloud testbeds are critical for enabling research into new cloud technologies - research that requires experiments that potentially change the operation of the cloud itself. Several such testbeds have been created in the recent past (e.g., Chameleon, CloudLab, etc.) with the goal to support the CISE systems research community. It has been shown that these testbeds are very popular and heavily used by the research community. Testbed utilization often reaches 100%, especially... [ More ]
-
Red Hat Collaboratory at Boston University Colloquium
Ulrich Drepper
Engineer, Office of the CTO, Red Hat
Software-Configured Compute Environments
Abstract
Hardware and software environments are designed as a compromise between many different requirements. This sacrifices performance, among other aspects, while at the same time the need for compute increases.
Specialists can certainly create more optimized systems. The challenge is to automate this.
To research these new systems we need hardware specialists to create re-configurable processors, compiler writers to deduce the best architecture from source code and generate configurations for hardware and OS, improved OSes to efficiently run that code. All that while preserving API and ideally ABI... [ More ]
-
Red Hat Collaboratory at Boston University Colloquium
Kate Keahey
Senior Fellow, Computation Institute, University of Chicago and Computer Scientist, Mathematics and Computer Science Division, Argonne National Laboratory
Chameleon: New Capabilities for Experimental Computer Science
Abstract
Computer Science experimental testbeds allow investigators to explore a broad range of different state-of-the-art hardware options, assess scalability of their systems, and provide conditions that allow deep reconfigurability and isolation so that one user does not impact the experiments of another. An experimental testbed is also in a unique position to provide methods facilitating experiment analysis and crucially, improve repeatability and reproducibility of experiments both from the perspective of the... [ More ]
-
Red Hat Collaboratory at Boston University Colloquium
Rodrigo Fonseca
Associate Professor, Computer Science Department, Brown University
Networking as a First-Class Cloud Resource
Abstract
Tenants in a cloud can specify, and are generally charged by, resources such as CPU, storage, and memory. There are dozens of different bundles of these resources tenants can choose from, and many different pricing schemes, including spot markets for left over resource. This is not the case for networking, however. Most of the time, networking is treated as basic infrastructure, and tenants, apart from connectivity, have very little to choose from in terms of network properties such as priorities, bandwidth, or... [ More ]
-
Red Hat Collaboratory at Boston University Colloquium
Mark Little
Red Hat, Vice President of Engineering and JBoss Middleware CTO
The Future of Enterprise Application Development in the Cloud
Abstract
Since the dawn of the cloud, developers have been inundated with a range of different recommended architectural approaches such as Web Services, REST or microservices, as well as just as many different frameworks or stacks, including AWS, Java EE, Spring Boot and now Eclipse MicroProfile. Throw in the explosion of programming languages, such as Golang and Swift and it's no wonder a developer today could be forgiven for being confused about where is the right place... [ More ]
-
Red Hat Collaboratory at Boston University Colloquium
Daniel S. Berger
2018 Mark Stehlik Postdoctoral Fellow in the Computer Science Department at Carnegie Mellon University
Towards Tail Latency-Aware Caching in Large Web Services
Abstract
Tail latency is of great importance in user-facing web services. However, achieving low tail latency is challenging, because typical user requests result in multiple queries to a variety of complex backends (databases, recommender systems, ad systems, etc.), where the request is not complete until all of its queries have completed.
In this talk we present our findings for the case of several large web services at Microsoft. We analyze production system request structures... [ More ]