Colloquium Series

Contacts: Ari Trachtenberg, Professor, Electrical & Computer Engineering

Renato Mancuso, Assistant Professor, Computer Science

Zoom Link

Time: 12 – 1 pm ET

Fall 2023 Colloquium Series Events

November 16: Sherard Griffin, Director of Engineering, OpenShift AI, Red Hat, AI Product Strategies and Research Topics

Location: Center for Computing & Data Sciences, 665 Commonwealth Ave., Room 1101 (11th floor)
Time: 12 – 1 pm ET
Abstract & Bio
- Abstract:
  Sherard Griffin will share experiences and strategies for open source AI that work in the real world. He’ll provide insights into what customers need and how that relates to AI research challenges.
- Bio:
  Sherard Griffin was responsible for the development of Open Data Hub, a community-driven reference architecture for building and AI-as-a-service platform on OpenShift. He also leads engineering for Red Hat’s OpenShift AI products and services.

October 12: Murat Demirbas, State Machine Replication and the Art of Abstraction

Location: Photonics Building, 8 St. Mary’s st., Boston, MA, Room PHO 210
Time: 12:00 p.m. – 1:00 p.m. ET
Abstract & Bio
- Abstract:
  State Machine Replication (SMR) serves as the backbone of dependable distributed systems, including cloud systems at AWS, Meta, Google, Azure, and renowned databases like DynamoDB, Aurora, Spanner, MongoDB, and CockroachDB. SMR ensures replication of operations across nodes as well as their consistent sequencing using Paxos variants for consensus.
  
  This talk delves into optimizing consensus and refining the SMR abstraction to craft customized high-performance solutions. We spotlight wide-area network and high-throughput SMR solutions, and introduce efficient strategies for performing strongly-consistent reads. We also offer hints for guiding distributed systems design.
- Bio:
  Murat Demirbas is a Principal Applied Scientist at AWS and a Professor of Computer Science & Engineering at the University at Buffalo, SUNY (on leave). He developed several influential protocols and systems, including hybrid logical clocks, WPaxos, PigPaxos, and PQR. Murat received a National Science Foundation CAREER award in 2008 and School of Engineering and Applied Sciences Senior Researcher of the Year Award in 2016. He maintains a popular blog on distributed systems at http://muratbuffalo.blogspot.com

Spring 2023 Colloquium Series Events

January 26: Matthew Miller, Conversation with Fedora Project Leader Matthew Miller

Location: Center for Computing & Data Sciences, 665 Commonwealth Ave, Room 1646 (16th floor)
Abstract & Bio
- Abstract: Join for a conversation about community-driven software development and the future of Linux distributions and related technology — and maybe a little reminiscing about the days of BU Linux (Boston University’s own Fedora-based distro from the early 2000s!)
- Bio: Matthew Miller is a Distinguished Engineer at Red Hat and is the leader of the Fedora Project, which creates the Fedora Linux distribution.

March 16: Adam Belay, MIT, LDB: An Efficient, Full-Program Latency Profiler

Location: Center for Computing & Data Sciences, 665 Commonwealth Ave, Room 1101 (11th floor)
Abstract & Bio
- Abstract:
  
  Maintaining low tail latency is critical for the efficiency and performance of large-scale datacenter systems. Software bugs that cause tail latency problems, however, are notoriously difficult to debug. In this talk, I will present LDB, a new latency profiling tool that aims to overcome this challenge by precisely identifying the specific functions that are responsible for tail latency anomalies. LDB observes the latency of all functions in a running program. It uses a novel, software-only technique called stack sampling, where a busy-spinning stack scanner thread polls light-weight metadata recorded in call frames, shifting instrumentation cost away from program threads. In addition, LDB records request boundaries and inter-thread synchronization to generate per-request timelines and to find the root cause of complex tail latency problems such as lock contention in multi-threaded programs. Our results show that LDB has low overhead and can rapidly analyze recordings, making it feasible to use in production settings.
- Bio:
  Adam Belay is an Associate Professor of Computer Science at the Massachusetts Institute of Technology, where he works on operating systems, runtime systems, and distributed systems. During his Ph.D. at Stanford, he developed Dune, a system that safely exposes privileged CPU instructions to userspace; and IX, a dataplane operating system that significantly accelerates I/O performance. Dr. Belay’s current research interests lie in developing systems that cut across hardware and software layers to increase datacenter efficiency and performance. He is a member of the Parallel and Distributed Operating Systems Group, and a recipient of a Google Faculty Award, a Facebook Research Award, and the Stanford Graduate Fellowship. http://abelay.me

March 30: Dionisio de Niz, Carnegie Mellon University: Mixed-Trust Real-Time Computation

Location: Center for Computing & Data Sciences, 665 Commonwealth Ave, Room 365 (3rd floor)
Abstract & Bio
- Bio:
  Dionisio de Niz is a Principal Researcher and the Technical Director of the Assuring Cyber-Physical Systems directorate at the Software Engineering Institute at Carnegie Mellon University. He received a Master of Science in Information Networking and a Ph.D. in Electrical and Computer Engineering both from Carnegie Mellon University. His research interest includes Cyber-Physical Systems, Real-Time Systems, Model-Based Engineering (MBE), and Security of CPS. In the Real-time arena he has recently focused on multicore processors and mixed-criticality scheduling and more recently in real-time mixed-trust computing. In MBE, he has focused on the symbolic integration of analysis using analysis contracts. Dr. de Niz co-edited and co-authored the book “Cyber-Physical Systems” where the authors discuss different application areas of CPS and the different foundational domains including real-time scheduling, logical verification, and CPS security. He has participated and/or helped in the organization of multiple workshops with industry on real-time multicore systems (two co-sponsored by the FAA and three by different services of the US military) and Safety Assurance of Nuclear Energy. He is a member of the executive committee of the IEEE Technical Committee on Real-Time Systems. Dr. de Niz participates regularly in technical program committees of the real-time systems conferences such as RTSS, RTAS, RTCSA, etc. where he also publishes a large part of his work.
- Abstract:
  Certification authorities (e.g., FAA) allow the validation of different parts of a system with different degrees of rigor depending on their level of criticality. Formal methods have been recognized as important to verify safety-critical components. Unfortunately, a verified property can be easily compromised if the verified components are not protected from misbehaviors of the unverified ones (e.g., due to bugs). Thus, trust requires that both verification and protection of components are jointly considered.
  
  A key challenge to building trust is the complexity of today’s operating systems (OSs) making them impractical to verify. Building a trusted system is challenging because the underlying operating systems (OSs) that implement protection mechanisms are extremely hard (if even possible) to thoroughly verify. Thus, there has been a trend to minimize the trusted computing base (TCB) by developing small verified hypervisors (HVs) and microkernels, e.g., seL4, CertiKOS}, and uberXMHF. In these systems, trusted and untrusted components co-exist on a single hardware platform but in a completely isolated and disjoint manner. We thus call this approach disjoint-trust computing. The fundamental limitation of disjoint-trust computing is that it does not allow the use of untrusted components in critical functionality whose safety must be assured through verification.
  
  In this talk, we present the real-time mixed-trust computing (RT-MTC) framework. Unlike disjoint-trust computing, it gives the flexibility to use untrusted components even for CPS critical functionality. In this framework, untrusted components are monitored by verified components ensuring that the output of the untrusted components always lead to safe states (e.g., avoiding crashes). These monitoring components are known as logical enforcers. To ensure trust, these enforcers are protected by a verified micro-hypervisor. To preserve the timing guarantees of the system, RT-MTC uses temporal enforcers, which are small, self-contained codeblocks that perform a default safety action (e.g., hover in a quadrotor) if the untrusted component has not produced a correct output by a specified time. Temporal enforcers are contained within the verified micro-hypervisor. Our framework incorporates two schedulers: (i) a preemptive fixed-priority scheduler in the VM to run the untrusted components and (ii) a non-preemptive fixed-priority scheduler within the HV to run trusted components. To verify the timing correctness of safety-critical applications in our mixed-trust framework, we develop a new task model and schedulability analysis. We also present the design and implementation of a coordination protocol between the two schedulers to preserve the synchronization between the trusted and untrusted components while preventing dependencies that can compromise the trusted component.
  
  Finally, we discuss the extension of this framework for trusted mode degradation. While a number of real-time modal models have been proposed, they fail to address the challenges presented here in at least two important respects. First, previous models consider mode transitions as simple task parameter changes without taking into account the computation required by the transition and the synchronization between the modes and the transition. Second, previous work does not address the challenges imposed by the need to preserve safety guarantees during the transition. Our work addresses these issues by extending the RT-MTC framework to include degradation modes and creating a schedulability model based on the digraph model that supports this extension.

May 4: Joydeep Bannerjee and Randy George, Red Hat: Challenges and opportunities customers have with real-world observability

Location: Center for Computing & Data Sciences, 665 Commonwealth Ave, Room 1101 (11th floor)
Abstract & Bio
- Abstract:
  Red Hat’s Joydeep Banerjee and Randy George combined have decades of experience working with customers and developing monitoring solutions to meet their real-world needs. In this seminar, they will share what their customers are doing today, their challenges as well as opportunities that they see where academic research could be applied.

May 25: Roxana Geumbasu, Managing Privacy as a Computing Resource in User-Data Workloads

Location: Center for Computing & Data Sciences, 665 Commonwealth Ave, Room 1101 (11th floor)
Abstract & Bio
- Abstract:
  
  In this talk, I present the perspective that user privacy should be recognized as a crucial computing resource in user-data workloads and managed accordingly. These workloads, prevalent in today’s companies, constantly compute statistics or train machine learning models on user data, making these “products” of the data available to internal analysts, external partners, and even the general population. However, these products often leak significant information about individual users. Differential privacy (DP) offers a rigorous way to limit such data leakage by constraining the data products to noisy aggregates. The talk discusses our group’s work over the past few years on (1) designing a multi-dimensional privacy resource using DP to suit common user-data workloads and (2) integrating support for this resource into popular resource management systems like Kubernetes and caching components. This allows for proper management, including monitoring, scheduling, conservation, payment, and identification of bottlenecks for the privacy resource. By treating privacy as a computing resource, we put it on par with other computing resources that are routinely managed in computer systems (such as CPU, GPU, and RAM), and we acknowledge that user-data workloads are consuming something extra than just these traditional resources.
  
  The talk highlights the main lessons I have learned from our experience building these systems. Firstly, considering privacy as a computing resource helps address certain limitations of DP for practical use. Secondly, while DP is close to practical in certain settings, incorporating it into effective systems requires further evolution of DP theory alongside system design. Lastly, I believe the systems research community is uniquely positioned in tackling the remaining challenges of implementing DP in practice, so my talk serves as a call to action for systems researchers to help bring this much needed privacy technology to practice.
- Bio:
  
  Roxana Geambasu is an Associate Professor of Computer Science at Columbia University and a member of Columbia’s Data Sciences Institute. She joined Columbia in Fall 2011 after finishing her Ph.D. at the University of Washington. For her work in data privacy, she received: an Alfred P. Sloan Faculty Fellowship, an NSF CAREER award, a Microsoft Research Faculty Fellowship, several Google Faculty awards, a “Brilliant 10” Popular Science nomination, the Honorable Mention for the 2013 inaugural Dennis M. Ritchie Doctoral Dissertation Award, a William Chan Dissertation Award, two best paper awards at top systems conferences, and the first Google Ph.D. Fellowship in Cloud Computing.

This list includes full abstracts of past Colloquium Series events.

Colloquium: Pause for democracy: Leaving open source storage to help Americans vote in 2020

Red Hat Collaboratory at Boston University Colloquium Sage Weil Ceph Project Lead; Sr Distinguished Engineer, Red Hat Pause for democracy: Leaving open source storage to help Americans vote in 2020 Abstract In 2020 I took a leave of absence to work on democracy and voting related efforts with VoteAmerica, a national get-out-the-vote organization with a focus on voter registration, voter turnout, and voter protection. This was a huge departure from my usual role at Red Hat leading Ceph, an open source distributed storage project, both in terms of the technical work and the underlying motivation and purpose. This talk will cover a mix of... [ More ]
Colloquium: Open Technology and Unlock Human Potential

Red Hat Collaboratory at Boston University Colloquium Jonathan Bryce Executive Director, Open Infrastructure Foundation Open Technology and Unlock Human Potential Abstract Throughout human history, a conflict between closed innovation and open sharing has tugged at the progress made possible by technological advancement. When I create something new, do I control the benefits and retain the value for myself, or do I share the advancement broadly and let others move it forward while risking a lower return for myself? I imagine these questions have been asked every day for millennia by inventors and scientists, military leaders and governments, entrepreneurs and CEOs. We now live in a hyper-connected... [ More ]
Colloquium: The Platform Of The Future: What Is An Open Hybrid Cloud And What Does It Mean For Open Source

Red Hat Collaboratory at Boston University Colloquium Hugh Brock Research Director, Red Hat The Platform Of The Future: What Is An Open Hybrid Cloud And What Does It Mean For Open Source Abstract One of the early developments in computing that made it an essential part of both scientific research and business operations was a basic abstraction layer that hid the grungy implementation details of the computing machine from the programmer trying to write an application to run on it. This layer developed to serve the computer industry's need to sell products to more than one customer: If you could design one computer, but sell it... [ More ]
Colloquium: Making AI faster, easier, and safer

Red Hat Collaboratory at Boston University Colloquium Rania Khalaf Director of AI Platforms and Runtimes at IBM Research Making AI faster, easier, and safer Abstract Artificial intelligence is being infused into applications at an ever increasing rate. The proliferation of machine learning models in production has surfaced the need to bridge between the worlds of machine learning and software engineering in order to scale these deployments in a fast, safe and repeatable way. Finally, it is important to consider the applications that these models are deployed within and the context that brings to improving business outcomes effectively through ML. In this talk, we will talk... [ More ]
Colloquium: Open Cloud Testbed: Developing a Testbed for the Research Community Exploring Next-Generation Cloud Platforms

Red Hat Collaboratory at Boston University Colloquium Mike Zink Associate Professor, Electrical and Computer Engineering Department, University of Massachusetts Open Cloud Testbed: Developing a Testbed for the Research Community Exploring Next-Generation Cloud Platforms Abstract Cloud testbeds are critical for enabling research into new cloud technologies - research that requires experiments that potentially change the operation of the cloud itself. Several such testbeds have been created in the recent past (e.g., Chameleon, CloudLab, etc.) with the goal to support the CISE systems research community. It has been shown that these testbeds are very popular and heavily used by the research community. Testbed utilization often reaches 100%, especially... [ More ]
Colloquium: Software-Configured Compute Environments

Red Hat Collaboratory at Boston University Colloquium Ulrich Drepper Engineer, Office of the CTO, Red Hat Software-Configured Compute Environments Abstract Hardware and software environments are designed as a compromise between many different requirements. This sacrifices performance, among other aspects, while at the same time the need for compute increases. Specialists can certainly create more optimized systems. The challenge is to automate this. To research these new systems we need hardware specialists to create re-configurable processors, compiler writers to deduce the best architecture from source code and generate configurations for hardware and OS, improved OSes to efficiently run that code. All that while preserving API and ideally ABI... [ More ]
Colloquium: Chameleon: New Capabilities for Experimental Computer Science

Red Hat Collaboratory at Boston University Colloquium Kate Keahey Senior Fellow, Computation Institute, University of Chicago and Computer Scientist, Mathematics and Computer Science Division, Argonne National Laboratory Chameleon: New Capabilities for Experimental Computer Science Abstract Computer Science experimental testbeds allow investigators to explore a broad range of different state-of-the-art hardware options, assess scalability of their systems, and provide conditions that allow deep reconfigurability and isolation so that one user does not impact the experiments of another. An experimental testbed is also in a unique position to provide methods facilitating experiment analysis and crucially, improve repeatability and reproducibility of experiments both from the perspective of the... [ More ]
Colloquium: Networking as a First-Class Cloud Resource

Red Hat Collaboratory at Boston University Colloquium Rodrigo Fonseca Associate Professor, Computer Science Department, Brown University Networking as a First-Class Cloud Resource Abstract Tenants in a cloud can specify, and are generally charged by, resources such as CPU, storage, and memory. There are dozens of different bundles of these resources tenants can choose from, and many different pricing schemes, including spot markets for left over resource. This is not the case for networking, however. Most of the time, networking is treated as basic infrastructure, and tenants, apart from connectivity, have very little to choose from in terms of network properties such as priorities, bandwidth, or... [ More ]
Colloquium: The Future of Enterprise Application Development in the Cloud

Red Hat Collaboratory at Boston University Colloquium Mark Little Red Hat, Vice President of Engineering and JBoss Middleware CTO The Future of Enterprise Application Development in the Cloud Abstract Since the dawn of the cloud, developers have been inundated with a range of different recommended architectural approaches such as Web Services, REST or microservices, as well as just as many different frameworks or stacks, including AWS, Java EE, Spring Boot and now Eclipse MicroProfile. Throw in the explosion of programming languages, such as Golang and Swift and it's no wonder a developer today could be forgiven for being confused about where is the right place... [ More ]
Colloquium: Towards Tail Latency-Aware Caching in Large Web Services

Red Hat Collaboratory at Boston University Colloquium Daniel S. Berger 2018 Mark Stehlik Postdoctoral Fellow in the Computer Science Department at Carnegie Mellon University Towards Tail Latency-Aware Caching in Large Web Services Abstract Tail latency is of great importance in user-facing web services. However, achieving low tail latency is challenging, because typical user requests result in multiple queries to a variety of complex backends (databases, recommender systems, ad systems, etc.), where the request is not complete until all of its queries have completed. In this talk we present our findings for the case of several large web services at Microsoft. We analyze production system request structures... [ More ]