Delete: The Forgotten Operator

The Computer Science Research Seminar Series features a 30 minute virtual talk followed by Q&A. All BU students and faculty are welcome to attend. Abstract: When building new data systems, and their underlying storage managers, typically we optimize performance (access time or throughput), however, as data management tasks are increasingly moving to the cloud and refer to massive data sets a set of other metrics are being considered, including the cost (space amplification, monetary, energy) and the deployment time. In this work we focus on deletion, and we discuss that the typical deletion-through-invalidation approach comes at a cost with respect to space amplification and potentially of the privacy of the users. We focus on the concept of out-of-place deletes which are frequently employed by storage engines that use the log-structure merge (LSM) tree as their backend. LSM-Trees support high ingestion rates with low read/write interference. These benefits, however, come at the cost of treating deletes as a second-class citizen. A delete inserts a tombstone that invalidates older instances of the deleted key. State-of-the-art LSM engines do not provide guarantees as to how fast a tombstone will propagate to persist the deletion. Further, LSM engines only support deletion on the sort key. To delete on another attribute (e.g., timestamp), the entire tree is read and re-written. In this talk we present how LSM-Trees work and present a family of new techniques that allow for timely deletion without compromising privacy or space amplification, leading in small read performance benefits and near-zero increase in the amortized write amplification. Speaker Bio: Manos Athanassoulis is an Assistant Professor of Computer Science at Boston University, Director and Founder of the BU Data-intensive Systems and Computing Laboratory and co-director of the BU Massive Data Algorithms and Systems Group. His research is in the area of data management focusing on building data systems that efficiently exploit modern hardware (computing units, storage, and memories), are deployed in the cloud, and can adapt to the workload both at setup time and, dynamically, at runtime. Before joining Boston University, Manos was a postdoctoral researcher at Harvard School of Engineering and Applied Sciences. Manos obtained his PhD from EPFL, Switzerland, and spent one summer at IBM Research, Watson. Manos’ work is published on top conferences and journals of the community, like ACM SIGMOD, PVLDB, ACM TODS, VLDBJ, and others, and has been recognized by awards like “Best of SIGMOD” in 2016, “Best of VLDB” in 2010 and 2017, and “Most Reproducible Paper” at SIGMOD in 2017. Manos has been acting as a program committee member and technical reviewer in top data management conferences and journals for the past ten years and is chairing ACM SIGMOD Reproducibility as of Fall 2020.

When 11:00 am to 12:00 pm on Friday, December 11, 2020
Contact Name Kimberly Crosta
Phone 6173532566
Contact Email kimrich@bu.edu
Contact Organization BU MET Department of Computer Science
Fees Free
Open To Students faculty staff
Speakers Professor Manos Athanassoulis, Moderated by Professor Kia Teymourian