Naiad: a system for incremental, iterative and interactive parallel computation: Frank McSherry, MSR

Starts:
11:00 am on Monday, December 17, 2012
Ends:
12:30 pm on Monday, December 17, 2012
Location:
MCS 148
URL:
https://sites.google.com/site/busecuritygroup/calendar
Abstract: We are developing a new system for large-scale data analysis -- called "Naiad" -- which has the goal of supporting complex iterative queries over dynamic inputs at interactive timescales. Like many existing systems, Naiad supports high-level declarative queries, data-parallel execution, and transparent distribution. Unlike these systems, Naiad can efficiently execute queries with multiple (possibly nested) iterative loops, while simultaneously supporting low-latency incremental changes to the query inputs. As a highlight of its characteristics, Naiad can not only efficiently compute the strongly connected component structure of a 24 hour sliding window of the Twitter @mention graph (using a doubly nested fixed-point computation), but also maintains the computation with sub-second latencies in the face of Twitter's full volume of continuously arriving tweets. I will describe the computational model underlying Naiad, a generalization of traditional incremental dataflow to partially ordered logical times, and work through some of the (very friendly, picture oriented) mathematical details. I will also highlight several new distributed systems challenges faced in order to fully realize the multiple orders-of-magnitude performance improvements Naiad presents. This is joint work with Derek Murray, Rebecca Isaacs, Michael Isard and Martn Abadi. Lunch will follow the talk in 135.