Naiad: a system for incremental, iterative and interactive parallel computation: Frank McSherry, MSR

11:00 am on Monday, December 17, 2012
12:30 pm on Monday, December 17, 2012
MCS 148
Abstract: We are developing a new system for large-scale data analysis -- called "Naiad" -- which has the goal of supporting complex iterative queries over dynamic inputs at interactive timescales. Like many existing systems, Naiad supports high-level declarative queries, data-parallel execution, and transparent distribution. Unlike these systems, Naiad can efficiently execute queries with multiple (possibly nested) iterative loops, while simultaneously supporting low-latency incremental changes to the query inputs. As a highlight of its characteristics, Naiad can not only efficiently compute the strongly connected component structure of a 24 hour sliding window of the Twitter @mention graph (using a doubly nested fixed-point computation), but also maintains the computation with sub-second latencies in the face of Twitter's full volume of continuously arriving tweets. I will describe the computational model underlying Naiad, a generalization of traditional incremental dataflow to partially ordered logical times, and work through some of the (very friendly, picture oriented) mathematical details. I will also highlight several new distributed systems challenges faced in order to fully realize the multiple orders-of-magnitude performance improvements Naiad presents. This is joint work with Derek Murray, Rebecca Isaacs, Michael Isard and Martěn Abadi. Lunch will follow the talk in 135.