Colloquium: Towards Tail Latency-Aware Caching in Large Web Services

Red Hat Collaboratory at Boston University Colloquium

Daniel S. Berger

2018 Mark Stehlik Postdoctoral Fellow in the Computer Science Department at Carnegie Mellon University

Towards Tail Latency-Aware Caching in Large Web Services


Tail latency is of great importance in user-facing web services. However, achieving low tail latency is challenging, because typical user requests result in multiple queries to a variety of complex backends (databases, recommender systems, ad systems, etc.), where the request is not complete until all of its queries have completed.

In this talk we present our findings for the case of several large web services at Microsoft. We analyze production system request structures and find that requests vary greatly in the backends that they access and in the number of queries made to each backend. Furthermore, we find that backend query latencies vary by more than two orders of magnitude across backends and vary widely over time, resulting in high request tail latencies.

This talk proposes a novel solution for maintaining low request tail
latency: repurpose existing caches to mitigate the effects of backend latency variability. Our solution, RobinHood, dynamically reallocates cache resources from the cache-rich (backends which don’t affect request latency) to the cache-poor (backends which affect request latency). We evaluate RobinHood with production traces on a 50-server cluster with 20 different backend systems. We find that, in the presence of load spikes, RobinHood meets a 150ms SLO 99.7% of the time, whereas the next best policy only meets this SLO 70% of the time.

The team working on this project includes Benjamin Berg (CMU), Timothy Zhu (Penn State), Mor Harchol-Balter (CMU), and Siddhartha Sen (MSR). Will appear at USENIX OSDI 2018.


Daniel S. Berger is the 2018 Mark Stehlik Postdoctoral Fellow in the Computer Science Department at Carnegie Mellon University. His research interests intersect systems, mathematical modeling, and performance testing. Daniel’s research explores how caching can be used to reduce tail latency in large web services and CDNs. Daniel has received his Ph.D (2018) from the University of Kaiserslautern, Germany, and has spent extended visits at CMU (2015-2017), Warwick University (2014), T-Labs Berlin (2013), ETH Zurich (2012), and at the University of Waterloo (2011). Previously, Daniel worked as a data scientist at the German Cancer Research Center (2008-2010) and as a project scientist at CMU (2017-2018).


  • 11:30 AM – 12:00 PM: Pizza & Networking
  • 12:00 – 1:00 PM: Talk and Discussion


Contact the Collaboratory with any questions you may have about this event.

Recording of Event

This talk was held as scheduled. A recording can be accessed here.  Slides can be accessed here.

View all posts