- Starts: 2:00 pm on Wednesday, June 3, 2026
- Ends: 4:00 pm on Wednesday, June 3, 2026
ECE PhD Thesis Defense: Nicholas Sacco
Title: Straggler Mitigation for Distributed Iterative Methods
Presenter: Nicholas Sacco
Advisor: Professor Bobak Nazer
Chair: Professor Vivek Goyal
Committee: Professor Bobak Nazer, Professor Ashok Cutkosky, Professor Alex Olshevsky, Professor David David Castañón
Google Scholar Link: https://scholar.google.com/citations?user=SNn5y0gAAAAJ&hl=en&oi=ao
Abstract: Many modern scientific and engineering applications involve storing, communicating, and computing with large-scale data matrices. As computational and storage demands continue to grow, distributing the workload across multiple servers is a simple, practical, and effective strategy to obtain significant speedups for a variety of high-dimensional computational problems. However, this parallelization introduces multiple points of failure, as servers often experience delays or errors with completing their assigned tasks and returning results to the main server.
While it is possible to simply wait for these “stragglers” to complete their assigned jobs, coded computation handles the possibility of node delays or errors by preemptively applying an erasure code to the distributed workload. By encoding the target computation with additional redundancy, the main node can reliably reconstruct missing computations from sufficiently many completed tasks, without requiring results from every node.
Various coded computing strategies have been developed for several basic operations, such as matrix-matrix multiplication or matrix-vector multiplication. While many of these techniques can efficiently achieve information-theoretic limits and exactly recovery from straggler-induced erasures, the computational overhead in the reconstruction step may be expensive in certain settings. For instance, iterative methods that refine an estimate over time, or repeatedly perform the same matrix operation may not require exact recovery at every step of the algorithm. Since intermediate results are immediately processed in the next iteration, we may be able to reduce the overhead in each decoding step by relaxing the requirement that missing computations must be recovered exactly.
In this work, we first consider the problem of estimating the principal eigenvector of a positive semi-definite matrix using a distributed version of the power method, subject to random row erasures. We propose a simple recovery strategy that updates coordinates based on data received from the servers, while reusing values from the previous iteration as an estimate for the missing data. We prove that the power method with row erasures converges exponentially fast to the principal eigenvector. The rate of convergence is governed by a modified spectral gap that depends only on the original spectral gap and the erasure ratio. Numerical results validate our theoretical bounds, and we observe in certain settings, our proposed strategy outperforms coded computing and other stochastic competitors.
Next, we consider the problem of solving a system of linear equations using a distributed version of the Jacobi method, again subject to random row erasures. Under the same recovery mechanism, we prove the Jacobi method with row erasures converges exponentially fast to the solution of the system of equations. In this problem, the rate of convergence is governed by a modified infinity norm that depends only on the original infinity norm and the erasure ratio. Simulations also validate these bounds, and we still observe that our approach outperforms coded and other stochastic strategies.
Future work extends these results to generalized linear iterative methods with similar convergence guarantees, and more classes of matrices, such as symmetric matrices.
- Location:
- PHO 339
