Assistant Professor of Computer Science Jonathan Appavoo and colleagues from other institutions have received a grant from the US Department of Energy to support research on “A Fault-Oblivious Extreme-Scale Execution Environment.” Exascale supercomputers are expected to feature billions of threads, concurrent streams of instructions, and 10s of millions of CPUs. The construction of software that can effectively utilize these machines is a critical challenge. It requires the careful decomposition of software into concurrent threads of execution and mapping them on to the different hardware components such that the communication overheads do not dominate applications execution time. In addition, at the exascale, software must cope with the fact that the sheer magnitude of physical devices will lead to failure rates measured in minutes.
The team proposes the construction of a fault-oblivious extreme-scale execution environment that will address these issues in a general-purpose way, thus enabling and simplifying application development. This is a collaborative effort, with a total award to the team of approximately $2.3 million for three years. The team for this research project includes, in addition to Appavoo: Ronald G Minnich, Curtis L. Janssen (Sandia National Labs), Sriram Krishnamoorthy, Andres Marquez (Pacific Northwest National Lab), Maya Gokhale (Lawrence Livermore National Lab), P. Sadayappan (Ohio State University), Eric Van Hensbergen (IBM), and Jim McKie (Alcatel-Lucent Bell Labs).