{"id":138,"date":"2010-06-26T15:50:01","date_gmt":"2010-06-26T19:50:01","guid":{"rendered":"https:\/\/www.bu.edu\/pasi\/?page_id=138"},"modified":"2011-01-31T18:59:41","modified_gmt":"2011-01-31T23:59:41","slug":"keynote-lectures","status":"publish","type":"page","link":"https:\/\/www.bu.edu\/pasi\/program\/keynote-lectures\/","title":{"rendered":"Keynote Lectures"},"content":{"rendered":"<p><img loading=\"lazy\" class=\"alignleft size-full wp-image-140\" src=\"\/pasi\/files\/2010\/06\/DavidKeyes200x200.png\" alt=\"DavidKeyes200x200\" width=\"200\" height=\"200\" srcset=\"https:\/\/www.bu.edu\/pasi\/files\/2010\/06\/DavidKeyes200x200.png 200w, https:\/\/www.bu.edu\/pasi\/files\/2010\/06\/DavidKeyes200x200-150x150.png 150w\" sizes=\"(max-width: 200px) 100vw, 200px\" \/><\/p>\n<h2>Exaflop\/s, Seriously!<\/h2>\n<h3>David Keyes, Columbia University and King Abdullah University of Science and Technology (KAUST)<\/h3>\n<p>Sustained floating-point computation rates on real applications, as tracked by the ACM Gordon Bell Prize, increased by three orders of magnitude from 1988 (1 Gigaflop\/s) to 1998 (1 Teraflop\/s), and by another three orders of magnitude to 2008 (1 Petaflop\/s). \u00a0Computer engineering provided only a couple of orders of magnitude of improvement for individual cores over that period; the remaining factor came from concurrency, which is approaching one million-fold.<\/p>\n<p>Algorithmic improvements contributed meanwhile to making each flop more valuable scientifically. \u00a0As the semiconductor industry now slips relative to its own roadmap for silicon-based logic and memory, concurrency, especially on-chip many-core concurrency and GPGPU SIMD-type concurrency, will play an increasing role in the next few orders of magnitude, to arrive at the ambitious target of 1 Exaflop\/s, extrapolated for 2018. \u00a0An important question is whether today&#8217;s best algorithms are efficiently hosted on such hardware and how much co-design of algorithms and architecture will be required.<\/p>\n<p>From the applications perspective, we illustrate eight reasons why today&#8217;s computational scientists have an insatiable appetite for such performance: resolution, fidelity, dimension, artificial boundaries, parameter inversion, optimal control, uncertainty quantification, and the statistics of ensembles.<\/p>\n<p>The paths to the exascale summit are debated, but all are narrow and treacherous, constrained by fundamental laws of physics, cost, power consumption, programmability and reliability. Drawing on recent reports, workshops, vendor projections, and experiences with scientific codes on contemporary platforms, we propose roles for today&#8217;s graduate researchers in one of the great global scientific quests of the next decade.<\/p>\n<p style=\"height: 50px;\">\n<p><img loading=\"lazy\" class=\"alignleft size-full wp-image-141\" src=\"\/pasi\/files\/2010\/06\/Aoki200x200.png\" alt=\"Aoki200x200\" width=\"200\" height=\"201\" srcset=\"https:\/\/www.bu.edu\/pasi\/files\/2010\/06\/Aoki200x200.png 200w, https:\/\/www.bu.edu\/pasi\/files\/2010\/06\/Aoki200x200-150x150.png 150w\" sizes=\"(max-width: 200px) 100vw, 200px\" \/><\/p>\n<h2>Tsunami Simulation on GPUs<\/h2>\n<h3>Takayuki Aoki, Tokyo Institute of Technology<\/h3>\n<p><strong>Thursday, January 6th 2011<\/strong><\/p>\n<p>Tsunamis are destructive forces of nature and thus their accurate forecast and early warning is extremely important. \u00a0In order to predict a tsunami, the Shallow Water Equations must be solved in real-time. To solve these equations the CIP-CSL2 and the method of characteristics can be used.\u00a0A new, outstanding way to speed up these computations uses GPUs to drastically accelerate the computation in a highly parallelized environment.<\/p>\n<p>A single-GPU calculation has been found to be 62-times faster than using a single CPU core (Intel i7). We also have applied domain decomposition to solve the problem on a multi-node GPU cluster. Two transfer models were used, synchronous and asynchronous models. \u00a0In the synchronous model, the computing stops while the transfers are done, whilst in the asynchronous model, the computing and transfers are done simultaneously. Overlapping transfers and computation further accelerated the process by hiding communication. \u00a0Because GPU to GPU transfers are not possible, the CPU must be used as a bridge to share information between neighbors. \u00a0Therefore, for the GPU transfers an asynchronous-copy model was used, and the MPI library was used to transfer the data between nodes.<\/p>\n<p>A domain representing real bathymetry was used as our dataset, with a grid size of 4096&#215;8192 and 90m resolution. \u00a0Our tests on the supercomputer TSUBAME showed excellent scalability. \u00a0Also on the TSUBAME GPU cluster, consisting of Tesla S1060 \u00a0cards, impressive results were obtained, <em>e.g.<\/em>, 1000 CPUs were required to match the performance of 8 GPU&#8217;s.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Exaflop\/s, Seriously! David Keyes, Columbia University and King Abdullah University of Science and Technology (KAUST) Sustained floating-point computation rates on real applications, as tracked by the ACM Gordon Bell Prize, increased by three orders of magnitude from 1988 (1 Gigaflop\/s) to 1998 (1 Teraflop\/s), and by another three orders of magnitude to 2008 (1 Petaflop\/s). [&hellip;]<\/p>\n","protected":false},"author":3344,"featured_media":0,"parent":46,"menu_order":2,"comment_status":"closed","ping_status":"open","template":"","meta":[],"_links":{"self":[{"href":"https:\/\/www.bu.edu\/pasi\/wp-json\/wp\/v2\/pages\/138"}],"collection":[{"href":"https:\/\/www.bu.edu\/pasi\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/www.bu.edu\/pasi\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/www.bu.edu\/pasi\/wp-json\/wp\/v2\/users\/3344"}],"replies":[{"embeddable":true,"href":"https:\/\/www.bu.edu\/pasi\/wp-json\/wp\/v2\/comments?post=138"}],"version-history":[{"count":16,"href":"https:\/\/www.bu.edu\/pasi\/wp-json\/wp\/v2\/pages\/138\/revisions"}],"predecessor-version":[{"id":1175,"href":"https:\/\/www.bu.edu\/pasi\/wp-json\/wp\/v2\/pages\/138\/revisions\/1175"}],"up":[{"embeddable":true,"href":"https:\/\/www.bu.edu\/pasi\/wp-json\/wp\/v2\/pages\/46"}],"wp:attachment":[{"href":"https:\/\/www.bu.edu\/pasi\/wp-json\/wp\/v2\/media?parent=138"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}