Optimization and performance evaluation of the IDR iterative Krylov solver on GPUs

dc.contributor.authorAnzt, Hartwig
dc.contributor.authorKreutzer, Moritz
dc.contributor.authorPonce, Eduardo
dc.contributor.authorPeterson, Gregory D.
dc.contributor.authorWellein, Gerhard
dc.contributor.authorDongarra, Jack
dc.date.accessioned2020-04-21
dc.date.available2020-04-21
dc.date.created2018
dc.date.issued2020-04-21
dc.description.abstractIn this paper, we present an optimized GPU implementation for the induced dimension reduction algorithm. We improve data locality, combine it with an efficient sparse matrix vector kernel, and investigate the potential of overlapping computation with communication as well as the possibility of concurrent kernel execution. A comprehensive performance evaluation is conducted using a suitable performance model. The analysis reveals efficiency of up to 90%, which indicates that the implementation achieves performance close to the theoretically attainable bound.en
dc.identifier.citationThe International Journal of High Performance Computing Applications 32. 2 (2018): 220 - 230. <https://journals.sagepub.com/doi/full/10.1177/1094342016646844>
dc.identifier.doihttps://doi.org/10.1177/1094342016646844
dc.identifier.opus-id13541
dc.identifier.urihttps://open.fau.de/handle/openfau/13541
dc.identifier.urnurn:nbn:de:bvb:29-opus4-135418
dc.language.isoen
dc.rights.urihttp://www.gesetze-im-internet.de/urhg/index.html
dc.subjectInduced dimension reduction (IDR)
dc.subjectGPU
dc.subjectco-design
dc.subjectkernel fusion
dc.subjectkernel overlap
dc.subjectroofline performance model
dc.subject.ddcDDC Classification::0 Informatik, Informationswissenschaft, allgemeine Werke :: 00 Informatik, Wissen, Systeme :: 000 Informatik, Informationswissenschaft, allgemeine Werke
dc.titleOptimization and performance evaluation of the IDR iterative Krylov solver on GPUsen
dc.typearticle
dcterms.publisherFriedrich-Alexander-Universität Erlangen-Nürnberg (FAU)
local.journal.issue2
local.journal.titleThe International Journal of High Performance Computing Applications
local.journal.volume32
local.sendToDnbfree*
local.subject.fakultaetZentrale Universitätseinrichtung / Regionales Rechenzentrum (RRZE)
local.subject.sammlungUniversität Erlangen-Nürnberg / Allianzlizenzen: Alle Beiträge sind mit Zustimmung der Rechteinhaber aufgrund einer DFG-geförderten Allianzlizenz frei zugänglich. / Allianzlizenzen 2018
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
13541_1094342016646844.pdf
Size:
1.17 MB
Format:
Adobe Portable Document Format
Description: