A tuning approach for iterative multiple 3d stencil pipeline on GPUs: Anisotropic Nonlinear Diffusion algorithm as case study.

Saved in:
Bibliographic Details
Title: A tuning approach for iterative multiple 3d stencil pipeline on GPUs: Anisotropic Nonlinear Diffusion algorithm as case study.
Authors: Tabik, S.1 siham@ugr.es, Peemen, M.2, Romero, L. F.3
Source: Journal of Supercomputing. Apr2018, Vol. 74 Issue 4, p1580-1608. 29p.
Subjects: Graphics processing units, OpenCL (Computer program language), Distributed shared memory, Partial differential equations, NVIDIA Corp.
Abstract: This paper focuses on challenging applications that can be expressed as an iterative pipeline of multiple 3d stencil stages and explores their optimization space on GPUs. For this study, we selected a representative example from the field of digital signal processing, the Anisotropic Nonlinear Diffusion algorithm. An open issue to these applications is to determine the optimal fission/fusion level of the involved stages and whether that combination benefits from data tiling. This implies exploring a large space of all the possible fission/fusion combinations with and without tiling, thus making the process non-trivial. This study provides insights to reduce the optimization tuning space and programming effort of iterative multiple 3d stencils. Our results demonstrate that all combinations that fuse the bottleneck stencil with high halos update cost (>25%, this percentage can be measured or estimated experimentally for each single stencil) and high registers and shared memory accesses must not be considered in the exploration process. The optimal fission/fusion combination is up to 1.65× faster than the case in which we fully decompose our stencil without tiling and 5.3× faster with respect to the fully fused version on the NVIDIA GPUs. [ABSTRACT FROM AUTHOR]
Copyright of Journal of Supercomputing is the property of Springer Nature and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Database: Engineering Source
Full text is not displayed to guests.
FullText Links:
  – Type: pdflink
Text:
  Availability: 1
Header DbId: egs
DbLabel: Engineering Source
An: 128656642
AccessLevel: 6
PubType: Academic Journal
PubTypeId: academicJournal
PreciseRelevancyScore: 0
IllustrationInfo
Items – Name: Title
  Label: Title
  Group: Ti
  Data: A tuning approach for iterative multiple 3d stencil pipeline on GPUs: Anisotropic Nonlinear Diffusion algorithm as case study.
– Name: Author
  Label: Authors
  Group: Au
  Data: <searchLink fieldCode="AR" term="%22Tabik%2C+S%2E%22">Tabik, S.</searchLink><relatesTo>1</relatesTo><i> siham@ugr.es</i><br /><searchLink fieldCode="AR" term="%22Peemen%2C+M%2E%22">Peemen, M.</searchLink><relatesTo>2</relatesTo><br /><searchLink fieldCode="AR" term="%22Romero%2C+L%2E+F%2E%22">Romero, L. F.</searchLink><relatesTo>3</relatesTo>
– Name: TitleSource
  Label: Source
  Group: Src
  Data: <searchLink fieldCode="JN" term="%22Journal+of+Supercomputing%22">Journal of Supercomputing</searchLink>. Apr2018, Vol. 74 Issue 4, p1580-1608. 29p.
– Name: Subject
  Label: Subjects
  Group: Su
  Data: <searchLink fieldCode="DE" term="%22Graphics+processing+units%22">Graphics processing units</searchLink><br /><searchLink fieldCode="DE" term="%22OpenCL+%28Computer+program+language%29%22">OpenCL (Computer program language)</searchLink><br /><searchLink fieldCode="DE" term="%22Distributed+shared+memory%22">Distributed shared memory</searchLink><br /><searchLink fieldCode="DE" term="%22Partial+differential+equations%22">Partial differential equations</searchLink><br /><searchLink fieldCode="DE" term="%22NVIDIA+Corp%2E%22">NVIDIA Corp.</searchLink>
– Name: Abstract
  Label: Abstract
  Group: Ab
  Data: This paper focuses on challenging applications that can be expressed as an iterative pipeline of multiple 3d stencil stages and explores their optimization space on GPUs. For this study, we selected a representative example from the field of digital signal processing, the Anisotropic Nonlinear Diffusion algorithm. An open issue to these applications is to determine the optimal fission/fusion level of the involved stages and whether that combination benefits from data tiling. This implies exploring a large space of all the possible fission/fusion combinations with and without tiling, thus making the process non-trivial. This study provides insights to reduce the optimization tuning space and programming effort of iterative multiple 3d stencils. Our results demonstrate that all combinations that fuse the bottleneck stencil with high halos update cost (>25%<inline-graphic></inline-graphic>, this percentage can be measured or estimated experimentally for each single stencil) and high registers and shared memory accesses must not be considered in the exploration process. The optimal fission/fusion combination is up to 1.65×<inline-graphic></inline-graphic> faster than the case in which we fully decompose our stencil without tiling and 5.3×<inline-graphic></inline-graphic> faster with respect to the fully fused version on the NVIDIA GPUs. [ABSTRACT FROM AUTHOR]
– Name: AbstractSuppliedCopyright
  Label:
  Group: Ab
  Data: <i>Copyright of Journal of Supercomputing is the property of Springer Nature and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract.</i> (Copyright applies to all Abstracts.)
PLink https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=egs&AN=128656642
RecordInfo BibRecord:
  BibEntity:
    Identifiers:
      – Type: doi
        Value: 10.1007/s11227-017-2184-6
    Languages:
      – Code: eng
        Text: English
    PhysicalDescription:
      Pagination:
        PageCount: 29
        StartPage: 1580
    Subjects:
      – SubjectFull: Graphics processing units
        Type: general
      – SubjectFull: OpenCL (Computer program language)
        Type: general
      – SubjectFull: Distributed shared memory
        Type: general
      – SubjectFull: Partial differential equations
        Type: general
      – SubjectFull: NVIDIA Corp.
        Type: general
    Titles:
      – TitleFull: A tuning approach for iterative multiple 3d stencil pipeline on GPUs: Anisotropic Nonlinear Diffusion algorithm as case study.
        Type: main
  BibRelationships:
    HasContributorRelationships:
      – PersonEntity:
          Name:
            NameFull: Tabik, S.
      – PersonEntity:
          Name:
            NameFull: Peemen, M.
      – PersonEntity:
          Name:
            NameFull: Romero, L. F.
    IsPartOfRelationships:
      – BibEntity:
          Dates:
            – D: 01
              M: 04
              Text: Apr2018
              Type: published
              Y: 2018
          Identifiers:
            – Type: issn-print
              Value: 09208542
          Numbering:
            – Type: volume
              Value: 74
            – Type: issue
              Value: 4
          Titles:
            – TitleFull: Journal of Supercomputing
              Type: main
ResultId 1