Accelerating LINPACK with MPI-OpenCL on Clusters of Multi-GPU Nodes.

Saved in:
Bibliographic Details
Title: Accelerating LINPACK with MPI-OpenCL on Clusters of Multi-GPU Nodes.
Authors: Jo, Gangwon1, Nah, Jeongho1, Lee, Jun1, Kim, Jungwon2, Lee, Jaejin1
Source: IEEE Transactions on Parallel & Distributed Systems. Jul2015, Vol. 26 Issue 7, p1814-1825. 12p.
Subjects: LINPACK (Computer system), Graphics processing units, Heterogeneous computing, Communication models, Benchmark testing (Engineering)
Abstract: OpenCL is an open standard to write parallel applications for heterogeneous computing systems. Since its usage is restricted to a single operating system instance, programmers need to use a mix of OpenCL and MPI to program a heterogeneous cluster. In this paper, we introduce an MPI-OpenCL implementation of the LINPACK benchmark for a cluster with multi-GPU nodes. The LINPACK benchmark is one of the most widely used benchmark applications for evaluating high performance computing systems. Our implementation is based on High Performance LINPACK (HPL) and uses the blocked LU decomposition algorithm. We address that optimizations aimed at reducing the overhead of CPUs are necessary to overcome the performance gap between the CPUs and the multiple GPUs. Our LINPACK implementation achieves 93.69 Tflops (46 percent of the theoretical peak) on the target cluster with 49 nodes, each node containing two eight-core CPUs and four GPUs. [ABSTRACT FROM AUTHOR]
Copyright of IEEE Transactions on Parallel & Distributed Systems is the property of IEEE and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Database: Engineering Source
FullText Text:
  Availability: 0
Header DbId: egs
DbLabel: Engineering Source
An: 103222732
AccessLevel: 6
PubType: Academic Journal
PubTypeId: academicJournal
PreciseRelevancyScore: 0
IllustrationInfo
Items – Name: Title
  Label: Title
  Group: Ti
  Data: Accelerating LINPACK with MPI-OpenCL on Clusters of Multi-GPU Nodes.
– Name: Author
  Label: Authors
  Group: Au
  Data: <searchLink fieldCode="AR" term="%22Jo%2C+Gangwon%22">Jo, Gangwon</searchLink><relatesTo>1</relatesTo><br /><searchLink fieldCode="AR" term="%22Nah%2C+Jeongho%22">Nah, Jeongho</searchLink><relatesTo>1</relatesTo><br /><searchLink fieldCode="AR" term="%22Lee%2C+Jun%22">Lee, Jun</searchLink><relatesTo>1</relatesTo><br /><searchLink fieldCode="AR" term="%22Kim%2C+Jungwon%22">Kim, Jungwon</searchLink><relatesTo>2</relatesTo><br /><searchLink fieldCode="AR" term="%22Lee%2C+Jaejin%22">Lee, Jaejin</searchLink><relatesTo>1</relatesTo>
– Name: TitleSource
  Label: Source
  Group: Src
  Data: <searchLink fieldCode="JN" term="%22IEEE+Transactions+on+Parallel+%26+Distributed+Systems%22">IEEE Transactions on Parallel & Distributed Systems</searchLink>. Jul2015, Vol. 26 Issue 7, p1814-1825. 12p.
– Name: Subject
  Label: Subjects
  Group: Su
  Data: <searchLink fieldCode="DE" term="%22LINPACK+%28Computer+system%29%22">LINPACK (Computer system)</searchLink><br /><searchLink fieldCode="DE" term="%22Graphics+processing+units%22">Graphics processing units</searchLink><br /><searchLink fieldCode="DE" term="%22Heterogeneous+computing%22">Heterogeneous computing</searchLink><br /><searchLink fieldCode="DE" term="%22Communication+models%22">Communication models</searchLink><br /><searchLink fieldCode="DE" term="%22Benchmark+testing+%28Engineering%29%22">Benchmark testing (Engineering)</searchLink>
– Name: Abstract
  Label: Abstract
  Group: Ab
  Data: OpenCL is an open standard to write parallel applications for heterogeneous computing systems. Since its usage is restricted to a single operating system instance, programmers need to use a mix of OpenCL and MPI to program a heterogeneous cluster. In this paper, we introduce an MPI-OpenCL implementation of the LINPACK benchmark for a cluster with multi-GPU nodes. The LINPACK benchmark is one of the most widely used benchmark applications for evaluating high performance computing systems. Our implementation is based on High Performance LINPACK (HPL) and uses the blocked LU decomposition algorithm. We address that optimizations aimed at reducing the overhead of CPUs are necessary to overcome the performance gap between the CPUs and the multiple GPUs. Our LINPACK implementation achieves 93.69 Tflops (46 percent of the theoretical peak) on the target cluster with 49 nodes, each node containing two eight-core CPUs and four GPUs. [ABSTRACT FROM AUTHOR]
– Name: AbstractSuppliedCopyright
  Label:
  Group: Ab
  Data: <i>Copyright of IEEE Transactions on Parallel & Distributed Systems is the property of IEEE and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract.</i> (Copyright applies to all Abstracts.)
PLink https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=egs&AN=103222732
RecordInfo BibRecord:
  BibEntity:
    Identifiers:
      – Type: doi
        Value: 10.1109/TPDS.2014.2321742
    Languages:
      – Code: eng
        Text: English
    PhysicalDescription:
      Pagination:
        PageCount: 12
        StartPage: 1814
    Subjects:
      – SubjectFull: LINPACK (Computer system)
        Type: general
      – SubjectFull: Graphics processing units
        Type: general
      – SubjectFull: Heterogeneous computing
        Type: general
      – SubjectFull: Communication models
        Type: general
      – SubjectFull: Benchmark testing (Engineering)
        Type: general
    Titles:
      – TitleFull: Accelerating LINPACK with MPI-OpenCL on Clusters of Multi-GPU Nodes.
        Type: main
  BibRelationships:
    HasContributorRelationships:
      – PersonEntity:
          Name:
            NameFull: Jo, Gangwon
      – PersonEntity:
          Name:
            NameFull: Nah, Jeongho
      – PersonEntity:
          Name:
            NameFull: Lee, Jun
      – PersonEntity:
          Name:
            NameFull: Kim, Jungwon
      – PersonEntity:
          Name:
            NameFull: Lee, Jaejin
    IsPartOfRelationships:
      – BibEntity:
          Dates:
            – D: 01
              M: 07
              Text: Jul2015
              Type: published
              Y: 2015
          Identifiers:
            – Type: issn-print
              Value: 10459219
          Numbering:
            – Type: volume
              Value: 26
            – Type: issue
              Value: 7
          Titles:
            – TitleFull: IEEE Transactions on Parallel & Distributed Systems
              Type: main
ResultId 1