Optimizing Linpack Benchmark on GPU-Accelerated Petascale Supercomputer.

Saved in:
Bibliographic Details
Title: Optimizing Linpack Benchmark on GPU-Accelerated Petascale Supercomputer.
Authors: Wang, Feng1 fengwang@nudt.edu.cn, Yang, Can-Qun1 canqun@nudt.edu.cn, Du, Yun-Fei1 duyunfei@nudt.edu.cn, Chen, Juan1 juanchen@nudt.edu.cn, Yi, Hui-Zhan1 huizhanyi@nudt.edu.cn, Xu, Wei-Xia1 xuwx@nudt.edu.cn
Source: Journal of Computer Science & Technology (10009000). Sep2011, Vol. 26 Issue 5, p854-865. 12p.
Subjects: LINPACK (Computer system), Benchmarking (Management), Graphics processing units, Supercomputers, Program transformation, Heterogeneous computing
Geographic Terms: China
Abstract: In this paper we present the programming of the Linpack benchmark on TianHe-1 system, the first petascale supercomputer system of China, and the largest GPU-accelerated heterogeneous system ever attempted before. A hybrid programming model consisting of MPI, OpenMP and streaming computing is described to explore the task parallel, thread parallel and data parallel of the Linpack. We explain how we optimized the load distribution across the CPUs and GPUs using the two-level adaptive method and describe the implementation in details. To overcome the low-bandwidth between the CPU and GPU communication, we present a software pipelining technique to hide the communication overhead. Combined with other traditional optimizations, the Linpack we developed achieved 196 :7 GFLOPS on a single compute element of TianHe-1. This result is 70 :1% of the peak compute capability, 3 :3 times faster than the result by using the vendor's library. On the full configuration of TianHe-1 our optimizations resulted in a Linpack performance of 0 :563 PFLOPS, which made TianHe-1 the 5th fastest supercomputer on the Top500 list in November, 2009. [ABSTRACT FROM AUTHOR]
Copyright of Journal of Computer Science & Technology (10009000) is the property of Springer Nature and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Database: Engineering Source
FullText Text:
  Availability: 0
Header DbId: egs
DbLabel: Engineering Source
An: 65796946
AccessLevel: 6
PubType: Academic Journal
PubTypeId: academicJournal
PreciseRelevancyScore: 0
IllustrationInfo
Items – Name: Title
  Label: Title
  Group: Ti
  Data: Optimizing Linpack Benchmark on GPU-Accelerated Petascale Supercomputer.
– Name: Author
  Label: Authors
  Group: Au
  Data: <searchLink fieldCode="AR" term="%22Wang%2C+Feng%22">Wang, Feng</searchLink><relatesTo>1</relatesTo><i> fengwang@nudt.edu.cn</i><br /><searchLink fieldCode="AR" term="%22Yang%2C+Can-Qun%22">Yang, Can-Qun</searchLink><relatesTo>1</relatesTo><i> canqun@nudt.edu.cn</i><br /><searchLink fieldCode="AR" term="%22Du%2C+Yun-Fei%22">Du, Yun-Fei</searchLink><relatesTo>1</relatesTo><i> duyunfei@nudt.edu.cn</i><br /><searchLink fieldCode="AR" term="%22Chen%2C+Juan%22">Chen, Juan</searchLink><relatesTo>1</relatesTo><i> juanchen@nudt.edu.cn</i><br /><searchLink fieldCode="AR" term="%22Yi%2C+Hui-Zhan%22">Yi, Hui-Zhan</searchLink><relatesTo>1</relatesTo><i> huizhanyi@nudt.edu.cn</i><br /><searchLink fieldCode="AR" term="%22Xu%2C+Wei-Xia%22">Xu, Wei-Xia</searchLink><relatesTo>1</relatesTo><i> xuwx@nudt.edu.cn</i>
– Name: TitleSource
  Label: Source
  Group: Src
  Data: <searchLink fieldCode="JN" term="%22Journal+of+Computer+Science+%26+Technology+%2810009000%29%22">Journal of Computer Science & Technology (10009000)</searchLink>. Sep2011, Vol. 26 Issue 5, p854-865. 12p.
– Name: Subject
  Label: Subjects
  Group: Su
  Data: <searchLink fieldCode="DE" term="%22LINPACK+%28Computer+system%29%22">LINPACK (Computer system)</searchLink><br /><searchLink fieldCode="DE" term="%22Benchmarking+%28Management%29%22">Benchmarking (Management)</searchLink><br /><searchLink fieldCode="DE" term="%22Graphics+processing+units%22">Graphics processing units</searchLink><br /><searchLink fieldCode="DE" term="%22Supercomputers%22">Supercomputers</searchLink><br /><searchLink fieldCode="DE" term="%22Program+transformation%22">Program transformation</searchLink><br /><searchLink fieldCode="DE" term="%22Heterogeneous+computing%22">Heterogeneous computing</searchLink>
– Name: SubjectGeographic
  Label: Geographic Terms
  Group: Su
  Data: <searchLink fieldCode="DE" term="%22China%22">China</searchLink>
– Name: Abstract
  Label: Abstract
  Group: Ab
  Data: In this paper we present the programming of the Linpack benchmark on TianHe-1 system, the first petascale supercomputer system of China, and the largest GPU-accelerated heterogeneous system ever attempted before. A hybrid programming model consisting of MPI, OpenMP and streaming computing is described to explore the task parallel, thread parallel and data parallel of the Linpack. We explain how we optimized the load distribution across the CPUs and GPUs using the two-level adaptive method and describe the implementation in details. To overcome the low-bandwidth between the CPU and GPU communication, we present a software pipelining technique to hide the communication overhead. Combined with other traditional optimizations, the Linpack we developed achieved 196 :7 GFLOPS on a single compute element of TianHe-1. This result is 70 :1% of the peak compute capability, 3 :3 times faster than the result by using the vendor's library. On the full configuration of TianHe-1 our optimizations resulted in a Linpack performance of 0 :563 PFLOPS, which made TianHe-1 the 5th fastest supercomputer on the Top500 list in November, 2009. [ABSTRACT FROM AUTHOR]
– Name: AbstractSuppliedCopyright
  Label:
  Group: Ab
  Data: <i>Copyright of Journal of Computer Science & Technology (10009000) is the property of Springer Nature and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract.</i> (Copyright applies to all Abstracts.)
PLink https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=egs&AN=65796946
RecordInfo BibRecord:
  BibEntity:
    Identifiers:
      – Type: doi
        Value: 10.1007/s11390-011-0184-1
    Languages:
      – Code: eng
        Text: English
    PhysicalDescription:
      Pagination:
        PageCount: 12
        StartPage: 854
    Subjects:
      – SubjectFull: LINPACK (Computer system)
        Type: general
      – SubjectFull: Benchmarking (Management)
        Type: general
      – SubjectFull: Graphics processing units
        Type: general
      – SubjectFull: Supercomputers
        Type: general
      – SubjectFull: Program transformation
        Type: general
      – SubjectFull: Heterogeneous computing
        Type: general
      – SubjectFull: China
        Type: general
    Titles:
      – TitleFull: Optimizing Linpack Benchmark on GPU-Accelerated Petascale Supercomputer.
        Type: main
  BibRelationships:
    HasContributorRelationships:
      – PersonEntity:
          Name:
            NameFull: Wang, Feng
      – PersonEntity:
          Name:
            NameFull: Yang, Can-Qun
      – PersonEntity:
          Name:
            NameFull: Du, Yun-Fei
      – PersonEntity:
          Name:
            NameFull: Chen, Juan
      – PersonEntity:
          Name:
            NameFull: Yi, Hui-Zhan
      – PersonEntity:
          Name:
            NameFull: Xu, Wei-Xia
    IsPartOfRelationships:
      – BibEntity:
          Dates:
            – D: 01
              M: 09
              Text: Sep2011
              Type: published
              Y: 2011
          Identifiers:
            – Type: issn-print
              Value: 10009000
          Numbering:
            – Type: volume
              Value: 26
            – Type: issue
              Value: 5
          Titles:
            – TitleFull: Journal of Computer Science & Technology (10009000)
              Type: main
ResultId 1