Data race avoidance and replay scheme for developing and debugging parallel programs on distributed shared memory systems

Saved in:
Bibliographic Details
Title: Data race avoidance and replay scheme for developing and debugging parallel programs on distributed shared memory systems
Authors: Chiu, Yung-Chang1 qson@hpds.ee.ncku.edu.tw, Shieh, Ce-Kuen1 shieh@hpds.ee.ncku.edu.tw, Huang, Tzu-Chi2 tzuchi.phd@gmail.com, Liang, Tyng-Yeu3 lty@mail.ee.kuas.edu.tw, Chu, Kuo-Chih2 kcchu@mail.lhu.edu.tw
Source: Parallel Computing. Jan2011, Vol. 37 Issue 1, p11-25. 15p.
Subjects: Debugging, Parallel programs (Computer programs), Distributed shared memory, Distributed computing, Computer network protocols, Threads (Computer programs)
Abstract: Abstract: Distributed shared memory (DSM) allows parallel programs to run on distributed computers by simulating a global virtual shared memory, but data racing bugs may easily occur when the threads of a multi-threaded process concurrently access the physically distributed memory. Earlier tools to help programmers locate data racing bugs in non-DSM parallel programs are not easily applied to DSM systems. This study presents the data race avoidance and replay scheme (DRARS) to assist debugging parallel programs on DSM or multi-core systems. DRARS is a novel tool which controls the consistency protocol of the target program, automatically preventing a large class of data racing bugs when the parallel program is subsequently run, obviating much of the need for manual debugging. For data racing bugs that cannot be avoided automatically, DRARS performs a deterministic replay-type function on DSM systems, faithfully reproducing the behavior of the parallel program during run time. Because one class of data racing bugs has already been eliminated, the remaining manual debugging task is greatly simplified. Unlike previous debugging methods, DRARS does not require that the parallel program be written in a specific style or programming language. Moreover, DRARS can be implemented in most consistency protocols. In this paper, DRARS is realized and verified in real experiments using the eager release consistency protocol on a DSM system with various applications. [Copyright &y& Elsevier]
Copyright of Parallel Computing is the property of Elsevier B.V. and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Database: Engineering Source
FullText Text:
  Availability: 0
Header DbId: egs
DbLabel: Engineering Source
An: 56495919
AccessLevel: 6
PubType: Academic Journal
PubTypeId: academicJournal
PreciseRelevancyScore: 0
IllustrationInfo
Items – Name: Title
  Label: Title
  Group: Ti
  Data: Data race avoidance and replay scheme for developing and debugging parallel programs on distributed shared memory systems
– Name: Author
  Label: Authors
  Group: Au
  Data: <searchLink fieldCode="AR" term="%22Chiu%2C+Yung-Chang%22">Chiu, Yung-Chang</searchLink><relatesTo>1</relatesTo><i> qson@hpds.ee.ncku.edu.tw</i><br /><searchLink fieldCode="AR" term="%22Shieh%2C+Ce-Kuen%22">Shieh, Ce-Kuen</searchLink><relatesTo>1</relatesTo><i> shieh@hpds.ee.ncku.edu.tw</i><br /><searchLink fieldCode="AR" term="%22Huang%2C+Tzu-Chi%22">Huang, Tzu-Chi</searchLink><relatesTo>2</relatesTo><i> tzuchi.phd@gmail.com</i><br /><searchLink fieldCode="AR" term="%22Liang%2C+Tyng-Yeu%22">Liang, Tyng-Yeu</searchLink><relatesTo>3</relatesTo><i> lty@mail.ee.kuas.edu.tw</i><br /><searchLink fieldCode="AR" term="%22Chu%2C+Kuo-Chih%22">Chu, Kuo-Chih</searchLink><relatesTo>2</relatesTo><i> kcchu@mail.lhu.edu.tw</i>
– Name: TitleSource
  Label: Source
  Group: Src
  Data: <searchLink fieldCode="JN" term="%22Parallel+Computing%22">Parallel Computing</searchLink>. Jan2011, Vol. 37 Issue 1, p11-25. 15p.
– Name: Subject
  Label: Subjects
  Group: Su
  Data: <searchLink fieldCode="DE" term="%22Debugging%22">Debugging</searchLink><br /><searchLink fieldCode="DE" term="%22Parallel+programs+%28Computer+programs%29%22">Parallel programs (Computer programs)</searchLink><br /><searchLink fieldCode="DE" term="%22Distributed+shared+memory%22">Distributed shared memory</searchLink><br /><searchLink fieldCode="DE" term="%22Distributed+computing%22">Distributed computing</searchLink><br /><searchLink fieldCode="DE" term="%22Computer+network+protocols%22">Computer network protocols</searchLink><br /><searchLink fieldCode="DE" term="%22Threads+%28Computer+programs%29%22">Threads (Computer programs)</searchLink>
– Name: Abstract
  Label: Abstract
  Group: Ab
  Data: Abstract: Distributed shared memory (DSM) allows parallel programs to run on distributed computers by simulating a global virtual shared memory, but data racing bugs may easily occur when the threads of a multi-threaded process concurrently access the physically distributed memory. Earlier tools to help programmers locate data racing bugs in non-DSM parallel programs are not easily applied to DSM systems. This study presents the data race avoidance and replay scheme (DRARS) to assist debugging parallel programs on DSM or multi-core systems. DRARS is a novel tool which controls the consistency protocol of the target program, automatically preventing a large class of data racing bugs when the parallel program is subsequently run, obviating much of the need for manual debugging. For data racing bugs that cannot be avoided automatically, DRARS performs a deterministic replay-type function on DSM systems, faithfully reproducing the behavior of the parallel program during run time. Because one class of data racing bugs has already been eliminated, the remaining manual debugging task is greatly simplified. Unlike previous debugging methods, DRARS does not require that the parallel program be written in a specific style or programming language. Moreover, DRARS can be implemented in most consistency protocols. In this paper, DRARS is realized and verified in real experiments using the eager release consistency protocol on a DSM system with various applications. [Copyright &y& Elsevier]
– Name: AbstractSuppliedCopyright
  Label:
  Group: Ab
  Data: <i>Copyright of Parallel Computing is the property of Elsevier B.V. and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract.</i> (Copyright applies to all Abstracts.)
PLink https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=egs&AN=56495919
RecordInfo BibRecord:
  BibEntity:
    Identifiers:
      – Type: doi
        Value: 10.1016/j.parco.2010.09.002
    Languages:
      – Code: eng
        Text: English
    PhysicalDescription:
      Pagination:
        PageCount: 15
        StartPage: 11
    Subjects:
      – SubjectFull: Debugging
        Type: general
      – SubjectFull: Parallel programs (Computer programs)
        Type: general
      – SubjectFull: Distributed shared memory
        Type: general
      – SubjectFull: Distributed computing
        Type: general
      – SubjectFull: Computer network protocols
        Type: general
      – SubjectFull: Threads (Computer programs)
        Type: general
    Titles:
      – TitleFull: Data race avoidance and replay scheme for developing and debugging parallel programs on distributed shared memory systems
        Type: main
  BibRelationships:
    HasContributorRelationships:
      – PersonEntity:
          Name:
            NameFull: Chiu, Yung-Chang
      – PersonEntity:
          Name:
            NameFull: Shieh, Ce-Kuen
      – PersonEntity:
          Name:
            NameFull: Huang, Tzu-Chi
      – PersonEntity:
          Name:
            NameFull: Liang, Tyng-Yeu
      – PersonEntity:
          Name:
            NameFull: Chu, Kuo-Chih
    IsPartOfRelationships:
      – BibEntity:
          Dates:
            – D: 01
              M: 01
              Text: Jan2011
              Type: published
              Y: 2011
          Identifiers:
            – Type: issn-print
              Value: 01678191
          Numbering:
            – Type: volume
              Value: 37
            – Type: issue
              Value: 1
          Titles:
            – TitleFull: Parallel Computing
              Type: main
ResultId 1