Runtime Vectorization Transformations of Binary Code.

Saved in:
Bibliographic Details
Title: Runtime Vectorization Transformations of Binary Code.
Authors: Hallou, Nabil1 nabil.hallou@inria.fr, Rohou, Erven1 erven.rohou@inria.fr, Clauss, Philippe2 philippe.clauss@inria.fr
Source: International Journal of Parallel Programming. Dec2017, Vol. 45 Issue 6, p1536-1565. 30p.
Subjects: Binary codes, Vector processing (Computer science), Central processing units, Virtual machine systems, Compilers (Computer programs)
Abstract: In many cases, applications are not optimized for the hardware on which they run. Several reasons contribute to this unsatisfying situation, such as legacy code, commercial code distributed in binary form, or deployment on compute farms. In fact, backward compatibility of ISA guarantees only the functionality, not the best exploitation of the hardware. In this work, we focus on maximizing the CPU efficiency for the SIMD extensions. The first contribution was originally published in the International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation, SAMOS XV, July 2015, Agios Konstantinos, Greece. It is a binary-to-binary optimization framework where loops vectorized for an older version of the processor SIMD extension are automatically converted to a newer one. It is a lightweight mechanism that does not include a vectorizer, but instead leverages what a static vectorizer previously did. We show that many loops compiled for x86 SSE can be dynamically converted to the more recent and more powerful AVX; as well as, how correctness is maintained with regards to challenges such as data dependencies and reductions. We obtain speedups in line with those of a native compiler targeting AVX. The second contribution is the runtime vectorization of loops in binary codes that were not originally vectorized. For this purpose, we use open source frameworks that we have tuned and integrated to (1) dynamically lift the x86 binary into the Intermediate Representation form of the LLVM compiler, (2) abstract hot loops in the polyhedral model, (3) use the power of this mathematical framework to vectorize them, and (4) finally compile them back into executable form using the LLVM Just-In-Time compiler. In most cases, the obtained speedups are close to the number of elements that can be simultaneously processed by the SIMD unit. The re-vectorizer and auto-vectorizer are implemented inside a dynamic optimization platform; it is completely transparent to the user, does not require any rewriting of the binaries, and operates during program execution. [ABSTRACT FROM AUTHOR]
Copyright of International Journal of Parallel Programming is the property of Springer Nature and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Database: Engineering Source
FullText Links:
  – Type: pdflink
Text:
  Availability: 0
Header DbId: egs
DbLabel: Engineering Source
An: 125257203
AccessLevel: 6
PubType: Academic Journal
PubTypeId: academicJournal
PreciseRelevancyScore: 0
IllustrationInfo
Items – Name: Title
  Label: Title
  Group: Ti
  Data: Runtime Vectorization Transformations of Binary Code.
– Name: Author
  Label: Authors
  Group: Au
  Data: <searchLink fieldCode="AR" term="%22Hallou%2C+Nabil%22">Hallou, Nabil</searchLink><relatesTo>1</relatesTo><i> nabil.hallou@inria.fr</i><br /><searchLink fieldCode="AR" term="%22Rohou%2C+Erven%22">Rohou, Erven</searchLink><relatesTo>1</relatesTo><i> erven.rohou@inria.fr</i><br /><searchLink fieldCode="AR" term="%22Clauss%2C+Philippe%22">Clauss, Philippe</searchLink><relatesTo>2</relatesTo><i> philippe.clauss@inria.fr</i>
– Name: TitleSource
  Label: Source
  Group: Src
  Data: <searchLink fieldCode="JN" term="%22International+Journal+of+Parallel+Programming%22">International Journal of Parallel Programming</searchLink>. Dec2017, Vol. 45 Issue 6, p1536-1565. 30p.
– Name: Subject
  Label: Subjects
  Group: Su
  Data: <searchLink fieldCode="DE" term="%22Binary+codes%22">Binary codes</searchLink><br /><searchLink fieldCode="DE" term="%22Vector+processing+%28Computer+science%29%22">Vector processing (Computer science)</searchLink><br /><searchLink fieldCode="DE" term="%22Central+processing+units%22">Central processing units</searchLink><br /><searchLink fieldCode="DE" term="%22Virtual+machine+systems%22">Virtual machine systems</searchLink><br /><searchLink fieldCode="DE" term="%22Compilers+%28Computer+programs%29%22">Compilers (Computer programs)</searchLink>
– Name: Abstract
  Label: Abstract
  Group: Ab
  Data: In many cases, applications are not optimized for the hardware on which they run. Several reasons contribute to this unsatisfying situation, such as legacy code, commercial code distributed in binary form, or deployment on compute farms. In fact, backward compatibility of ISA guarantees only the functionality, not the best exploitation of the hardware. In this work, we focus on maximizing the CPU efficiency for the SIMD extensions. The first contribution was originally published in the International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation, SAMOS XV, July 2015, Agios Konstantinos, Greece. It is a binary-to-binary optimization framework where loops vectorized for an older version of the processor SIMD extension are automatically converted to a newer one. It is a lightweight mechanism that does not include a vectorizer, but instead leverages what a static vectorizer previously did. We show that many loops compiled for x86 SSE can be dynamically converted to the more recent and more powerful AVX; as well as, how correctness is maintained with regards to challenges such as data dependencies and reductions. We obtain speedups in line with those of a native compiler targeting AVX. The second contribution is the runtime vectorization of loops in binary codes that were not originally vectorized. For this purpose, we use open source frameworks that we have tuned and integrated to (1) dynamically lift the x86 binary into the Intermediate Representation form of the LLVM compiler, (2) abstract hot loops in the polyhedral model, (3) use the power of this mathematical framework to vectorize them, and (4) finally compile them back into executable form using the LLVM Just-In-Time compiler. In most cases, the obtained speedups are close to the number of elements that can be simultaneously processed by the SIMD unit. The re-vectorizer and auto-vectorizer are implemented inside a dynamic optimization platform; it is completely transparent to the user, does not require any rewriting of the binaries, and operates during program execution. [ABSTRACT FROM AUTHOR]
– Name: AbstractSuppliedCopyright
  Label:
  Group: Ab
  Data: <i>Copyright of International Journal of Parallel Programming is the property of Springer Nature and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract.</i> (Copyright applies to all Abstracts.)
PLink https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=egs&AN=125257203
RecordInfo BibRecord:
  BibEntity:
    Identifiers:
      – Type: doi
        Value: 10.1007/s10766-016-0480-z
    Languages:
      – Code: eng
        Text: English
    PhysicalDescription:
      Pagination:
        PageCount: 30
        StartPage: 1536
    Subjects:
      – SubjectFull: Binary codes
        Type: general
      – SubjectFull: Vector processing (Computer science)
        Type: general
      – SubjectFull: Central processing units
        Type: general
      – SubjectFull: Virtual machine systems
        Type: general
      – SubjectFull: Compilers (Computer programs)
        Type: general
    Titles:
      – TitleFull: Runtime Vectorization Transformations of Binary Code.
        Type: main
  BibRelationships:
    HasContributorRelationships:
      – PersonEntity:
          Name:
            NameFull: Hallou, Nabil
      – PersonEntity:
          Name:
            NameFull: Rohou, Erven
      – PersonEntity:
          Name:
            NameFull: Clauss, Philippe
    IsPartOfRelationships:
      – BibEntity:
          Dates:
            – D: 01
              M: 12
              Text: Dec2017
              Type: published
              Y: 2017
          Identifiers:
            – Type: issn-print
              Value: 08857458
          Numbering:
            – Type: volume
              Value: 45
            – Type: issue
              Value: 6
          Titles:
            – TitleFull: International Journal of Parallel Programming
              Type: main
ResultId 1