Language-based vectorization and parallelization using intrinsics, OpenMP, TBB and Cilk Plus.
Saved in:
| Title: | Language-based vectorization and parallelization using intrinsics, OpenMP, TBB and Cilk Plus. |
|---|---|
| Authors: | Stpiczyński, Przemysław1 przem@hektor.umcs.lublin.pl |
| Source: | Journal of Supercomputing. Apr2018, Vol. 74 Issue 4, p1461-1472. 12p. |
| Subjects: | Parallel programs (Computer programs), SIMD (Computer architecture), Multicore processors, Recursive programming, Intel computers |
| Abstract: | The aim of this paper is to evaluate OpenMP, TBB and Cilk Plus as basic language-based tools for simple and efficient parallelization of recursively defined computational problems and other problems that need both task and data parallelization techniques. We show how to use these models of parallel programming to transform a source code of |
| Copyright of Journal of Supercomputing is the property of Springer Nature and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.) | |
| Database: | Engineering Source |
|
Full text is not displayed to guests.
Login for full access.
|
|
| Abstract: | The aim of this paper is to evaluate OpenMP, TBB and Cilk Plus as basic language-based tools for simple and efficient parallelization of recursively defined computational problems and other problems that need both task and data parallelization techniques. We show how to use these models of parallel programming to transform a source code of <italic>Adaptive Simpson’s Integration</italic> to programs that can utilize multiple cores of modern processors. Using the example of <italic>Belman-Ford algorithm</italic> for solving single-source shortest path problems, we advise how to improve performance of data parallel algorithms by tuning data structures for better utilization of vector extensions of modern processors. Manual vectorization techniques based on Cilk array notation and intrinsics are presented. We also show how to simplify such optimization using Intel SIMD Data Layout Template containers. [ABSTRACT FROM AUTHOR] |
|---|---|
| ISSN: | 09208542 |
| DOI: | 10.1007/s11227-017-2231-3 |