Study on Data Placement Strategies in Distributed RDF Stores
Saved in:
| Title: | Study on Data Placement Strategies in Distributed RDF Stores |
|---|---|
| Description: | The distributed setting of RDF stores in the cloud poses many challenges, including how to optimize data placement on the compute nodes to improve query performance. In this book, a novel benchmarking methodology is developed for data placement strategies; one that overcomes these limitations by using a data-placement-strategy-independent distributed RDF store to analyze the effect of the data placement strategies on query performance. Frequently used data placement strategies have been evaluated, and this evaluation challenges the commonly held belief that data placement strategies which emphasize local computation lead to faster query executions. Indeed, results indicate that queries with a high workload can be executed faster on hash-based data placement strategies than on, for example, minimal edge-cut covers. The analysis of additional measurements indicates that vertical parallelization (i.e., a well-distributed workload) may be more important than horizontal containment (i.e., minimal data transport) for efficient query processing. Two such data placement strategies are proposed: the first, found in the literature, is entitled overpartitioned minimal edge-cut cover, and the second is the newly developed molecule hash cover. Evaluation revealed a balanced query workload and a high horizontal containment, which lead to a high vertical parallelization. As a result, these strategies demonstrated better query performance than other frequently used data placement strategies. The book also tests the hypothesis that collocating small connected triple sets on the same compute node while balancing the amount of triples stored on the different compute nodes leads to a high vertical parallelization. |
| Authors: | Daniel Dominik Janke |
| Resource Type: | eBook. |
| Subjects: | Electronic data processing--Distributed processing |
| Categories: | COMPUTERS / Artificial Intelligence / General |
| Database: | eBook Collection (EBSCOhost) |
| FullText | Links: – Type: ebook-pdf Text: Availability: 0 |
|---|---|
| Header | DbId: nlebk DbLabel: eBook Collection (EBSCOhost) An: 2401169 RelevancyScore: 1097 AccessLevel: 6 PubType: eBook PubTypeId: ebook PreciseRelevancyScore: 1096.64697265625 |
| IllustrationInfo | |
| ImageInfo | – Size: thumb Target: https://rps2images.ebscohost.com/rpsweb/othumb?id=NL$2401169$PDF&s=r – Size: medium Target: https://rps2images.ebscohost.com/rpsweb/othumb?id=NL$2401169$PDF&s=d |
| Items | – Name: Title Label: Title Group: Ti Data: Study on Data Placement Strategies in Distributed RDF Stores – Name: Abstract Label: Description Group: Ab Data: The distributed setting of RDF stores in the cloud poses many challenges, including how to optimize data placement on the compute nodes to improve query performance. In this book, a novel benchmarking methodology is developed for data placement strategies; one that overcomes these limitations by using a data-placement-strategy-independent distributed RDF store to analyze the effect of the data placement strategies on query performance. Frequently used data placement strategies have been evaluated, and this evaluation challenges the commonly held belief that data placement strategies which emphasize local computation lead to faster query executions. Indeed, results indicate that queries with a high workload can be executed faster on hash-based data placement strategies than on, for example, minimal edge-cut covers. The analysis of additional measurements indicates that vertical parallelization (i.e., a well-distributed workload) may be more important than horizontal containment (i.e., minimal data transport) for efficient query processing. Two such data placement strategies are proposed: the first, found in the literature, is entitled overpartitioned minimal edge-cut cover, and the second is the newly developed molecule hash cover. Evaluation revealed a balanced query workload and a high horizontal containment, which lead to a high vertical parallelization. As a result, these strategies demonstrated better query performance than other frequently used data placement strategies. The book also tests the hypothesis that collocating small connected triple sets on the same compute node while balancing the amount of triples stored on the different compute nodes leads to a high vertical parallelization. – Name: Author Label: Authors Group: Au Data: <searchLink fieldCode="AR" term="%22Daniel+Dominik+Janke%22">Daniel Dominik Janke</searchLink> – Name: TypePub Label: Resource Type Group: TypPub Data: eBook. – Name: Subject Label: Subjects Group: Su Data: <searchLink fieldCode="DE" term="%22Electronic+data+processing--Distributed+processing%22">Electronic data processing--Distributed processing</searchLink> – Name: SubjectBISAC Label: Categories Group: Su Data: <searchLink fieldCode="ZK" term="%22COMPUTERS+%2F+Artificial+Intelligence+%2F+General%22">COMPUTERS / Artificial Intelligence / General</searchLink> |
| PLink | https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=nlebk&AN=2401169 |
| RecordInfo | BibRecord: BibEntity: Classifications: – Code: 004.36 Scheme: ddc Type: prePub Languages: – Code: eng Text: English Subjects: – SubjectFull: Electronic data processing--Distributed processing Type: general Titles: – TitleFull: Study on Data Placement Strategies in Distributed RDF Stores Type: main BibRelationships: HasContributorRelationships: – PersonEntity: Name: NameFull: Daniel Dominik Janke – PersonEntity: Name: NameFull: Daniel Dominik Janke IsPartOfRelationships: – BibEntity: Dates: – D: 01 M: 01 Type: published Y: 2020 – D: 26 M: 03 Type: profile Y: 2020 Identifiers: – Type: isbn-print Value: 9781643680682 – Type: isbn-electronic Value: 9781643680699 Titles: – TitleFull: Study on Data Placement Strategies in Distributed RDF Stores Type: main |
| ResultId | 1 |