Study on Data Placement Strategies in Distributed RDF Stores

Saved in:
Bibliographic Details
Title: Study on Data Placement Strategies in Distributed RDF Stores
Description: The distributed setting of RDF stores in the cloud poses many challenges, including how to optimize data placement on the compute nodes to improve query performance. In this book, a novel benchmarking methodology is developed for data placement strategies; one that overcomes these limitations by using a data-placement-strategy-independent distributed RDF store to analyze the effect of the data placement strategies on query performance. Frequently used data placement strategies have been evaluated, and this evaluation challenges the commonly held belief that data placement strategies which emphasize local computation lead to faster query executions. Indeed, results indicate that queries with a high workload can be executed faster on hash-based data placement strategies than on, for example, minimal edge-cut covers. The analysis of additional measurements indicates that vertical parallelization (i.e., a well-distributed workload) may be more important than horizontal containment (i.e., minimal data transport) for efficient query processing. Two such data placement strategies are proposed: the first, found in the literature, is entitled overpartitioned minimal edge-cut cover, and the second is the newly developed molecule hash cover. Evaluation revealed a balanced query workload and a high horizontal containment, which lead to a high vertical parallelization. As a result, these strategies demonstrated better query performance than other frequently used data placement strategies. The book also tests the hypothesis that collocating small connected triple sets on the same compute node while balancing the amount of triples stored on the different compute nodes leads to a high vertical parallelization.
Authors: Daniel Dominik Janke
Resource Type: eBook.
Subjects: Electronic data processing--Distributed processing
Categories: COMPUTERS / Artificial Intelligence / General
Database: eBook Collection (EBSCOhost)
FullText Links:
  – Type: ebook-pdf
Text:
  Availability: 0
Header DbId: nlebk
DbLabel: eBook Collection (EBSCOhost)
An: 2401169
RelevancyScore: 1097
AccessLevel: 6
PubType: eBook
PubTypeId: ebook
PreciseRelevancyScore: 1096.64697265625
IllustrationInfo
ImageInfo – Size: thumb
  Target: https://rps2images.ebscohost.com/rpsweb/othumb?id=NL$2401169$PDF&s=r
– Size: medium
  Target: https://rps2images.ebscohost.com/rpsweb/othumb?id=NL$2401169$PDF&s=d
Items – Name: Title
  Label: Title
  Group: Ti
  Data: Study on Data Placement Strategies in Distributed RDF Stores
– Name: Abstract
  Label: Description
  Group: Ab
  Data: The distributed setting of RDF stores in the cloud poses many challenges, including how to optimize data placement on the compute nodes to improve query performance. In this book, a novel benchmarking methodology is developed for data placement strategies; one that overcomes these limitations by using a data-placement-strategy-independent distributed RDF store to analyze the effect of the data placement strategies on query performance. Frequently used data placement strategies have been evaluated, and this evaluation challenges the commonly held belief that data placement strategies which emphasize local computation lead to faster query executions. Indeed, results indicate that queries with a high workload can be executed faster on hash-based data placement strategies than on, for example, minimal edge-cut covers. The analysis of additional measurements indicates that vertical parallelization (i.e., a well-distributed workload) may be more important than horizontal containment (i.e., minimal data transport) for efficient query processing. Two such data placement strategies are proposed: the first, found in the literature, is entitled overpartitioned minimal edge-cut cover, and the second is the newly developed molecule hash cover. Evaluation revealed a balanced query workload and a high horizontal containment, which lead to a high vertical parallelization. As a result, these strategies demonstrated better query performance than other frequently used data placement strategies. The book also tests the hypothesis that collocating small connected triple sets on the same compute node while balancing the amount of triples stored on the different compute nodes leads to a high vertical parallelization.
– Name: Author
  Label: Authors
  Group: Au
  Data: <searchLink fieldCode="AR" term="%22Daniel+Dominik+Janke%22">Daniel Dominik Janke</searchLink>
– Name: TypePub
  Label: Resource Type
  Group: TypPub
  Data: eBook.
– Name: Subject
  Label: Subjects
  Group: Su
  Data: <searchLink fieldCode="DE" term="%22Electronic+data+processing--Distributed+processing%22">Electronic data processing--Distributed processing</searchLink>
– Name: SubjectBISAC
  Label: Categories
  Group: Su
  Data: <searchLink fieldCode="ZK" term="%22COMPUTERS+%2F+Artificial+Intelligence+%2F+General%22">COMPUTERS / Artificial Intelligence / General</searchLink>
PLink https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=nlebk&AN=2401169
RecordInfo BibRecord:
  BibEntity:
    Classifications:
      – Code: 004.36
        Scheme: ddc
        Type: prePub
    Languages:
      – Code: eng
        Text: English
    Subjects:
      – SubjectFull: Electronic data processing--Distributed processing
        Type: general
    Titles:
      – TitleFull: Study on Data Placement Strategies in Distributed RDF Stores
        Type: main
  BibRelationships:
    HasContributorRelationships:
      – PersonEntity:
          Name:
            NameFull: Daniel Dominik Janke
      – PersonEntity:
          Name:
            NameFull: Daniel Dominik Janke
    IsPartOfRelationships:
      – BibEntity:
          Dates:
            – D: 01
              M: 01
              Type: published
              Y: 2020
            – D: 26
              M: 03
              Type: profile
              Y: 2020
          Identifiers:
            – Type: isbn-print
              Value: 9781643680682
            – Type: isbn-electronic
              Value: 9781643680699
          Titles:
            – TitleFull: Study on Data Placement Strategies in Distributed RDF Stores
              Type: main
ResultId 1