ISODAC: A high performance solution for indexing and searching heterogeneous data.
Saved in:
| Title: | ISODAC: A high performance solution for indexing and searching heterogeneous data. |
|---|---|
| Authors: | Totaro, G.1 totaro@di.uniroma1.it, Bernaschi, M.2 massimo.bernaschi@cnr.it, Carbone, G.1 giancarlo.carbone@uniroma1.it, Cianfriglia, M.2 m.cianfriglia@iac.cnr.it, Di Marco, A.2 a.dimarco@iac.cnr.it |
| Source: | Journal of Systems & Software. Aug2016, Vol. 118, p115-133. 19p. |
| Subjects: | High performance computing, Indexing, Database searching, Computer crimes, Criminal investigation, Computer storage devices, Data extraction |
| Abstract: | Searching for words or sentences within large sets of textual documents can be very challenging unless an index of the data has been created in advance. However, indexing can be very time consuming especially if the text is not readily available and has to be extracted from files stored in different formats. Several solutions, based on the MapReduce paradigm, have been proposed to accelerate the process of index creation. These solutions perform well when data are already distributed across the hosts involved in the elaboration. On the other hand, the cost of distributing data can introduce noticeable overhead. We propose ISODAC, a new approach aimed at improving efficiency without sacrificing reliability. Our solution reduces to the bare minimum the number of I/O operations by using a stream of in-memory operations to extract and index text. We further improve the performance by using GPUs for the most computationally intensive tasks of the indexing procedure. ISODAC indexes heterogeneous documents up to 10.6x faster than other widely adopted solutions, such as Apache Spark. As proof-of-concept, we developed a tool to index forensic disk images that can easily be used by investigators through a web interface. [ABSTRACT FROM AUTHOR] |
| Copyright of Journal of Systems & Software is the property of Elsevier B.V. and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.) | |
| Database: | Engineering Source |
| FullText | Text: Availability: 0 |
|---|---|
| Header | DbId: egs DbLabel: Engineering Source An: 115943755 AccessLevel: 6 PubType: Academic Journal PubTypeId: academicJournal PreciseRelevancyScore: 0 |
| IllustrationInfo | |
| Items | – Name: Title Label: Title Group: Ti Data: ISODAC: A high performance solution for indexing and searching heterogeneous data. – Name: Author Label: Authors Group: Au Data: <searchLink fieldCode="AR" term="%22Totaro%2C+G%2E%22">Totaro, G.</searchLink><relatesTo>1</relatesTo><i> totaro@di.uniroma1.it</i><br /><searchLink fieldCode="AR" term="%22Bernaschi%2C+M%2E%22">Bernaschi, M.</searchLink><relatesTo>2</relatesTo><i> massimo.bernaschi@cnr.it</i><br /><searchLink fieldCode="AR" term="%22Carbone%2C+G%2E%22">Carbone, G.</searchLink><relatesTo>1</relatesTo><i> giancarlo.carbone@uniroma1.it</i><br /><searchLink fieldCode="AR" term="%22Cianfriglia%2C+M%2E%22">Cianfriglia, M.</searchLink><relatesTo>2</relatesTo><i> m.cianfriglia@iac.cnr.it</i><br /><searchLink fieldCode="AR" term="%22Di+Marco%2C+A%2E%22">Di Marco, A.</searchLink><relatesTo>2</relatesTo><i> a.dimarco@iac.cnr.it</i> – Name: TitleSource Label: Source Group: Src Data: <searchLink fieldCode="JN" term="%22Journal+of+Systems+%26+Software%22">Journal of Systems & Software</searchLink>. Aug2016, Vol. 118, p115-133. 19p. – Name: Subject Label: Subjects Group: Su Data: <searchLink fieldCode="DE" term="%22High+performance+computing%22">High performance computing</searchLink><br /><searchLink fieldCode="DE" term="%22Indexing%22">Indexing</searchLink><br /><searchLink fieldCode="DE" term="%22Database+searching%22">Database searching</searchLink><br /><searchLink fieldCode="DE" term="%22Computer+crimes%22">Computer crimes</searchLink><br /><searchLink fieldCode="DE" term="%22Criminal+investigation%22">Criminal investigation</searchLink><br /><searchLink fieldCode="DE" term="%22Computer+storage+devices%22">Computer storage devices</searchLink><br /><searchLink fieldCode="DE" term="%22Data+extraction%22">Data extraction</searchLink> – Name: Abstract Label: Abstract Group: Ab Data: Searching for words or sentences within large sets of textual documents can be very challenging unless an index of the data has been created in advance. However, indexing can be very time consuming especially if the text is not readily available and has to be extracted from files stored in different formats. Several solutions, based on the MapReduce paradigm, have been proposed to accelerate the process of index creation. These solutions perform well when data are already distributed across the hosts involved in the elaboration. On the other hand, the cost of distributing data can introduce noticeable overhead. We propose ISODAC, a new approach aimed at improving efficiency without sacrificing reliability. Our solution reduces to the bare minimum the number of I/O operations by using a stream of in-memory operations to extract and index text. We further improve the performance by using GPUs for the most computationally intensive tasks of the indexing procedure. ISODAC indexes heterogeneous documents up to 10.6x faster than other widely adopted solutions, such as Apache Spark. As proof-of-concept, we developed a tool to index forensic disk images that can easily be used by investigators through a web interface. [ABSTRACT FROM AUTHOR] – Name: AbstractSuppliedCopyright Label: Group: Ab Data: <i>Copyright of Journal of Systems & Software is the property of Elsevier B.V. and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract.</i> (Copyright applies to all Abstracts.) |
| PLink | https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=egs&AN=115943755 |
| RecordInfo | BibRecord: BibEntity: Identifiers: – Type: doi Value: 10.1016/j.jss.2015.11.043 Languages: – Code: eng Text: English PhysicalDescription: Pagination: PageCount: 19 StartPage: 115 Subjects: – SubjectFull: High performance computing Type: general – SubjectFull: Indexing Type: general – SubjectFull: Database searching Type: general – SubjectFull: Computer crimes Type: general – SubjectFull: Criminal investigation Type: general – SubjectFull: Computer storage devices Type: general – SubjectFull: Data extraction Type: general Titles: – TitleFull: ISODAC: A high performance solution for indexing and searching heterogeneous data. Type: main BibRelationships: HasContributorRelationships: – PersonEntity: Name: NameFull: Totaro, G. – PersonEntity: Name: NameFull: Bernaschi, M. – PersonEntity: Name: NameFull: Carbone, G. – PersonEntity: Name: NameFull: Cianfriglia, M. – PersonEntity: Name: NameFull: Di Marco, A. IsPartOfRelationships: – BibEntity: Dates: – D: 01 M: 08 Text: Aug2016 Type: published Y: 2016 Identifiers: – Type: issn-print Value: 01641212 Numbering: – Type: volume Value: 118 Titles: – TitleFull: Journal of Systems & Software Type: main |
| ResultId | 1 |