A MapReduce solution for associative classification of big data.
Saved in:
| Title: | A MapReduce solution for associative classification of big data. |
|---|---|
| Authors: | Bechini, Alessio1, Marcelloni, Francesco1 f.marcelloni@iet.unipi.it, Segatori, Armando1 |
| Source: | Information Sciences. Mar2016, Vol. 332, p33-55. 23p. |
| Subjects: | Big data, Algorithms, Data mining, Computer workstation clusters, Accuracy |
| Abstract: | Associative classifiers have proven to be very effective in classification problems. Unfortunately, the algorithms used for learning these classifiers are not able to adequately manage big data because of time complexity and memory constraints. To overcome such drawbacks, we propose a distributed association rule-based classification scheme shaped according to the MapReduce programming model. The scheme mines classification association rules (CARs) using a properly enhanced, distributed version of the well-known FP-Growth algorithm. Once CARs have been mined, the proposed scheme performs a distributed rule pruning. The set of survived CARs is used to classify unlabeled patterns. The memory usage and time complexity for each phase of the learning process are discussed, and the scheme is evaluated on seven real-world big datasets on the Hadoop framework, characterizing its scalability and achievable speedup on small computer clusters. The proposed solution for associative classifiers turns to be suitable to practically address big datasets even with modest hardware support. Comparisons with two state-of-the-art distributed learning algorithms are also discussed in terms of accuracy, model complexity, and computation time. [ABSTRACT FROM AUTHOR] |
| Copyright of Information Sciences is the property of Elsevier B.V. and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.) | |
| Database: | Engineering Source |
| FullText | Text: Availability: 0 |
|---|---|
| Header | DbId: egs DbLabel: Engineering Source An: 111321489 AccessLevel: 6 PubType: Periodical PubTypeId: serialPeriodical PreciseRelevancyScore: 0 |
| IllustrationInfo | |
| Items | – Name: Title Label: Title Group: Ti Data: A MapReduce solution for associative classification of big data. – Name: Author Label: Authors Group: Au Data: <searchLink fieldCode="AR" term="%22Bechini%2C+Alessio%22">Bechini, Alessio</searchLink><relatesTo>1</relatesTo><br /><searchLink fieldCode="AR" term="%22Marcelloni%2C+Francesco%22">Marcelloni, Francesco</searchLink><relatesTo>1</relatesTo><i> f.marcelloni@iet.unipi.it</i><br /><searchLink fieldCode="AR" term="%22Segatori%2C+Armando%22">Segatori, Armando</searchLink><relatesTo>1</relatesTo> – Name: TitleSource Label: Source Group: Src Data: <searchLink fieldCode="JN" term="%22Information+Sciences%22">Information Sciences</searchLink>. Mar2016, Vol. 332, p33-55. 23p. – Name: Subject Label: Subjects Group: Su Data: <searchLink fieldCode="DE" term="%22Big+data%22">Big data</searchLink><br /><searchLink fieldCode="DE" term="%22Algorithms%22">Algorithms</searchLink><br /><searchLink fieldCode="DE" term="%22Data+mining%22">Data mining</searchLink><br /><searchLink fieldCode="DE" term="%22Computer+workstation+clusters%22">Computer workstation clusters</searchLink><br /><searchLink fieldCode="DE" term="%22Accuracy%22">Accuracy</searchLink> – Name: Abstract Label: Abstract Group: Ab Data: Associative classifiers have proven to be very effective in classification problems. Unfortunately, the algorithms used for learning these classifiers are not able to adequately manage big data because of time complexity and memory constraints. To overcome such drawbacks, we propose a distributed association rule-based classification scheme shaped according to the MapReduce programming model. The scheme mines classification association rules (CARs) using a properly enhanced, distributed version of the well-known FP-Growth algorithm. Once CARs have been mined, the proposed scheme performs a distributed rule pruning. The set of survived CARs is used to classify unlabeled patterns. The memory usage and time complexity for each phase of the learning process are discussed, and the scheme is evaluated on seven real-world big datasets on the Hadoop framework, characterizing its scalability and achievable speedup on small computer clusters. The proposed solution for associative classifiers turns to be suitable to practically address big datasets even with modest hardware support. Comparisons with two state-of-the-art distributed learning algorithms are also discussed in terms of accuracy, model complexity, and computation time. [ABSTRACT FROM AUTHOR] – Name: AbstractSuppliedCopyright Label: Group: Ab Data: <i>Copyright of Information Sciences is the property of Elsevier B.V. and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract.</i> (Copyright applies to all Abstracts.) |
| PLink | https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=egs&AN=111321489 |
| RecordInfo | BibRecord: BibEntity: Identifiers: – Type: doi Value: 10.1016/j.ins.2015.10.041 Languages: – Code: eng Text: English PhysicalDescription: Pagination: PageCount: 23 StartPage: 33 Subjects: – SubjectFull: Big data Type: general – SubjectFull: Algorithms Type: general – SubjectFull: Data mining Type: general – SubjectFull: Computer workstation clusters Type: general – SubjectFull: Accuracy Type: general Titles: – TitleFull: A MapReduce solution for associative classification of big data. Type: main BibRelationships: HasContributorRelationships: – PersonEntity: Name: NameFull: Bechini, Alessio – PersonEntity: Name: NameFull: Marcelloni, Francesco – PersonEntity: Name: NameFull: Segatori, Armando IsPartOfRelationships: – BibEntity: Dates: – D: 01 M: 03 Text: Mar2016 Type: published Y: 2016 Identifiers: – Type: issn-print Value: 00200255 Numbering: – Type: volume Value: 332 Titles: – TitleFull: Information Sciences Type: main |
| ResultId | 1 |