A MapReduce solution for associative classification of big data.

Saved in:
Bibliographic Details
Title: A MapReduce solution for associative classification of big data.
Authors: Bechini, Alessio1, Marcelloni, Francesco1 f.marcelloni@iet.unipi.it, Segatori, Armando1
Source: Information Sciences. Mar2016, Vol. 332, p33-55. 23p.
Subjects: Big data, Algorithms, Data mining, Computer workstation clusters, Accuracy
Abstract: Associative classifiers have proven to be very effective in classification problems. Unfortunately, the algorithms used for learning these classifiers are not able to adequately manage big data because of time complexity and memory constraints. To overcome such drawbacks, we propose a distributed association rule-based classification scheme shaped according to the MapReduce programming model. The scheme mines classification association rules (CARs) using a properly enhanced, distributed version of the well-known FP-Growth algorithm. Once CARs have been mined, the proposed scheme performs a distributed rule pruning. The set of survived CARs is used to classify unlabeled patterns. The memory usage and time complexity for each phase of the learning process are discussed, and the scheme is evaluated on seven real-world big datasets on the Hadoop framework, characterizing its scalability and achievable speedup on small computer clusters. The proposed solution for associative classifiers turns to be suitable to practically address big datasets even with modest hardware support. Comparisons with two state-of-the-art distributed learning algorithms are also discussed in terms of accuracy, model complexity, and computation time. [ABSTRACT FROM AUTHOR]
Copyright of Information Sciences is the property of Elsevier B.V. and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Database: Engineering Source
FullText Text:
  Availability: 0
Header DbId: egs
DbLabel: Engineering Source
An: 111321489
AccessLevel: 6
PubType: Periodical
PubTypeId: serialPeriodical
PreciseRelevancyScore: 0
IllustrationInfo
Items – Name: Title
  Label: Title
  Group: Ti
  Data: A MapReduce solution for associative classification of big data.
– Name: Author
  Label: Authors
  Group: Au
  Data: <searchLink fieldCode="AR" term="%22Bechini%2C+Alessio%22">Bechini, Alessio</searchLink><relatesTo>1</relatesTo><br /><searchLink fieldCode="AR" term="%22Marcelloni%2C+Francesco%22">Marcelloni, Francesco</searchLink><relatesTo>1</relatesTo><i> f.marcelloni@iet.unipi.it</i><br /><searchLink fieldCode="AR" term="%22Segatori%2C+Armando%22">Segatori, Armando</searchLink><relatesTo>1</relatesTo>
– Name: TitleSource
  Label: Source
  Group: Src
  Data: <searchLink fieldCode="JN" term="%22Information+Sciences%22">Information Sciences</searchLink>. Mar2016, Vol. 332, p33-55. 23p.
– Name: Subject
  Label: Subjects
  Group: Su
  Data: <searchLink fieldCode="DE" term="%22Big+data%22">Big data</searchLink><br /><searchLink fieldCode="DE" term="%22Algorithms%22">Algorithms</searchLink><br /><searchLink fieldCode="DE" term="%22Data+mining%22">Data mining</searchLink><br /><searchLink fieldCode="DE" term="%22Computer+workstation+clusters%22">Computer workstation clusters</searchLink><br /><searchLink fieldCode="DE" term="%22Accuracy%22">Accuracy</searchLink>
– Name: Abstract
  Label: Abstract
  Group: Ab
  Data: Associative classifiers have proven to be very effective in classification problems. Unfortunately, the algorithms used for learning these classifiers are not able to adequately manage big data because of time complexity and memory constraints. To overcome such drawbacks, we propose a distributed association rule-based classification scheme shaped according to the MapReduce programming model. The scheme mines classification association rules (CARs) using a properly enhanced, distributed version of the well-known FP-Growth algorithm. Once CARs have been mined, the proposed scheme performs a distributed rule pruning. The set of survived CARs is used to classify unlabeled patterns. The memory usage and time complexity for each phase of the learning process are discussed, and the scheme is evaluated on seven real-world big datasets on the Hadoop framework, characterizing its scalability and achievable speedup on small computer clusters. The proposed solution for associative classifiers turns to be suitable to practically address big datasets even with modest hardware support. Comparisons with two state-of-the-art distributed learning algorithms are also discussed in terms of accuracy, model complexity, and computation time. [ABSTRACT FROM AUTHOR]
– Name: AbstractSuppliedCopyright
  Label:
  Group: Ab
  Data: <i>Copyright of Information Sciences is the property of Elsevier B.V. and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract.</i> (Copyright applies to all Abstracts.)
PLink https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=egs&AN=111321489
RecordInfo BibRecord:
  BibEntity:
    Identifiers:
      – Type: doi
        Value: 10.1016/j.ins.2015.10.041
    Languages:
      – Code: eng
        Text: English
    PhysicalDescription:
      Pagination:
        PageCount: 23
        StartPage: 33
    Subjects:
      – SubjectFull: Big data
        Type: general
      – SubjectFull: Algorithms
        Type: general
      – SubjectFull: Data mining
        Type: general
      – SubjectFull: Computer workstation clusters
        Type: general
      – SubjectFull: Accuracy
        Type: general
    Titles:
      – TitleFull: A MapReduce solution for associative classification of big data.
        Type: main
  BibRelationships:
    HasContributorRelationships:
      – PersonEntity:
          Name:
            NameFull: Bechini, Alessio
      – PersonEntity:
          Name:
            NameFull: Marcelloni, Francesco
      – PersonEntity:
          Name:
            NameFull: Segatori, Armando
    IsPartOfRelationships:
      – BibEntity:
          Dates:
            – D: 01
              M: 03
              Text: Mar2016
              Type: published
              Y: 2016
          Identifiers:
            – Type: issn-print
              Value: 00200255
          Numbering:
            – Type: volume
              Value: 332
          Titles:
            – TitleFull: Information Sciences
              Type: main
ResultId 1