Scheduling of Big Data Workflows in the Hadoop Framework with Heterogeneous Computing Cluster.

Saved in:
Bibliographic Details
Title: Scheduling of Big Data Workflows in the Hadoop Framework with Heterogeneous Computing Cluster.
Authors: Rahmani, Amir Masoud1 (AUTHOR) rahmania@yuntech.edu.tw, Chamzini, Ehsan Yazdani2,3 (AUTHOR) Ehsan.yazdani@sco.iaun.ac.ir, pourshaban, Mohsen2,3 (AUTHOR) Pourshaban@sco.iaun.ac.ir, Hosseinzadeh, Mehdi4,5 (AUTHOR) mehdihosseinzadeh@duytan.edu.vn
Source: Arabian Journal for Science & Engineering (Springer Science & Business Media B.V. ). Aug2025, Vol. 50 Issue 15, p12449-12461. 13p.
Subjects: Big data, Heterogeneous computing, Cloud computing, Workflow management systems, Load balancing (Computer networks), Resource allocation, Scheduling
Abstract: Recently, resource allocation in cloud computing has become a popular research topic. Hi-WAY is a scientific workflow management system that facilitates workflows involving large-scale inputs such as big data. Hadoop, a framework designed to implement distributed systems, allows Hi-WAY to be run on thousands of computing nodes with desirable fault tolerance. Task scheduling is not difficult in a homogeneous Hadoop system, where computing nodes have identical specifications. However, task scheduling could be problematic in heterogeneous systems, where specifications such as processor power, memory, and bandwidth may vary from node to node. This paper introduces a workflow scheduler on the Hadoop framework (WSH), accounting for system heterogeneity when scheduling computing- and IO-intensive jobs. WSH uses a training task to collect information before distributing jobs. The results demonstrate effective job allocation and load balancing improvement in Hadoop, leading to increased resource efficiency and reduced makespan. Based on various experiments and the use of different workflows, the proposed method improves the scheduling length ratio by 42%, reduces makespan by 20%, and enhances speedup by approximately 37% compared to the algorithm. [ABSTRACT FROM AUTHOR]
Copyright of Arabian Journal for Science & Engineering (Springer Science & Business Media B.V. ) is the property of Springer Nature and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Database: Engineering Source
Full text is not displayed to guests.
FullText Links:
  – Type: pdflink
Text:
  Availability: 1
Header DbId: egs
DbLabel: Engineering Source
An: 187091477
AccessLevel: 6
PubType: Academic Journal
PubTypeId: academicJournal
PreciseRelevancyScore: 0
IllustrationInfo
Items – Name: Title
  Label: Title
  Group: Ti
  Data: Scheduling of Big Data Workflows in the Hadoop Framework with Heterogeneous Computing Cluster.
– Name: Author
  Label: Authors
  Group: Au
  Data: <searchLink fieldCode="AR" term="%22Rahmani%2C+Amir+Masoud%22">Rahmani, Amir Masoud</searchLink><relatesTo>1</relatesTo> (AUTHOR)<i> rahmania@yuntech.edu.tw</i><br /><searchLink fieldCode="AR" term="%22Chamzini%2C+Ehsan+Yazdani%22">Chamzini, Ehsan Yazdani</searchLink><relatesTo>2,3</relatesTo> (AUTHOR)<i> Ehsan.yazdani@sco.iaun.ac.ir</i><br /><searchLink fieldCode="AR" term="%22pourshaban%2C+Mohsen%22">pourshaban, Mohsen</searchLink><relatesTo>2,3</relatesTo> (AUTHOR)<i> Pourshaban@sco.iaun.ac.ir</i><br /><searchLink fieldCode="AR" term="%22Hosseinzadeh%2C+Mehdi%22">Hosseinzadeh, Mehdi</searchLink><relatesTo>4,5</relatesTo> (AUTHOR)<i> mehdihosseinzadeh@duytan.edu.vn</i>
– Name: TitleSource
  Label: Source
  Group: Src
  Data: <searchLink fieldCode="JN" term="%22Arabian+Journal+for+Science+%26+Engineering+%28Springer+Science+%26+Business+Media+B%2EV%2E+%29%22">Arabian Journal for Science & Engineering (Springer Science & Business Media B.V. )</searchLink>. Aug2025, Vol. 50 Issue 15, p12449-12461. 13p.
– Name: Subject
  Label: Subjects
  Group: Su
  Data: <searchLink fieldCode="DE" term="%22Big+data%22">Big data</searchLink><br /><searchLink fieldCode="DE" term="%22Heterogeneous+computing%22">Heterogeneous computing</searchLink><br /><searchLink fieldCode="DE" term="%22Cloud+computing%22">Cloud computing</searchLink><br /><searchLink fieldCode="DE" term="%22Workflow+management+systems%22">Workflow management systems</searchLink><br /><searchLink fieldCode="DE" term="%22Load+balancing+%28Computer+networks%29%22">Load balancing (Computer networks)</searchLink><br /><searchLink fieldCode="DE" term="%22Resource+allocation%22">Resource allocation</searchLink><br /><searchLink fieldCode="DE" term="%22Scheduling%22">Scheduling</searchLink>
– Name: Abstract
  Label: Abstract
  Group: Ab
  Data: Recently, resource allocation in cloud computing has become a popular research topic. Hi-WAY is a scientific workflow management system that facilitates workflows involving large-scale inputs such as big data. Hadoop, a framework designed to implement distributed systems, allows Hi-WAY to be run on thousands of computing nodes with desirable fault tolerance. Task scheduling is not difficult in a homogeneous Hadoop system, where computing nodes have identical specifications. However, task scheduling could be problematic in heterogeneous systems, where specifications such as processor power, memory, and bandwidth may vary from node to node. This paper introduces a workflow scheduler on the Hadoop framework (WSH), accounting for system heterogeneity when scheduling computing- and IO-intensive jobs. WSH uses a training task to collect information before distributing jobs. The results demonstrate effective job allocation and load balancing improvement in Hadoop, leading to increased resource efficiency and reduced makespan. Based on various experiments and the use of different workflows, the proposed method improves the scheduling length ratio by 42%, reduces makespan by 20%, and enhances speedup by approximately 37% compared to the algorithm. [ABSTRACT FROM AUTHOR]
– Name: AbstractSuppliedCopyright
  Label:
  Group: Ab
  Data: <i>Copyright of Arabian Journal for Science & Engineering (Springer Science & Business Media B.V. ) is the property of Springer Nature and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract.</i> (Copyright applies to all Abstracts.)
PLink https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=egs&AN=187091477
RecordInfo BibRecord:
  BibEntity:
    Identifiers:
      – Type: doi
        Value: 10.1007/s13369-024-09779-9
    Languages:
      – Code: eng
        Text: English
    PhysicalDescription:
      Pagination:
        PageCount: 13
        StartPage: 12449
    Subjects:
      – SubjectFull: Big data
        Type: general
      – SubjectFull: Heterogeneous computing
        Type: general
      – SubjectFull: Cloud computing
        Type: general
      – SubjectFull: Workflow management systems
        Type: general
      – SubjectFull: Load balancing (Computer networks)
        Type: general
      – SubjectFull: Resource allocation
        Type: general
      – SubjectFull: Scheduling
        Type: general
    Titles:
      – TitleFull: Scheduling of Big Data Workflows in the Hadoop Framework with Heterogeneous Computing Cluster.
        Type: main
  BibRelationships:
    HasContributorRelationships:
      – PersonEntity:
          Name:
            NameFull: Rahmani, Amir Masoud
      – PersonEntity:
          Name:
            NameFull: Chamzini, Ehsan Yazdani
      – PersonEntity:
          Name:
            NameFull: pourshaban, Mohsen
      – PersonEntity:
          Name:
            NameFull: Hosseinzadeh, Mehdi
    IsPartOfRelationships:
      – BibEntity:
          Dates:
            – D: 01
              M: 08
              Text: Aug2025
              Type: published
              Y: 2025
          Identifiers:
            – Type: issn-print
              Value: 2193567X
          Numbering:
            – Type: volume
              Value: 50
            – Type: issue
              Value: 15
          Titles:
            – TitleFull: Arabian Journal for Science & Engineering (Springer Science & Business Media B.V. )
              Type: main
ResultId 1