View in EDS HTML Full Text PDF Full Text

Scheduling of Big Data Workflows in the Hadoop Framework with Heterogeneous Computing Cluster.

Saved in:

Bibliographic Details
Title:	Scheduling of Big Data Workflows in the Hadoop Framework with Heterogeneous Computing Cluster.
Authors:	Rahmani, Amir Masoud¹ (AUTHOR) rahmania@yuntech.edu.tw, Chamzini, Ehsan Yazdani^2,3 (AUTHOR) Ehsan.yazdani@sco.iaun.ac.ir, pourshaban, Mohsen^2,3 (AUTHOR) Pourshaban@sco.iaun.ac.ir, Hosseinzadeh, Mehdi^4,5 (AUTHOR) mehdihosseinzadeh@duytan.edu.vn
Source:	Arabian Journal for Science & Engineering (Springer Science & Business Media B.V. ). Aug2025, Vol. 50 Issue 15, p12449-12461. 13p.
Subjects:	Big data, Heterogeneous computing, Cloud computing, Workflow management systems, Load balancing (Computer networks), Resource allocation, Scheduling
Abstract:	Recently, resource allocation in cloud computing has become a popular research topic. Hi-WAY is a scientific workflow management system that facilitates workflows involving large-scale inputs such as big data. Hadoop, a framework designed to implement distributed systems, allows Hi-WAY to be run on thousands of computing nodes with desirable fault tolerance. Task scheduling is not difficult in a homogeneous Hadoop system, where computing nodes have identical specifications. However, task scheduling could be problematic in heterogeneous systems, where specifications such as processor power, memory, and bandwidth may vary from node to node. This paper introduces a workflow scheduler on the Hadoop framework (WSH), accounting for system heterogeneity when scheduling computing- and IO-intensive jobs. WSH uses a training task to collect information before distributing jobs. The results demonstrate effective job allocation and load balancing improvement in Hadoop, leading to increased resource efficiency and reduced makespan. Based on various experiments and the use of different workflows, the proposed method improves the scheduling length ratio by 42%, reduces makespan by 20%, and enhances speedup by approximately 37% compared to the algorithm. [ABSTRACT FROM AUTHOR]
	Copyright of Arabian Journal for Science & Engineering (Springer Science & Business Media B.V. ) is the property of Springer Nature and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Database:	Engineering Source
Full text is not displayed to guests. Login for full access.

Description
Abstract:	Recently, resource allocation in cloud computing has become a popular research topic. Hi-WAY is a scientific workflow management system that facilitates workflows involving large-scale inputs such as big data. Hadoop, a framework designed to implement distributed systems, allows Hi-WAY to be run on thousands of computing nodes with desirable fault tolerance. Task scheduling is not difficult in a homogeneous Hadoop system, where computing nodes have identical specifications. However, task scheduling could be problematic in heterogeneous systems, where specifications such as processor power, memory, and bandwidth may vary from node to node. This paper introduces a workflow scheduler on the Hadoop framework (WSH), accounting for system heterogeneity when scheduling computing- and IO-intensive jobs. WSH uses a training task to collect information before distributing jobs. The results demonstrate effective job allocation and load balancing improvement in Hadoop, leading to increased resource efficiency and reduced makespan. Based on various experiments and the use of different workflows, the proposed method improves the scheduling length ratio by 42%, reduces makespan by 20%, and enhances speedup by approximately 37% compared to the algorithm. [ABSTRACT FROM AUTHOR]
ISSN:	2193567X
DOI:	10.1007/s13369-024-09779-9