Predicting GPU Training Energy Consumption in Data Centers Using Task Metadata via Symbolic Regression.

Saved in:
Bibliographic Details
Title: Predicting GPU Training Energy Consumption in Data Centers Using Task Metadata via Symbolic Regression.
Authors: Liao, Xiao1 (AUTHOR), Li, Yiqian1,2 (AUTHOR), Zhang, Shaofeng1 (AUTHOR), Wei, Xianzheng2 (AUTHOR), Hu, Jinlong2 (AUTHOR) jlhu@scut.edu.cn
Source: Energies (19961073). Jan2026, Vol. 19 Issue 2, p448. 25p.
Subject Terms: *Energy consumption, *Graphics processing units, *Energy management, *Nonlinear regression, *Data centers, *Deep learning, *Artificial neural networks
Abstract: With the rapid advancement of artificial intelligence (AI) technology, training deep neural networks has become a core computational task that consumes significant energy in data centers. Researchers often employ various methods to estimate the energy usage of data center clusters or servers to enhance energy management and conservation efforts. However, accurately predicting the energy consumption and carbon footprint of a specific AI task throughout its entire lifecycle before execution remains challenging. In this paper, we explore the energy consumption characteristics of AI model training tasks and propose a simple yet effective method for predicting neural network training energy consumption. This approach leverages training task metadata and applies genetic programming-based symbolic regression to forecast energy consumption prior to executing training tasks, distinguishing it from time series forecasting of data center energy consumption. We have developed an AI training energy consumption environment using the A800 GPU and models from the ResNet{18, 34, 50, 101}, VGG16, MobileNet, ViT, and BERT families to collect data for experimentation and analysis. The experimental analysis of energy consumption reveals that the consumption curve exhibits waveform characteristics resembling square waves, with distinct peaks and valleys. The prediction experiments demonstrate that the proposed method performs well, achieving mean relative errors (MRE) of 2.67% for valley energy, 8.42% for valley duration, 5.16% for peak power, and 3.64% for peak duration. Our findings indicate that, within a specific data center, the energy consumption of AI training tasks follows a predictable pattern. Furthermore, our proposed method enables accurate prediction and calculation of power load before model training begins, without requiring extensive historical energy consumption data. This capability facilitates optimized energy-saving scheduling in data centers in advance, thereby advancing the vision of green AI. [ABSTRACT FROM AUTHOR]
Database: Energy & Power Source
Full text is not displayed to guests.
Description
Abstract:With the rapid advancement of artificial intelligence (AI) technology, training deep neural networks has become a core computational task that consumes significant energy in data centers. Researchers often employ various methods to estimate the energy usage of data center clusters or servers to enhance energy management and conservation efforts. However, accurately predicting the energy consumption and carbon footprint of a specific AI task throughout its entire lifecycle before execution remains challenging. In this paper, we explore the energy consumption characteristics of AI model training tasks and propose a simple yet effective method for predicting neural network training energy consumption. This approach leverages training task metadata and applies genetic programming-based symbolic regression to forecast energy consumption prior to executing training tasks, distinguishing it from time series forecasting of data center energy consumption. We have developed an AI training energy consumption environment using the A800 GPU and models from the ResNet{18, 34, 50, 101}, VGG16, MobileNet, ViT, and BERT families to collect data for experimentation and analysis. The experimental analysis of energy consumption reveals that the consumption curve exhibits waveform characteristics resembling square waves, with distinct peaks and valleys. The prediction experiments demonstrate that the proposed method performs well, achieving mean relative errors (MRE) of 2.67% for valley energy, 8.42% for valley duration, 5.16% for peak power, and 3.64% for peak duration. Our findings indicate that, within a specific data center, the energy consumption of AI training tasks follows a predictable pattern. Furthermore, our proposed method enables accurate prediction and calculation of power load before model training begins, without requiring extensive historical energy consumption data. This capability facilitates optimized energy-saving scheduling in data centers in advance, thereby advancing the vision of green AI. [ABSTRACT FROM AUTHOR]
ISSN:19961073
DOI:10.3390/en19020448