View in EDS HTML Full Text PDF Full Text

MFITN-E2NetGA: a transformer-based multi-level fusion framework for multi-class respiratory disease classification from lung sounds.

Saved in:

Bibliographic Details
Title:	MFITN-E2NetGA: a transformer-based multi-level fusion framework for multi-class respiratory disease classification from lung sounds.
Authors:	Jagadish, Maddirla (AUTHOR), Mohanty, Sachi Nandan (AUTHOR)
Source:	Connection Science. Dec 2025, Vol. 37 Issue 1, p1-35. 35p.
Subjects:	Respiratory diseases, Deep learning, Feature extraction, Physical acoustics, Respiratory organ sounds, Reliability in engineering, Automatic classification, Transformer models
Abstract:	Respiratory diseases are among the most prevalent health challenges worldwide, and timely detection is critical for improving patient outcomes. Auscultation of lung sounds is often the first diagnostic step but heavily relies on the clinician's expertise, which may lead to variability in assessments. Automating this process can enhance diagnostic efficiency and reliability. This study introduces an advanced Artificial Intelligence (AI) approach to improve lung sound classification by extracting relevant acoustic features and learning their relationships with various respiratory conditions. We propose a novel deep-learning pipeline using pulmonary sound data to classify multiple respiratory disorders. Three complementary audio representations—Mel-Frequency Cepstral Coefficients (MFCCs), Mel Spectrograms, and Cochleograms—are employed to capture time-frequency and perceptual characteristics of lung sounds. A Multi-level Feature Integration Transformer Network (MFITN) is developed to efficiently integrate these heterogeneous features through transformer-based attention mechanisms across abstraction layers. The fused representation is processed by our customized classifier, E2Net-GA—an enhanced EfficientNetV2 model augmented with a Global Attention Mechanism (GAM) and Lightweight Attention Network (LAN) modules. On benchmark datasets, the MFITN-E2Net-GA framework achieved superior performance: for the ICBHI−2017 dataset, accuracy of 98.75%, F1-score of 98.35%, precision of 98.10%, specificity of 97.45%, and recall of 98.75%; for another lung sound dataset, accuracy of 98.95%, F1-score of 98.48%, precision of 98.16%, specificity of 99.36%, and recall of 98.90%. By effectively capturing diverse acoustic features, the proposed multimodal approach enhances diagnostic accuracy, supporting early identification of lung diseases and contributing to improved clinical decision-making and patient care. [ABSTRACT FROM AUTHOR]
	Copyright of Connection Science is the property of Taylor & Francis Ltd and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Database:	Psychology and Behavioral Sciences Collection
Full text is not displayed to guests. Login for full access.

Description
Abstract:	Respiratory diseases are among the most prevalent health challenges worldwide, and timely detection is critical for improving patient outcomes. Auscultation of lung sounds is often the first diagnostic step but heavily relies on the clinician's expertise, which may lead to variability in assessments. Automating this process can enhance diagnostic efficiency and reliability. This study introduces an advanced Artificial Intelligence (AI) approach to improve lung sound classification by extracting relevant acoustic features and learning their relationships with various respiratory conditions. We propose a novel deep-learning pipeline using pulmonary sound data to classify multiple respiratory disorders. Three complementary audio representations—Mel-Frequency Cepstral Coefficients (MFCCs), Mel Spectrograms, and Cochleograms—are employed to capture time-frequency and perceptual characteristics of lung sounds. A Multi-level Feature Integration Transformer Network (MFITN) is developed to efficiently integrate these heterogeneous features through transformer-based attention mechanisms across abstraction layers. The fused representation is processed by our customized classifier, E2Net-GA—an enhanced EfficientNetV2 model augmented with a Global Attention Mechanism (GAM) and Lightweight Attention Network (LAN) modules. On benchmark datasets, the MFITN-E2Net-GA framework achieved superior performance: for the ICBHI−2017 dataset, accuracy of 98.75%, F1-score of 98.35%, precision of 98.10%, specificity of 97.45%, and recall of 98.75%; for another lung sound dataset, accuracy of 98.95%, F1-score of 98.48%, precision of 98.16%, specificity of 99.36%, and recall of 98.90%. By effectively capturing diverse acoustic features, the proposed multimodal approach enhances diagnostic accuracy, supporting early identification of lung diseases and contributing to improved clinical decision-making and patient care. [ABSTRACT FROM AUTHOR]
ISSN:	09540091
DOI:	10.1080/09540091.2025.2587472