Multi-Level Attention and Scale-Aware Fusion for Remote Sensing Scene Object Detection.

Saved in:

Bibliographic Details
Title:	Multi-Level Attention and Scale-Aware Fusion for Remote Sensing Scene Object Detection.
Authors:	Li, Yanling¹ liyanling@xynu.edu.cn, Li, Jiaman¹ jiaman0813@163.com, Yang, Zhipeng¹ yangzp@xynu.edu.cn, Chen, Chongyang¹ cychen@xynu.edu.cn
Source:	Engineering Letters. May2026, Vol. 34 Issue 5, p1506-1523. 18p.
Subjects:	Remote sensing, Object recognition (Computer vision), Data fusion (Statistics), Image analysis, Computer vision, Deep learning
Abstract:	Object detection plays a pivotal role in intelligent remote sensing image interpretation, with critical applications spanning national defense, security, and smart city development. However, two fundamental challenges persist: complex background interference and significant object scale variations, both severely degrading detection performance. A novel remote sensing object detection method, denoted as MASF, is proposed in this work. The framework consists of three core components: a backbone network, a neck network, and a detection head. To address background interference, we incorporated a Dynamic Bottleneck Module (DBM) into the backbone network. The DBM's core component is a Star Attention Block. This module significantly improves the model's target localization capability in complex scenes by modeling long-range dependencies across regions. A Multi-Kernel Feature Diffusion Pyramid Network is proposed in the neck network to handle multi-scale objects. This architecture utilizes hierarchical feature interactions and a dedicated FocusFeature module to adaptively aggregate features, thereby improving multi-scale detection accuracy. While preserving high-resolution details and semantic consistency, this module enhances the model's recognition accuracy for objects of varying scales by vertically diffusing information across resolutions and hierarchical levels. The detection head is responsible for final object classification and localization. Comprehensive evaluations conducted on the demanding remote sensing benchmark datasets, VisDrone2019-DET and NWPU VHR-10, confirm the efficacy of the proposed methodology. [ABSTRACT FROM AUTHOR]
	Copyright of Engineering Letters is the property of International Association of Engineers (IAENG) and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Database:	Engineering Source

Description
Abstract:	Object detection plays a pivotal role in intelligent remote sensing image interpretation, with critical applications spanning national defense, security, and smart city development. However, two fundamental challenges persist: complex background interference and significant object scale variations, both severely degrading detection performance. A novel remote sensing object detection method, denoted as MASF, is proposed in this work. The framework consists of three core components: a backbone network, a neck network, and a detection head. To address background interference, we incorporated a Dynamic Bottleneck Module (DBM) into the backbone network. The DBM's core component is a Star Attention Block. This module significantly improves the model's target localization capability in complex scenes by modeling long-range dependencies across regions. A Multi-Kernel Feature Diffusion Pyramid Network is proposed in the neck network to handle multi-scale objects. This architecture utilizes hierarchical feature interactions and a dedicated FocusFeature module to adaptively aggregate features, thereby improving multi-scale detection accuracy. While preserving high-resolution details and semantic consistency, this module enhances the model's recognition accuracy for objects of varying scales by vertically diffusing information across resolutions and hierarchical levels. The detection head is responsible for final object classification and localization. Comprehensive evaluations conducted on the demanding remote sensing benchmark datasets, VisDrone2019-DET and NWPU VHR-10, confirm the efficacy of the proposed methodology. [ABSTRACT FROM AUTHOR]
ISSN:	1816093X