ATS-YOLO: A Multi-Scale Small Object Detection Model for Aerial Imagery Based on Context Enhancement and Task Collaboration Mechanisms.

Saved in:

Bibliographic Details
Title:	ATS-YOLO: A Multi-Scale Small Object Detection Model for Aerial Imagery Based on Context Enhancement and Task Collaboration Mechanisms.
Authors:	Zhang, Guoyu¹ 609341993@qq.com, Xu, Yang² 1981@aliyun.com
Source:	IAENG International Journal of Computer Science. Jun2026, Vol. 53 Issue 6, p2284-2294. 11p.
Subjects:	Drone aircraft, Object recognition (Computer vision), Deep learning, Computer vision, Aerial photographs
Abstract:	Unmanned Aerial Vehicles (UAVs) are widely used for applications such as traffic monitoring, smart city management, and disaster response. However, detecting small objects in aerial imagery presents significant challenges, including scale variations, cluttered backgrounds, high object density, and strict computational constraints for onboard deployment. To tackle these issues, we propose ATS-YOLO, a novel multi-scale small object detection framework for aerial imagery that leverages context enhancement and adaptive task collaboration. Our approach includes a lightweight Context-Guided Bilateral Downsampling (CGBD) module that replaces traditional strided convolutions in both the backbone and neck networks. This design minimizes information loss during spatial reduction, preserving essential contextual cues for small object localization. We also introduce a Complementary Multi-Kernel Fusion Module (CMKFM) in the backbone, utilizing a Feature Complementary Mapping (FCM) unit and a Multi-Kernel Perception (MKP) block to enhance feature integration and multi-scale representation learning. By eliminating the P5 detection pyramid level, which provides limited benefits for tiny objects, we streamline the architecture and reduce computational redundancy without sacrificing accuracy. Additionally, our Adaptive Task-Collaborative Detection Head (ATSHead) dynamically balances classification and localization tasks through shared attention mechanisms, enhancing robustness in complex scenarios. Extensive experiments on the VisDrone2019 benchmark show that ATS-YOLO significantly outperforms the baseline YOLOv12s, achieving improvements of 6.0% and 4.7% in mAP0.5 and mAP0.5:0.95, respectively, while reducing model parameters by 40%. In the UAVDT public dataset, we observe gains of 2.8% in mAP0.5 and 1.4% in mAP0.5:0.95, demonstrating substantial performance enhancements. [ABSTRACT FROM AUTHOR]
	Copyright of IAENG International Journal of Computer Science is the property of International Association of Engineers (IAENG) and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Database:	Engineering Source

Description
Abstract:	Unmanned Aerial Vehicles (UAVs) are widely used for applications such as traffic monitoring, smart city management, and disaster response. However, detecting small objects in aerial imagery presents significant challenges, including scale variations, cluttered backgrounds, high object density, and strict computational constraints for onboard deployment. To tackle these issues, we propose ATS-YOLO, a novel multi-scale small object detection framework for aerial imagery that leverages context enhancement and adaptive task collaboration. Our approach includes a lightweight Context-Guided Bilateral Downsampling (CGBD) module that replaces traditional strided convolutions in both the backbone and neck networks. This design minimizes information loss during spatial reduction, preserving essential contextual cues for small object localization. We also introduce a Complementary Multi-Kernel Fusion Module (CMKFM) in the backbone, utilizing a Feature Complementary Mapping (FCM) unit and a Multi-Kernel Perception (MKP) block to enhance feature integration and multi-scale representation learning. By eliminating the P5 detection pyramid level, which provides limited benefits for tiny objects, we streamline the architecture and reduce computational redundancy without sacrificing accuracy. Additionally, our Adaptive Task-Collaborative Detection Head (ATSHead) dynamically balances classification and localization tasks through shared attention mechanisms, enhancing robustness in complex scenarios. Extensive experiments on the VisDrone2019 benchmark show that ATS-YOLO significantly outperforms the baseline YOLOv12s, achieving improvements of 6.0% and 4.7% in mAP0.5 and mAP0.5:0.95, respectively, while reducing model parameters by 40%. In the UAVDT public dataset, we observe gains of 2.8% in mAP0.5 and 1.4% in mAP0.5:0.95, demonstrating substantial performance enhancements. [ABSTRACT FROM AUTHOR]
ISSN:	1819656X