A Learnable Feature Processing Front-End Based Multimodal Fusion Network for SAR Ship Classification.

Saved in:
Bibliographic Details
Title: A Learnable Feature Processing Front-End Based Multimodal Fusion Network for SAR Ship Classification.
Authors: Wang, Bowen1 (AUTHOR), Liu, Liguo1 (AUTHOR) 1309021015@nue.edu.cn, Zhang, Qingyi (AUTHOR)
Source: Remote Sensing. May2026, Vol. 18 Issue 10, p1610. 23p.
Subjects: Feature extraction, Texture analysis (Image processing), Multisensor data fusion, Geospatial data
Abstract: Highlights: What are the main findings? A learnable feature preprocessing front-end is proposed to adaptively integrate scattering and texture features from dual-polarization SAR images into an end-to-end trainable framework. A bidirectional cross-attention mechanism is designed to deeply fuse SAR visual features with AIS geometric information, enabling effective cross-modal guidance and complementarity. An improved texture feature extraction method is introduced, which models local pixel differences and similarities to capture more discriminative texture representations. What is the implication of the main finding? The method achieves state-of-the-art classification accuracy on the OpenSARShip 2.0 dataset (89.03% for three-class and 71.43% for six-class), demonstrating its superiority over existing approaches. The learnable preprocessing front-end reduces reliance on handcrafted feature engineering and improves adaptability to different data distributions, while the enhanced texture extraction further enriches fine-grained details, collectively offering a more generalizable solution for SAR ship classification. Ship classification in synthetic aperture radar (SAR) imagery is essential for maritime surveillance but remains challenging due to limited resolution, insufficient textural details, and difficulties in effectively fusing multimodal information. Existing methods either rely on handcrafted features with limited adaptability or employ simplistic fusion strategies that fail to fully exploit the complementary guidance across modalities. To address these issues, we propose a multimodal fusion network based on a learnable feature preprocessing front-end (LFPF-MFN), which integrates polarimetric, textural, and geometric information in an end-to-end learnable manner. Specifically, LFPF-MFN introduces a learnable preprocessing front-end to embed scattering and enhanced textural features. Meanwhile, geometric information from the Automatic Identification System (AIS) is incorporated through textual embedding, and effective multimodal fusion is achieved via a bidirectional cross-attention mechanism. Extensive experiments on the OpenSARShip 2.0 dataset demonstrate that the proposed method achieves state-of-the-art performance in both three-class and six-class classification tasks, validating the effectiveness of each designed module and the superiority of the multimodal fusion strategy. [ABSTRACT FROM AUTHOR]
Copyright of Remote Sensing is the property of MDPI and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Database: Engineering Source
Full text is not displayed to guests.
Description
Abstract:Highlights: What are the main findings? A learnable feature preprocessing front-end is proposed to adaptively integrate scattering and texture features from dual-polarization SAR images into an end-to-end trainable framework. A bidirectional cross-attention mechanism is designed to deeply fuse SAR visual features with AIS geometric information, enabling effective cross-modal guidance and complementarity. An improved texture feature extraction method is introduced, which models local pixel differences and similarities to capture more discriminative texture representations. What is the implication of the main finding? The method achieves state-of-the-art classification accuracy on the OpenSARShip 2.0 dataset (89.03% for three-class and 71.43% for six-class), demonstrating its superiority over existing approaches. The learnable preprocessing front-end reduces reliance on handcrafted feature engineering and improves adaptability to different data distributions, while the enhanced texture extraction further enriches fine-grained details, collectively offering a more generalizable solution for SAR ship classification. Ship classification in synthetic aperture radar (SAR) imagery is essential for maritime surveillance but remains challenging due to limited resolution, insufficient textural details, and difficulties in effectively fusing multimodal information. Existing methods either rely on handcrafted features with limited adaptability or employ simplistic fusion strategies that fail to fully exploit the complementary guidance across modalities. To address these issues, we propose a multimodal fusion network based on a learnable feature preprocessing front-end (LFPF-MFN), which integrates polarimetric, textural, and geometric information in an end-to-end learnable manner. Specifically, LFPF-MFN introduces a learnable preprocessing front-end to embed scattering and enhanced textural features. Meanwhile, geometric information from the Automatic Identification System (AIS) is incorporated through textual embedding, and effective multimodal fusion is achieved via a bidirectional cross-attention mechanism. Extensive experiments on the OpenSARShip 2.0 dataset demonstrate that the proposed method achieves state-of-the-art performance in both three-class and six-class classification tasks, validating the effectiveness of each designed module and the superiority of the multimodal fusion strategy. [ABSTRACT FROM AUTHOR]
ISSN:20724292
DOI:10.3390/rs18101610