任务提示融合的端到端视觉多任务学习模型*.

Saved in:

Bibliographic Details
Title:	任务提示融合的端到端视觉多任务学习模型*.
Alternate Title:	An end-to-end visual multi-task learning model for task prompts fusion.
Authors:	耿焕同¹ htgeng@nuist.edu.cn, 范子辰² 202212210032@nuist.edu.cn, 蒋骏¹, 刘振宇¹, 李嘉兴¹
Source:	Computer Engineering & Science / Jisuanji Gongcheng yu Kexue. Mar2026, Vol. 48 Issue 3, p456-466. 11p.
Subjects:	Encoding, Feature extraction, Artificial intelligence
Abstract (English):	To address the issues of separated network structures and inter-task interference in existing visual multi-task learning models, an end-to-end visual multi-task learning model based on triple feature embedding and task prompt fusion is proposed. During the image embedding and encoding phase, three distinct encoding modules are employed to capture the original three types of features from the image, fully preserving global, local, and contour features. This enriches the structure and semantic information of the embedding vectors, enabling the model to access image information across different feature dimensions. In the feature extraction phase, to achieve unified end-to-end learning for general tasks, task-specific learning, and cross-task interactions, spatial-channel prompt learning modules and prompt fusion modules are utilized to extract salient features, trends, and raw information from both the image and task prompts. This enhances the expressiveness and guiding capabilities of the task prompts, allowing for more comprehensive extraction of global and local features from both the image and task prompts. Experimental results show that, compared to single-task state-of-the-art (SOTA) models, the evaluation metrics for mDS and RMSE improve by 3.36 percentage points and 2.41 percentage points, respectively. Compared to multi-task SOTA models, these metrics improve by 1.69 percentage points and 0.32 percentage points, respectively, with mIOU improving by 0.99 percentage points. This provides a novel solution for multi-task learning. [ABSTRACT FROM AUTHOR]
Abstract (Chinese):	针对现有视觉多任务学习模型中网络结构分离和任务间相互干扰的问题,提出了一种基于三重特征嵌入和任务提示融合的端到端多任务学习模型。在图像嵌入编码阶段,通过采用3组不同的编码模块以捕获图像原始的3种特征,充分保留图像的全局、局部以及轮廓特征,丰富嵌入编码向量结构和语义信息,使得模型可以获取不同特征维度的图像信息。在特征提取阶段,为实现端到端统一的任务通用学习、任务特定学习以及跨任务交互,使用空间-通道提示学习模块和提示融合模块提取图像和任务提示的显著特征、趋势以及原始信息,增强任务提示的表达能力和提示能力,更充分地提取图像和任务提示的全局和局部特征。实验结果表明,与单任务SOTA 模型相比,mDS 以及RMSE 指标分别提高了3.36个百分点和2.41个百分点;而与多任务SOTA 模型相比,以上2个指标分别提高了1.69个百分点和0.32个百分点,mIOU 提高了0.99个百分点,为多任务学习提供了新的解决方法. [ABSTRACT FROM AUTHOR]
	Copyright of Computer Engineering & Science / Jisuanji Gongcheng yu Kexue is the property of Computer Engineering & Science and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Database:	Engineering Source

Be the first to leave a comment!