任务提示融合的端到端视觉多任务学习模型*.

Saved in:
Bibliographic Details
Title: 任务提示融合的端到端视觉多任务学习模型*.
Alternate Title: An end-to-end visual multi-task learning model for task prompts fusion.
Authors: 耿焕同1 htgeng@nuist.edu.cn, 范子辰2 202212210032@nuist.edu.cn, 蒋 骏1, 刘振宇1, 李嘉兴1
Source: Computer Engineering & Science / Jisuanji Gongcheng yu Kexue. Mar2026, Vol. 48 Issue 3, p456-466. 11p.
Subjects: Encoding, Feature extraction, Artificial intelligence
Abstract (English): To address the issues of separated network structures and inter-task interference in existing visual multi-task learning models, an end-to-end visual multi-task learning model based on triple feature embedding and task prompt fusion is proposed. During the image embedding and encoding phase, three distinct encoding modules are employed to capture the original three types of features from the image, fully preserving global, local, and contour features. This enriches the structure and semantic information of the embedding vectors, enabling the model to access image information across different feature dimensions. In the feature extraction phase, to achieve unified end-to-end learning for general tasks, task-specific learning, and cross-task interactions, spatial-channel prompt learning modules and prompt fusion modules are utilized to extract salient features, trends, and raw information from both the image and task prompts. This enhances the expressiveness and guiding capabilities of the task prompts, allowing for more comprehensive extraction of global and local features from both the image and task prompts. Experimental results show that, compared to single-task state-of-the-art (SOTA) models, the evaluation metrics for mDS and RMSE improve by 3.36 percentage points and 2.41 percentage points, respectively. Compared to multi-task SOTA models, these metrics improve by 1.69 percentage points and 0.32 percentage points, respectively, with mIOU improving by 0.99 percentage points. This provides a novel solution for multi-task learning. [ABSTRACT FROM AUTHOR]
Abstract (Chinese): 针对现有视觉多任务学习模型中网络结构分离和任务间相互干扰的问题,提出了一种基于三 重特征嵌入和任务提示融合的端到端多任务学习模型。在图像嵌入编码阶段,通过采用3组不同的编码 模块以捕获图像原始的3种特征,充分保留图像的全局、局部以及轮廓特征,丰富嵌入编码向量结构和语 义信息,使得模型可以获取不同特征维度的图像信息。在特征提取阶段,为实现端到端统一的任务通用学 习、任务特定学习以及跨任务交互,使用空间-通道提示学习模块和提示融合模块提取图像和任务提示的 显著特征、趋势以及原始信息,增强任务提示的表达能力和提示能力,更充分地提取图像和任务提示的全 局和局部特征。实验结果表明,与单任务SOTA 模型相比,mDS 以及RMSE 指标分别提高了3.36个百 分点和2.41个百分点;而与多任务SOTA 模型相比,以上2个指标分别提高了1.69个百分点和0.32个 百分点,mIOU 提高了0.99个百分点,为多任务学习提供了新的解决方法. [ABSTRACT FROM AUTHOR]
Copyright of Computer Engineering & Science / Jisuanji Gongcheng yu Kexue is the property of Computer Engineering & Science and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Database: Engineering Source
FullText Links:
  – Type: pdflink
Text:
  Availability: 0
Header DbId: egs
DbLabel: Engineering Source
An: 192760409
AccessLevel: 6
PubType: Academic Journal
PubTypeId: academicJournal
PreciseRelevancyScore: 0
IllustrationInfo
Items – Name: Title
  Label: Title
  Group: Ti
  Data: 任务提示融合的端到端视觉多任务学习模型*.
– Name: TitleAlt
  Label: Alternate Title
  Group: TiAlt
  Data: An end-to-end visual multi-task learning model for task prompts fusion.
– Name: Author
  Label: Authors
  Group: Au
  Data: <searchLink fieldCode="AR" term="%22耿焕同%22">耿焕同</searchLink><relatesTo>1</relatesTo><i> htgeng@nuist.edu.cn</i><br /><searchLink fieldCode="AR" term="%22范子辰%22">范子辰</searchLink><relatesTo>2</relatesTo><i> 202212210032@nuist.edu.cn</i><br /><searchLink fieldCode="AR" term="%22蒋+骏%22">蒋 骏</searchLink><relatesTo>1</relatesTo><br /><searchLink fieldCode="AR" term="%22刘振宇%22">刘振宇</searchLink><relatesTo>1</relatesTo><br /><searchLink fieldCode="AR" term="%22李嘉兴%22">李嘉兴</searchLink><relatesTo>1</relatesTo>
– Name: TitleSource
  Label: Source
  Group: Src
  Data: <searchLink fieldCode="JN" term="%22Computer+Engineering+%26+Science+%2F+Jisuanji+Gongcheng+yu+Kexue%22">Computer Engineering & Science / Jisuanji Gongcheng yu Kexue</searchLink>. Mar2026, Vol. 48 Issue 3, p456-466. 11p.
– Name: Subject
  Label: Subjects
  Group: Su
  Data: <searchLink fieldCode="DE" term="%22Encoding%22">Encoding</searchLink><br /><searchLink fieldCode="DE" term="%22Feature+extraction%22">Feature extraction</searchLink><br /><searchLink fieldCode="DE" term="%22Artificial+intelligence%22">Artificial intelligence</searchLink>
– Name: Abstract
  Label: Abstract (English)
  Group: Ab
  Data: To address the issues of separated network structures and inter-task interference in existing visual multi-task learning models, an end-to-end visual multi-task learning model based on triple feature embedding and task prompt fusion is proposed. During the image embedding and encoding phase, three distinct encoding modules are employed to capture the original three types of features from the image, fully preserving global, local, and contour features. This enriches the structure and semantic information of the embedding vectors, enabling the model to access image information across different feature dimensions. In the feature extraction phase, to achieve unified end-to-end learning for general tasks, task-specific learning, and cross-task interactions, spatial-channel prompt learning modules and prompt fusion modules are utilized to extract salient features, trends, and raw information from both the image and task prompts. This enhances the expressiveness and guiding capabilities of the task prompts, allowing for more comprehensive extraction of global and local features from both the image and task prompts. Experimental results show that, compared to single-task state-of-the-art (SOTA) models, the evaluation metrics for mDS and RMSE improve by 3.36 percentage points and 2.41 percentage points, respectively. Compared to multi-task SOTA models, these metrics improve by 1.69 percentage points and 0.32 percentage points, respectively, with mIOU improving by 0.99 percentage points. This provides a novel solution for multi-task learning. [ABSTRACT FROM AUTHOR]
– Name: Abstract
  Label: Abstract (Chinese)
  Group: Ab
  Data: 针对现有视觉多任务学习模型中网络结构分离和任务间相互干扰的问题,提出了一种基于三 重特征嵌入和任务提示融合的端到端多任务学习模型。在图像嵌入编码阶段,通过采用3组不同的编码 模块以捕获图像原始的3种特征,充分保留图像的全局、局部以及轮廓特征,丰富嵌入编码向量结构和语 义信息,使得模型可以获取不同特征维度的图像信息。在特征提取阶段,为实现端到端统一的任务通用学 习、任务特定学习以及跨任务交互,使用空间-通道提示学习模块和提示融合模块提取图像和任务提示的 显著特征、趋势以及原始信息,增强任务提示的表达能力和提示能力,更充分地提取图像和任务提示的全 局和局部特征。实验结果表明,与单任务SOTA 模型相比,mDS 以及RMSE 指标分别提高了3.36个百 分点和2.41个百分点;而与多任务SOTA 模型相比,以上2个指标分别提高了1.69个百分点和0.32个 百分点,mIOU 提高了0.99个百分点,为多任务学习提供了新的解决方法. [ABSTRACT FROM AUTHOR]
– Name: AbstractSuppliedCopyright
  Label:
  Group: Ab
  Data: <i>Copyright of Computer Engineering & Science / Jisuanji Gongcheng yu Kexue is the property of Computer Engineering & Science and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract.</i> (Copyright applies to all Abstracts.)
PLink https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=egs&AN=192760409
RecordInfo BibRecord:
  BibEntity:
    Identifiers:
      – Type: doi
        Value: 10.3969/j.issn.1007-130X.2026.03.008
    Languages:
      – Code: chi
        Text: Chinese
    PhysicalDescription:
      Pagination:
        PageCount: 11
        StartPage: 456
    Subjects:
      – SubjectFull: Encoding
        Type: general
      – SubjectFull: Feature extraction
        Type: general
      – SubjectFull: Artificial intelligence
        Type: general
    Titles:
      – TitleFull: 任务提示融合的端到端视觉多任务学习模型*.
        Type: main
  BibRelationships:
    HasContributorRelationships:
      – PersonEntity:
          Name:
            NameFull: 耿焕同
      – PersonEntity:
          Name:
            NameFull: 范子辰
      – PersonEntity:
          Name:
            NameFull: 蒋 骏
      – PersonEntity:
          Name:
            NameFull: 刘振宇
      – PersonEntity:
          Name:
            NameFull: 李嘉兴
    IsPartOfRelationships:
      – BibEntity:
          Dates:
            – D: 01
              M: 03
              Text: Mar2026
              Type: published
              Y: 2026
          Identifiers:
            – Type: issn-print
              Value: 1007130X
          Numbering:
            – Type: volume
              Value: 48
            – Type: issue
              Value: 3
          Titles:
            – TitleFull: Computer Engineering & Science / Jisuanji Gongcheng yu Kexue
              Type: main
ResultId 1