MLFormer: Linear-Attention Transformer with Multi-Scale Feature Enhancement for Real-Time Monocular 3D Human Pose Estimation.

Saved in:
Bibliographic Details
Title: MLFormer: Linear-Attention Transformer with Multi-Scale Feature Enhancement for Real-Time Monocular 3D Human Pose Estimation.
Authors: Wang, Bowen1 670144302@qq.com, Wang, Shiwen2 2042798189@qq.com, Zhou, Ziwei3 381431970@qq.com
Source: Engineering Letters. Apr2026, Vol. 34 Issue 4, p1385-1394. 10p.
Subjects: Transformer models, Real-time computing, Feature extraction, Computer vision, Artificial intelligence
Abstract: With the rapid development of artificial intelligence, monocular human pose estimation has become increasingly prominent in the field of computer vision. It holds broad application prospects in areas such as intelligent sports, medical rehabilitation, and human--computer interaction. Nevertheless, existing monocular approaches still suffer from limited accuracy, real-time performance, and adaptability. To address these issues, we propose MLFormer, a Transformer-based architecture that integrates a linear attention mechanism with a Multi-scale Feature Enhancement Module (MFEM). This design significantly reduces computational complexity while improving both accuracy and inference speed. Evaluated on Human3.6M, MLFormer achieves an MPJPE of 42.1 mm; on MPI-INF-3DHP it attains 94.6% PCK, 67.1% AUC, and 53.8 mm MPJPE, surpassing state-of-the-art methods on all metrics. Extensive experiments demonstrate that MLFormer retains high precision, offers stronger real-time capability, and exhibits superior adaptability to human poses at varying scales, together with robustness and generalizability. Overall, the proposed model delivers an efficient solution for monocular human pose estimation, providing notable improvements in accuracy, real-time performance, and adaptability. [ABSTRACT FROM AUTHOR]
Copyright of Engineering Letters is the property of International Association of Engineers (IAENG) and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Database: Engineering Source
FullText Links:
  – Type: pdflink
Text:
  Availability: 0
Header DbId: egs
DbLabel: Engineering Source
An: 192720699
AccessLevel: 6
PubType: Academic Journal
PubTypeId: academicJournal
PreciseRelevancyScore: 0
IllustrationInfo
Items – Name: Title
  Label: Title
  Group: Ti
  Data: MLFormer: Linear-Attention Transformer with Multi-Scale Feature Enhancement for Real-Time Monocular 3D Human Pose Estimation.
– Name: Author
  Label: Authors
  Group: Au
  Data: <searchLink fieldCode="AR" term="%22Wang%2C+Bowen%22">Wang, Bowen</searchLink><relatesTo>1</relatesTo><i> 670144302@qq.com</i><br /><searchLink fieldCode="AR" term="%22Wang%2C+Shiwen%22">Wang, Shiwen</searchLink><relatesTo>2</relatesTo><i> 2042798189@qq.com</i><br /><searchLink fieldCode="AR" term="%22Zhou%2C+Ziwei%22">Zhou, Ziwei</searchLink><relatesTo>3</relatesTo><i> 381431970@qq.com</i>
– Name: TitleSource
  Label: Source
  Group: Src
  Data: <searchLink fieldCode="JN" term="%22Engineering+Letters%22">Engineering Letters</searchLink>. Apr2026, Vol. 34 Issue 4, p1385-1394. 10p.
– Name: Subject
  Label: Subjects
  Group: Su
  Data: <searchLink fieldCode="DE" term="%22Transformer+models%22">Transformer models</searchLink><br /><searchLink fieldCode="DE" term="%22Real-time+computing%22">Real-time computing</searchLink><br /><searchLink fieldCode="DE" term="%22Feature+extraction%22">Feature extraction</searchLink><br /><searchLink fieldCode="DE" term="%22Computer+vision%22">Computer vision</searchLink><br /><searchLink fieldCode="DE" term="%22Artificial+intelligence%22">Artificial intelligence</searchLink>
– Name: Abstract
  Label: Abstract
  Group: Ab
  Data: With the rapid development of artificial intelligence, monocular human pose estimation has become increasingly prominent in the field of computer vision. It holds broad application prospects in areas such as intelligent sports, medical rehabilitation, and human--computer interaction. Nevertheless, existing monocular approaches still suffer from limited accuracy, real-time performance, and adaptability. To address these issues, we propose MLFormer, a Transformer-based architecture that integrates a linear attention mechanism with a Multi-scale Feature Enhancement Module (MFEM). This design significantly reduces computational complexity while improving both accuracy and inference speed. Evaluated on Human3.6M, MLFormer achieves an MPJPE of 42.1 mm; on MPI-INF-3DHP it attains 94.6% PCK, 67.1% AUC, and 53.8 mm MPJPE, surpassing state-of-the-art methods on all metrics. Extensive experiments demonstrate that MLFormer retains high precision, offers stronger real-time capability, and exhibits superior adaptability to human poses at varying scales, together with robustness and generalizability. Overall, the proposed model delivers an efficient solution for monocular human pose estimation, providing notable improvements in accuracy, real-time performance, and adaptability. [ABSTRACT FROM AUTHOR]
– Name: AbstractSuppliedCopyright
  Label:
  Group: Ab
  Data: <i>Copyright of Engineering Letters is the property of International Association of Engineers (IAENG) and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract.</i> (Copyright applies to all Abstracts.)
PLink https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=egs&AN=192720699
RecordInfo BibRecord:
  BibEntity:
    Languages:
      – Code: eng
        Text: English
    PhysicalDescription:
      Pagination:
        PageCount: 10
        StartPage: 1385
    Subjects:
      – SubjectFull: Transformer models
        Type: general
      – SubjectFull: Real-time computing
        Type: general
      – SubjectFull: Feature extraction
        Type: general
      – SubjectFull: Computer vision
        Type: general
      – SubjectFull: Artificial intelligence
        Type: general
    Titles:
      – TitleFull: MLFormer: Linear-Attention Transformer with Multi-Scale Feature Enhancement for Real-Time Monocular 3D Human Pose Estimation.
        Type: main
  BibRelationships:
    HasContributorRelationships:
      – PersonEntity:
          Name:
            NameFull: Wang, Bowen
      – PersonEntity:
          Name:
            NameFull: Wang, Shiwen
      – PersonEntity:
          Name:
            NameFull: Zhou, Ziwei
    IsPartOfRelationships:
      – BibEntity:
          Dates:
            – D: 01
              M: 04
              Text: Apr2026
              Type: published
              Y: 2026
          Identifiers:
            – Type: issn-print
              Value: 1816093X
          Numbering:
            – Type: volume
              Value: 34
            – Type: issue
              Value: 4
          Titles:
            – TitleFull: Engineering Letters
              Type: main
ResultId 1