MLFormer: Linear-Attention Transformer with Multi-Scale Feature Enhancement for Real-Time Monocular 3D Human Pose Estimation.

Saved in:

Bibliographic Details
Title:	MLFormer: Linear-Attention Transformer with Multi-Scale Feature Enhancement for Real-Time Monocular 3D Human Pose Estimation.
Authors:	Wang, Bowen¹ 670144302@qq.com, Wang, Shiwen² 2042798189@qq.com, Zhou, Ziwei³ 381431970@qq.com
Source:	Engineering Letters. Apr2026, Vol. 34 Issue 4, p1385-1394. 10p.
Subjects:	Transformer models, Real-time computing, Feature extraction, Computer vision, Artificial intelligence
Abstract:	With the rapid development of artificial intelligence, monocular human pose estimation has become increasingly prominent in the field of computer vision. It holds broad application prospects in areas such as intelligent sports, medical rehabilitation, and human--computer interaction. Nevertheless, existing monocular approaches still suffer from limited accuracy, real-time performance, and adaptability. To address these issues, we propose MLFormer, a Transformer-based architecture that integrates a linear attention mechanism with a Multi-scale Feature Enhancement Module (MFEM). This design significantly reduces computational complexity while improving both accuracy and inference speed. Evaluated on Human3.6M, MLFormer achieves an MPJPE of 42.1 mm; on MPI-INF-3DHP it attains 94.6% PCK, 67.1% AUC, and 53.8 mm MPJPE, surpassing state-of-the-art methods on all metrics. Extensive experiments demonstrate that MLFormer retains high precision, offers stronger real-time capability, and exhibits superior adaptability to human poses at varying scales, together with robustness and generalizability. Overall, the proposed model delivers an efficient solution for monocular human pose estimation, providing notable improvements in accuracy, real-time performance, and adaptability. [ABSTRACT FROM AUTHOR]
	Copyright of Engineering Letters is the property of International Association of Engineers (IAENG) and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Database:	Engineering Source

FullText	Links: – Type: pdflink Text: Availability: 0
Header	DbId: egs DbLabel: Engineering Source An: 192720699 AccessLevel: 6 PubType: Academic Journal PubTypeId: academicJournal PreciseRelevancyScore: 0
IllustrationInfo
Items	– Name: Title Label: Title Group: Ti Data: MLFormer: Linear-Attention Transformer with Multi-Scale Feature Enhancement for Real-Time Monocular 3D Human Pose Estimation. – Name: Author Label: Authors Group: Au Data: <searchLink fieldCode="AR" term="%22Wang%2C+Bowen%22">Wang, Bowen</searchLink><relatesTo>1</relatesTo><i> 670144302@qq.com</i><br /><searchLink fieldCode="AR" term="%22Wang%2C+Shiwen%22">Wang, Shiwen</searchLink><relatesTo>2</relatesTo><i> 2042798189@qq.com</i><br /><searchLink fieldCode="AR" term="%22Zhou%2C+Ziwei%22">Zhou, Ziwei</searchLink><relatesTo>3</relatesTo><i> 381431970@qq.com</i> – Name: TitleSource Label: Source Group: Src Data: <searchLink fieldCode="JN" term="%22Engineering+Letters%22">Engineering Letters</searchLink>. Apr2026, Vol. 34 Issue 4, p1385-1394. 10p. – Name: Subject Label: Subjects Group: Su Data: <searchLink fieldCode="DE" term="%22Transformer+models%22">Transformer models</searchLink><br /><searchLink fieldCode="DE" term="%22Real-time+computing%22">Real-time computing</searchLink><br /><searchLink fieldCode="DE" term="%22Feature+extraction%22">Feature extraction</searchLink><br /><searchLink fieldCode="DE" term="%22Computer+vision%22">Computer vision</searchLink><br /><searchLink fieldCode="DE" term="%22Artificial+intelligence%22">Artificial intelligence</searchLink> – Name: Abstract Label: Abstract Group: Ab Data: With the rapid development of artificial intelligence, monocular human pose estimation has become increasingly prominent in the field of computer vision. It holds broad application prospects in areas such as intelligent sports, medical rehabilitation, and human--computer interaction. Nevertheless, existing monocular approaches still suffer from limited accuracy, real-time performance, and adaptability. To address these issues, we propose MLFormer, a Transformer-based architecture that integrates a linear attention mechanism with a Multi-scale Feature Enhancement Module (MFEM). This design significantly reduces computational complexity while improving both accuracy and inference speed. Evaluated on Human3.6M, MLFormer achieves an MPJPE of 42.1 mm; on MPI-INF-3DHP it attains 94.6% PCK, 67.1% AUC, and 53.8 mm MPJPE, surpassing state-of-the-art methods on all metrics. Extensive experiments demonstrate that MLFormer retains high precision, offers stronger real-time capability, and exhibits superior adaptability to human poses at varying scales, together with robustness and generalizability. Overall, the proposed model delivers an efficient solution for monocular human pose estimation, providing notable improvements in accuracy, real-time performance, and adaptability. [ABSTRACT FROM AUTHOR] – Name: AbstractSuppliedCopyright Label: Group: Ab Data: <i>Copyright of Engineering Letters is the property of International Association of Engineers (IAENG) and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract.</i> (Copyright applies to all Abstracts.)
PLink	https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=egs&AN=192720699
RecordInfo	BibRecord: BibEntity: Languages: – Code: eng Text: English PhysicalDescription: Pagination: PageCount: 10 StartPage: 1385 Subjects: – SubjectFull: Transformer models Type: general – SubjectFull: Real-time computing Type: general – SubjectFull: Feature extraction Type: general – SubjectFull: Computer vision Type: general – SubjectFull: Artificial intelligence Type: general Titles: – TitleFull: MLFormer: Linear-Attention Transformer with Multi-Scale Feature Enhancement for Real-Time Monocular 3D Human Pose Estimation. Type: main BibRelationships: HasContributorRelationships: – PersonEntity: Name: NameFull: Wang, Bowen – PersonEntity: Name: NameFull: Wang, Shiwen – PersonEntity: Name: NameFull: Zhou, Ziwei IsPartOfRelationships: – BibEntity: Dates: – D: 01 M: 04 Text: Apr2026 Type: published Y: 2026 Identifiers: – Type: issn-print Value: 1816093X Numbering: – Type: volume Value: 34 – Type: issue Value: 4 Titles: – TitleFull: Engineering Letters Type: main
ResultId	1