Zero-Shot Skeleton-Based Action Recognition With Prototype-Guided Feature Alignment.
Saved in:
| Title: | Zero-Shot Skeleton-Based Action Recognition With Prototype-Guided Feature Alignment. |
|---|---|
| Authors: | Zhou, Kai1 kayjoe0723@gmail.com, Zhang, Shuhai1 shuhaizhangshz@gmail.com, You, Zeng2 zengyou.yz@gmail.com, Hu, Jinwu3 fhujinwu@gmail.com, Tan, Mingkui1 mingkuitan@scut.edu.cn, Liu, Fei1 feiliu@scut.edu.cn |
| Source: | IEEE Transactions on Image Processing. 2025, Vol. 34, p4602-4617. 16p. |
| Subjects: | Image recognition (Computer vision), Optical pattern recognition, Image registration, Object recognition (Computer vision), Computer vision |
| Abstract: | Zero-shot skeleton-based action recognition aims to classify unseen skeleton-based human actions without prior exposure to such categories during training. This task is extremely challenging due to the difficulty in generalizing from known to unknown actions. Previous studies typically use two-stage training: pre-training skeleton encoders on seen action categories using cross-entropy loss and then aligning pre-extracted skeleton and text features, enabling knowledge transfer to unseen classes through skeleton-text alignment and language models’ generalization. However, their efficacy is hindered by 1) insufficient discrimination for skeleton features, as the fixed skeleton encoder fails to capture necessary alignment information for effective skeleton-text alignment; 2) the neglect of alignment bias between skeleton and unseen text features during testing. To this end, we propose a prototype-guided feature alignment paradigm for zero-shot skeleton-based action recognition, termed PGFA. Specifically, we develop an end-to-end cross-modal contrastive training framework to improve skeleton-text alignment, ensuring sufficient discrimination for skeleton features. Additionally, we introduce a prototype-guided text feature alignment strategy to mitigate the adverse impact of the distribution discrepancy during testing. We provide a theoretical analysis to support our prototype-guided text feature alignment strategy and empirically evaluate our overall PGFA on three well-known datasets. Compared with the top competitor SMIE method, our PGFA achieves absolute accuracy improvements of 22.96%, 12.53%, and 18.54% on the NTU-60, NTU-120, and PKU-MMD datasets, respectively. [ABSTRACT FROM AUTHOR] |
| Copyright of IEEE Transactions on Image Processing is the property of IEEE and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.) | |
| Database: | Engineering Source |
| FullText | Text: Availability: 0 |
|---|---|
| Header | DbId: egs DbLabel: Engineering Source An: 191897157 AccessLevel: 6 PubType: Academic Journal PubTypeId: academicJournal PreciseRelevancyScore: 0 |
| IllustrationInfo | |
| Items | – Name: Title Label: Title Group: Ti Data: Zero-Shot Skeleton-Based Action Recognition With Prototype-Guided Feature Alignment. – Name: Author Label: Authors Group: Au Data: <searchLink fieldCode="AR" term="%22Zhou%2C+Kai%22">Zhou, Kai</searchLink><relatesTo>1</relatesTo><i> kayjoe0723@gmail.com</i><br /><searchLink fieldCode="AR" term="%22Zhang%2C+Shuhai%22">Zhang, Shuhai</searchLink><relatesTo>1</relatesTo><i> shuhaizhangshz@gmail.com</i><br /><searchLink fieldCode="AR" term="%22You%2C+Zeng%22">You, Zeng</searchLink><relatesTo>2</relatesTo><i> zengyou.yz@gmail.com</i><br /><searchLink fieldCode="AR" term="%22Hu%2C+Jinwu%22">Hu, Jinwu</searchLink><relatesTo>3</relatesTo><i> fhujinwu@gmail.com</i><br /><searchLink fieldCode="AR" term="%22Tan%2C+Mingkui%22">Tan, Mingkui</searchLink><relatesTo>1</relatesTo><i> mingkuitan@scut.edu.cn</i><br /><searchLink fieldCode="AR" term="%22Liu%2C+Fei%22">Liu, Fei</searchLink><relatesTo>1</relatesTo><i> feiliu@scut.edu.cn</i> – Name: TitleSource Label: Source Group: Src Data: <searchLink fieldCode="JN" term="%22IEEE+Transactions+on+Image+Processing%22">IEEE Transactions on Image Processing</searchLink>. 2025, Vol. 34, p4602-4617. 16p. – Name: Subject Label: Subjects Group: Su Data: <searchLink fieldCode="DE" term="%22Image+recognition+%28Computer+vision%29%22">Image recognition (Computer vision)</searchLink><br /><searchLink fieldCode="DE" term="%22Optical+pattern+recognition%22">Optical pattern recognition</searchLink><br /><searchLink fieldCode="DE" term="%22Image+registration%22">Image registration</searchLink><br /><searchLink fieldCode="DE" term="%22Object+recognition+%28Computer+vision%29%22">Object recognition (Computer vision)</searchLink><br /><searchLink fieldCode="DE" term="%22Computer+vision%22">Computer vision</searchLink> – Name: Abstract Label: Abstract Group: Ab Data: Zero-shot skeleton-based action recognition aims to classify unseen skeleton-based human actions without prior exposure to such categories during training. This task is extremely challenging due to the difficulty in generalizing from known to unknown actions. Previous studies typically use two-stage training: pre-training skeleton encoders on seen action categories using cross-entropy loss and then aligning pre-extracted skeleton and text features, enabling knowledge transfer to unseen classes through skeleton-text alignment and language models’ generalization. However, their efficacy is hindered by 1) insufficient discrimination for skeleton features, as the fixed skeleton encoder fails to capture necessary alignment information for effective skeleton-text alignment; 2) the neglect of alignment bias between skeleton and unseen text features during testing. To this end, we propose a prototype-guided feature alignment paradigm for zero-shot skeleton-based action recognition, termed PGFA. Specifically, we develop an end-to-end cross-modal contrastive training framework to improve skeleton-text alignment, ensuring sufficient discrimination for skeleton features. Additionally, we introduce a prototype-guided text feature alignment strategy to mitigate the adverse impact of the distribution discrepancy during testing. We provide a theoretical analysis to support our prototype-guided text feature alignment strategy and empirically evaluate our overall PGFA on three well-known datasets. Compared with the top competitor SMIE method, our PGFA achieves absolute accuracy improvements of 22.96%, 12.53%, and 18.54% on the NTU-60, NTU-120, and PKU-MMD datasets, respectively. [ABSTRACT FROM AUTHOR] – Name: AbstractSuppliedCopyright Label: Group: Ab Data: <i>Copyright of IEEE Transactions on Image Processing is the property of IEEE and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract.</i> (Copyright applies to all Abstracts.) |
| PLink | https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=egs&AN=191897157 |
| RecordInfo | BibRecord: BibEntity: Identifiers: – Type: doi Value: 10.1109/TIP.2025.3586487 Languages: – Code: eng Text: English PhysicalDescription: Pagination: PageCount: 16 StartPage: 4602 Subjects: – SubjectFull: Image recognition (Computer vision) Type: general – SubjectFull: Optical pattern recognition Type: general – SubjectFull: Image registration Type: general – SubjectFull: Object recognition (Computer vision) Type: general – SubjectFull: Computer vision Type: general Titles: – TitleFull: Zero-Shot Skeleton-Based Action Recognition With Prototype-Guided Feature Alignment. Type: main BibRelationships: HasContributorRelationships: – PersonEntity: Name: NameFull: Zhou, Kai – PersonEntity: Name: NameFull: Zhang, Shuhai – PersonEntity: Name: NameFull: You, Zeng – PersonEntity: Name: NameFull: Hu, Jinwu – PersonEntity: Name: NameFull: Tan, Mingkui – PersonEntity: Name: NameFull: Liu, Fei IsPartOfRelationships: – BibEntity: Dates: – D: 01 M: 01 Text: 2025 Type: published Y: 2025 Identifiers: – Type: issn-print Value: 10577149 Numbering: – Type: volume Value: 34 Titles: – TitleFull: IEEE Transactions on Image Processing Type: main |
| ResultId | 1 |