Style-Content progressive aggregation network with stable diffusion.

Saved in:
Bibliographic Details
Title: Style-Content progressive aggregation network with stable diffusion.
Authors: Yuan, Tiebiao1 (AUTHOR), Yu, Yangyang1,2 (AUTHOR) yuantb@tjrac.edu.cn, Ji, Ning1 (AUTHOR)
Source: Applied Intelligence. Aug2025, Vol. 55 Issue 12, p1-16. 16p.
Abstract: The task of text-to-image generation has matured significantly with the advancement of diffusion models; however, achieving precise control over the details and style of generated images remains a challenge. Existing methods often rely on complex text prompts to describe details while attempting to integrate style information. Nevertheless, the single-stage attention mechanism in diffusion models struggles to effectively capture multi-scale features and the relationship between style and content, resulting in feature amalgamation that compromises the quality of generated images. To address this issue, we propose a Style-Content Progressive Aggregation (SCPA) network, which integrates and aggregates multi-scale features from style images and text prompts through the coordinated design of two complementary modules. Specifically, the Style-Content Decoupling (SCD) module disentangles the style and content features of the style image, and reconstructs a learnable content template based on the extracted style features, thereby preventing the original content features from interfering with text understanding. The Style-Content Coupling (SCC) module then extracts multi-scale pixel-level content features from the text prompt and progressively integrates style elements into the template, enabling fine-grained fusion of content and style. This progressive aggregation strategy effectively enhances the quality of prior guidance provided to the diffusion model. Extensive experimental results demonstrate that the SCPA network can generate more artistically appealing images and offers a new direction for the integration of text-to-image generation models with traditional style transfer techniques. [ABSTRACT FROM AUTHOR]
Copyright of Applied Intelligence is the property of Springer Nature and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Database: Engineering Source
Description
Abstract:The task of text-to-image generation has matured significantly with the advancement of diffusion models; however, achieving precise control over the details and style of generated images remains a challenge. Existing methods often rely on complex text prompts to describe details while attempting to integrate style information. Nevertheless, the single-stage attention mechanism in diffusion models struggles to effectively capture multi-scale features and the relationship between style and content, resulting in feature amalgamation that compromises the quality of generated images. To address this issue, we propose a Style-Content Progressive Aggregation (SCPA) network, which integrates and aggregates multi-scale features from style images and text prompts through the coordinated design of two complementary modules. Specifically, the Style-Content Decoupling (SCD) module disentangles the style and content features of the style image, and reconstructs a learnable content template based on the extracted style features, thereby preventing the original content features from interfering with text understanding. The Style-Content Coupling (SCC) module then extracts multi-scale pixel-level content features from the text prompt and progressively integrates style elements into the template, enabling fine-grained fusion of content and style. This progressive aggregation strategy effectively enhances the quality of prior guidance provided to the diffusion model. Extensive experimental results demonstrate that the SCPA network can generate more artistically appealing images and offers a new direction for the integration of text-to-image generation models with traditional style transfer techniques. [ABSTRACT FROM AUTHOR]
ISSN:0924669X
DOI:10.1007/s10489-025-06751-4