Seeded Topic Models in Digital Archives: Analyzing Interpretations of Immigration in Swedish Newspapers, 1945-2019

Saved in:
Bibliographic Details
Title: Seeded Topic Models in Digital Archives: Analyzing Interpretations of Immigration in Swedish Newspapers, 1945-2019
Language: English
Authors: Miriam Hurtado Bodell (ORCID 0000-0002-8467-1746), Måns Magnusson (ORCID 0000-0002-0296-2719), Marc Keuschnigg (ORCID 0000-0001-5774-1553)
Source: Sociological Methods & Research. 2026 55(1):120-156.
Availability: SAGE Publications. 2455 Teller Road, Thousand Oaks, CA 91320. Tel: 800-818-7243; Tel: 805-499-9774; Fax: 800-583-2665; e-mail: journals@sagepub.com; Web site: https://sagepub.com
Peer Reviewed: Y
Page Count: 37
Publication Date: 2026
Document Type: Journal Articles
Reports - Research
Descriptors: Foreign Countries, Newspapers, Mass Media Role, Public Opinion, Immigrants, Content Analysis, Immigration, Social Science Research, Information Retrieval
Geographic Terms: Sweden
DOI: 10.1177/00491241241268453
ISSN: 0049-1241
1552-8294
Abstract: Sociologists are discussing the need for more formal ways to extract meaning from digital text archives. We focus attention on the seeded topic model, a semi-supervised extension to the standard topic model that allows sociological knowledge to be infused into the computational learning of meaning structures. Seed words help crystallize topics around known concepts, while utilizing topic models' functionality to identify associations in text based on word co-occurrences. The method estimates a concept's shared interpretation (or framing) via its associations with other frequently co-occurring topics. In a case study, we extract longitudinal measures of media frames regarding immigration from a vast corpus of millions of Swedish newspaper articles from the period 1945-2019. We infer turning points that partition the immigration discourse into meaningful eras and locate Sweden's era of multicultural ideals that coined its tolerant reputation.
Abstractor: As Provided
Entry Date: 2026
Accession Number: EJ1496219
Database: ERIC
Full text is not displayed to guests.
FullText Links:
  – Type: pdflink
    Url: https://content.ebscohost.com/cds/retrieve?content=AQICAHj0k_4E0hTGH8RJwT4gCJyBsGNe_WN95AvKlDbXJGqwxwFJvH9mbzjsRo5wHvKnxm80AAAA4zCB4AYJKoZIhvcNAQcGoIHSMIHPAgEAMIHJBgkqhkiG9w0BBwEwHgYJYIZIAWUDBAEuMBEEDF2HR94WpEthXZ902gIBEICBm0dt8ua-Xrr_syGapCL4iNKiRbrQAGGJsOb9_7QD1av5vtmAxXe2hkuOFu-mIu4Xh1N80lJq1HJreUquTl06nMOtwGnc5pxsXkezqcf68HpiqMJgqG5GLw9H3QaDm5LxrKCqzIo3nqc90S4DEL8oE4sIzNY6_lHvw0-3WIMWilhhDV4P8srSt47ovrXQ-2xQyBWyv0uoAcfWzwgh
Text:
  Availability: 1
  Value: <anid>AN0190929150;som01feb.26;2026Jan20.00:54;v2.2.500</anid> <title id="AN0190929150-1">Seeded Topic Models in Digital Archives: Analyzing Interpretations of Immigration in Swedish Newspapers, 1945–2019 </title> <p>Sociologists are discussing the need for more formal ways to extract meaning from digital text archives. We focus attention on the seeded topic model, a semi-supervised extension to the standard topic model that allows sociological knowledge to be infused into the computational learning of meaning structures. Seed words help crystallize topics around known concepts, while utilizing topic models' functionality to identify associations in text based on word co-occurrences. The method estimates a concept's shared interpretation (or framing) via its associations with other frequently co-occurring topics. In a case study, we extract longitudinal measures of media frames regarding immigration from a vast corpus of millions of Swedish newspaper articles from the period 1945–2019. We infer turning points that partition the immigration discourse into meaningful eras and locate Sweden's era of multicultural ideals that coined its tolerant reputation.</p> <p>Keywords: media discourse; framing; immigration; computational text analysis; seeded topic model; natural language processing‌</p> <hd id="AN0190929150-2">Introduction</hd> <p>In recent years, an increasing number of sociologists have embraced machine learning algorithms to infer latent patterns in text data (e.g., [<reflink idref="bib44" id="ref1">44</reflink>]; [<reflink idref="bib102" id="ref2">102</reflink>]; [<reflink idref="bib115" id="ref3">115</reflink>]; [<reflink idref="bib11" id="ref4">11</reflink>]; [<reflink idref="bib104" id="ref5">104</reflink>], [<reflink idref="bib105" id="ref6">105</reflink>],[<reflink idref="bib106" id="ref7">106</reflink>]; [<reflink idref="bib87" id="ref8">87</reflink>]; [<reflink idref="bib66" id="ref9">66</reflink>]; [<reflink idref="bib13" id="ref10">13</reflink>]; [<reflink idref="bib82" id="ref11">82</reflink>]; [<reflink idref="bib143" id="ref12">143</reflink>]; [<reflink idref="bib25" id="ref13">25</reflink>]; [<reflink idref="bib131" id="ref14">131</reflink>]; [<reflink idref="bib126" id="ref15">126</reflink>]; [<reflink idref="bib7" id="ref16">7</reflink>]; [<reflink idref="bib27" id="ref17">27</reflink>]; [<reflink idref="bib31" id="ref18">31</reflink>]; [<reflink idref="bib20" id="ref19">20</reflink>]). One suite of algorithms, unsupervised topic models ([<reflink idref="bib22" id="ref20">22</reflink>]; [<reflink idref="bib71" id="ref21">71</reflink>]; [<reflink idref="bib21" id="ref22">21</reflink>]), infers linguistic themes based on word co-occurrences. Topic models have been found to resonate well with sociological ideas about how people create meaning and make sense of the social world by linking themes to other concepts and ideas ([<reflink idref="bib44" id="ref23">44</reflink>]; [<reflink idref="bib102" id="ref24">102</reflink>]; [<reflink idref="bib133" id="ref25">133</reflink>]; [<reflink idref="bib57" id="ref26">57</reflink>]; [<reflink idref="bib104" id="ref27">104</reflink>]). This article addresses a central limitation of topic models: while they are suited to inductive research that identifies emergent themes from document collections, they fare poorly at identifying, in transparent and replicable ways, specific concepts predefined by the researcher. Topic models, and unsupervised methods more generally, rely on post hoc analysis to make sense of the output in light of sociological theory, opening up an old rift between inductive and deductive research within the discipline. As computational text analysis has matured as a methodology in the sociological toolkit, calls have been made for an important next step: to move beyond the implementation of standard models and to strive to apply specialized models that are more transparent, replicable, theory-driven, and interpretable, and thus more attuned to the central demands of social science research ([<reflink idref="bib43" id="ref28">43</reflink>]; [<reflink idref="bib103" id="ref29">103</reflink>]; [<reflink idref="bib100" id="ref30">100</reflink>]; [<reflink idref="bib109" id="ref31">109</reflink>]; [<reflink idref="bib106" id="ref32">106</reflink>]; [<reflink idref="bib72" id="ref33">72</reflink>]; [<reflink idref="bib28" id="ref34">28</reflink>]).</p> <p>We contribute further to this debate and argue for the use of semi-supervised text analysis. We focus on the <emph>seeded</emph> (or <emph>constrained</emph>) <emph>topic model</emph> ([<reflink idref="bib6" id="ref35">6</reflink>]; [<reflink idref="bib79" id="ref36">79</reflink>]; [<reflink idref="bib138" id="ref37">138</reflink>]), which combines the original model's unsupervised nature with sociological domain knowledge.[<reflink idref="bib7" id="ref38">7</reflink>] In contrast to other topic model extensions commonly used by social scientists, such as the structural topic model ([<reflink idref="bib113" id="ref39">113</reflink>]) that utilizes document-level covariates to interpret model results in light of theory, the seeded topic model creates an informative dimension reduction of the corpus. In practice, scholars often want to take advantage of the exploratory capabilities of topic models, while also hoping that the models will capture themes that are presumed a priori to exist in a corpus. Our proposed approach makes it possible to achieve both objectives by seeding certain topics, while letting other topics emerge inductively, combining the inductive power of topic models with some degree of researcher supervision. Here lies an important advantage over the deterministic use of dictionary approaches to measure predefined concepts in that seeding helps crystallize topics of interest but it allows for imperfect knowledge of the topics before running the model.</p> <p>The seeding crystallizes topics around predefined words that describe themes of interest. We use the term "topics" to refer to model output, and we use "themes," "issues," and "frames" when referring to theoretical concepts. Seed words require researchers to be explicit about how a concept is operationalized, and seeding is one way to constrain the model to search for specific themes of interest. Seeding can also increase the robustness of computational text analysis to language change, an endemic challenge when analyzing text archives of historical timescales ([<reflink idref="bib17" id="ref40">17</reflink>]; [<reflink idref="bib115" id="ref41">115</reflink>]; [<reflink idref="bib135" id="ref42">135</reflink>]; [<reflink idref="bib27" id="ref43">27</reflink>]). By identifying associations between a focal topic and other topics with which it frequently co-occurs, the model can detect widely shared interpretations (or frames) associated with the theme in question. These model features provide an attractive complement to the mixed-methods approaches (e.g., [<reflink idref="bib43" id="ref44">43</reflink>]; [<reflink idref="bib82" id="ref45">82</reflink>]; [<reflink idref="bib103" id="ref46">103</reflink>], [<reflink idref="bib104" id="ref47">104</reflink>]) that are currently being discussed as a way of bringing computational text analysis into sociological research.</p> <p>One strength of the topic model approach is to allow for words' mixed memberships in topics. Our use of the seeded topic model, however, aims at measuring clearly defined and interpretable topics, which we will achieve by using seed words that we believe to have a single, very clear meaning. Seeding will work less well if one starts from polysemic words, i.e., words with multiple meanings, or if one tries to seed a polysemic topic altogether. While the words associated with the seeded words within a given topic are also allowed to emerge from the data, forced monosemy is a limitation of our approach that will hinder its applicability to certain use cases.</p> <p>Seeded topic models have been around for a decade and have more recently become available in general-purpose programming languages such as R ([<reflink idref="bib138" id="ref48">138</reflink>]) and Python ([<reflink idref="bib5" id="ref49">5</reflink>]). However, strong computational requirements and limitations in the scalability of off-the-shelf implementations ([<reflink idref="bib95" id="ref50">95</reflink>]; [<reflink idref="bib79" id="ref51">79</reflink>]; [<reflink idref="bib54" id="ref52">54</reflink>]; [<reflink idref="bib53" id="ref53">53</reflink>]; [<reflink idref="bib137" id="ref54">137</reflink>]) have hampered their application in sociology. We discuss a scalable implementation for big text data ([<reflink idref="bib97" id="ref55">97</reflink>]) that removes previous bottlenecks and that we hope will make the algorithm attractive to a broader sociological audience. We illustrate the method using an important case study that measures the ways the media have framed immigration in a Swedish newspaper corpus spanning 75 years. The corpus, one of the most extensive ever analyzed in the social sciences, contains 30 million text blocks from more than 100,000 editions of the country's four national newspapers from the period 1945–2019.</p> <p>Our study connects to a long tradition of sociological research studying newspaper discourses (e.g., [<reflink idref="bib60" id="ref56">60</reflink>]; [<reflink idref="bib99" id="ref57">99</reflink>]; [<reflink idref="bib85" id="ref58">85</reflink>]; [<reflink idref="bib56" id="ref59">56</reflink>]; [<reflink idref="bib80" id="ref60">80</reflink>]; [<reflink idref="bib9" id="ref61">9</reflink>]; [<reflink idref="bib123" id="ref62">123</reflink>]). Previous immigration-related research has relied on corpora comprising between a few thousand and 130,000 articles, which have typically been assembled using keyword searches, and which have spanned time frames of between 1 and 14 years ([<reflink idref="bib75" id="ref63">75</reflink>]; [<reflink idref="bib90" id="ref64">90</reflink>]; [<reflink idref="bib68" id="ref65">68</reflink>]; [<reflink idref="bib74" id="ref66">74</reflink>]; [<reflink idref="bib39" id="ref67">39</reflink>]). The largest studies to date have included 850,000 articles in six European languages ([<reflink idref="bib46" id="ref68">46</reflink>]) and 850,000 immigration-related headlines from UK newspapers ([<reflink idref="bib23" id="ref69">23</reflink>]). Compared to past snap-shot corpora, our data are vast and—in combination with a scalable algorithm—permit a fine-grained mapping of the newspaper discourse on immigration over 75 years.</p> <p>Using the corpus described above, we map how shared interpretations of immigration have evolved over time. We operationalize interpretative media frames as associations between a focal topic and other topics, estimating the co-occurrence patterns of predefined themes (combining "immigration" with, e.g., "the economy," "culture," or "security"). Issues that frequently co-occur with the focal topic represent prominent logics for the topic's interpretation. Through the ways journalists curate and present the news flow, the media frames that we measure in this study establish a shared context of meaning-making ([<reflink idref="bib118" id="ref70">118</reflink>]; [<reflink idref="bib56" id="ref71">56</reflink>]; [<reflink idref="bib38" id="ref72">38</reflink>]; [<reflink idref="bib94" id="ref73">94</reflink>]), placing events, people, and ideas into a wider context of interpretability ([<reflink idref="bib128" id="ref74">128</reflink>]; [<reflink idref="bib42" id="ref75">42</reflink>]; [<reflink idref="bib34" id="ref76">34</reflink>]; [<reflink idref="bib8" id="ref77">8</reflink>]).</p> <p>Since we estimate changes in cultural associations and delineate periods during which associations measurably differed, our computational approach adds scale to the qualitative analysis of "turning points" in collective meaning-making ([<reflink idref="bib122" id="ref78">122</reflink>]; [<reflink idref="bib1" id="ref79">1</reflink>], [<reflink idref="bib2" id="ref80">2</reflink>]; [<reflink idref="bib136" id="ref81">136</reflink>]). It further lends a broader empirical foundation to the casing of timelines than the narrative accounts usually heralded in the historical social sciences ([<reflink idref="bib52" id="ref82">52</reflink>]; [<reflink idref="bib70" id="ref83">70</reflink>]; [<reflink idref="bib18" id="ref84">18</reflink>]).</p> <p>In the following, we provide a brief primer on frames of interpretation and turning points in media discourse, and we introduce the Swedish case study in relation to earlier large-scale studies of newspaper content. We then turn to the method itself and describe its implementation as a means of estimating predefined topics and their relations to one another over time. We present results for the Swedish newspaper corpus that highlight the interpretability of model outputs. In the concluding section, we discuss our insights into the Swedish media coverage of immigration over the past 75 years, and we ponder the degree to which text measures, drawn for example from the mainstream media as in our case, provide social sensors that can help us learn about trends in contemporary societies.</p> <hd id="AN0190929150-3">Frames and Turning Points</hd> <p>Frames concern how information is conveyed in communication, and how specific interpretations are promoted by relating one concept to other concepts, thereby linking new information to existing ideas and previous experiences ([<reflink idref="bib60" id="ref85">60</reflink>]; [<reflink idref="bib50" id="ref86">50</reflink>]; [<reflink idref="bib119" id="ref87">119</reflink>]; [<reflink idref="bib112" id="ref88">112</reflink>]). As such, frames are "interpretive packages" ([<reflink idref="bib60" id="ref89">60</reflink>]) that evoke particular perspectives and problem definitions through which objects in the social world can be seen and understood ([<reflink idref="bib140" id="ref90">140</reflink>]; [<reflink idref="bib59" id="ref91">59</reflink>]; [<reflink idref="bib19" id="ref92">19</reflink>]). Immigration, for example, might be interpreted, among others, through a security frame or an economic frame. Individuals may have opposing opinions on immigration (e.g., "immigrants provide necessary labor" and "immigrants take our jobs"), but they can still agree to interpret immigration through a similar lens (e.g., the economy). Taken together, frames provide the cognitive contexts that speak to and activate the learned categories of individuals' cognition ([<reflink idref="bib93" id="ref93">93</reflink>]; [<reflink idref="bib142" id="ref94">142</reflink>]; [<reflink idref="bib76" id="ref95">76</reflink>]; [<reflink idref="bib34" id="ref96">34</reflink>]), and they organize cognition at a higher order of abstraction than do opinions, attitudes, or values ([<reflink idref="bib42" id="ref97">42</reflink>]; [<reflink idref="bib65" id="ref98">65</reflink>]; [<reflink idref="bib100" id="ref99">100</reflink>]).</p> <p>In our application, we focus on how immigration has been framed in national news media, exploring the interpretations of immigration formulated by journalists and editors. In line with the idea that an interpretative frame can be viewed as an associative pattern, we operationalize media frames as associations between a focal theme and other topics. Media frames that frequently co-occur with the focal issue represent prominent logics for the issue's interpretation. For example, one frame may connect immigration with issues of religion in order to highlight the cultural differences between natives and migrants, while another may connect immigration with party politics in order to promote a politicized perspective on immigration. The composition of salient frames at a certain time point aggregates into what we refer to as the shared interpretation of immigration communicated by the media.</p> <p>In contests over sovereignty in interpretation ([<reflink idref="bib130" id="ref100">130</reflink>]; [<reflink idref="bib59" id="ref101">59</reflink>]; [<reflink idref="bib19" id="ref102">19</reflink>]), entrepreneurs of meaning—such as governments, political parties, advocacy groups, and media outlets themselves—are keen to obtain ownership of salient issues and to influence their shared interpretations ([<reflink idref="bib4" id="ref103">4</reflink>]; [<reflink idref="bib111" id="ref104">111</reflink>]; [<reflink idref="bib134" id="ref105">134</reflink>]; [<reflink idref="bib55" id="ref106">55</reflink>]; [<reflink idref="bib13" id="ref107">13</reflink>]). But how do publicly available interpretations change? Influential social science theorizing refers to "turning points" that constitute breaks with routine practices of meaning-making ([<reflink idref="bib122" id="ref108">122</reflink>]; [<reflink idref="bib1" id="ref109">1</reflink>]; [<reflink idref="bib136" id="ref110">136</reflink>]). Turning points take shape in "unsettled times" ([<reflink idref="bib130" id="ref111">130</reflink>]) or "periods of rupture" ([<reflink idref="bib136" id="ref112">136</reflink>]) in which sequences of events occur that imply thresholds and shifts that are recognizable to contemporaries. In retrospect, we give names to these ruptures because they bring with them a series of occurrences that challenge established interpretations and "durably transforms previous structures and practices" ([<reflink idref="bib122" id="ref113">122</reflink>]).</p> <p>We use the concept of turning points that are grounded in, and operative on, publicly available interpretations to partition Sweden's immigration discourse into recognizable eras. We estimate annual salience shifts in the composition of dominant frames over time to identify breakpoints in the media's framing of immigration and to parse discursive periods during which meaning-making measurably differed.</p> <hd id="AN0190929150-4">The Swedish Newspaper Corpus in Context</hd> <p>The Swedish Newspaper Corpus 1945–2019, digitized by the National Library of Sweden ([<reflink idref="bib29" id="ref114">29</reflink>]), contains 75 years of journalistic content from the country's four largest newspapers <emph>Aftonbladet</emph>, <emph>Dagens Nyheter</emph>, <emph>Expressen</emph>, and <emph>Svenska Dagbladet</emph>. The corpus allows for a macroscopic analysis of the Swedish migration discourse as reflected in the mainstream media, dating back to the time when mass immigration to Sweden started. Sweden entered Europe's post-war reconstruction period as a neutral country without an influential colonial history and with an ethnically homogeneous population of 6.6 million. In the decades that followed, Sweden received labor migrants and, increasingly, refugees at an average annual rate of 0.6% of the population ([<reflink idref="bib125" id="ref115">125</reflink>]). Figure 1A shows the number of immigrants arriving in Sweden during the observation period. Today, 20% of the 10.3 million Swedes are foreign born ([<reflink idref="bib124" id="ref116">124</reflink>]).</p> <p>The news articles we study represent a broad mixture of different formats and political orientations (see Table 1). Newspapers divide their content into multiple stand-alone sections, e.g., op-eds, domestic politics, world news, culture, sports, and TV listings. We restrict our analysis to the front sections of each newspaper. We believe these sections contribute most to meaning-making in newspapers. Using the front sections leaves us with 29.3 million documents and 1.6 billion words after removing rare words and documents shorter than 15 words. The corpus consists of text blocks, i.e., units of cohesive text identified in the segmentation procedure during digitization. The segmentation relies on a rule-based approach curated by the Swedish National Library (using the software Zissor with ABBYY as the optical character recognition engine); there are different segmentation rules for each newspaper that are updated when newspaper layouts change ([<reflink idref="bib41" id="ref117">41</reflink>]). We use each text block as a document. Previous research ([<reflink idref="bib78" id="ref118">78</reflink>]) has shown that an article is commonly captured by multiple text blocks and, importantly, that only 16% of text blocks contain content from more than one article. See Supplemental Material Section S1 in the Appendix for more details on corpus creation.</p> <p>Table 1. Corpus Description, 1945–2019.</p> <p>Graph</p> <p> <ephtml> <table><colgroup><col align="left" /><col align="left" /><col align="left" /><col align="left" /><col align="left" /></colgroup><thead><tr><th align="left" /><th align="left">Aftonbladet</th><th align="left">Dagens Nyheter</th><th align="left">Expressen</th><th align="left">Svenska Dagbladet</th></tr></thead><tbody><tr><td>Newspaper type</td><td>Tabloid</td><td>Broadsheet</td><td>Tabloid</td><td>Broadsheet</td></tr><tr><td>Political leaning</td><td>Left</td><td>Moderate</td><td>Moderate</td><td>Right</td></tr><tr><td>Founding year</td><td>1830</td><td>1864</td><td>1944</td><td>1884</td></tr><tr><td>Avg. daily paid circulation</td><td>343,595</td><td>377,870</td><td>417,653</td><td>166,426</td></tr><tr><td># documents (in millions)</td><td>7.20</td><td>6.86</td><td>7.89</td><td>7.36</td></tr><tr><td>Tokens (in millions)</td><td>338.5</td><td>427.9</td><td>338.8</td><td>455.7</td></tr><tr><td>Avg. # tokens per doc</td><td>47.3</td><td>44.0</td><td>61.1</td><td>44.1</td></tr><tr><td># Immigration-rich docs</td><td>86,070</td><td>117,876</td><td>90,261</td><td>112,844</td></tr></tbody></table> </ephtml> </p> <p>1 <emph>Note:</emph> Average daily paid circulation refers to 1945–2018, tokens refers to number of words, and we classified documents as immigration-rich if at least 2.5% of its tokens belong to the estimated immigration topic.</p> <p>By comparison with earlier computational studies of archival text that have described national conversations based on sets of <emph>political speeches</emph> ([<reflink idref="bib115" id="ref119">115</reflink>]; [<reflink idref="bib14" id="ref120">14</reflink>]; [<reflink idref="bib58" id="ref121">58</reflink>]; [<reflink idref="bib33" id="ref122">33</reflink>]), the extreme breadth of the newspaper archive (<reflink idref="bib106" id="ref123">106</reflink>,000 daily issues in total) permits us to focus on the national conversation about one particular issue, immigration, with high granularity. In relation to the <emph>newspaper corpora</emph> studied in prior immigration-related research, our data set is much larger. As a comparison, [<reflink idref="bib46" id="ref124">46</reflink>] searched for manually selected keywords and found an increase in the attention focused on immigration during the period of the 2015 European "refugee crisis" in Germany, Hungary, Poland, Spain, Sweden, and the UK (<reflink idref="bib102" id="ref125">102</reflink>,000 articles, 2003–2017); with regard to Sweden, the study reported that most found keywords centered around security and welfare issues. [<reflink idref="bib68" id="ref126">68</reflink>] used principal component analysis to analyze data based on 89 predefined immigration-related words in Austrian newspapers (<reflink idref="bib10" id="ref127">10</reflink>,000 articles, 2015) and found media frames focused on security and economic issues. A lexicon-based sentiment analysis of immigration-related headlines in 850,000 articles from UK newspapers (2001–2012) found negative connotations and problem frames, particularly in news reporting on Muslim immigrants ([<reflink idref="bib23" id="ref128">23</reflink>]). Based on the original topic modeling framework, [<reflink idref="bib74" id="ref129">74</reflink>] explored framing during the "refugee crisis" in 24 newspapers from Germany, Hungary, Spain, Sweden, and the UK (<reflink idref="bib130" id="ref130">130</reflink>,000 articles, 2015–2016), and found a stronger humanitarian framing of immigration in Sweden than in the other European countries. Using a structural topic model, [<reflink idref="bib39" id="ref131">39</reflink>] showed that, in Germany, print reporting perpetuated a more diverse set of frames of the "refugee crisis" than online reporting (<reflink idref="bib32" id="ref132">32</reflink>,000 articles, 2015–2017).</p> <p>While they have been innovative and carefully implemented, previous topic-model studies have relied exclusively on an inductive operationalization of meaningful frames that were detected as topics in articles identified as having a focus on immigration based on a keyword search. The inferred topics, and the sociological concepts they may represent, have been interpreted post hoc, after seeing the model outputs. In this article, we argue that this practice invites researchers to adapt the boundaries of theoretical constructs on the basis of model outputs rather than on what is suggested by theory. Because topics inferred by unsupervised topic models differ each time a model is estimated, this could create a situation in which the conceptualization of a theoretical construct changes with each model run. In our use case, a topic model may capture different aspects of the "immigration discourse" with each re-run. The use of seed words to anchor an immigration topic stabilizes inferences across model estimations. As we explain in the next section, the seeded topic model improves both replicability and interpretability and combines improvements in transparency with a more theoretically informed approach to detecting topics and topical associations.</p> <hd id="AN0190929150-5">Methods</hd> <p>For many in the social sciences, computational text analysis comes in two variants: supervised or unsupervised. Supervised methods rest on the researcher's access to labels for meaning structures in text data, such as categories and a coding scheme, and then extrapolate these labels to unseen texts ([<reflink idref="bib107" id="ref133">107</reflink>]; [<reflink idref="bib37" id="ref134">37</reflink>]; [<reflink idref="bib92" id="ref135">92</reflink>]; [<reflink idref="bib45" id="ref136">45</reflink>]). By contrast, unsupervised methods infer information about language patterns, such as co-occurrences of words in documents, without drawing on predefined categories or coding schemes. A growing number of studies are using unsupervised methods to describe the cultural meanings of sociological concepts—such as class ([<reflink idref="bib87" id="ref137">87</reflink>]), gender ([<reflink idref="bib61" id="ref138">61</reflink>]), race ([<reflink idref="bib106" id="ref139">106</reflink>]), stigma ([<reflink idref="bib20" id="ref140">20</reflink>]), and art ([<reflink idref="bib44" id="ref141">44</reflink>]). Unsupervised methods rely on algorithms that either trace the meaning of individual words—for word embedding models in recent sociological research see [<reflink idref="bib87" id="ref142">87</reflink>]; [<reflink idref="bib107" id="ref143">107</reflink>]; [<reflink idref="bib27" id="ref144">27</reflink>]; [<reflink idref="bib135" id="ref145">135</reflink>]; [<reflink idref="bib20" id="ref146">20</reflink>]—or on algorithms that identify thematic structures in ensembles of text—for topic models see, e.g., [<reflink idref="bib44" id="ref147">44</reflink>]; [<reflink idref="bib82" id="ref148">82</reflink>]; [<reflink idref="bib25" id="ref149">25</reflink>]; [<reflink idref="bib69" id="ref150">69</reflink>].</p> <p>Topic models or, more specifically, models based on Latent Dirichlet Allocation (LDA, [<reflink idref="bib22" id="ref151">22</reflink>]) represent an important class of unsupervised methods that inductively detect themes by learning the topics that are present in a document and the words that best describe them. LDA represents a generative probabilistic process that treats each document as a bag of words from which each word (token) is randomly drawn from a mixture of topics present in the document. The model then assigns each word in a document to a topic, allowing the same word to belong to various topics to a differing degree. Each topic, in turn, is a low entropy distribution over words that tend to co-occur. This graded membership property aligns closely with our analytical aim of determining which co-occurring topics are most relevant for describing the shared interpretation (or framing) of an issue.</p> <p>As was mentioned above, unsupervised methods quantify what would otherwise be inaccessible, making the interpretive process that is always an important part of text analysis more transparent and systematic. However, unsupervised methods require post hoc operations to connect the model output to meaningful sociological concepts. Word embedding models, such as the one used by [<reflink idref="bib87" id="ref152">87</reflink>], rely on vector algebra and focus on a set of manually selected keywords in order to identify interpretable dimensions of a concept. In applications that use LDA models, the standard practice employed to achieve interpretability involves qualitatively inspecting each inferred topic and making iterative decisions as to which topics are meaningful and relevant for inclusion in the final analysis (e.g., [<reflink idref="bib133" id="ref153">133</reflink>]; [<reflink idref="bib82" id="ref154">82</reflink>]; [<reflink idref="bib104" id="ref155">104</reflink>]; [<reflink idref="bib39" id="ref156">39</reflink>]). As a consequence, "sociologists using text as data must make a dizzying number of decisions about what information to extract and how to answer their research question" ([<reflink idref="bib103" id="ref157">103</reflink>]: 139). While they are important as a result of their exploratory potential and for their links to existing qualitative methodologies, iterative mixed-method approaches such as "computational grounded theory" ([<reflink idref="bib16" id="ref158">16</reflink>]; [<reflink idref="bib104" id="ref159">104</reflink>]), or "computational hermeneutics" ([<reflink idref="bib101" id="ref160">101</reflink>]) remain reliant on making sense of the output after a model is learned ([<reflink idref="bib66" id="ref161">66</reflink>]; [<reflink idref="bib103" id="ref162">103</reflink>]; [<reflink idref="bib109" id="ref163">109</reflink>]). Because the inductive finding of relevant sociological concepts places researchers at risk of also finding seemingly meaningful interpretations where none actually exist, calls have been made for the development and use of intrinsically interpretable models ([<reflink idref="bib77" id="ref164">77</reflink>]; [<reflink idref="bib114" id="ref165">114</reflink>]; [<reflink idref="bib96" id="ref166">96</reflink>]).</p> <hd id="AN0190929150-6">Seeded Topic Model</hd> <p>We suggest an extension to the original topic model, the <emph>seeded topic model</emph> ([<reflink idref="bib95" id="ref167">95</reflink>]; [<reflink idref="bib6" id="ref168">6</reflink>]; [<reflink idref="bib79" id="ref169">79</reflink>]; [<reflink idref="bib98" id="ref170">98</reflink>]; [<reflink idref="bib54" id="ref171">54</reflink>]; [<reflink idref="bib53" id="ref172">53</reflink>]; [<reflink idref="bib139" id="ref173">139</reflink>]; [<reflink idref="bib137" id="ref174">137</reflink>]), as a middle ground between supervised and unsupervised approaches. The fully unsupervised nature of the original topic model does not guarantee that the topics identified will meaningfully reflect concepts of interest. By applying a simple extension of the original LDA framework, we aim to measure specific topics that we believe a priori to exist in a corpus. Seed words—a collection of words that the researcher believes represent topics of interest prior to seeing model outputs—guide the model toward the topics of interest. This extension makes the decisions that must be made during the topic definition procedure more transparent and reproducible.</p> <p>Allowing researchers to seed topics on the basis of existing domain knowledge constitutes an important step toward a more deductive, insight-oriented approach to modeling that is both less reliant on post hoc interpretations of model outputs (as are required in the unsupervised approach) and not restricted to a priori manually annotated categories or manually selected keywords (as are required in the supervised approach). Instead, the seed words help form topics around predefined concepts, names, or ideas, while at the same time utilizing the functionality of LDA to find new associations in text data based on word co-occurrences.</p> <p>It is important to note that there is a crucial difference between the seed word strategy used here and the use of keyword searches to identify meaningful topics and identify documents that "belong" to or are most salient in relation to specific topics. Keyword search involves a deterministic procedure that requires detailed knowledge of the configuration of topics before models are run. Previous research shows that even domain experts perform poorly in identifying the keywords that are most relevant for capturing specific concepts ([<reflink idref="bib84" id="ref175">84</reflink>]). This results in biased text measures and differences in substantive conclusions. In contrast, seed words are only the starting point from which a model proceeds to learn which words go together. The unsupervised part of the algorithm will expand upon the original list of seed words in crystallizing topics of interest. We discuss the model and its implementation in detail in Supplemental Material Sections S2 and S4.</p> <p>Previous contributions that have introduced seeded topic models using informative priors on preselected seed words ([<reflink idref="bib95" id="ref176">95</reflink>]; [<reflink idref="bib79" id="ref177">79</reflink>]; [<reflink idref="bib54" id="ref178">54</reflink>]; [<reflink idref="bib53" id="ref179">53</reflink>]; [<reflink idref="bib137" id="ref180">137</reflink>]) relied on the standard collapsed Gibbs sampler as described in [<reflink idref="bib71" id="ref181">71</reflink>], limiting their applicability to large-scale data. By increasing scalability, and by using the model as a method for measuring sociological concepts, our implementation extends in important ways to the existing methodological literature. Seeded topic models that are implemented via highly scalable parallelizable sampling ([<reflink idref="bib97" id="ref182">97</reflink>]) permit the extraction of predefined topics and their associations with other themes from massive text data. Even though we have used this highly specialized algorithm, the model estimation process based on our vast corpus took 4.5 days using a machine with 360 GB RAM and 32 cores.[<reflink idref="bib8" id="ref183">8</reflink>] Without the specialized algorithm, our analysis would not have been possible. See Authors' Note for information about the code and data that reproduces our analysis.</p> <hd id="AN0190929150-7">Seeding the Immigration Topic</hd> <p>Seeded topic models rely on Bayesian informative priors to decide which topics the algorithm should identify. In practice, informative priors are placed on the topic-word distribution such that a word used to guide the model has a zero probability of belonging to any other topic than the one for which it is a seed word. The seed words one uses to guide the model should be highly unlikely to occur in contexts outside the topic of interest—in our case, immigration. We use five types of words that are highly unlikely to be used in texts that do not relate to immigration: (i) names of immigration laws, (ii) titles of ministers responsible for immigration, (iii) names of agencies responsible for immigration, (iv) terms referring to related policy areas (e.g., integration policy), and (v) terms referring to different types of immigration (e.g., labor migration). Moving beyond the predefined seed words, the model learns other meaningful words that define the topic of interest. Among these, we find words that relate, for example, to race and ethnicity, such as names and slurs associated with minorities in Sweden (see Supplemental Material Section S7 for details). Our choice of seed words allows us to capture different dimensions of the immigration issue including, for example, discourses on different types of migrants such as refugees, asylum seekers, and labor migrants.</p> <p>Seeding also allows the model to be infused with a priori knowledge of language change. Conceptually, actors, meanings, and contexts change over time, which implies that no single measure of discourse may be appropriate over long timescales. Lexical shifts and the changing meanings of social categorizations are critical challenges to the computational analysis of historical text ([<reflink idref="bib10" id="ref184">10</reflink>]; [<reflink idref="bib115" id="ref185">115</reflink>]; [<reflink idref="bib27" id="ref186">27</reflink>]; [<reflink idref="bib135" id="ref187">135</reflink>]). The word "immigrant," for example, had rarely been used prior to the 1970s ("foreigner"" was the term of the day), and concepts such as "family reunification" and "unaccompanied minor" first appeared in the 1970s and 1990s, respectively. We implement the semi-supervised seeded topic model using domain knowledge to guide the model estimation over language changes that introduce new words to discuss the same topic. Topic seeding is best equipped to handle this type of language change that, in a standard modeling approach, would lead to the splitting of a theme into various topics. A previous name of the current Migration Agency (<emph>Migrationsverket</emph>), for example, was <emph>Statens Invandrarverk</emph>, and—by placing a prior on multiple words to inform the model that they belong to the same topic—we allow the immigration topic to crystallize around both these names (see Supplemental Material Section S2 for details on the seeding procedure and S3 for a full list of the seed words employed).</p> <p>We measure the salience of the immigration topic (Figure 1B) by calculating the proportion of words in all documents that are estimated to belong to the seeded immigration topic each week.</p> <hd id="AN0190929150-8">Co-occurring Topics as Interpretative Frames</hd> <p>The seeding strategy also permits us to define a set of additional topics that meaningfully co-occur with immigration and that we wish to flesh out from the media discourse as potential interpretations of immigration. We operationalize prominent media frames via the focal topic's associations with other frequently co-occurring topics, and we interpret these relationships as culturally shared associations between concepts. This implies that we abstract away from word-level analyses, such as keyword in context, and instead, focus on how topics (rather than words) co-occur. In our analysis, it is not crucial whether the word "immigrant" is discussed alongside words such as "workplace" or "murder"; what matters instead is the association of the immigration topic with the economy topic and the crime topic, respectively.</p> <p>We have predefined co-occurring topics on the basis of existing research on the common themes found in European news reporting on immigration ([<reflink idref="bib86" id="ref188">86</reflink>]; [<reflink idref="bib68" id="ref189">68</reflink>]; [<reflink idref="bib47" id="ref190">47</reflink>]; [<reflink idref="bib74" id="ref191">74</reflink>]) and research documenting Sweden's immigration history ([<reflink idref="bib62" id="ref192">62</reflink>]; [<reflink idref="bib32" id="ref193">32</reflink>]; [<reflink idref="bib88" id="ref194">88</reflink>]; [<reflink idref="bib3" id="ref195">3</reflink>]). Based on this research, we expect five dominant frames—"culture," "economy," "human rights," "politics," and "security"—to co-occur with discussions of immigration. We capture each frame that represents a known interpretation of immigration by seeding several topics (Table 2). We seed multiple topics to capture each frame such that an interpretative frame can be viewed as a "supratopic" covering different dimensions of a related issue. For example, "crime," which constitutes part of the security frame, is a highly diverse issue that includes a focus on offenses such as burglary, narcotics, murder, and sexual assault, to name only a few. To capture the many different crime-related aspects, we seed four different topics using the same set of seed words (see Supplemental Material Sections S2 and S3 for details). By seeding different topics with the same words we allow the model to crystallize around particular dimensions of a broader theme of interest in separate topics without explicitly having to choose these dimensions a priori. For example, while we know that "crime" is a multi-dimensional theme in our corpus (e.g., news covering different types of crimes at different phases in an investigation will be defined by different vocabularies), we let the model inductively find which type and aspect of crime should form a particular topic. One seeded topic then becomes a drug topic, for example, one becomes a homicide topic, and so on, and these are then combined into the larger topic of crime. This procedure allows the model to identify more specialized topics which, depending on the research question, can then be combined into a well-defined larger topic. We set the number of topics to 1,000, allowing for a combination of seeded and unseeded topics in the model.</p> <p>Table 2. Seeded Topics Reflecting Frames of Immigration.</p> <p>Graph</p> <p> <ephtml> <table><colgroup><col align="left" /><col align="left" /></colgroup><thead><tr><th align="left">Interpretative frame</th><th align="left">Seeded topics</th></tr></thead><tbody><tr><td>Culture</td><td>Diversity perspectives, language, national identity, religion</td></tr><tr><td>Economy</td><td>Labor market, public finance, health care, housing, education</td></tr><tr><td>Human rights</td><td>Discrimination, family, human rights, racism</td></tr><tr><td>Politics</td><td>Political parties, European Union</td></tr><tr><td>Security</td><td>Crime, terrorism</td></tr></tbody></table> </ephtml> </p> <p>2 <emph>Note:</emph> We capture each interpretative frame as a supratopic composed of several specialized seeded topics that frequently co-occur with the immigration topic.</p> <p>Unlike previous research, we quantify interpretative frames using co-occurrence frequencies for different topics that are inferred from the same topic model that simultaneously measures the focal topic of interest. We measure the importance of each frame (Figure 2) in terms of the proportion of words that belong to the respective seeded topics in immigration-rich documents printed in the newspapers (see Supplemental Material Section S5).</p> <hd id="AN0190929150-9">Document Inclusion, Sensitivity, and Validation</hd> <p>The analysis includes all documents that we classified as "immigration-rich" if at least 2.5% of its tokens were estimated to belong to the immigration topic (i.e., <ephtml> <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><mo>≥</mo></math> </ephtml> 25 times more than the a priori expected proportion, which is 1/1,000 or 0.1%, where 1,000 represents the number of topics used in the model). One could argue that if a news item contains only a single token related to immigration, it should belong to the immigration topic. However, often immigration or immigrants are mentioned only once in an article, for example, as one of many policy areas. To establish a useful threshold for document inclusion for the entire observation period, we have read samples of the material at different threshold values and evaluated when the topic of immigration indeed was central to the news items; we settled with what we considered a good trade-off between keeping a reasonable number of documents and keeping the analysis centered around the focal theme of interest. Our main results are robust to threshold choice (see Supplemental Material Section S6).</p> <p>We report on model diagnostics and sensitivity analyses in Supplemental Material Section S6, including (i) a test for model convergence as well as model re-runs (ii) using alternative numbers of topics (<reflink idref="bib950" id="ref196">950</reflink>, 1500), (iii) using each newspaper corpus separately, (iv) using alternative thresholds for document inclusion (1%, 4%, and 5%), and (v) using random subsets of 90%, 80%, and 70% of the original set of seed words.</p> <p>In Supplemental Material Section S7, we report on validation strategies for topic definition that evaluate the degree to which a seeded topic captures the concept of interest. Those strategies include (i) a comparison of documents classified as being about immigration with a manual annotation of a sample of documents, (ii) an inspection of the tokens that the algorithm learned to belong to the topic, and (iii) an analysis of influential immigration-related events based on high temporal resolution data. The latter analysis tests whether the model picks up on immediate changes in newspapers' framing following such events. We focus on events for which clear theoretical expectations exist about their likely impact on the salience of a particular seeded frame. An Islamist terrorist attack, for example, may serve to re-frame Islam as a violent ideology, leading to revisions of the current security-related interpretations of immigration ([<reflink idref="bib67" id="ref197">67</reflink>]; [<reflink idref="bib91" id="ref198">91</reflink>]; [<reflink idref="bib121" id="ref199">121</reflink>]). In this case, we would expect the relative salience of the security-related frame to increase in the weeks following the attack—indicating valid topic seeding.</p> <hd id="AN0190929150-10">Parsing Discursive Eras</hd> <p>We use a Bayesian Gaussian change-point model ([<reflink idref="bib15" id="ref200">15</reflink>]; [<reflink idref="bib51" id="ref201">51</reflink>]) to detect shifts over time in the salience of single frames as well as in the relative composition of salient frames. We interpret salience shifts as breakpoints in the media's framing of immigration. The model assumes that a time series of frame salience can be partitioned into an unknown number of periods, with each period having a constant mean reflecting a "new probability regime" ([<reflink idref="bib2" id="ref202">2</reflink>]). We estimate two kinds of specifications of the change-point model: (i) A univariate specification that tests for breakpoints in the salience of each of the five seeded frames separately, and (ii) a combined multivariate specification that tests for breakpoints in the relative composition of all five seeded frames. We are particularly interested in the multivariate model results. The composition of salient frames at a certain point in time aggregates into what we refer to as the shared interpretation of immigration communicated by the media. A shared interpretation describes a set of frames that are available to the public at a given point in time to make sense of an issue. The estimates of the change-point model provide an empirical foundation for the parsing of discursive periods ([<reflink idref="bib115" id="ref203">115</reflink>]) in which meaning-making measurably differed.</p> <p>The model, regardless of its specification as univariate or multivariate, estimates the posterior probability that each year constitutes a change point, delimiting sharp differences in the means of the respective time series in adjacent periods. That is, the model estimates the likelihood of a significant shift has occurred in the way the newspapers frame immigration in each one of the 75 years included in the data. We use a standard implementation of the model ([<reflink idref="bib51" id="ref204">51</reflink>]), and we set the model's hyperparameter <ephtml> <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><mi>γ</mi></math> </ephtml> to its default value 0.3, which reflects the absence of a priori knowledge as to how many change points the model should identify. We interpret years that have a multivariate posterior change-point probability equal to or larger than 50% as consequential turning points that mark the beginning of a new era of discourse. For most years, the estimated change-point probabilities are close to 0 (see Figure 2B). Our choice of a <ephtml> <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><mo>≥</mo></math> </ephtml> 50%-threshold is non-exclusive and merely requires a change-point year to have a higher likelihood of representing a turning point than of not doing so.</p> <hd id="AN0190929150-11">Results</hd> <p>Figure 1B traces the relative salience of the seeded immigration topic in Sweden's newspaper corpus from 1945 to 2019. The blue line represents the annual average salience of immigration and shows how important this issue was in the media. Prior to the first major peak in the number of immigrants in 1970, the level of media attention focused on immigration was low. On average, 0.05% of tokens in the newspapers referred to it. By contrast, from 2015 to 2019, the salience of immigration as a news issue reached 0.37%, a 7.4-fold increase vis-à-vis the first period.[<reflink idref="bib9" id="ref205">9</reflink>] Both the actual number of immigrants arriving in Sweden (Figure 1A) and the importance of the immigration topic in newspaper coverage (Figure 1B) reached unprecedented heights in 2015. The year of the European "refugee crisis" represents a clear disruption in terms of the salience of immigration. Salience also spiked during 1969–1970, which were years of high labor migration, and during the armed conflicts in Iraq (1990–1991, 2003–2011) and Bosnia (1992–1995), which resulted in many refugees arriving in Sweden. The linear correlation between the annual number of newly arrived immigrants and the salience of the immigration topic is 0.82 for the entire period examined; this correlation increases to 0.93 from 2010 to 2019. These results show that the attention of the media shifts to immigration in periods of peak influx, particularly if immigrant numbers increase rapidly.</p> <p>Graph: Figure 1. (A) Annual number of immigrants (in thousands) arriving in Sweden. (B) Annual average salience of the immigration topic in Sweden's four major newspapers (blue line). Data points represent the percentage of all words in a given week's news articles that are estimated to belong to the immigration topic.</p> <p>Graph: Figure 2. (A) The evolution of media frames of immigration. The Y-axis represents the salience proportion of the five seeded topics that frequently and meaningfully co-occur with the "immigration" topic. The salience proportions of these five frames sum to 1 in each year, and trajectories represent 5-year moving averages. The dashed vertical lines indicate the beginning and ending of inferred eras. (B) The likely turning points in the framing of immigration. Colored trajectories represent the univariate posterior distribution of potential change points per media frame. The black trajectory represents the multivariate posterior distribution of potential change points in the composition of frames, which constitutes our measure of the shared interpretation of immigration. The background colors highlight the seven periods implied by the model.</p> <p>Relative topic salience provides an important measure of <emph>what</emph> has been discussed at different times. It does not reveal, however, <emph>how</emph> issues have been covered and thought about. To answer the second question, we trace the salience of the different immigration frames shared by the media. Figure 2A maps the co-evolution of different interpretations of immigration, plotting the salience proportion of each of the five seeded frames over 75 years. Figure 2B provides estimates of likely change points for each frame (colored lines for univariate models) and in the composition of the different frames (black line for the multivariate model). When parsing discursive eras, our primary interest lies in the detection of measurable shifts in the composition of frames. From the multivariate change-point model, we infer seven recognizable eras (with an average length of 10.7 <ephtml> <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><mo>±</mo></math> </ephtml> 2.6 years) between which the media's interpretation of immigration measurably differed.</p> <p></p> <hd1 id="AN0190929150-12"> • Period 1, 1945–1954. </hd1> <p></p> <ulist> <item> Immediately following the war, the media discourse portrayed immigration mainly from a humanitarian perspective (Figure 2A). As this association became less prominent, we find likely univariate change points in the humanitarian interpretation and, to a lesser degree, in the cultural interpretation of immigration during the late 1940s and early 1950s (Figure 2B).</item> <p></p> </ulist> <hd1 id="AN0190929150-13"> • Period 2, 1955–1964. </hd1> <p></p> <ulist> <item> We estimate the first turning point, with a 96% posterior probability in the multivariate model, as occurring in 1955. This year was characterized by a surge of labor migration to Sweden. At the end of the second period, in the mid-1960s, the association between immigration and the economy had caught up with the humanitarian perspective. Both inferred periods 1 and 2 of post-war immigration align with historical accounts that partition Sweden's immigration history on the basis of immigration flows and policy changes ([<reflink idref="bib62" id="ref206">62</reflink>]; [<reflink idref="bib32" id="ref207">32</reflink>]; [<reflink idref="bib88" id="ref208">88</reflink>]; [<reflink idref="bib89" id="ref209">89</reflink>]; [<reflink idref="bib3" id="ref210">3</reflink>]; [<reflink idref="bib129" id="ref211">129</reflink>]).</item> <p></p> </ulist> <hd1 id="AN0190929150-14"> • Period 3, 1965–1973. </hd1> <p></p> <ulist> <item> Our model identifies a period of rupture in the mid-1960s—which coincides with the first discussions of multiculturalism (1964) and investigations into the costs of immigration for the expanding welfare state (1965). In the immediate aftermath of these discussions and investigations, the dominant interpretation of immigration became economic, and a cultural framing gained importance. These ruptures, with multivariate change-point probabilities of 95% in 1964 and 70% in 1966, mark the beginning of a long era of relative stability in the associative patterns. Rapid economic growth and the political hegemony of the Social Democratic party resulted in the roll-out of the welfare state, which was extended in 1968 to cover migrant workers, and a newly established migration board was tasked with overseeing their employability. Again, the inferred period is largely in alignment with the narrative presented by historical social science ([<reflink idref="bib32" id="ref212">32</reflink>]; [<reflink idref="bib88" id="ref213">88</reflink>]).</item> <p></p> </ulist> <hd1 id="AN0190929150-15"> • Period 4, 1974–1985. </hd1> <p></p> <ulist> <item> We infer turning points in 1974 (70%) and 1986 (77%). Labor migration declined during the economic crises of the 1970s and was increasingly replaced by immigration involving non-European refugees. The univariate breakpoint for culture in 1984 coincides with the arrival of increasing numbers of non-Western refugees, discussions of legislation against ethnic discrimination, and increased efforts focused on integration, including family reunification ([<reflink idref="bib32" id="ref214">32</reflink>]; [<reflink idref="bib3" id="ref215">3</reflink>]).</item> <p></p> </ulist> <hd1 id="AN0190929150-16"> • Period 5, 1986–1999. </hd1> <p></p> <ulist> <item> 1986 marks the year in which the Swedish Prime Minister, Olof Palme, was murdered. Spearheaded by Palme's governments (1969–1976, 1982–1986), immigration law had embraced multicultural ideals, affirming diversity and the protection of immigrants' cultural identities. Despite the turning point identified in 1986, the media framing of immigration remained remarkably stable across periods 4 and 5, and we interpret the interval 1974–1999 as representing Sweden's famed era of tolerance ([<reflink idref="bib120" id="ref216">120</reflink>]; [<reflink idref="bib116" id="ref217">116</reflink>]), during which an inert mix of economic, humanitarian, and security-related frames shaped the interpretation of migration for almost a generation. This interpretation weathered economic downturns, peaks in immigration, and Sweden's accession to the EU in 1995, and remained dominant until the end of the 1990s—which is much longer than the historical narrative suggests ([<reflink idref="bib40" id="ref218">40</reflink>]; [<reflink idref="bib32" id="ref219">32</reflink>]; [<reflink idref="bib129" id="ref220">129</reflink>]). At the same time, the turning points we identify in this era are disproportionately driven by an increase in a new, politically polarized understanding of immigration. Notably, this upward trend in the politicization of immigration precedes the electoral success of populist far-right parties and the decline in the Social Democratic consensus that have characterized Swedish policy debates in recent decades ([<reflink idref="bib40" id="ref221">40</reflink>]; [<reflink idref="bib32" id="ref222">32</reflink>]).</item> <p></p> </ulist> <hd1 id="AN0190929150-17"> • Period 6, 2000–2012. </hd1> <p></p> <ulist> <item> Our analysis identifies the year 2000 as a consequential turning point (84%) driven by politicization. This was a year of revisions to immigration law, when the EU started to harmonize its immigration policies in the lead-up to the Schengen agreement (2001), and led to an increase in the number of migrant workers arriving in Sweden from the eastern countries of the EU. We find that a further convergence of media frames and, ultimately, their gradual replacement by politics as the dominant lens through which immigration is viewed, coincided with the populist right Sweden Democrats' entry into parliament in 2010. The Sweden Democrats have since become the country's second-largest party in national elections. Several years are associated with non-zero change-point probabilities for specific frames, but none of these are particularly pronounced and we do not find them to be sufficiently consequential to register in the model as having altered the interpretation of immigration. Throughout this period, and despite the September 11 attacks and the subsequent US-led "war on terror," the association between Swedish immigration and security issues remained flat.</item> <p></p> </ulist> <hd1 id="AN0190929150-18"> • Period 7, 2013–today. </hd1> <p></p> <ulist> <item> The final turning point that we estimate to lie above the 50%-threshold (51%) occurred in 2013. This disruption, which is less clear than those described above, marks the beginning of the most recent discursive era. This period included generous revisions of asylum law. At the same time, the consensual migration politics of past decades, which some have argued cemented an "opinion corridor" of views perceived as socially acceptable ([<reflink idref="bib49" id="ref223">49</reflink>]), were increasingly being criticized in society at large. This period reflects a further politicization of the immigration discourse, a surge in a security-related interpretation, and probably also the end of Sweden's "exceptionalism" ([<reflink idref="bib120" id="ref224">120</reflink>]; [<reflink idref="bib116" id="ref225">116</reflink>]) as regards the country's tolerant approach to immigration. Our results indicate that this reinterpretation of immigration started well before the 2014 general election (in which the Sweden Democrats doubled their number of seats in parliament) and, most importantly, before the 2015 "refugee crisis." Neither of these years was sufficiently consequential to register in our change-point model. Strikingly, we instead see that the 2015 "refugee crisis," which many observers have classified as a watershed in European immigration history, was of little consequence for the ways in which the Swedish media have portrayed immigration.</item> </ulist> <p>In Supplemental Material Section S6, we report these results separately per newspaper. We find that the framing of immigration over time varies little between newspapers of different political orientations or between highbrow broadsheets (<emph>Dagens Nyheter</emph>, <emph>Svenska Dagbladet</emph>) and lowbrow tabloids (<emph>Aftonbladet</emph>, <emph>Expressen</emph>). These separate analyses closely reproduce the findings of the main analysis presented here.</p> <hd id="AN0190929150-19">Discussion</hd> <p>We have argued that the seeded (or constrained) topic model constitutes a promising semi-supervised method—combining both inductive and deductive reasoning—that provides a more replicable and transparent means of measuring meaning in digital text. Semi-supervised methods can improve transparency and replicability by decreasing the number of idiosyncratic decisions made during model implementation. Importantly, the seeded topic model permits a theoretical grounding of the topic definition procedure, because seed words require researchers to be explicit about how concepts are operationalized, and these constraints ensure that the model will identify the same concepts in each model run. This approach represents an advance in relation to concerns about whether computationally identified patterns can provide replicable and interpretable empirical evidence that is relevant to social science research. The seeding procedure allows researchers to tame the unsupervised nature of the topic model by guiding the model in its detection of topics, but without predetermining the full vocabulary associated with the topics identified. We have demonstrated the applicability of one specific algorithm to the task of identifying predefined, sociologically relevant concepts in texts and inferring the associations that exist between these concepts.</p> <p>Model performance should be validated to ensure that the seeded topics represent the concepts of interest, and model validation still requires subjective interpretations of topic quality. To be sure, choosing seed words may be an iterative process, based on interpretations of model outputs and allowing previously unknown patterns to arise from the data. Such iterative processes are essential in most research that employs computational text analysis ([<reflink idref="bib72" id="ref226">72</reflink>]), and as Mohr and colleagues have noted, "there can be no measurement of culture without interpretation" ([<reflink idref="bib100" id="ref227">100</reflink>]: 4). Against this backdrop, we have taken important steps toward a more principled interpretation of topic models. First, identifying both a focal concept and its neighboring topics in a single estimation—instead of first identifying the relevant documents that contain the focal concept and then searching for other concepts within these documents—ensures that the analysis is less reliant on early operationalization decisions. One-step procedures are particularly important for producing reliable measures of meaning-making over long timescales, where they may be affected by language change.</p> <p>Second, seeding facilitates diagnostics of model performance, something that is typically difficult in purely unsupervised settings ([<reflink idref="bib36" id="ref228">36</reflink>]; [<reflink idref="bib144" id="ref229">144</reflink>]). The semi-supervised nature of the model allows us to restrict validation efforts to the seeded topics. This is particularly important because there are currently no standards regarding how topic models should best be evaluated when used in sociological research. In the Appendix (Supplemental Material Section S7), we suggest various measures that will assist in inspecting the quality of seeded topics, and we found a high level of correspondence when we compared a manually coded sample of documents with documents inferred by the model to belong to a seeded topic. Additionally, we have checked the sensitivity of our results regarding the number of topics, seed word selection, and different thresholds for document inclusion (Supplemental Material Section S6).</p> <p>In a supplementary analysis also reported in Supplemental Material Section S7, we provide suggestive evidence that unforeseen and widely recognized events have the capacity to measurably shift the salience of certain media frames. These results illustrate another validation strategy that tests whether the model picks up on shifts in the salience of the frame most closely related to the event in question. The results lend support to the validity of our semi-supervised inference of interpretative frames, and they provide pointers to the immediate response of newspapers to disruptive events. The event-focused analysis of high temporal resolution data also illustrates how—under certain assumptions—latent features of text data can be used as the outcome variable when estimating causal effects ([<reflink idref="bib48" id="ref230">48</reflink>]; [<reflink idref="bib63" id="ref231">63</reflink>]).</p> <p>Of course, seeded topic models also have their own limitations. Current applications of the original topic model focus on discovering previously unknown patterns in text data ([<reflink idref="bib72" id="ref232">72</reflink>]). The seeding of topics places bounds on an open discovery process. One solution (which we followed in our case study) involves allowing for a combination of seeded and unseeded topics in the model such that unexpected signals in the data can still be detected and explored. The applicability of the seeded topic model depends on how well researchers can operationalize a theoretical concept via one or more topics. A seeded topic model can easily identify some concepts, depending on the availability of unique words associated with the theme of interest. Other concepts are nearly impossible to pin down, however. For example, the model will struggle to capture a topic that is mostly defined by polysemic words, i.e., words with different possible meanings. To tackle issues with polysemy, researchers can seed multiple topics with the same words—as we did, for example, for the multifaceted crime topic—and thereby rely on the model to inductively capture their different meanings. While this may solve issues related to polysemy, it also decreases the replicability of the model. Therefore, finding non-polysemic words to crystallize interpretable topics of interest poses an important scope condition and, in some potential use cases, a roadblock to making full use of the seeded topic model. At the same time, however, vague and multifaceted themes that are difficult to identify using a seeded topic model may also present challenges to supervised methods that require human annotation.</p> <p>Large language models (LLMs), which increasingly find their way into social science publications, also blur the line between supervised and unsupervised learning. LLMs have shown great capacity in a vast array of classification tasks ([<reflink idref="bib45" id="ref233">45</reflink>]; [<reflink idref="bib141" id="ref234">141</reflink>]; [<reflink idref="bib27" id="ref235">27</reflink>]; [<reflink idref="bib35" id="ref236">35</reflink>]; [<reflink idref="bib64" id="ref237">64</reflink>]; [<reflink idref="bib132" id="ref238">132</reflink>]), although current models' performance is still under debate (e.g., [<reflink idref="bib108" id="ref239">108</reflink>]; [<reflink idref="bib12" id="ref240">12</reflink>]), especially in classification tasks that require cross-document reasoning as in topic modeling and when texts pertain to a particular place and time as in historical corpora ([<reflink idref="bib145" id="ref241">145</reflink>]). The development of LLMs proceeds at an extremely fast pace. Decreasing costs will open them up for analyses of very large corpora, and ideas of identifying, in principled ways, concepts predefined by the researcher will hopefully guide some of the modeling advances. If researchers find ways to gain more control over labeling, replicability, and transparency ([<reflink idref="bib73" id="ref242">73</reflink>]), this transformative brand of text modeling will be in a good position to develop important alternatives to the seeded topic model.</p> <p>We have applied the seeded topic model to a vast newspaper archive to learn how the issue of immigration has been framed in Swedish newspapers from 1945 to 2019. The storytelling of journalists—their use of interpretative frames to make news events understandable to their audiences—makes newspaper archives a treasure trove for the study of meaning-making over historical timescales. We have operationalized frames as themes that frequently co-occur with the issue of interest, and we have interpreted these relationships as culturally relevant associations between concepts. Hence, we have also studied newspaper coverage as a social sensor of discursive processes ([<reflink idref="bib56" id="ref243">56</reflink>]; [<reflink idref="bib60" id="ref244">60</reflink>]) in which broader interpretations of societal developments and events are generated, negotiated, and revised ([<reflink idref="bib130" id="ref245">130</reflink>]; [<reflink idref="bib30" id="ref246">30</reflink>]; [<reflink idref="bib128" id="ref247">128</reflink>]). Viewing text as a social sensor involves the use of large repositories of digital text to uncover latent observations about the social world and trends in contemporary societies in particular.</p> <p>Some have argued that media content reflects elite discourses and that a media sensor can capture "common cultural patterns, but it cannot observe what is never articulated" ([<reflink idref="bib26" id="ref248">26</reflink>]). We recognize that media-generated perceptions of current events do not equate to the perceptions of the whole population, especially not with regard to polarized "hot" topics and in the age of social media. We have not measured meaning at the individual level, and we have not delineated different "thought communities," although they no doubt exist, particularly in a politicized domain such as immigration. One example would be that different segments of society may have different groups in mind when they think about immigrants ([<reflink idref="bib24" id="ref249">24</reflink>]; [<reflink idref="bib47" id="ref250">47</reflink>]). Still, our case study has demonstrated that vast corpora of the type and scale studied here are likely to contain important evidence of the dominant interpretative frames—in the sense of "common cultural patterns"—that have been used to make sense of societal issues at a certain point in time. We believe that using such sensors may have general implications for sociological research in light of the increasing availability of "found" online data (e.g., [<reflink idref="bib83" id="ref251">83</reflink>]; [<reflink idref="bib117" id="ref252">117</reflink>]; [<reflink idref="bib81" id="ref253">81</reflink>]).</p> <p>We have highlighted the induction of different eras of meaning-making as a potential means of analyzing the output of seeded topic models, offering a refined empirical foundation for the parsing of "discursive periods" during which specific interpretations of an issue are widely shared. Historians often define "eras" of social change on the basis of policy shifts ([<reflink idref="bib52" id="ref254">52</reflink>]), and—for immigration history—many have viewed key revisions of immigration law as turning points demarcating different eras ([<reflink idref="bib3" id="ref255">3</reflink>]; [<reflink idref="bib62" id="ref256">62</reflink>]). However, historical narratives that partition the flow of events into coherent, meaningful sequences ([<reflink idref="bib127" id="ref257">127</reflink>]; [<reflink idref="bib122" id="ref258">122</reflink>]) have been criticized for their lack of explanatory depth and, in particular, for involving a risk that spurious events will be identified as marking the beginning and end of posited periods ([<reflink idref="bib110" id="ref259">110</reflink>]; [<reflink idref="bib70" id="ref260">70</reflink>]). Our study exemplifies that digital archives offer new opportunities for the identification of turning points and for delineating discursive periods on the basis of the ideas expressed by contemporaries ([<reflink idref="bib17" id="ref261">17</reflink>]; [<reflink idref="bib115" id="ref262">115</reflink>]; [<reflink idref="bib61" id="ref263">61</reflink>]).</p> <p>Our measures of media framing are in close alignment with the type of immigration experienced in post-war Sweden until the mid-1970s. The inferred discursive periods match those implied by historical accounts that have partitioned Sweden's immigration history on the basis of policy changes ([<reflink idref="bib3" id="ref264">3</reflink>]; [<reflink idref="bib62" id="ref265">62</reflink>]; [<reflink idref="bib89" id="ref266">89</reflink>]). We found that the texts from the late 1970s and early 1980s best describe the country's signature era of multiculturalism and tolerance toward immigration. Different frames achieved similar salience, indicating a new pluralism in how immigration has been discussed. Weathering economic downturns and peaks in immigration, this era lasted until the end of the 1990s—and thus much longer than historical accounts have suggested ([<reflink idref="bib40" id="ref267">40</reflink>]; [<reflink idref="bib129" id="ref268">129</reflink>]). At the same time, we found that the media began framing immigration as a political issue as early as the mid-1970s—long before anti-immigration platforms started attracting larger audiences and the erosion of the parliamentary consensus on immigration in the mid to late 1980s ([<reflink idref="bib32" id="ref269">32</reflink>]). As the political framing of immigration gained momentum, we were once again able to see a more unidimensional discussion of migration—now as a strongly politicized issue.</p> <p>We have also found that seemingly obvious turning points—such as the economic downturns of the 1970s and 1990s, and the "refugee crisis" of 2015—had few consequences for the frames used by the news media to portray immigration in Sweden. However, the public might frame things differently from the mainstream media, and future research is therefore needed to examine how broader segments of society, e.g., the online public, react to highly publicized events.</p> <p>To conclude, seeded topic modeling provides a means whereby researchers can rely on sociological knowledge when implementing and validating replicable models that make inferences beyond the words on the page. Semi-supervised approaches of this kind could become an important next step toward further improving the work of social scientists in their computational analysis of social data.</p> <hd id="AN0190929150-20">Supplemental Material</hd> <p>Graph: Supplemental material, sj-pdf-1-smr-10.1177_00491241241268453 for Seeded Topic Models in Digital Archives: Analyzing Interpretations of Immigration in Swedish Newspapers, 1945–2019 by Miriam Hurtado Bodell, Måns Magnusson and Marc Keuschnigg in Sociological Methods & Research</p> <hd id="AN0190929150-21">Acknowledgments</hd> <p>We thank Maria Brandén, Jacob Habinek, Peter Hedström, Leif Jonsson, Friedolin Merhout, Étienne Ollion, and Sarah Valdez for discussions, and our editor Justin Grimmer and three anonymous reviewers for their valuable comments. We are indebted to the National Library of Sweden for granting access to their digitized newspaper archive.</p> <ref id="AN0190929150-22"> <title> Footnotes </title> <blist> <bibl id="bib1" idref="ref79" type="bt">1</bibl> <bibtext> To facilitate the running of seeded topic models on very large text data, we developed an R package, available on GitHub: https://github.com/mhbodell/seeded%5ftopic%5fmodels%5fdigital%5farchives. The pre-processed data and code, which can be used to recreate our main analyses, can be found under the same link.</bibtext> </blist> <blist> <bibl id="bib2" idref="ref80" type="bt">2</bibl> <bibtext> The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.</bibtext> </blist> <blist> <bibl id="bib3" idref="ref195" type="bt">3</bibl> <bibtext> The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research has been funded by the Swedish Research Council (2018-05170, 2018-06063). Resources provided by the Swedish National Infrastructure for Computing (2020/5-145, 2021/5-161, 2023-22-360) enabled computations.</bibtext> </blist> <blist> <bibl id="bib4" idref="ref103" type="bt">4</bibl> <bibtext> Miriam Hurtado Bodell https://orcid.org/0000-0002-8467-1746 Måns Magnusson https://orcid.org/0000-0002-0296-2719 Marc Keuschnigg https://orcid.org/0000-0001-5774-1553</bibtext> </blist> <blist> <bibl id="bib5" idref="ref49" type="bt">5</bibl> <bibtext> The pre-processed data are available on GitHub: https://github.com/mhbodell/seeded%5ftopic%5fmodels%5fdigital%5farchives. The original data are available from the authors upon request but, for copyright reasons, can only be accessed on-site at the National Library of Sweden.</bibtext> </blist> <blist> <bibl id="bib6" idref="ref35" type="bt">6</bibl> <bibtext> The supplemental material for this article is available online.</bibtext> </blist> <blist> <bibl id="bib7" idref="ref16" type="bt">7</bibl> <bibtext> In the machine learning literature, the term "semi-supervised" typically refers to modeling in a context where only a few labeled observations are available for training. Following the empirical social science literature ([139]), we also use the term for the type of "constrained" or "guided" modeling approach presented in this paper. While we use the term seeded topic model, similar models have also been referred to as guided or anchored topic models ([6]; [79]).</bibtext> </blist> <blist> <bibl id="bib8" idref="ref77" type="bt">8</bibl> <bibtext> For details on runtime and compute requirements, see the formal evaluation of parallel performance in [97].</bibtext> </blist> <blist> <bibl id="bib9" idref="ref61" type="bt">9</bibl> <bibtext> Note that absolute values depend on the number of topics <ephtml> <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><mi>K</mi></math> </ephtml> used in the topic model. Relative interpretations, however, are robust to changes in <ephtml> <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><mi>K</mi></math> </ephtml> .</bibtext> </blist> <blist> <bibtext> [5]</bibtext> </blist> </ref> <ref id="AN0190929150-23"> <title> References </title> <blist> <bibtext> Abbott A.1997. " On the Concept of Turning Point." Comparative Social Research. 16: 85–106.</bibtext> </blist> <blist> <bibtext> Abbott A. 2001. Time Matters: On Theory and Method. Chicago, IL, USA: University of Chicago Press.</bibtext> </blist> <blist> <bibtext> Andersson R., Dhalmann H., Holmqvist E., Kauppinen T.M., Magnusson Turner L., Skifter Andersen H., Søholt S., Vaattovaara M., Vilkama K., Wessel T., et al. 2010. Immigration, Housing and Segregation in the Nordic Welfare States. Helsinki University, Finland: Department of Geosciences and Geography.</bibtext> </blist> <blist> <bibtext> Andrews K.T., Caren N. 2010. " Making the News: Movement Organizations, Media Attention, and the Public Agenda." American Sociological Review. 75(6): 841–866.</bibtext> </blist> <blist> <bibtext> Anoop V.S., Asharaf S. 2017. " A Topic Modeling Guided Approach for Semantic Knowledge Discovery in E-Commerce." International Journal of Interactive Multimedia and Artificial Intelligence. 4(6): 1–8.</bibtext> </blist> <blist> <bibtext> Arora S., Ge R., Moitra A. 2012. "Learning Topic Models–Going Beyond SVD." Pp. 1–10 in Proceedings of the 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science. New Brunswick, New Jersey: IEEE.</bibtext> </blist> <blist> <bibtext> Arseniev-Koehler A., Cochran S.D., Mays V.M., Chang K.-W., Foster J.G. 2022. " Integrating Topic Modeling and Word Embedding to Characterize Violent Deaths." Proceedings of the National Academy of Sciences. 119(10): 1–6.</bibtext> </blist> <blist> <bibtext> Arseniev-Koehler A., Foster J.G. 2022. " Machine Learning as a Model for Cultural Learning: Teaching an Algorithm What It Means to Be Fat." Sociological Methods & Research. 51(4): 1484–1539.</bibtext> </blist> <blist> <bibtext> Bail C.A. 2012. " The Fringe Effect: Civil Society Organizations and the Evolution of Media Discourse About Islam Since the September 11th Attacks." American Sociological Review. 77(6): 855–879.</bibtext> </blist> <blist> <bibtext> Bail C.A. 2014. " The Cultural Environment: Measuring Culture With Big Data." Theory and Society. 43(3-4): 465–482.</bibtext> </blist> <blist> <bibtext> Bail C.A. 2016. " Cultural Carrying Capacity: Organ Donation Advocacy, Discursive Framing, and Social Media Engagement." Social Science & Medicine. 165: 280–288.</bibtext> </blist> <blist> <bibtext> Bail C.A. 2024. " Can Generative AI Improve Social Science? " Proceedings of the National Academy of Sciences. 121(21): e2314021121.</bibtext> </blist> <blist> <bibtext> Bail C.A., Brown T.W., Mann M. 2017. " Channeling Hearts and Minds: Advocacy Organizations, Cognitive-Emotional Currents, and Public Conversation." American Sociological Review. 82(6): 1188–1213.</bibtext> </blist> <blist> <bibtext> Barron A.T.J., Huang J., Spang R.L., DeDeo S. 2018. " Individuals, Institutions, and Innovation in the Debates of the French Revolution." Proceedings of the National Academy of Sciences. 115(18): 4607–4612.</bibtext> </blist> <blist> <bibtext> Barry D., Hartigan J.A. 1993. " A Bayesian Analysis for Change Point Problems." Journal of the American Statistical Association. 88(421): 309–319.</bibtext> </blist> <blist> <bibtext> Baumer E., Mimno D., Guha S., Quan E., Gay G.K. 2017. " Comparing Grounded Theory and Topic Modeling: Extreme Divergence or Unlikely Convergence? " Journal of the Association for Information Science and Technology. 68(6): 1397–1410.</bibtext> </blist> <blist> <bibtext> Bearman P. 2015. " Big Data and Historical Social Science." Big Data & Society. 2(2): 1–5.</bibtext> </blist> <blist> <bibtext> Bearman P., Faris R., Moody J. 1999. " Blocking the Future: New Solutions for Old Problems in Historical Social Science." Social Science History. 23(4): 501–533.</bibtext> </blist> <blist> <bibtext> Benford R.D., Snow D.A. 2000. " Framing Processes and Social Movements: An Overview and Assessment." Annual Review of Sociology. 26(1): 611–639.</bibtext> </blist> <blist> <bibtext> Best R.K., Arseniev-Koehler A. 2023. " The Stigma of Diseases: Unequal Burden, Uneven Decline." American Sociological Review. 88(5): 938–969.</bibtext> </blist> <blist> <bibtext> Blei D.M. 2012. " Probabilistic Topic Models." Communications of the ACM. 55(4): 77–84.</bibtext> </blist> <blist> <bibtext> Blei D.M., Ng A.Y., Jordan M.I. 2003. " Latent Dirichlet Allocation." Journal of Machine Learning Research. 3: 993–1022.</bibtext> </blist> <blist> <bibtext> Bleich E., van der Veen A.M. 2021. " Media Portrayals of Muslims: A Comparative Sentiment Analysis of American Newspapers, 1996–2015." Politics, Groups, and Identities. 9(1): 20–39.</bibtext> </blist> <blist> <bibtext> Blinder S. 2015. " Imagined Immigration: The Impact of Different Meanings of 'Immigrants' in Public Opinion and Policy Debates in Britain." Political Studies. 63(1): 80–100.</bibtext> </blist> <blist> <bibtext> Bohr J. 2020. " Reporting on Climate Change: A Computational Analysis of U.S. Newspapers and Sources of Bias, 1997–2017." Global Environmental Change. 61: 1–12.</bibtext> </blist> <blist> <bibtext> Bonikowski B. 2016. " Nationalism in Settled Times." Annual Review of Sociology. 42: 427–449.</bibtext> </blist> <blist> <bibtext> Bonikowski B., Luo Y., Stuhler O. 2022. " Politics as Usual? Measuring Populism, Nationalism, and Authoritarianism in US Presidential Campaigns (1952–2020) with Neural Language Models." Sociological Methods & Research. 51(4): 1721–1787.</bibtext> </blist> <blist> <bibtext> Bonikowski B., Nelson L.K. 2022. " From Ends to Means: The Promise of Computational Text Analysis for Theoretically Driven Sociological Research." Sociological Methods & Research. 51(4): 1469–1483.</bibtext> </blist> <blist> <bibtext> Börjeson L., Haffenden C., Malmsten M., Klingwall F., Rende E., Kurtz R., Rekathati F., Hägglöf H., Sikora J. 2023. "Transfiguring the Library as Digital Research Infrastructure: Making KBLab at the National Library of Sweden." Retrieved March 3, 2024 (https://osf.io/preprints/socarxiv/w48rf).</bibtext> </blist> <blist> <bibtext> Bourdieu P. 1991. Language and Symbolic Power. Cambridge, MA: Harvard University Press.</bibtext> </blist> <blist> <bibtext> Boutyline A., Arseniev-Koehler A., Cornell D.J. 2023. " School, Studying, and Smarts: Gender Stereotypes and Education Across 80 Years of American Print Media, 1930–2009." Social Forces. 102(1): 263–286.</bibtext> </blist> <blist> <bibtext> Byström M., Frohnert P. 2017. Invandringens Historia: Från "Folkhemmet" til Dagens Sverige. Elanders Sverige AB, Stockholm, Sweden: Delegationen för migrationsstudier.</bibtext> </blist> <blist> <bibtext> Card D., Chang S., Becker C., Mendelsohn J., Voigt R., Boustan L., Abramitzky R., Jurafsky D. 2022. " Computational Analysis of 140 Years of US Political Speeches Reveals More Positive but Increasingly Polarized Framing of Immigration." Proceedings of the National Academy of Sciences. 119(31): 1–9.</bibtext> </blist> <blist> <bibtext> Cerulo K.A., Leschziner V., Shepherd H. 2021. " Rethinking Culture and Cognition." Annual Review of Sociology. 47: 63–85.</bibtext> </blist> <blist> <bibtext> Chae Y., Davidson T. 2023. "Large Language Models for Text Classification: From Zero-Shot Learning to Fine-Tuning." Retrieved March 5, 2024 (https://osf.io/preprints/socarxiv/sthwk).</bibtext> </blist> <blist> <bibtext> Chang J., Gerrish S., Wang C., Boyd-Graber J., Blei D. 2009. " Reading Tea Leaves: How Humans Interpret Topic Models." Advances in Neural Information Processing Systems. 22: 288-296.</bibtext> </blist> <blist> <bibtext> Chen N.-C., Drouhard M., Kocielnik R., Suh J., Aragon C.R. 2018. " Using Machine Learning to Support Qualitative Coding in Social Science: Shifting the Focus to Ambiguity." ACM Transactions on Interactive Intelligent Systems. 8(2): 1–20.</bibtext> </blist> <blist> <bibtext> Chong D., Druckman J.N. 2017. " Framing Theory." Annual Review of Political Science. 10: 103–126.</bibtext> </blist> <blist> <bibtext> Czymara C.S., van Klingeren M. 2022. " New Perspective? Comparing Frame Occurrence in Online and Traditional News Media Reporting on Europe's "Migration Crisis"." Communications. 47(1): 136–162.</bibtext> </blist> <blist> <bibtext> Dahlström C. 2004. " Rhetoric, Practice and the Dynamics of Institutional Change: Immigrant Policy in Sweden, 1964–2000." Scandinavian Political Studies. 27(3): 287–310.</bibtext> </blist> <blist> <bibtext> Dannélls D., Johansson T., Björk L. 2019. "Evaluation and Refinement of an Enhanced OCR Process for Mass Digitisation." Pp. 112–123 in Digital Humanities in the Nordic Countries, Vol. 2364, edited by: Costanza Navarretta, Manex Agirrezabal, Bente Maegaard. Copenhagen, Denmark: University of Copenhagen.</bibtext> </blist> <blist> <bibtext> DiMaggio P. 1997. " Culture and Cognition." Annual Review of Sociology. 23(1): 263–287.</bibtext> </blist> <blist> <bibtext> DiMaggio P. 2015. " Adapting Computational Text Analysis to Social Science (and Vice Versa)." Big Data & Society. 2(2): 1–5.</bibtext> </blist> <blist> <bibtext> DiMaggio P., Nag M., Blei D. 2013. " Exploiting Affinities Between Topic Modeling and the Sociological Perspective on Culture: Application to Newspaper Coverage of US Government Arts Funding." Poetics. 41(6): 570–606.</bibtext> </blist> <blist> <bibtext> Do S., Ollion E., Shen R. 2022. " The Augmented Social Scientist: Using Sequential Transfer Learning to Annotate Millions of Texts with Human-Level Accuracy." Sociological Methods & Research 53(3): 1167-1200.</bibtext> </blist> <blist> <bibtext> Eberl J.-M., Galyga S. 2021. "Mapping Media Coverage of Migration Within and Into Europe." Pp. 105–122 in Media and Public Attitudes Toward Migration in Europe, edited by Strömbäck J, Meltzer CE, Eberl J-M. et al. Oxfordshire, England, UK: Routledge.</bibtext> </blist> <blist> <bibtext> Eberl J.-M., Meltzer C.E., Heidenreich T., Herrero B., Theorin N., Lind F., Berganza R., Boomgaarden H.G., Schemer C., Strömbäck J. 2018. " The European Media Discourse on Immigration and its Affects: A Literature Review." Annals of the International Communication Association. 42(3): 207–223.</bibtext> </blist> <blist> <bibtext> Egami N., Fong C.J., Grimmer J., Roberts M.E., Stewart B.M. 2022. " How to Make Causal Inferences Using Texts." Science Advances. 8(42): 1–13.</bibtext> </blist> <blist> <bibtext> Ekengren Oscarsson H. 2013. Väljare är inga dumbommar. Retrieved May 21, 2024 (https://tinyurl.com/juyntbk2).</bibtext> </blist> <blist> <bibtext> Entman R.M. 1993. " Framing: Toward Clarification of a Fractured Paradigm." Journal of Communication. 43(4): 51–58.</bibtext> </blist> <blist> <bibtext> Erdman C., Emerson J.W. 2007. " bcp: An R Package for Performing a Bayesian Analysis of Change Point Problems." Journal of Statistical Software. 23(3): 1–13.</bibtext> </blist> <blist> <bibtext> Ermakoff I. 2019. " Causality and History: Modes of Causal Investigation in Historical Social Sciences." Annual Review of Sociology. 45: 581–606.</bibtext> </blist> <blist> <bibtext> Eshima S., Imai K., Sasaki T. 2024. " Keyword-Assisted Topic Models." American Journal of Political Science. 68(2): 730–750.</bibtext> </blist> <blist> <bibtext> Fan A., Doshi-Velez F., Miratrix L. 2019. " Assessing Topic Model Relevance: Evaluation and Informative Priors." Statistical Analysis and Data Mining: The ASA Data Science Journal. 12(3): 210–222.</bibtext> </blist> <blist> <bibtext> Farrell J. 2016. " Corporate Funding and Ideological Polarization About Climate Change." Proceedings of the National Academy of Sciences. 113(1): 92–97.</bibtext> </blist> <blist> <bibtext> Fiss P.C., Hirsch P.M. 2005. " The Discourse of Globalization: Framing and Sensemaking of an Emerging Concept." American Sociological Review. 70(1): 29–52.</bibtext> </blist> <blist> <bibtext> Fligstein N., Stuart Brundage J., Schultz M. 2017. " Seeing Like the Fed: Culture, Cognition, and Framing in the Failure to Anticipate the Financial Crisis of 2008." American Sociological Review. 82(5): 879–909.</bibtext> </blist> <blist> <bibtext> Fuhse J., Stuhler O., Riebling J., Martin J.L. 2020. " Relating Social and Symbolic Relations in Quantitative Text Analysis: A Study of Parliamentary Discourse in the Weimar Republic." Poetics. 78: 1–17.</bibtext> </blist> <blist> <bibtext> Gamson W.A.1992. Talking Politics. Cambridge, UK: Cambridge University Press.</bibtext> </blist> <blist> <bibtext> Gamson W.A., Modigliani A. 1989. " Media Discourse and Public Opinion on Nuclear Power: A Constructionist Approach." American Journal of Sociology. 95(1): 1–37.</bibtext> </blist> <blist> <bibtext> Garg N., Schiebinger L., Jurafsky D., Zou J. 2018. " Word Embeddings Quantify 100 Years of Gender and Ethnic Stereotypes." Proceedings of the National Academy of Sciences. 115(16): E3635–E3644.</bibtext> </blist> <blist> <bibtext> Geddes A., Scholten P. 2016. The Politics of Migration and Immigration in Europe. London, UK: Sage.</bibtext> </blist> <blist> <bibtext> Gencoglu O., Gruber M. 2020. " Causal Modeling of Twitter Activity During Covid-19." Computation. 8(4): 1–14.</bibtext> </blist> <blist> <bibtext> Gilardi F., Alizadeh M., Kubli M. 2023. " ChatGPT Outperforms Crowd Workers for Text-Annotation Tasks." Proceedings of the National Academy of Sciences. 120(30): 1–3.</bibtext> </blist> <blist> <bibtext> Goldberg A. 2011. " Mapping Shared Understandings Using Relational Class Analysis: The Case of the Cultural Omnivore Reexamined." American Journal of Sociology. 116(5): 1397–1436.</bibtext> </blist> <blist> <bibtext> Goldenstein J., Poschmann P. 2019. " Analyzing Meaning in Big Data: Performing a Map Analysis Using Grammatical Parsing and Topic Modeling." Sociological Methodology. 49(1): 83–131.</bibtext> </blist> <blist> <bibtext> Greenberg J., Pyszczynski T., Solomon S. 1986. "The Causes and Consequences of a Need for Self-Esteem: A Terror Management Theory." Pp. 189–212 in Public Self and Private Self, edited by Baumeister, R.F. New York: Springer.</bibtext> </blist> <blist> <bibtext> Greussing E., Boomgaarden H.G.2017. " Shifting the Refugee Narrative? An Automated Frame Analysis of Europe's 2015 Refugee Crisis." Journal of Ethnic and Migration Studies. 43(11): 1749–1774.</bibtext> </blist> <blist> <bibtext> Greve H.R., Rao H., Vicinanza P., Zhou E.Y. 2022. " Online Conspiracy Groups: Micro-Bloggers, Bots, and Coronavirus Conspiracy Talk on Twitter." American Sociological Review. 87(6): 919–949.</bibtext> </blist> <blist> <bibtext> Griffin L.J. 1992. " Temporality, Events, and Explanation in Historical Sociology: An Introduction." Sociological Methods & Research. 20(4): 403–427.</bibtext> </blist> <blist> <bibtext> Griffiths T.L., Steyvers M. 2004. " Finding Scientific Topics." Proceedings of the National Academy of Sciences. 101(1): 5228–5235.</bibtext> </blist> <blist> <bibtext> Grimmer J., Roberts M.E., Stewart B.M. 2022. Text as Data: A New Framework for Machine Learning and the Social Sciences. Princeton, NJ: Princeton University Press.</bibtext> </blist> <blist> <bibtext> Grossmann I., Feinberg M., Parker D., Christakis N., Tetlock P., Cunningham W. 2023. " AI and the Transformation of Social Science Research." Science (New York, NY). 380(6650): 1108–1109.</bibtext> </blist> <blist> <bibtext> Heidenreich T., Lind F., Eberl J.-M., Boomgaarden H.G.2019. " Media Framing Dynamics of the 'European Refugee Crisis': A Comparative Topic Modelling Approach." Journal of Refugee Studies. 32(SI 1): 172–182.</bibtext> </blist> <blist> <bibtext> Helbling M. 2014. " Framing Immigration in Western Europe." Journal of Ethnic and Migration Studies. 40(1): 21–41.</bibtext> </blist> <blist> <bibtext> Hunzaker M.B.F., Valentino L. 2019. " Mapping Cultural Schemas: From Theory to Method." American Sociological Review. 84(5): 950–981.</bibtext> </blist> <blist> <bibtext> Hurtado Bodell M., Arvidsson M., Magnusson M. 2019. "Interpretable Word Embeddings via Informative Priors." Pp. 6323–6329 in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), edited by Padó, S., Huang, R. Hong Kong, China: Association for Computational Linguistics.</bibtext> </blist> <blist> <bibtext> Hurtado Bodell M., Magnusson M., Mützel S. 2022. " From Documents to Data: A Framework for Total Corpus Quality." Socius. 8: 1–15.</bibtext> </blist> <blist> <bibtext> Jagarlamudi J., Daumé III H., Udupa R. 2012. "Incorporating Lexical Priors Into Topic Models." Pp. 204–213 in Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, edited by Daelemans,W. Avignon, France: Association for Computational Linguistics.</bibtext> </blist> <blist> <bibtext> Janssen S., Kuipers G., Verboord M.2008. " Cultural Globalization and Arts Journalism: The International Orientation of Arts and Culture Coverage in Dutch, French, German, and US Newspapers, 1955 to 2005." American Sociological Review. 73(5): 719–740.</bibtext> </blist> <blist> <bibtext> Jarvis B.F., Keuschnigg M., Hedström P. 2021. "Analytical Sociology Amidst a Computational Social Science Revolution." Pp. 33–52 in Handbook of Computational Social Science, edited by U. Engel, A. Quan-Haase, S. X. Liu, and L. Lyberg. Oxfordshire, England, UK: Routledge.</bibtext> </blist> <blist> <bibtext> Karell D., Freedman M.2019. " Rhetorics of Radicalism." American Sociological Review. 84(4): 726–753.</bibtext> </blist> <blist> <bibtext> Keuschnigg M., Lovsjö N., Hedström P. 2018. " Analytical Sociology and Computational Social Science." Journal of Computational Social Science. 1(1): 3–14.</bibtext> </blist> <blist> <bibtext> King G., Lam P., Roberts M.E. 2017. " Computer-Assisted Keyword and Document Set Discovery from Unstructured Text." American Journal of Political Science. 61(4): 971–988.</bibtext> </blist> <blist> <bibtext> Koopmans R., Olzak S. 2004. " Discursive Opportunities and the Evolution of Right-Wing Violence in Germany." American Journal of Sociology. 110(1): 198–230.</bibtext> </blist> <blist> <bibtext> Korkut U., Bucken-Knapp G., McGarry A., Hinnfors J., Drake H. 2013. The Discourses and Politics of Migration in Europe. New York: Springer.</bibtext> </blist> <blist> <bibtext> Kozlowski A.C., Taddy M., Evans J.A. 2019. " The Geometry of Culture: Analyzing the Meanings of Class Through Word Embeddings." American Sociological Review. 84(5): 905–949.</bibtext> </blist> <blist> <bibtext> Krzyżanowski M. 2018. " 'We Are a Small Country That Has Done Enormously Lot': The? Refugee Crisis? and the Hybrid Discourse of Politicizing Immigration in Sweden." Journal of Immigrant & Refugee Studies. 16(1-2): 97–117.</bibtext> </blist> <blist> <bibtext> Kupskỳ A. 2017. " History and Changes of Swedish Migration Policy." Journal of Geography, Politics and Society. 7(3): 50–56.</bibtext> </blist> <blist> <bibtext> Lawlor A., Tolley E.2017. " Deciding Who's Legitimate: News Media Framing of Immigrants and Refugees." International Journal of Communication. 11: 967–991.</bibtext> </blist> <blist> <bibtext> Legewie J. 2013. " Terrorist Events and Attitudes Toward Immigrants: A Natural Experiment." American Journal of Sociology. 118(5): 1199–1245.</bibtext> </blist> <blist> <bibtext> Lichtenstein M., Rucks-Ahidiana Z.2023. " Contextual Text Coding: A Mixed-Methods Approach for Large-Scale Textual Data." Sociological Methods & Research 52(2): 606-641.</bibtext> </blist> <blist> <bibtext> Lizardo O. 2017. " Improving Cultural Analysis: Considering Personal Culture in its Declarative and Nondeclarative Modes." American Sociological Review. 82(1): 88–115.</bibtext> </blist> <blist> <bibtext> Lizardo O. 2021. " Culture, Cognition, and Internalization." Sociological Forum. 36(S1): 1177–1206.</bibtext> </blist> <blist> <bibtext> Lu B., Ott M., Cardie C., Tsou B. 2011. "Multi-Aspect Sentiment Analysis with Topic Models." Pp. 81–88 in 2011 IEEE 11th International Conference on Data Mining Workshops, edited by Spiliopoulou, M., Wang, H.,Cook, D. et al. Vancover, Canada: IEEE.</bibtext> </blist> <blist> <bibtext> Madsen A., Reddy S., Chandar S. 2021. " Post-Hoc Interpretability for Neural NLP: A Survey." ACM Computing Surveys 55(8): 1–41.</bibtext> </blist> <blist> <bibtext> Magnusson M., Jonsson L., Villani M., Broman D.2018. " Sparse Partially Collapsed MCMC for Parallel Inference in Topic Models." Journal of Computational and Graphical Statistics. 27(2): 449–463.</bibtext> </blist> <blist> <bibtext> Magnusson M., Öhrvall R., Barrling K., Mimno D. 2018. "Voices from the Far Right: A Text Analysis of Swedish Parliamentary Debates." Retrieved May 21, 2024 (https://osf.io/preprints/socarxiv/jdsqc).</bibtext> </blist> <blist> <bibtext> Marx Ferree M. 2003. " Resonance and Radicalism: Feminist Framing in the Abortion Debates of the United States and Germany." American Journal of Sociology. 109(2): 304–344.</bibtext> </blist> <blist> <bibtext> Mohr J.W., Bail C.A., Frye M., Lena J.C., Lizardo O., McDonnell T.E., Mische A., Tavory I., Wherry F.F. 2020. Measuring Culture. New York: Columbia University Press.</bibtext> </blist> <blist> <bibtext> Mohr J.W., Wagner-Pacifici R., Breiger R.L. 2015. " Toward a Computational Hermeneutics." Big Data & Society. 2(2): 1–8.</bibtext> </blist> <blist> <bibtext> Mohr J.W., Wagner-Pacifici R., Breiger R.L., Bogdanov P. 2013. " Graphing the Grammar of Motives in National Security Strategies: Cultural Interpretation, Automated Text Analysis and the Drama of Global Politics." Poetics. 41(6): 670–700.</bibtext> </blist> <blist> <bibtext> Nelson L.K. 2019. " To Measure Meaning in Big Data, Don't Give Me a Map, Give Me Transparency and Reproducibility." Sociological Methodology. 49(1): 139–143.</bibtext> </blist> <blist> <bibtext> Nelson L.K. 2020. " Computational Grounded Theory: A Methodological Framework." Sociological Methods & Research. 49(1): 3–42.</bibtext> </blist> <blist> <bibtext> Nelson L.K. 2021a. " Cycles of Conflict, a Century of Continuity: The Impact of Persistent Place-Based Political Logics on Social Movement Strategy." American Journal of Sociology. 127(1): 1–59.</bibtext> </blist> <blist> <bibtext> Nelson L.K. 2021b. " Leveraging the Alignment Between Machine Learning and Intersectionality: Using Word Embeddings to Measure Intersectional Experiences of the Nineteenth Century US South." Poetics. 88: 1–14.</bibtext> </blist> <blist> <bibtext> Nelson L.K., Burk D., Knudsen M., McCall L. 2021. " The Future of Coding: A Comparison of Hand-Coding and Three Types of Computer-Assisted Text Analysis Methods." Sociological Methods & Research. 50(1): 202–237.</bibtext> </blist> <blist> <bibtext> Ollion É, Shen R., Macanovic A., Chatelain A. 2024. " The Dangers of Using Proprietary LLMs for Research." Nature Machine Intelligence. 6: 4–5.</bibtext> </blist> <blist> <bibtext> Pääkkönen J., Ylikoski P. 2021. " Humanistic Interpretation and Machine Learning." Synthese. 199(1): 1461–1497.</bibtext> </blist> <blist> <bibtext> Popper K.R.1957. The Poverty of Historicism. Oxfordshire, UK: Routledge.</bibtext> </blist> <blist> <bibtext> Quinsaat S. 2014. " Competing News Frames and Hegemonic Discourses in the Construction of Contemporary Immigration and Immigrants in the United States." Mass Communication and Society. 17(4): 573–596.</bibtext> </blist> <blist> <bibtext> Rawlings C.M., Childress C. 2021. " Schemas, Interactions, and Objects in Meaning-Making." Sociological Forum. 36: 1446–1477.</bibtext> </blist> <blist> <bibtext> Roberts M.E., Stewart B.M., Tingley D., Lucas C., Leder-Luis J., Gadarian S.K., Albertson B., Rand D.G. 2014. " Structural Topic Models for Open-Ended Survey Responses." American Journal of Political Science. 58(4): 1064–82.</bibtext> </blist> <blist> <bibtext> Rudin C. 2019. " Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead." Nature Machine Intelligence. 1(5): 206–215.</bibtext> </blist> <blist> <bibtext> Rule A., Cointet J.-P., Bearman P.S. 2015. " Lexical Shifts, Substantive Changes, and Continuity in State of the Union Discourse, 1790–2014." Proceedings of the National Academy of Sciences. 112(35): 10837–10844.</bibtext> </blist> <blist> <bibtext> Rydgren J., van der Meiden S. 2019. " The Radical Right and the End of Swedish Exceptionalism." European Political Science. 18: 439–455.</bibtext> </blist> <blist> <bibtext> Salganik M.J.2018. Bit by Bit: Social Research in the Digital Age. Princeton, NJ: Princeton University Press.</bibtext> </blist> <blist> <bibtext> Scheufele D.A. 1999. " Framing as a Theory of Media Effects." Journal of Communication. 49(1): 103–122.</bibtext> </blist> <blist> <bibtext> Scheufele D.A. 2000. " Agenda-Setting, Priming, and Framing Revisited: Another Look at Cognitive Effects of Political Communication." Mass Communication and Society. 3(2-3): 297–316.</bibtext> </blist> <blist> <bibtext> Schierup C.-U., Ålund A. 2011. " The End of Swedish Exceptionalism? Citizenship, Neoliberalism and the Politics of Exclusion." Race & Class. 53(1): 45–64.</bibtext> </blist> <blist> <bibtext> Schmidt-Catran A., Czymara C.S. 2020. " 'Did You Read About Berlin?' Terrorist Attacks, Online Media Reporting and Support for Refugees in Germany." Soziale Welt. 71(2-3): 305–337.</bibtext> </blist> <blist> <bibtext> Sewell W.H. 1996. " Historical Events as Transformations of Structures: Inventing Revolution at the Bastille." Theory and Society. 25(6): 841–881.</bibtext> </blist> <blist> <bibtext> Shor E., Van De Rijt A., Miltsov A., Kulkarni V., Skiena S. 2015. " A Paper Ceiling: Explaining the Persistent Underrepresentation of Women in Printed News." American Sociological Review. 80(5): 960–984.</bibtext> </blist> <blist> <bibtext> Statistics Sweden. 2022. "Utrikes Födda i Sverige." Retrieved May 21, 2024 (https://tinyurl.com/4hxukurf).</bibtext> </blist> <blist> <bibtext> Statistics Sweden. 2024. "Population and Population Changes 1749–2023." Retrieved May 21, 2024 (https://tinyurl.com/ms9767nu).</bibtext> </blist> <blist> <bibtext> Stoltz D.S., Taylor M.A. 2021. " Cultural Cartography with Word Embeddings." Poetics. 88: 1–14.</bibtext> </blist> <blist> <bibtext> Stone L. 1979. " The Revival of Narrative: Reflections On a New Old History." Past & Present. 85: 3–24.</bibtext> </blist> <blist> <bibtext> Strauss C., Quinn N. 1997. A Cognitive Theory of Cultural Meaning. Cambridge, UK: Cambridge University Press.</bibtext> </blist> <blist> <bibtext> Svanberg I., Tydén M. 1998. Tusen år av Invandring. En Svensk Kulturhistoria. Stockholm, Sweden: Arena.</bibtext> </blist> <blist> <bibtext> Swidler A. 1986. " Culture in Action: Symbols and Strategies." American Sociological Review. 51(2): 273–286.</bibtext> </blist> <blist> <bibtext> Taylor M.A., Stoltz D.S. 2020. " Concept Class Analysis: A Method for Identifying Cultural Schemas in Texts." Sociological Science. 7: 544–569.</bibtext> </blist> <blist> <bibtext> Törnberg P. 2023. "Chatgpt-4 Outperforms Experts and Crowd Workers in Annotating Political Twitter Messages with Zero-Shot Learning." Retrieved March 5, 2024 (https://arxiv.org/abs/2304.06588).</bibtext> </blist> <blist> <bibtext> Törnberg A., Törnberg P. 2016. " Muslims in Social Media Discourse: Combining Topic Modeling and Critical Discourse Analysis." Discourse, Context & Media. 13: 132–142.</bibtext> </blist> <blist> <bibtext> Tsur O., Calacci D., Lazer D. 2015. "A Frame of Mind: Using Statistical Models for Detection of Framing and Agenda Setting Campaigns." Pp. 1629–1638 in Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), edited by Zong, C. & Strube, M. Beijing, China: Association for Computational Linguistics.</bibtext> </blist> <blist> <bibtext> Voyer A., Kline Z.D., Danton M., Volkova T. 2022. " From Strange to Normal: Computational Approaches to Examining Immigrant Incorporation Through Shifts in the Mainstream." Sociological Methods & Research. 51(4): 1540–1579.</bibtext> </blist> <blist> <bibtext> Wagner-Pacifici R. 2017. What Is an Event?Chicago, IL: University of Chicago Press.</bibtext> </blist> <blist> <bibtext> Watanabe K., Baturo A. 2024. " Seeded Sequential LDA: A Semi-Supervised Algorithm for Topic-Specific Analysis of Sentences." Social Science Computer Review. 42(1): 224–248.</bibtext> </blist> <blist> <bibtext> Watanabe K., Xuan-Hieu P., Watanabe M.K. 2022. "Package 'seededlda'."Retrieved February 2, 2023 (https://cran.irsn.fr/web/packages/seededlda/seededlda.pdf).</bibtext> </blist> <blist> <bibtext> Watanabe K., Zhou Y. 2022. " Theory-Driven Analysis of Large Corpora: Semisupervised Topic Classification of the UN Speeches." Social Science Computer Review. 40(2): 346–366.</bibtext> </blist> <blist> <bibtext> Weaver D.H. 2007. " Thoughts on Agenda Setting, Framing, and Priming." Journal of Communication. 57(1): 142–147.</bibtext> </blist> <blist> <bibtext> Widmann T., Wich M. 2023. " Creating and Comparing Dictionary, Word Embedding, and Transformer-Based Models to Measure Discrete Emotions in German Political Text." Political Analysis. 31(4): 626–641.</bibtext> </blist> <blist> <bibtext> Wood M.L., Stoltz D.S., Van Ness J., Taylor M.A. 2018. " Schemas and Frames." Sociological Theory. 36(3): 244–261.</bibtext> </blist> <blist> <bibtext> Wu L., Wang D., Evans J.A. 2019. " Large Teams Develop and Small Teams Disrupt Science and Technology." Nature. 566(7744): 378–382.</bibtext> </blist> <blist> <bibtext> Ying L., Montgomery J.M., Stewart B.M. 2022. " Topics, Concepts, and Measurement: A Crowdsourced Procedure for Validating Topics as Measures." Political Analysis. 30(4): 570–589.</bibtext> </blist> <blist> <bibtext> Ziems C., Held W., Shaikh O., Chen J., Zhang Z., Yang D. 2024. " Can Large Language Models Transform Computational Social Science? " Computational Linguistics. 50(1): 237–291.</bibtext> </blist> </ref> <aug> <p>By Miriam Hurtado Bodell; Måns Magnusson and Marc Keuschnigg</p> <p>Reported by Author; Author; Author</p> <p></p> <p>Miriam Hurtado Bodell is a PhD candidate in analytical sociology at Linköping University, Sweden. Her research focuses on theoretically-driven computational text analysis and social inquiries about meaning-making processes.</p> <p>Måns Magnusson is an assistant professor of Statistics at Uppsala University, Sweden and affiliated with the Institute for Analytical Sociology at Linköping University, Sweden. His primary research interests are Bayesian inference, probabilistic machine learning, and statistical inference from textual data.</p> <p>Marc Keuschnigg is Professor of Sociology at Leipzig University, Germany, and Associate Professor at the Institute for Analytical Sociology at Linköping University, Sweden. His research interests include cultural dynamics, inequality, and normative change. Recent work has appeared in the European Sociological Review, Nature Human Behaviour, Science Advances, and PNAS.</p> </aug> <nolink nlid="nl1" bibid="bib44" firstref="ref1"></nolink> <nolink nlid="nl2" bibid="bib102" firstref="ref2"></nolink> <nolink nlid="nl3" bibid="bib115" firstref="ref3"></nolink> <nolink nlid="nl4" bibid="bib11" firstref="ref4"></nolink> <nolink nlid="nl5" bibid="bib104" firstref="ref5"></nolink> <nolink nlid="nl6" bibid="bib105" firstref="ref6"></nolink> <nolink nlid="nl7" bibid="bib106" firstref="ref7"></nolink> <nolink nlid="nl8" bibid="bib87" firstref="ref8"></nolink> <nolink nlid="nl9" bibid="bib66" firstref="ref9"></nolink> <nolink nlid="nl10" bibid="bib13" firstref="ref10"></nolink> <nolink nlid="nl11" bibid="bib82" firstref="ref11"></nolink> <nolink nlid="nl12" bibid="bib143" firstref="ref12"></nolink> <nolink nlid="nl13" bibid="bib25" firstref="ref13"></nolink> <nolink nlid="nl14" bibid="bib131" firstref="ref14"></nolink> <nolink nlid="nl15" bibid="bib126" firstref="ref15"></nolink> <nolink nlid="nl16" bibid="bib27" firstref="ref17"></nolink> <nolink nlid="nl17" bibid="bib31" firstref="ref18"></nolink> <nolink nlid="nl18" bibid="bib20" firstref="ref19"></nolink> <nolink nlid="nl19" bibid="bib22" firstref="ref20"></nolink> <nolink nlid="nl20" bibid="bib71" firstref="ref21"></nolink> <nolink nlid="nl21" bibid="bib21" firstref="ref22"></nolink> <nolink nlid="nl22" bibid="bib133" firstref="ref25"></nolink> <nolink nlid="nl23" bibid="bib57" firstref="ref26"></nolink> <nolink nlid="nl24" bibid="bib43" firstref="ref28"></nolink> <nolink nlid="nl25" bibid="bib103" firstref="ref29"></nolink> <nolink nlid="nl26" bibid="bib100" firstref="ref30"></nolink> <nolink nlid="nl27" bibid="bib109" firstref="ref31"></nolink> <nolink nlid="nl28" bibid="bib72" firstref="ref33"></nolink> <nolink nlid="nl29" bibid="bib28" firstref="ref34"></nolink> <nolink nlid="nl30" bibid="bib79" firstref="ref36"></nolink> <nolink nlid="nl31" bibid="bib138" firstref="ref37"></nolink> <nolink nlid="nl32" bibid="bib113" firstref="ref39"></nolink> <nolink nlid="nl33" bibid="bib17" firstref="ref40"></nolink> <nolink nlid="nl34" bibid="bib135" firstref="ref42"></nolink> <nolink nlid="nl35" bibid="bib95" firstref="ref50"></nolink> <nolink nlid="nl36" bibid="bib54" firstref="ref52"></nolink> <nolink nlid="nl37" bibid="bib53" firstref="ref53"></nolink> <nolink nlid="nl38" bibid="bib137" firstref="ref54"></nolink> <nolink nlid="nl39" bibid="bib97" firstref="ref55"></nolink> <nolink nlid="nl40" bibid="bib60" firstref="ref56"></nolink> <nolink nlid="nl41" bibid="bib99" firstref="ref57"></nolink> <nolink nlid="nl42" bibid="bib85" firstref="ref58"></nolink> <nolink nlid="nl43" bibid="bib56" firstref="ref59"></nolink> <nolink nlid="nl44" bibid="bib80" firstref="ref60"></nolink> <nolink nlid="nl45" bibid="bib123" firstref="ref62"></nolink> <nolink nlid="nl46" bibid="bib75" firstref="ref63"></nolink> <nolink nlid="nl47" bibid="bib90" firstref="ref64"></nolink> <nolink nlid="nl48" bibid="bib68" firstref="ref65"></nolink> <nolink nlid="nl49" bibid="bib74" firstref="ref66"></nolink> <nolink nlid="nl50" bibid="bib39" firstref="ref67"></nolink> <nolink nlid="nl51" bibid="bib46" firstref="ref68"></nolink> <nolink nlid="nl52" bibid="bib23" firstref="ref69"></nolink> <nolink nlid="nl53" bibid="bib118" firstref="ref70"></nolink> <nolink nlid="nl54" bibid="bib38" firstref="ref72"></nolink> <nolink nlid="nl55" bibid="bib94" firstref="ref73"></nolink> <nolink nlid="nl56" bibid="bib128" firstref="ref74"></nolink> <nolink nlid="nl57" bibid="bib42" firstref="ref75"></nolink> <nolink nlid="nl58" bibid="bib34" firstref="ref76"></nolink> <nolink nlid="nl59" bibid="bib122" firstref="ref78"></nolink> <nolink nlid="nl60" bibid="bib136" firstref="ref81"></nolink> <nolink nlid="nl61" bibid="bib52" firstref="ref82"></nolink> <nolink nlid="nl62" bibid="bib70" firstref="ref83"></nolink> <nolink nlid="nl63" bibid="bib18" firstref="ref84"></nolink> <nolink nlid="nl64" bibid="bib50" firstref="ref86"></nolink> <nolink nlid="nl65" bibid="bib119" firstref="ref87"></nolink> <nolink nlid="nl66" bibid="bib112" firstref="ref88"></nolink> <nolink nlid="nl67" bibid="bib140" firstref="ref90"></nolink> <nolink nlid="nl68" bibid="bib59" firstref="ref91"></nolink> <nolink nlid="nl69" bibid="bib19" firstref="ref92"></nolink> <nolink nlid="nl70" bibid="bib93" firstref="ref93"></nolink> <nolink nlid="nl71" bibid="bib142" firstref="ref94"></nolink> <nolink nlid="nl72" bibid="bib76" firstref="ref95"></nolink> <nolink nlid="nl73" bibid="bib65" firstref="ref98"></nolink> <nolink nlid="nl74" bibid="bib130" firstref="ref100"></nolink> <nolink nlid="nl75" bibid="bib111" firstref="ref104"></nolink> <nolink nlid="nl76" bibid="bib134" firstref="ref105"></nolink> <nolink nlid="nl77" bibid="bib55" firstref="ref106"></nolink> <nolink nlid="nl78" bibid="bib29" firstref="ref114"></nolink> <nolink nlid="nl79" bibid="bib125" firstref="ref115"></nolink> <nolink nlid="nl80" bibid="bib124" firstref="ref116"></nolink> <nolink nlid="nl81" bibid="bib41" firstref="ref117"></nolink> <nolink nlid="nl82" bibid="bib78" firstref="ref118"></nolink> <nolink nlid="nl83" bibid="bib14" firstref="ref120"></nolink> <nolink nlid="nl84" bibid="bib58" firstref="ref121"></nolink> <nolink nlid="nl85" bibid="bib33" firstref="ref122"></nolink> <nolink nlid="nl86" bibid="bib10" firstref="ref127"></nolink> <nolink nlid="nl87" bibid="bib32" firstref="ref132"></nolink> <nolink nlid="nl88" bibid="bib107" firstref="ref133"></nolink> <nolink nlid="nl89" bibid="bib37" firstref="ref134"></nolink> <nolink nlid="nl90" bibid="bib92" firstref="ref135"></nolink> <nolink nlid="nl91" bibid="bib45" firstref="ref136"></nolink> <nolink nlid="nl92" bibid="bib61" firstref="ref138"></nolink> <nolink nlid="nl93" bibid="bib69" firstref="ref150"></nolink> <nolink nlid="nl94" bibid="bib16" firstref="ref158"></nolink> <nolink nlid="nl95" bibid="bib101" firstref="ref160"></nolink> <nolink nlid="nl96" bibid="bib77" firstref="ref164"></nolink> <nolink nlid="nl97" bibid="bib114" firstref="ref165"></nolink> <nolink nlid="nl98" bibid="bib96" firstref="ref166"></nolink> <nolink nlid="nl99" bibid="bib98" firstref="ref170"></nolink> <nolink nlid="nl100" bibid="bib139" firstref="ref173"></nolink> <nolink nlid="nl101" bibid="bib84" firstref="ref175"></nolink> <nolink nlid="nl102" bibid="bib86" firstref="ref188"></nolink> <nolink nlid="nl103" bibid="bib47" firstref="ref190"></nolink> <nolink nlid="nl104" bibid="bib62" firstref="ref192"></nolink> <nolink nlid="nl105" bibid="bib88" firstref="ref194"></nolink> <nolink nlid="nl106" bibid="bib950" firstref="ref196"></nolink> <nolink nlid="nl107" bibid="bib67" firstref="ref197"></nolink> <nolink nlid="nl108" bibid="bib91" firstref="ref198"></nolink> <nolink nlid="nl109" bibid="bib121" firstref="ref199"></nolink> <nolink nlid="nl110" bibid="bib15" firstref="ref200"></nolink> <nolink nlid="nl111" bibid="bib51" firstref="ref201"></nolink> <nolink nlid="nl112" bibid="bib89" firstref="ref209"></nolink> <nolink nlid="nl113" bibid="bib129" firstref="ref211"></nolink> <nolink nlid="nl114" bibid="bib120" firstref="ref216"></nolink> <nolink nlid="nl115" bibid="bib116" firstref="ref217"></nolink> <nolink nlid="nl116" bibid="bib40" firstref="ref218"></nolink> <nolink nlid="nl117" bibid="bib49" firstref="ref223"></nolink> <nolink nlid="nl118" bibid="bib36" firstref="ref228"></nolink> <nolink nlid="nl119" bibid="bib144" firstref="ref229"></nolink> <nolink nlid="nl120" bibid="bib48" firstref="ref230"></nolink> <nolink nlid="nl121" bibid="bib63" firstref="ref231"></nolink> <nolink nlid="nl122" bibid="bib141" firstref="ref234"></nolink> <nolink nlid="nl123" bibid="bib35" firstref="ref236"></nolink> <nolink nlid="nl124" bibid="bib64" firstref="ref237"></nolink> <nolink nlid="nl125" bibid="bib132" firstref="ref238"></nolink> <nolink nlid="nl126" bibid="bib108" firstref="ref239"></nolink> <nolink nlid="nl127" bibid="bib12" firstref="ref240"></nolink> <nolink nlid="nl128" bibid="bib145" firstref="ref241"></nolink> <nolink nlid="nl129" bibid="bib73" firstref="ref242"></nolink> <nolink nlid="nl130" bibid="bib30" firstref="ref246"></nolink> <nolink nlid="nl131" bibid="bib26" firstref="ref248"></nolink> <nolink nlid="nl132" bibid="bib24" firstref="ref249"></nolink> <nolink nlid="nl133" bibid="bib83" firstref="ref251"></nolink> <nolink nlid="nl134" bibid="bib117" firstref="ref252"></nolink> <nolink nlid="nl135" bibid="bib81" firstref="ref253"></nolink> <nolink nlid="nl136" bibid="bib127" firstref="ref257"></nolink> <nolink nlid="nl137" bibid="bib110" firstref="ref259"></nolink>
Header DbId: eric
DbLabel: ERIC
An: EJ1496219
AccessLevel: 3
PubType: Academic Journal
PubTypeId: academicJournal
PreciseRelevancyScore: 0
IllustrationInfo
Items – Name: Title
  Label: Title
  Group: Ti
  Data: Seeded Topic Models in Digital Archives: Analyzing Interpretations of Immigration in Swedish Newspapers, 1945-2019
– Name: Language
  Label: Language
  Group: Lang
  Data: English
– Name: Author
  Label: Authors
  Group: Au
  Data: <searchLink fieldCode="AR" term="%22Miriam+Hurtado+Bodell%22">Miriam Hurtado Bodell</searchLink> (ORCID <externalLink term="https://orcid.org/0000-0002-8467-1746">0000-0002-8467-1746</externalLink>)<br /><searchLink fieldCode="AR" term="%22Måns+Magnusson%22">Måns Magnusson</searchLink> (ORCID <externalLink term="https://orcid.org/0000-0002-0296-2719">0000-0002-0296-2719</externalLink>)<br /><searchLink fieldCode="AR" term="%22Marc+Keuschnigg%22">Marc Keuschnigg</searchLink> (ORCID <externalLink term="https://orcid.org/0000-0001-5774-1553">0000-0001-5774-1553</externalLink>)
– Name: TitleSource
  Label: Source
  Group: Src
  Data: <searchLink fieldCode="SO" term="%22Sociological+Methods+%26+Research%22"><i>Sociological Methods & Research</i></searchLink>. 2026 55(1):120-156.
– Name: Avail
  Label: Availability
  Group: Avail
  Data: SAGE Publications. 2455 Teller Road, Thousand Oaks, CA 91320. Tel: 800-818-7243; Tel: 805-499-9774; Fax: 800-583-2665; e-mail: journals@sagepub.com; Web site: https://sagepub.com
– Name: PeerReviewed
  Label: Peer Reviewed
  Group: SrcInfo
  Data: Y
– Name: Pages
  Label: Page Count
  Group: Src
  Data: 37
– Name: DatePubCY
  Label: Publication Date
  Group: Date
  Data: 2026
– Name: TypeDocument
  Label: Document Type
  Group: TypDoc
  Data: Journal Articles<br />Reports - Research
– Name: Subject
  Label: Descriptors
  Group: Su
  Data: <searchLink fieldCode="DE" term="%22Foreign+Countries%22">Foreign Countries</searchLink><br /><searchLink fieldCode="DE" term="%22Newspapers%22">Newspapers</searchLink><br /><searchLink fieldCode="DE" term="%22Mass+Media+Role%22">Mass Media Role</searchLink><br /><searchLink fieldCode="DE" term="%22Public+Opinion%22">Public Opinion</searchLink><br /><searchLink fieldCode="DE" term="%22Immigrants%22">Immigrants</searchLink><br /><searchLink fieldCode="DE" term="%22Content+Analysis%22">Content Analysis</searchLink><br /><searchLink fieldCode="DE" term="%22Immigration%22">Immigration</searchLink><br /><searchLink fieldCode="DE" term="%22Social+Science+Research%22">Social Science Research</searchLink><br /><searchLink fieldCode="DE" term="%22Information+Retrieval%22">Information Retrieval</searchLink>
– Name: Subject
  Label: Geographic Terms
  Group: Su
  Data: <searchLink fieldCode="DE" term="%22Sweden%22">Sweden</searchLink>
– Name: DOI
  Label: DOI
  Group: ID
  Data: 10.1177/00491241241268453
– Name: ISSN
  Label: ISSN
  Group: ISSN
  Data: 0049-1241<br />1552-8294
– Name: Abstract
  Label: Abstract
  Group: Ab
  Data: Sociologists are discussing the need for more formal ways to extract meaning from digital text archives. We focus attention on the seeded topic model, a semi-supervised extension to the standard topic model that allows sociological knowledge to be infused into the computational learning of meaning structures. Seed words help crystallize topics around known concepts, while utilizing topic models' functionality to identify associations in text based on word co-occurrences. The method estimates a concept's shared interpretation (or framing) via its associations with other frequently co-occurring topics. In a case study, we extract longitudinal measures of media frames regarding immigration from a vast corpus of millions of Swedish newspaper articles from the period 1945-2019. We infer turning points that partition the immigration discourse into meaningful eras and locate Sweden's era of multicultural ideals that coined its tolerant reputation.
– Name: AbstractInfo
  Label: Abstractor
  Group: Ab
  Data: As Provided
– Name: DateEntry
  Label: Entry Date
  Group: Date
  Data: 2026
– Name: AN
  Label: Accession Number
  Group: ID
  Data: EJ1496219
PLink https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=eric&AN=EJ1496219
RecordInfo BibRecord:
  BibEntity:
    Identifiers:
      – Type: doi
        Value: 10.1177/00491241241268453
    Languages:
      – Text: English
    PhysicalDescription:
      Pagination:
        PageCount: 37
        StartPage: 120
    Subjects:
      – SubjectFull: Foreign Countries
        Type: general
      – SubjectFull: Newspapers
        Type: general
      – SubjectFull: Mass Media Role
        Type: general
      – SubjectFull: Public Opinion
        Type: general
      – SubjectFull: Immigrants
        Type: general
      – SubjectFull: Content Analysis
        Type: general
      – SubjectFull: Immigration
        Type: general
      – SubjectFull: Social Science Research
        Type: general
      – SubjectFull: Information Retrieval
        Type: general
      – SubjectFull: Sweden
        Type: general
    Titles:
      – TitleFull: Seeded Topic Models in Digital Archives: Analyzing Interpretations of Immigration in Swedish Newspapers, 1945-2019
        Type: main
  BibRelationships:
    HasContributorRelationships:
      – PersonEntity:
          Name:
            NameFull: Miriam Hurtado Bodell
      – PersonEntity:
          Name:
            NameFull: Måns Magnusson
      – PersonEntity:
          Name:
            NameFull: Marc Keuschnigg
    IsPartOfRelationships:
      – BibEntity:
          Dates:
            – D: 01
              M: 02
              Type: published
              Y: 2026
          Identifiers:
            – Type: issn-print
              Value: 0049-1241
            – Type: issn-electronic
              Value: 1552-8294
          Numbering:
            – Type: volume
              Value: 55
            – Type: issue
              Value: 1
          Titles:
            – TitleFull: Sociological Methods & Research
              Type: main
ResultId 1