Bayesian Estimation and Testing of a Linear Logistic Test Model for Learning during the Test

Saved in:
Bibliographic Details
Title: Bayesian Estimation and Testing of a Linear Logistic Test Model for Learning during the Test
Language: English
Authors: Lozano, José H., Revuelta, Javier (ORCID 0000-0003-4705-6282)
Source: Applied Measurement in Education. 2021 34(3):223-235.
Availability: Routledge. Available from: Taylor & Francis, Ltd. 530 Walnut Street Suite 850, Philadelphia, PA 19106. Tel: 800-354-1420; Tel: 215-625-8900; Fax: 215-207-0050; Web site: http://www.tandf.co.uk/journals
Peer Reviewed: Y
Page Count: 13
Publication Date: 2021
Document Type: Journal Articles
Reports - Research
Descriptors: Bayesian Statistics, Computation, Learning, Testing, Statistical Analysis, Models, Test Items, Difficulty Level, Item Response Theory, Logical Thinking
DOI: 10.1080/08957347.2021.1933982
ISSN: 0895-7347
Abstract: The present study proposes a Bayesian approach for estimating and testing the operation-specific learning model, a variant of the linear logistic test model that allows for the measurement of the learning that occurs during a test as a result of the repeated use of the operations involved in the items. The advantages of using a Bayesian framework compared to the traditional frequentist approach are discussed. The application of the model is illustrated with real data from a logical ability test. The results show how the incorporation of previous practice into the linear logistic model improves the fit of the model as well as the prediction of the Rasch item difficulty estimates. The model provides evidence of learning associated with two of the logic operations involved in the items, which supports the hypothesis of practice effects in deductive reasoning tasks.
Abstractor: As Provided
Entry Date: 2021
Accession Number: EJ1312427
Database: ERIC
Full text is not displayed to guests.
FullText Links:
  – Type: pdflink
    Url: https://content.ebscohost.com/cds/retrieve?content=AQICAHj0k_4E0hTGH8RJwT4gCJyBsGNe_WN95AvKlDbXJGqwxwHDR_kZjaR2n1ern1rrv1GgAAAA4zCB4AYJKoZIhvcNAQcGoIHSMIHPAgEAMIHJBgkqhkiG9w0BBwEwHgYJYIZIAWUDBAEuMBEEDCW7-NAfaN7hCBt94gIBEICBmzKXLyyseEFLKkI0H96Ht7VoKoZrG_ynx0lU-5THlXGzPcaZzI-QOm1EFfS9dlI9prsELL05_4IOIHQ8Cb9hXgnLaH3CTS_1yNNIqVmLmwygFAfK0y1UAK8CKABARTiAoNKQjnIyZoUnMfG4JH0IUtWoKDUP-OkbdO4Xst9dk14PUbiLNQi1MeyQzXmqxZXf_eUv1AEPaZ4kA3zW
Text:
  Availability: 1
  Value: <anid>AN0152947607;7lg01jul.21;2021Oct13.02:12;v2.2.500</anid> <title id="AN0152947607-1">Bayesian Estimation and Testing of a Linear Logistic Test Model for Learning during the Test </title> <sbt id="AN0152947607-2">1. Introduction</sbt> <p>The present study proposes a Bayesian approach for estimating and testing the operation-specific learning model, a variant of the linear logistic test model that allows for the measurement of the learning that occurs during a test as a result of the repeated use of the operations involved in the items. The advantages of using a Bayesian framework compared to the traditional frequentist approach are discussed. The application of the model is illustrated with real data from a logical ability test. The results show how the incorporation of previous practice into the linear logistic model improves the fit of the model as well as the prediction of the Rasch item difficulty estimates. The model provides evidence of learning associated with two of the logic operations involved in the items, which supports the hypothesis of practice effects in deductive reasoning tasks.</p> <p>When the principles involved in solving the items of an educational or psychological test can be extrapolated from one item to another, the respondents may learn to respond more effectively during the test. In such cases, incorporating previous practice into the item response model allows for an account of the learning effect as well as for the obtaining of unbiased estimates of person and item parameters. In the present paper, a Bayesian approach is presented to detect and measure the learning that occurs during a test due to the repeated use of the cognitive operations common to the items. The proposed approach is based on the linear logistic test model (LLTM), an extension of the Rasch model (Rasch, [<reflink idref="bib27" id="ref1">27</reflink>]) developed by Fischer ([<reflink idref="bib8" id="ref2">8</reflink>]).</p> <hd id="AN0152947607-3">1.1. The Linear Logistic Approach to Item Response Modeling</hd> <p>The LLTM (Fischer, [<reflink idref="bib8" id="ref3">8</reflink>], [<reflink idref="bib9" id="ref4">9</reflink>], [<reflink idref="bib10" id="ref5">10</reflink>]) may be considered the first explanatory item response model (De Boeck & Wilson, [<reflink idref="bib5" id="ref6">5</reflink>]). Its origin dates back to the early 1970s; however, it has been in the last two decades when the model has received most attention, which is reflected in the large volume of related publications, concerning from substantive applications to new methodological developments (see Janssen, [<reflink idref="bib17" id="ref7">17</reflink>]).</p> <p>The LLTM is a member of the family of Rasch models (Rasch, [<reflink idref="bib27" id="ref8">27</reflink>]). The Rasch model establishes that the logit of a correct response for person <emph>i</emph> (<emph>i</emph> = 1, 2, ... , <emph>I</emph>) to item <emph>j</emph> (<emph>j</emph> = 1, 2, ... , <emph>J</emph>) is:</p> <p>(<reflink idref="bib1" id="ref9">1</reflink>)</p> <p>Graph</p> <p> <ephtml> <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mrow><mi mathvariant="normal">l</mi><mi mathvariant="normal">o</mi><mi mathvariant="normal">g</mi><mi mathvariant="normal">i</mi><mi mathvariant="normal">t</mi></mrow></mrow><mfenced open="[" close="]"><mrow><mrow><msub><mi>X</mi><mrow><mi>i</mi><mi>j</mi></mrow></msub></mrow><mo>=</mo><mn>1</mn></mrow></mfenced><mo>=</mo><mrow><msub><mi>θ</mi><mi>i</mi></msub></mrow><mo>−</mo><mrow><msub><mi>β</mi><mi>j</mi></msub></mrow><mrow><mo>,</mo></mrow></math> </ephtml> </p> <p>where <emph>θ<subs>i</subs></emph> is the ability of person <emph>i</emph>, and<emph> β<subs>j</subs></emph> is the difficulty of item <emph>j</emph>. The idea underlying the LLTM was first introduced by Scheiblechner ([<reflink idref="bib30" id="ref10">30</reflink>]), who regressed the Rasch item parameter estimates on item properties derived from the hypothetical cognitive structure of the test. Subsequently, Fischer ([<reflink idref="bib8" id="ref11">8</reflink>]) incorporated the item properties into the model and developed a set of conditional maximum likelihood (CML) estimating equations. Specifically, the LLTM decomposes the Rasch difficulty parameter, <emph>β<subs>j</subs></emph>, into a linear combination that represents the weighted sum of the difficulties of the cognitive operations involved in solving the item:</p> <p>(<reflink idref="bib2" id="ref12">2</reflink>)</p> <p>Graph</p> <p> <ephtml> <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow><msub><mi>β</mi><mi>j</mi></msub></mrow><mo>=</mo><mrow><munderover><mo movablelimits="false">∑</mo><mrow><mi>m</mi><mo>=</mo><mn>1</mn></mrow><mi>M</mi></munderover></mrow><mrow><mrow><msub><mi>w</mi><mrow><mi>j</mi><mi>m</mi></mrow></msub></mrow><mrow><msub><mi>α</mi><mi>m</mi></msub></mrow></mrow><mrow><mo>,</mo></mrow></math> </ephtml> </p> <p>where <emph>α<subs>m</subs></emph> is a basic parameter that represents the difficulty of operation <emph>m</emph> (<emph>m</emph> = 1, 2, ... , <emph>M</emph>), and <emph>w<subs>jm</subs></emph> is the weight of item <emph>j</emph> on operation <emph>m</emph>. The model is completed by <bold>W</bold>, a structural <emph>J</emph> × <emph>M</emph> matrix that contains the weights (<emph>w<subs>jm</subs></emph>) of each of the <emph>J</emph> items on each of the <emph>M</emph> operations. Each weight is given by the number of times operation <emph>m</emph> is involved in the solution of item <emph>j</emph>. As can be appreciated, the LLTM is a Rasch model with linear constraints on the item parameters and, therefore, shares many of the statistical properties of the Rasch model. For instance, the total score of a person on the items and the item sum score across persons constitute sufficient statistics for the incidental (<emph>θ<subs>i</subs></emph>) and structural (<emph>α<subs>m</subs></emph>) parameters, respectively. Consequently, the structural parameters can be estimated by CML equations that condition the incidental parameters out of the likelihood. Therefore, the estimates of the structural parameters are sample-free; that is, they can be estimated independently of the true values of the person parameters. For the structural parameters to be estimated by means of CML, the matrix <bold>W</bold><sups>+</sups><bold> = (</bold><bold>W</bold>; <bold>1</bold>) (i.e., <bold>W</bold> supplemented with a column vector of ones) must have full column rank; that is, <emph>rank</emph>(<bold>W</bold><sups>+</sups>) = <emph>M</emph> + 1 (Fischer, [<reflink idref="bib9" id="ref13">9</reflink>]). As a result, the number of operations is restricted to <emph>M</emph> ≤ <emph>J</emph> – 1. The fit of the Rasch model to the data constitutes a prerequisite for using the LLTM.</p> <hd id="AN0152947607-4">1.2. The Linear Logistic Approach for the Detection of Learning during the Test</hd> <p>Based on the idea underlying the LLTM, Scheiblechner ([<reflink idref="bib30" id="ref14">30</reflink>]) proposed a variant of the model that allows for the measurement of the learning that takes place during the test due to the repeated presentation of the operations involved in the items (see also Fischer & Formann, [<reflink idref="bib11" id="ref15">11</reflink>]; Spada, [<reflink idref="bib33" id="ref16">33</reflink>]; Spada & McGaw, [<reflink idref="bib34" id="ref17">34</reflink>]). Following Spada ([<reflink idref="bib33" id="ref18">33</reflink>]), this variant will be referred here as the operation-specific learning model (OSLM). The OSLM includes a practice parameter (<emph>δ<subs>m</subs></emph>) specific to each cognitive operation <emph>m</emph>. This parameter is weighted by the number of times operation <emph>m</emph> is involved in the current item <emph>j</emph> (<emph>w<subs>jm</subs></emph>) as well as by the number of times the person has practiced operation <emph>m</emph> in previous items (<emph>w<subs>km</subs></emph>, where <emph>k</emph> = 1, 2, ... , <emph>j</emph> – 1):</p> <p>(<reflink idref="bib3" id="ref19">3</reflink>)</p> <p>Graph</p> <p> <ephtml> <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow><msub><mi>β</mi><mi>j</mi></msub></mrow><mo>=</mo><mrow><munderover><mo movablelimits="false">∑</mo><mrow><mi>m</mi><mo>=</mo><mn>1</mn></mrow><mi>M</mi></munderover></mrow><mrow><mrow><msub><mi>w</mi><mrow><mi>j</mi><mi>m</mi></mrow></msub></mrow></mrow><mfenced open="(" close=")"><mrow><mrow><msub><mi>α</mi><mi>m</mi></msub></mrow><mo>−</mo><mrow><msub><mi>δ</mi><mi>m</mi></msub></mrow><mrow><munderover><mo movablelimits="false">∑</mo><mrow><mi>k</mi><mo>=</mo><mn>1</mn></mrow><mrow><mi>j</mi><mo>−</mo><mn>1</mn></mrow></munderover></mrow><mrow><mrow><msub><mi>w</mi><mrow><mi>k</mi><mi>m</mi></mrow></msub></mrow></mrow></mrow></mfenced><mrow><mo>,</mo></mrow></math> </ephtml> </p> <p>where <emph>α<subs>m</subs></emph> represents the initial difficulty of operation <emph>m</emph>, independently of the practice effect; <emph>w<subs>km</subs></emph> is the weight of the previous item <emph>k</emph> on operation <emph>m</emph>; and <emph>δ<subs>m</subs></emph> is the practice parameter, which represents the change in the difficulty of operation <emph>m</emph> that results from practicing the operation. A positive sign for the <emph>δ<subs>m</subs></emph> parameter implies a decrease in difficulty associated with operation <emph>m</emph> throughout the test as a function of practice, which may be interpreted as a learning effect. A negative sign, on the other hand, implies an increase in difficulty associated with operation <emph>m</emph> as a function of practice, which may be interpreted as fatigue or loss of attention (Debeer & Janssen, [<reflink idref="bib6" id="ref20">6</reflink>]; Kubinger, [<reflink idref="bib19" id="ref21">19</reflink>]). Consequently, the Rasch difficulty parameter is decomposed into an initial-difficulty component (Σ<emph>w<subs>jm</subs>α<subs>m</subs></emph>), derived from the cognitive operations involved in solving the item, and a practice component (ΣΣ<emph>w<subs>jm</subs>w<subs>km</subs>δ<subs>m</subs></emph>), derived from practicing said operations in previous items. Note that the practice effect is assumed to be the same for all subjects, and, therefore, the OSLM does not model local dependencies between items. As it happens with the LLTM, the Rasch model must fit the data for using the OSLM. Similarly, the OSLM can be estimated by the CML algorithm developed by Fischer ([<reflink idref="bib8" id="ref22">8</reflink>]). Mathematically, the OSLM is an LLTM with weigh matrix <bold>Q</bold> = (<bold>W</bold>; <bold>V</bold>), where <bold>V</bold> is a <emph>J</emph> × <emph>M</emph> matrix whose elements represent previous practice. More specifically, the elements in <bold>V</bold> are given by</p> <p>Graph</p> <p> <ephtml> <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow><msub><mi>v</mi><mrow><mi>j</mi><mi>m</mi></mrow></msub></mrow><mo>=</mo><mrow><msub><mi>w</mi><mrow><mi>j</mi><mi>m</mi></mrow></msub></mrow><mrow><munderover><mo movablelimits="false">∑</mo><mrow><mi>k</mi><mo>=</mo><mn>1</mn></mrow><mrow><mi>j</mi><mo>−</mo><mn>1</mn></mrow></munderover></mrow><mrow><mrow><msub><mi>w</mi><mrow><mi>k</mi><mi>m</mi></mrow></msub></mrow></mrow></math> </ephtml> . Therefore, in the OSLM, the full column rank condition is <emph>rank</emph>(<bold>Q</bold><sups>+</sups>) = 2<emph>M</emph> + 1, and the number of operations is restricted to <emph>M</emph> ≤ (<emph>J</emph> − 1) / 2.</p> <p>In contrast to other learning models (e.g., Kempf, [<reflink idref="bib18" id="ref23">18</reflink>]; Verhelst & Glas, [<reflink idref="bib37" id="ref24">37</reflink>]), the OSLM is an explanatory model in that it is not only aimed at measuring the learning effect, but also to account for it in terms of item properties (De Boeck, Cho, & Wilson, [<reflink idref="bib3" id="ref25">3</reflink>]; De Boeck & Wilson, [<reflink idref="bib4" id="ref26">4</reflink>], [<reflink idref="bib3" id="ref27">3</reflink>]). Spada ([<reflink idref="bib33" id="ref28">33</reflink>]; see also Spada & McGaw, [<reflink idref="bib34" id="ref29">34</reflink>]) fitted the model to data obtained from the administration of tasks from elementary mechanics, outperforming other learning models such as the deterministic model of Scandura, the probabilistic automata model of Suppes, and his own logistic automata model. Although the OSLM fitted the data only approximately, Spada found a satisfactory correspondence between the parameter estimates of the OSLM and those of the Rasch model, a result that was corroborated by cross-validation across different samples of subjects and tasks.</p> <p>The OSLM is of particular interest when there is reason to suspect learning effects during a test due to the repeated use of cognitive operations across items and the researcher is interested in adopting an explanatory approach. In such circumstances, the basic parameter estimates of the LLTM may be biased, confounding the difficulty of the operations with the practice effects taking place during the test. This scenario is not uncommon in the fields of psychology and education, where the type of test in which the items are based on a limited set of cognitive operations is often used for the assessment of abilities and competencies. In the present study, the applicability of the OSLM was illustrated by using a logical ability test whose items are based on various logic operations that are repeatedly applied throughout the test.</p> <hd id="AN0152947607-5">1.3. The DA5 Logical Ability Test</hd> <p>The test used in this study was the DA5 (SHL, [<reflink idref="bib31" id="ref30">31</reflink>]), a logical analysis test based on 10 logic operations that may be combined in different ways. The test consists of 50 items ordered by increasing difficulty. Each item includes a sequence from 2 to 4 figures. A symbol is presented together with each figure, representing the logic operation that must be applied to transform the figure. The subject must choose between five response alternatives the one that represents the result of applying the logic operations. For simplicity purposes, only 18 items were used in this study, which require applying seven logic operations: (<reflink idref="bib1" id="ref31">1</reflink>) rotate the figure from top to bottom; (<reflink idref="bib2" id="ref32">2</reflink>) rotate the figure from left to right; (<reflink idref="bib3" id="ref33">3</reflink>) erase the previous figure; (<reflink idref="bib4" id="ref34">4</reflink>) erase the next figure; (<reflink idref="bib5" id="ref35">5</reflink>) interchange the figure with the previous one; (<reflink idref="bib6" id="ref36">6</reflink>) ignore the previous operator; and (<reflink idref="bib7" id="ref37">7</reflink>) ignore the next operator. Figure 1 illustrates a hypothetical example of DA5 item. The item involves operations 1, 2, 4, and 5, in that order, and the correct response is C. The transposed matrix <bold>W</bold> corresponding to the 18 items is represented in Table 1.</p> <p>Table 1. Transposed matrix <bold>W</bold> for the DA5 items</p> <p> <ephtml> <table><thead><tr><td /><td>Item</td></tr><tr><td>Operation</td><td>1</td><td>2</td><td>3</td><td>4</td><td>5</td><td>6</td><td>7</td><td>8</td><td>9</td><td>10</td><td>11</td><td>12</td><td>13</td><td>14</td><td>15</td><td>16</td><td>17</td><td>18</td></tr></thead><tbody><tr><td>1</td><td>1</td><td>0</td><td>2</td><td>1</td><td>0</td><td>1</td><td>1</td><td>0</td><td>1</td><td>0</td><td>1</td><td>1</td><td>1</td><td>0</td><td>0</td><td>1</td><td>1</td><td>1</td></tr><tr><td>2</td><td>0</td><td>1</td><td>0</td><td>0</td><td>1</td><td>0</td><td>0</td><td>0</td><td>0</td><td>1</td><td>1</td><td>1</td><td>0</td><td>0</td><td>2</td><td>0</td><td>3</td><td>0</td></tr><tr><td>3</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>1</td><td>0</td><td>0</td><td>0</td><td>0</td><td>1</td><td>0</td><td>0</td><td>0</td><td>1</td></tr><tr><td>4</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>1</td><td>0</td><td>1</td><td>0</td><td>0</td><td>1</td><td>1</td><td>0</td><td>0</td><td>0</td><td>1</td></tr><tr><td>5</td><td>0</td><td>0</td><td>0</td><td>1</td><td>1</td><td>0</td><td>0</td><td>1</td><td>0</td><td>0</td><td>1</td><td>0</td><td>1</td><td>0</td><td>1</td><td>2</td><td>0</td><td>0</td></tr><tr><td>6</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>1</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>1</td><td>0</td><td>1</td><td>0</td><td>0</td><td>0</td><td>0</td></tr><tr><td>7</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>1</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>1</td></tr></tbody></table> </ephtml> </p> <p>Graph: Figure 1. Hypothetical example of DA5 item</p> <p>A cognitive model of the way respondents solve the DA5 items was adopted (Revuelta, [<reflink idref="bib28" id="ref38">28</reflink>]; Revuelta & Ponsoda, [<reflink idref="bib29" id="ref39">29</reflink>]). The model is based on two assumptions: (<reflink idref="bib1" id="ref40">1</reflink>) the item operations are applied sequentially, starting at the top row; and (<reflink idref="bib2" id="ref41">2</reflink>) processing is exhaustive, that is, all operations are applied. On the basis of these assumptions, the model establishes three steps, which are repeated until all operations have been applied: (<reflink idref="bib1" id="ref42">1</reflink>) coding the information about the figure and the operation; (<reflink idref="bib2" id="ref43">2</reflink>) applying the operation on the figure; and (<reflink idref="bib3" id="ref44">3</reflink>) selecting a response alternative that matches the result of applying the operation. The first and third steps, coding and selection, are assumed not to be the source of the individual differences in performance. Instead, these differences are assumed to be caused exclusively by the second step, applying the operation. Consequently, it was hypothesized that item difficulty can be decomposed into the difficulties of the operations involved in the item. A second hypothesis was formulated based on Sternberg's ([<reflink idref="bib36" id="ref45">36</reflink>]) classification of the cognitive processes used on common reasoning tasks, that is: (<reflink idref="bib1" id="ref46">1</reflink>) selective encoding (i.e., distinguishing relevant from irrelevant information); (<reflink idref="bib2" id="ref47">2</reflink>) selective comparison (i.e., retrieving and comparing a subset of the potentially relevant information from long-term memory); and (<reflink idref="bib3" id="ref48">3</reflink>) selective combination (i.e., strategic combination of information in working memory). While the first and second processes, selective encoding and selective comparison, are known to play a more significant role in inductive reasoning tasks, the third process, selective combination, is considered to be more influential in deductive reasoning tasks such as the DA5. Specifically, the second step of the DA5 performance model, applying the operation on the figure, corresponds to the selective combination process of Sternberg's classification, which, interestingly, is assumed to be susceptible to training (Lohman & Lakin, [<reflink idref="bib20" id="ref49">20</reflink>]). Therefore, it was also hypothesized that the difficulty of the DA5 operations may be reduced throughout the test as a result of practice.</p> <hd id="AN0152947607-6">1.4. Bayesian Framework</hd> <p>The purpose of the present study was to propose a Bayesian framework for estimating and testing the OSLM. One advantage of using a Bayesian framework in the present context is the possibility of using the OSLM when the conditions for CML estimation are not met, as, for instance, when the matrix <bold>Q</bold><sups>+</sups> does not satisfy the full rank condition. In CML estimation, the full column rank condition of <bold>Q</bold><sups>+</sups> ensures that the Rasch item parameters (<emph>β<subs>j</subs></emph>) can be decomposed uniquely into the OSLM difficulty and practice parameters (<emph>α<subs>m</subs></emph> and <emph>δ<subs>m</subs></emph>) while fixes the scale of the latent variable (<emph>θ<subs>i</subs></emph>). In Bayesian inference, by contrast, the <emph>θ</emph> scale is fixed by specifying the prior distribution of the parameter, so the looser condition of full column rank of <bold>Q</bold>, <emph>rank</emph>(<bold>Q</bold>) = 2<emph>M</emph>, is enough to ensure the uniqueness of the relation between the parameters of the two models. As a consequence, the original restriction <emph>M</emph> ≤ (<emph>J</emph> − 1) / 2 is relaxed to <emph>M</emph> ≤ <emph>J</emph> / 2. The Bayesian approach also allows for the use of assessment statistics specifically focused on certain aspects of model-data fit. This can be done without the need for adopting distributional assumptions, given that the reference distribution of the statistic is generated by using replicated data. This feature is of particular interest in the present context, where the distribution of the statistics appropriate to assess the fit of the models is not known. In this regard, since practice effects during the test may elicit local dependencies between items that cannot be accounted for by Rasch models, a discrepancy measure useful for capturing violations of local independence, the odds-ratio (Sinharay, [<reflink idref="bib32" id="ref50">32</reflink>]), was used in combination with a global measure of fit, the chi-square statistic (Béguin & Glas, [<reflink idref="bib1" id="ref51">1</reflink>]). The fact that these statistics do not have well-specified distributional assumptions (see Chen & Thissen, [<reflink idref="bib2" id="ref52">2</reflink>]) makes the Bayesian approach particularly useful for estimating and testing the OSLM. Another advantage of using a Bayesian framework is that, depending of the size of the matrix <bold>W</bold>, the OSLM might be heavily parameterized (note that the number of parameters to estimate in the OSLM is twice the one in the LLTM). In such a case, the Bayesian approach facilitates inference incorporating additional information through prior distributions, which may help to reduce the standard errors of parameter estimates and to avoid potential non-convergent cases that may arise in combination with small samples. Finally, the Bayesian framework also provides the possibility of extending the model to variants able to account for contingent learning or individual differences in learning (see Lozano & Revuelta, [<reflink idref="bib21" id="ref53">21</reflink>]), for which there are no estimation and evaluation procedures available in the frequentist framework.</p> <hd id="AN0152947607-7">2. Method</hd> <p></p> <hd id="AN0152947607-8">2.1. Analysis</hd> <p>The Rasch model, the LLTM, and the OSLM were fitted to a sample of responses from 621 individuals to the 18 items of the DA5. The estimation was conducted with R version 3.6.1 (R Development Core Team, [<reflink idref="bib26" id="ref54">26</reflink>]) and the RStan package version 2.19.2 (Stan Development Team, [<reflink idref="bib35" id="ref55">35</reflink>]), which interfaces with Stan. Stan is a programming software that implements the no-U-turn sampler (NUTS; Hoffman & Gelman, [<reflink idref="bib16" id="ref56">16</reflink>]), an extension of the Hamiltonian Monte Carlo (HMC; Duane, Kennedy, Pendleton, & Roweth, [<reflink idref="bib7" id="ref57">7</reflink>]; Neal, [<reflink idref="bib24" id="ref58">24</reflink>], [<reflink idref="bib25" id="ref59">25</reflink>]) algorithm. HMC overcomes some of the limitations of the traditional Gibbs sampler (Geman & Geman, [<reflink idref="bib15" id="ref60">15</reflink>]) and the Metropolis algorithm (Metropolis, Rosenbluth, Rosenbluth, Teller, & Teller, [<reflink idref="bib23" id="ref61">23</reflink>]), particularly in terms of computational efficiency in exploring the posterior parameter space (Gelman et al., [<reflink idref="bib12" id="ref62">12</reflink>]). Four Markov chains of 2,000 samples each were run. The first half of the samples of each chain were discarded as burn-in. The potential scale reduction statistic (</p> <p>Graph</p> <p> <ephtml> <math xmlns="http://www.w3.org/1998/Math/MathML"><mover><mi>R</mi><mo stretchy="false">ˆ</mo></mover></math> </ephtml> ; Gelman & Rubin, [<reflink idref="bib14" id="ref63">14</reflink>]) was used to evaluate the convergence of the chains. The prior distributions for the parameters were:</p> <p>Graph</p> <p> <ephtml> <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow><msub><mi>α</mi><mi>m</mi></msub></mrow><mo>∼</mo><mi>N</mi><mo stretchy="false">(</mo><mn>0</mn><mo>,</mo><mrow><msub><mi>σ</mi><mi>α</mi></msub></mrow><mo stretchy="false">)</mo><mo>,</mo></math> </ephtml> </p> <p>Graph</p> <p> <ephtml> <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow><msub><mi>δ</mi><mi>m</mi></msub></mrow><mo>∼</mo><mi>N</mi><mo stretchy="false">(</mo><mn>0</mn><mo>,</mo><mrow><msub><mi>σ</mi><mi>δ</mi></msub></mrow><mo stretchy="false">)</mo><mo>,</mo></math> </ephtml> </p> <p>Graph</p> <p> <ephtml> <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow><msub><mi>β</mi><mi>j</mi></msub></mrow><mo>∼</mo><mi>N</mi><mo stretchy="false">(</mo><mn>0</mn><mo>,</mo><mrow><msub><mi>σ</mi><mi>β</mi></msub></mrow><mo stretchy="false">)</mo><mo>,</mo></math> </ephtml> </p> <p>and</p> <p>Graph</p> <p> <ephtml> <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow><msub><mi>θ</mi><mi>i</mi></msub></mrow><mo>∼</mo><mi>N</mi><mfenced open="(" close=")"><mrow><mn>0</mn><mo>,</mo><mn>1</mn></mrow></mfenced><mo>,</mo></math> </ephtml> </p> <p>whereas the hyper-prior distribution for the <emph>σ<subs>α</subs>, σ<subs>δ</subs></emph>, and <emph>σ<subs>β</subs></emph> hyper-parameters was <emph>Γ</emph>(<reflink idref="bib1" id="ref64">1</reflink>, 1).</p> <p>The fit of the models to the data was assessed by means of posterior predictive model checking (PPMC; Gelman, Meng, & Stern, [<reflink idref="bib13" id="ref65">13</reflink>]). A sample of predicted responses was generated for each sample of simulated parameters, and the posterior predictive <emph>p</emph> (PPP) value (Gelman et al., [<reflink idref="bib13" id="ref66">13</reflink>]; Meng, [<reflink idref="bib22" id="ref67">22</reflink>]) was computed based on two discrepancy measures: the chi-square statistic (Béguin & Glas, [<reflink idref="bib1" id="ref68">1</reflink>]) and the odds-ratio (Sinharay, [<reflink idref="bib32" id="ref69">32</reflink>]). The chi-square statistic is a measure of global fit that captures the extent to which the observed score distribution is reproduced by the model:</p> <p>(<reflink idref="bib4" id="ref70">4</reflink>)</p> <p>Graph</p> <p> <ephtml> <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow><msup><mi>χ</mi><mn>2</mn></msup></mrow><mo>=</mo><mrow><munderover><mo movablelimits="false">∑</mo><mrow><mi>g</mi><mo>=</mo><mn>0</mn></mrow><mi>G</mi></munderover></mrow><mrow><mrow><mfrac><mrow><mrow><msup><mrow><mfenced open="[" close="]"><mrow><mi>N</mi><mrow><msub><mi>C</mi><mi>g</mi></msub></mrow><mo>−</mo><mi>E</mi><mfenced open="(" close=")"><mrow><mi>N</mi><mrow><msub><mi>C</mi><mi>g</mi></msub></mrow></mrow></mfenced></mrow></mfenced></mrow><mn>2</mn></msup></mrow></mrow><mrow><mi>E</mi><mfenced open="(" close=")"><mrow><mi>N</mi><mrow><msub><mi>C</mi><mi>g</mi></msub></mrow></mrow></mfenced></mrow></mfrac></mrow></mrow><mrow><mo>,</mo></mrow></math> </ephtml> </p> <p>where <emph>g</emph> (<emph>g</emph> = 0, 1, ... , <emph>G</emph>) represents the raw score group; <emph>NC<subs>g</subs></emph> represents the observed frequency of number-correct scores in group <emph>g</emph>; and <emph>E</emph>(<emph>NC<subs>g</subs></emph>) represents the frequency of number-correct scores in group <emph>g</emph> expected by the model, calculated as the mean of all the replicated samples. The odds-ratio is a measure of association between pairs of items that is computationally simple and does not depend on the fitted model. The odds-ratio for items <emph>j</emph> and <emph>j'</emph> is defined as:</p> <p>(<reflink idref="bib5" id="ref71">5</reflink>)</p> <p>Graph</p> <p> <ephtml> <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mrow><mi mathvariant="normal">O</mi></mrow></mrow><mrow><msub><mrow><mrow><mi mathvariant="normal">R</mi></mrow></mrow><mrow><mi>j</mi><mi>j</mi><mrow><msup><mi /><mrow><mi mathvariant="normal">′</mi></mrow></msup></mrow></mrow></msub></mrow><mo>=</mo><mrow><mfrac><mrow><mrow><msub><mi>n</mi><mrow><mn>11</mn></mrow></msub></mrow><mrow><msub><mi>n</mi><mrow><mn>00</mn></mrow></msub></mrow></mrow><mrow><mrow><msub><mi>n</mi><mrow><mn>10</mn></mrow></msub></mrow><mrow><msub><mi>n</mi><mrow><mn>01</mn></mrow></msub></mrow></mrow></mfrac></mrow><mrow><mo>,</mo></mrow></math> </ephtml> </p> <p>where <emph>n<subs>xx'</subs></emph> is the number of individuals scoring <emph>x</emph> on item <emph>j</emph> and <emph>x'</emph> on item <emph>j'</emph>. The odds-ratio allows for the identification of inter-item associations beyond those explained by the model (Chen & Thissen, [<reflink idref="bib2" id="ref72">2</reflink>]). This is of particular interest in the present context, since practice effects during the test may elicit local dependencies between items that cannot be accounted for by the OSLM. Measures of inter-item associations at the item level and at the test level are obtained by summing the odds-ratios over the pairs of items.</p> <p>The PPP value is the proportion of draws in which the posterior predictive value of the discrepancy measure,</p> <p>Graph</p> <p> <ephtml> <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>D</mi><mfenced open="(" close=")"><mrow><mrow><msup><mrow><mrow><mi mathvariant="bold">X</mi></mrow></mrow><mrow><mrow><mrow><mi mathvariant="normal">r</mi><mi mathvariant="normal">e</mi><mi mathvariant="normal">p</mi></mrow></mrow></mrow></msup></mrow><mo>;</mo><mrow><mrow><mi>θ</mi></mrow></mrow><mo>,</mo><mrow><mrow><mi>ω</mi></mrow></mrow></mrow></mfenced></math> </ephtml> , is equal to or higher than the realized value,</p> <p>Graph</p> <p> <ephtml> <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>D</mi><mfenced open="(" close=")"><mrow><mrow><mrow><mi mathvariant="bold">X</mi></mrow></mrow><mo>;</mo><mrow><mrow><mi>θ</mi></mrow></mrow><mo>,</mo><mrow><mrow><mi>ω</mi></mrow></mrow></mrow></mfenced></math> </ephtml> :</p> <p>(<reflink idref="bib6" id="ref73">6</reflink>)</p> <p>Graph</p> <p> <ephtml> <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mrow><mi mathvariant="normal">P</mi><mi mathvariant="normal">P</mi><mi mathvariant="normal">P</mi></mrow></mrow><mo>=</mo><mi>P</mi><mfenced open="[" close="]"><mrow><mi>D</mi><mfenced open="(" close=")"><mrow><mrow><msup><mrow><mrow><mi mathvariant="bold">X</mi></mrow></mrow><mrow><mrow><mrow><mi mathvariant="normal">r</mi><mi mathvariant="normal">e</mi><mi mathvariant="normal">p</mi></mrow></mrow></mrow></msup></mrow><mo>;</mo><mrow><mrow><mi>θ</mi></mrow></mrow><mo>,</mo><mrow><mrow><mi>ω</mi></mrow></mrow></mrow></mfenced><mo>≥</mo><mi>D</mi><mfenced open="(" close=")"><mrow><mrow><mrow><mi mathvariant="bold">X</mi></mrow></mrow><mo>;</mo><mrow><mrow><mi>θ</mi></mrow></mrow><mo>,</mo><mrow><mrow><mi>ω</mi></mrow></mrow></mrow></mfenced><mrow><mo>|</mo></mrow><mrow><mrow><mi mathvariant="bold">X</mi></mrow></mrow></mrow></mfenced><mrow><mo>,</mo></mrow></math> </ephtml> </p> <p>where <emph>rep</emph> stands for replicated data, and <bold>θ</bold> and <bold>ω</bold> represent the vectors of person and item parameters, respectively. The null hypothesis that the model fits the data was rejected when PPP <.05 or PPP >.95. For the interested reader, the R and RStan scripts used in this study are available as supplementary material to this paper.</p> <hd id="AN0152947607-9">3. Results</hd> <p>Table 2 shows the goodness-of-fit estimates at the test level for the Rasch model, the LLTM, and the OSLM. The PPP value of the chi-square statistic indicated that the Rasch model fitted the data well (.95 > PPP >.05), which constitutes a precondition for estimating both the LLTM and the OSLM. On the contrary, the LLTM did not fit the data (PPP <.05), indicating that the difficulty parameters of the hypothesized components included in the structural matrix were not enough to explain the variability in the responses. The OSLM, however, showed a good fit (again,.95 > PPP >.05), suggesting that the incorporation of previous practice into the model provided a better account of the data. The PPP value of the odds-ratio supported the fit of the Rasch model and the OSLM to the data (both.95 > PPP >.05), suggesting the absence of local dependencies between items derived from practice effects. Furthermore, the analysis of fit at the item level based on the odds-ratio statistic (Table 3) showed 4 (22%) misfitting ítems (.95 < PPP <.05) in terms of local dependency for the Rasch model, 12 (67%) for the LLTM, and 6 (33%) for the OSLM, which again indicated a substantial improvement in model fit for the OSLM in comparison with the LLTM.</p> <p>Table 2. Mean observed (Obs), simulated (Sim), and posterior predictive <emph>p</emph> (PPP) values of the chi-square (<emph>χ</emph><sups>2</sups>) and the odds-ratio (OR) statistics at the test level for the Rasch model, the linear logistic test model (LLTM), and the operation-specific learning model (OSLM)</p> <p> <ephtml> <table><thead><tr><td /><td><italic>χ</italic><sup>2</sup></td><td>OR</td></tr><tr><td /><td>Obs</td><td>Sim(SD)</td><td>PPP</td><td>Obs</td><td>Sim(SD)</td><td>PPP</td></tr></thead><tbody><tr><td>Rasch</td><td>26.495</td><td>17.295(17.081)</td><td>.083</td><td>410.37</td><td>399.04(48.93)</td><td>.360</td></tr><tr><td>LLTM</td><td>39.529</td><td>17.140(10.441)</td><td>.021</td><td>410.37</td><td>319.79(24.53)</td><td>.002</td></tr><tr><td>OSLM</td><td>25.589</td><td>17.187(15.783)</td><td>.099</td><td>410.37</td><td>375.54(40.21)</td><td>.173</td></tr></tbody></table> </ephtml> </p> <p>Table 3. Mean observed (Obs), simulated (Sim), and posterior predictive <emph>p</emph> (PPP) values of the odds-ratio statistic at the item level for the Rasch model, the linear logistic test model (LLTM), and the operation-specific learning model (OSLM)</p> <p> <ephtml> <table><thead><tr><td /><td /><td>Rasch</td><td>LLTM</td><td>OSLM</td></tr><tr><td>Item</td><td>Obs</td><td>Sim(SD)</td><td>PPP</td><td>Sim(SD)</td><td>PPP</td><td>Sim(SD)</td><td>PPP</td></tr></thead><tbody><tr><td>1</td><td>26.03</td><td>44.23(9.17)</td><td>.994</td><td>34.27(3.93)</td><td>.991</td><td>40.80(7.00)</td><td>.995</td></tr><tr><td>2</td><td>36.60</td><td>55.62(27.32)</td><td>.771</td><td>33.78(3.62)</td><td>.192</td><td>43.07(9.66)</td><td>.747</td></tr><tr><td>3</td><td>42.60</td><td>43.71(8.96)</td><td>.502</td><td>36.58(6.34)</td><td>.161</td><td>54.22(25.09)</td><td>.630</td></tr><tr><td>4</td><td>55.58</td><td>42.05(7.27)</td><td>.043</td><td>34.24(3.92)</td><td>.000</td><td>38.49(5.22)</td><td>.007</td></tr><tr><td>5</td><td>41.64</td><td>40.00(5.60)</td><td>.331</td><td>33.91(3.85)</td><td>.031</td><td>39.99(6.13)</td><td>.356</td></tr><tr><td>6</td><td>49.10</td><td>52.62(21.47)</td><td>.500</td><td>35.13(4.69)</td><td>.006</td><td>49.13(17.79)</td><td>.421</td></tr><tr><td>7</td><td>48.41</td><td>46.46(11.98)</td><td>.380</td><td>36.60(5.96)</td><td>.034</td><td>44.06(10.70)</td><td>.292</td></tr><tr><td>8</td><td>29.99</td><td>40.31(5.66)</td><td>.991</td><td>33.88(3.73)</td><td>.872</td><td>38.62(5.28)</td><td>.982</td></tr><tr><td>9</td><td>62.93</td><td>44.19(9.27)</td><td>.037</td><td>35.52(5.22)</td><td>.000</td><td>42.15(8.24)</td><td>.017</td></tr><tr><td>10</td><td>52.12</td><td>44.82(10.12)</td><td>.201</td><td>33.68(3.67)</td><td>.002</td><td>40.27(6.84)</td><td>.046</td></tr><tr><td>11</td><td>69.54</td><td>53.28(22.04)</td><td>.178</td><td>35.98(5.39)</td><td>.000</td><td>43.39(9.98)</td><td>.014</td></tr><tr><td>12</td><td>41.24</td><td>40.92(6.37)</td><td>.429</td><td>37.08(6.63)</td><td>.246</td><td>39.33(5.80)</td><td>.330</td></tr><tr><td>13</td><td>42.28</td><td>39.71(5.61)</td><td>.268</td><td>34.01(3.80)</td><td>.024</td><td>38.21(4.89)</td><td>.178</td></tr><tr><td>14</td><td>45.22</td><td>40.28(5.85)</td><td>.175</td><td>34.05(3.77)</td><td>.007</td><td>38.43(5.21)</td><td>.086</td></tr><tr><td>15</td><td>42.37</td><td>43.95(8.85)</td><td>.535</td><td>35.05(4.65)</td><td>.064</td><td>40.54(6.92)</td><td>.352</td></tr><tr><td>16</td><td>44.71</td><td>41.06(6.43)</td><td>.242</td><td>34.36(4.02)</td><td>.012</td><td>39.09(5.68)</td><td>.145</td></tr><tr><td>17</td><td>34.31</td><td>40.54(6.10)</td><td>.858</td><td>43.23(15.40)</td><td>.711</td><td>39.07(5.79)</td><td>.809</td></tr><tr><td>18</td><td>56.07</td><td>44.33(9.39)</td><td>.103</td><td>38.22(7.98)</td><td>.026</td><td>42.20(8.40)</td><td>.054</td></tr></tbody></table> </ephtml> </p> <p>The residual analysis (Table 4) substantiated the results of the goodness-of-fit assessment. The 95% posterior probability intervals of the residuals obtained by subtracting the LLTM item difficulties from the Rasch item difficulties did not include the value zero, except those for items 8 and 18 (11%). By contrast, in the case of the residuals obtained by subtracting the OSLM item difficulties from the Rasch item difficulties, 12 intervals (67%) included the value zero (i.e., items 1, 6, 7, 8, 9, 12, 13, 14, 15, 16, 17, and 18), again revealing a relative improvement for the OSLM in comparison with the LLTM in terms of the prediction of item difficulty.</p> <p>Table 4. Expected a posteriori (EAP) estimates, posterior standard deviations (SD), and posterior probability intervals (2.5%–97.5%) of the residuals obtained by subtracting the LLTM and the OSLM item difficulties from the Rasch item difficulties</p> <p> <ephtml> <table><thead><tr><td /><td>LLTM</td><td>OSLM</td></tr><tr><td>Item</td><td>EAP</td><td>SD</td><td>2.5%</td><td>97.5%</td><td>EAP</td><td>SD</td><td>2.5%</td><td>97.5%</td></tr></thead><tbody><tr><td>1</td><td>–1.660</td><td>0.174</td><td>–2.010</td><td>–1.323</td><td>–0.387</td><td>0.201</td><td>–0.787</td><td>0.004</td></tr><tr><td>2</td><td>–3.822</td><td>0.346</td><td>–4.541</td><td>–3.190</td><td>–1.473</td><td>0.363</td><td>–2.227</td><td>–0.795</td></tr><tr><td>3</td><td>–0.416</td><td>0.194</td><td>–0.804</td><td>–0.047</td><td>1.849</td><td>0.257</td><td>1.332</td><td>2.356</td></tr><tr><td>4</td><td>–1.000</td><td>0.147</td><td>–1.298</td><td>–0.719</td><td>–0.931</td><td>0.166</td><td>–1.259</td><td>–0.611</td></tr><tr><td>5</td><td>–0.381</td><td>0.122</td><td>–0.621</td><td>–0.147</td><td>0.950</td><td>0.149</td><td>0.654</td><td>1.237</td></tr><tr><td>6</td><td>–2.553</td><td>0.298</td><td>–3.151</td><td>–1.997</td><td>–0.208</td><td>0.399</td><td>–0.985</td><td>0.576</td></tr><tr><td>7</td><td>–1.002</td><td>0.233</td><td>–1.459</td><td>–0.555</td><td>–0.071</td><td>0.282</td><td>–0.608</td><td>0.486</td></tr><tr><td>8</td><td>–0.032</td><td>0.119</td><td>–0.263</td><td>0.204</td><td>0.105</td><td>0.138</td><td>–0.163</td><td>0.377</td></tr><tr><td>9</td><td>–0.895</td><td>0.200</td><td>–1.289</td><td>–0.510</td><td>–0.011</td><td>0.236</td><td>–0.474</td><td>0.442</td></tr><tr><td>10</td><td>–2.499</td><td>0.189</td><td>–2.883</td><td>–2.140</td><td>–0.771</td><td>0.206</td><td>–1.178</td><td>–0.375</td></tr><tr><td>11</td><td>–2.159</td><td>0.296</td><td>–2.761</td><td>–1.605</td><td>–1.006</td><td>0.305</td><td>–1.631</td><td>–0.455</td></tr><tr><td>12</td><td>0.722</td><td>0.161</td><td>0.407</td><td>1.032</td><td>0.021</td><td>0.172</td><td>–0.320</td><td>0.350</td></tr><tr><td>13</td><td>0.270</td><td>0.128</td><td>0.018</td><td>0.516</td><td>0.125</td><td>0.142</td><td>–0.153</td><td>0.411</td></tr><tr><td>14</td><td>–0.427</td><td>0.150</td><td>–0.720</td><td>–0.127</td><td>–0.053</td><td>0.158</td><td>–0.361</td><td>0.257</td></tr><tr><td>15</td><td>–1.039</td><td>0.183</td><td>–1.409</td><td>–0.688</td><td>–0.394</td><td>0.209</td><td>–0.817</td><td>0.007</td></tr><tr><td>16</td><td>–0.604</td><td>0.152</td><td>–0.906</td><td>–0.313</td><td>–0.078</td><td>0.169</td><td>–0.415</td><td>0.262</td></tr><tr><td>17</td><td>2.119</td><td>0.189</td><td>1.748</td><td>2.490</td><td>0.082</td><td>0.169</td><td>–0.248</td><td>0.413</td></tr><tr><td>18</td><td>–0.013</td><td>0.217</td><td>–0.440</td><td>0.395</td><td>–0.018</td><td>0.234</td><td>–0.475</td><td>0.435</td></tr></tbody></table> </ephtml> </p> <p>The posterior mean of the standardized regression coefficient of the Rasch item difficulties on the LLTM item difficulties was.282, revealing a poor explanatory and predictive power for the LLTM basic parameter estimates. By contrast, the posterior mean of the standardized regression coefficient of the Rasch item difficulties on the OSLM item difficulties was.811, which reflected a substantial increase in the explanatory and predictive power of the model after the incorporation of previous practice. Figure 2 shows the posterior mean standardized regression lines of the Rasch item difficulties on the LLTM (left) and on the OSLM (right) item difficulties.</p> <p>Graph: Figure 2. Scatterplots with the posterior mean standardized regression lines with 95% posterior probability intervals of the Rasch model item difficulty estimates on the linear logistic test model (LLTM) item difficulty estimates (left) and on the operation-specific learning model (OSLM) item difficulty estimates (right)</p> <p>Tables 5 to 7 show the expected a posteriori (EAP) estimates of the parameters and hyper-parameters of the Rasch model, the LLTM, and the OSLM, respectively. According to the LLTM basic parameter estimates (Table 6), the order of difficulty (from higher to lower difficulty) for the cognitive operations was: 4, 5, 6, 3, 2, 7, and 1. These estimates represent the marginal difficulty of each cognitive operation; that is, its difficulty confounded with the practice effect. However, according to the OSLM difficulty parameter estimates (Table 7, left), the order of difficulty for the cognitive operations was: 5, 4, 3, 7, 6, 1, and 2. Note that the order is different from the one obtained with the LLTM estimates, which is due to the fact that the OSLM difficulty parameters represent the initial difficulty of each cognitive operation, independently of the practice effect. The OSLM difficulty estimates indicated that the most difficult operation was the fifth operation, <emph>interchange the figure with the previous one</emph>, whereas the easiest operations were the second and first operations, <emph>rotate the figure from left to right</emph> and <emph>rotate the figure from top to bottom</emph>, respectively.</p> <p>Table 5. Expected a posteriori (EAP) estimates, posterior standard deviations (SD), and posterior probability intervals (2.5%–97.5%) of the item parameters (<emph>β<subs>j</subs></emph>) and the hyper-parameter (<emph>σ<subs>β</subs></emph>) for the Rasch model</p> <p> <ephtml> <table><thead><tr><td /><td>EAP</td><td>SD</td><td>2.5%</td><td>97.5%</td><td /><td>EAP</td><td>SD</td><td>2.5%</td><td>97.5%</td></tr></thead><tbody><tr><td><italic>β</italic><sub>1</sub></td><td>–2.885</td><td>0.167</td><td>–3.225</td><td>–2.560</td><td><italic>β</italic><sub>11</sub></td><td>–4.299</td><td>0.285</td><td>–4.897</td><td>–3.769</td></tr><tr><td><italic>β</italic><sub>2</sub></td><td>–4.682</td><td>0.334</td><td>–5.406</td><td>–4.060</td><td><italic>β</italic><sub>12</sub></td><td>–1.884</td><td>0.119</td><td>–2.121</td><td>–1.656</td></tr><tr><td><italic>β</italic><sub>3</sub></td><td>–2.866</td><td>0.163</td><td>–3.192</td><td>–2.558</td><td><italic>β</italic><sub>13</sub></td><td>–0.721</td><td>0.101</td><td>–0.923</td><td>–0.527</td></tr><tr><td><italic>β</italic><sub>4</sub></td><td>–2.279</td><td>0.136</td><td>–2.553</td><td>–2.019</td><td><italic>β</italic><sub>14</sub></td><td>–1.449</td><td>0.113</td><td>–1.676</td><td>–1.229</td></tr><tr><td><italic>β</italic><sub>5</sub></td><td>–1.296</td><td>0.106</td><td>–1.503</td><td>–1.092</td><td><italic>β</italic><sub>15</sub></td><td>–2.814</td><td>0.158</td><td>–3.133</td><td>–2.507</td></tr><tr><td><italic>β</italic><sub>6</sub></td><td>–4.300</td><td>0.284</td><td>–4.867</td><td>–3.776</td><td><italic>β</italic><sub>16</sub></td><td>–1.938</td><td>0.122</td><td>–2.186</td><td>–1.707</td></tr><tr><td><italic>β</italic><sub>7</sub></td><td>–3.401</td><td>0.194</td><td>–3.795</td><td>–3.031</td><td><italic>β</italic><sub>17</sub></td><td>–1.687</td><td>0.117</td><td>–1.922</td><td>–1.465</td></tr><tr><td><italic>β</italic><sub>8</sub></td><td>0.203</td><td>0.099</td><td>0.015</td><td>0.399</td><td><italic>β</italic><sub>18</sub></td><td>–2.914</td><td>0.162</td><td>–3.235</td><td>–2.609</td></tr><tr><td><italic>β</italic><sub>9</sub></td><td>–2.910</td><td>0.166</td><td>–3.247</td><td>–2.598</td><td /><td /><td /><td /><td /></tr><tr><td><italic>β</italic><sub>10</sub></td><td>–3.071</td><td>0.174</td><td>–3.425</td><td>–2.733</td><td><italic>σ<sub>β</sub></italic></td><td>2.770</td><td>0.453</td><td>2.034</td><td>3.820</td></tr></tbody></table> </ephtml> </p> <p>Table 6. Expected a posteriori (EAP) estimates, posterior standard deviations (SD), and posterior probability intervals (2.5%–97.5%) of the basic parameters (<emph>α<subs>m</subs></emph>) and the hyper-parameter (<emph>σ<subs>α</subs></emph>) for the linear logistic test model</p> <p> <ephtml> <table><thead><tr><td /><td>EAP</td><td>SD</td><td>2.5%</td><td>97.5%</td></tr></thead><tbody><tr><td><italic>α</italic><sub>1</sub></td><td>–1.225</td><td>0.052</td><td>–1.328</td><td>–1.125</td></tr><tr><td><italic>α</italic><sub>2</sub></td><td>–0.860</td><td>0.043</td><td>–0.944</td><td>–0.777</td></tr><tr><td><italic>α</italic><sub>3</sub></td><td>–0.790</td><td>0.098</td><td>–0.984</td><td>–0.592</td></tr><tr><td><italic>α</italic><sub>4</sub></td><td>0.289</td><td>0.060</td><td>0.173</td><td>0.408</td></tr><tr><td><italic>α</italic><sub>5</sub></td><td>–0.054</td><td>0.048</td><td>–0.147</td><td>0.039</td></tr><tr><td><italic>α</italic><sub>6</sub></td><td>–0.521</td><td>0.091</td><td>–0.701</td><td>–0.344</td></tr><tr><td><italic>α</italic><sub>7</sub></td><td>–1.174</td><td>0.135</td><td>–1.440</td><td>–0.917</td></tr><tr><td><italic>σ<sub>α</sub></italic></td><td>0.915</td><td>0.280</td><td>0.537</td><td>1.608</td></tr></tbody></table> </ephtml> </p> <p>Table 7. Expected a posteriori (EAP) estimates, posterior standard deviations (SD), and posterior probability intervals (2.5%–97.5%) of the difficulty parameters (<emph>α<subs>m</subs></emph>), practice parameters (<emph>δ<subs>m</subs></emph>), and hyper-parameters (<emph>σ<subs>α</subs></emph> and <emph>σ<subs>δ</subs></emph>) for the operation-specific learning model</p> <p> <ephtml> <table><thead><tr><td /><td>EAP</td><td>SD</td><td>2.5%</td><td>97.5%</td><td /><td>EAP</td><td>SD</td><td>2.5%</td><td>97.5%</td></tr></thead><tbody><tr><td><italic>α</italic><sub>1</sub></td><td>–2.498</td><td>0.109</td><td>–2.709</td><td>–2.285</td><td><italic>δ</italic><sub>1</sub></td><td>–0.140</td><td>0.022</td><td>–0.183</td><td>–0.098</td></tr><tr><td><italic>α</italic><sub>2</sub></td><td>–3.209</td><td>0.120</td><td>–3.443</td><td>–2.976</td><td><italic>δ</italic><sub>2</sub></td><td>–0.420</td><td>0.023</td><td>–0.465</td><td>–0.375</td></tr><tr><td><italic>α</italic><sub>3</sub></td><td>–1.244</td><td>0.197</td><td>–1.639</td><td>–0.875</td><td><italic>δ</italic><sub>3</sub></td><td>5.039</td><td>0.671</td><td>3.681</td><td>6.345</td></tr><tr><td><italic>α</italic><sub>4</sub></td><td>–0.261</td><td>0.115</td><td>–0.484</td><td>–0.034</td><td><italic>δ</italic><sub>4</sub></td><td>–0.331</td><td>0.100</td><td>–0.522</td><td>–0.139</td></tr><tr><td><italic>α</italic><sub>5</sub></td><td>0.729</td><td>0.108</td><td>0.520</td><td>0.939</td><td><italic>δ</italic><sub>5</sub></td><td>0.185</td><td>0.021</td><td>0.145</td><td>0.226</td></tr><tr><td><italic>α</italic><sub>6</sub></td><td>–2.156</td><td>0.281</td><td>–2.742</td><td>–1.619</td><td><italic>δ</italic><sub>6</sub></td><td>–3.155</td><td>0.312</td><td>–3.810</td><td>–2.573</td></tr><tr><td><italic>α</italic><sub>7</sub></td><td>–1.533</td><td>0.217</td><td>–1.960</td><td>–1.131</td><td><italic>δ</italic><sub>7</sub></td><td>–9.712</td><td>1.143</td><td>–11.936</td><td>–7.344</td></tr><tr><td><italic>σ<sub>α</sub></italic></td><td>1.971</td><td>0.532</td><td>1.218</td><td>3.249</td><td><italic>σ<sub>δ</sub></italic></td><td>3.864</td><td>0.928</td><td>2.454</td><td>6.014</td></tr></tbody></table> </ephtml> </p> <p>The positive sign of the OSLM practice parameter estimates (Table 7, right) associated with operations 3 and 5, together with the absence of zero in the corresponding posterior probability intervals, indicated that these operations showed a decrease in difficulty during the test as a function of practice, which may be attributed to learning. The interpretation of the OSLM practice parameter estimates is straightforward. For instance, each time the respondents were presented with operation 5, there was a decrease of 0.185 in the difficulty of this operation in subsequent presentations. On the contrary, the negative sign associated with operations 1, 2, 4, 6, and 7, together with the absence of zero in the posterior probability intervals, revealed an increase in difficulty during the test for these operations as a function of practice, which may be attributed to fatigue or loss of attention. The inspection of the matrix <bold>W</bold>, nevertheless, suggests that such an increase may also be explained by interaction effects between logic operations. In this regard, the DA5 items involve a progressively greater number of operations throughout the test, some of them requiring interactions with other operations. Given the high number of operations involved in the test, the inclusion of the interaction effects in the matrix <bold>W</bold> was prohibitive. As a consequence, the results do not allow us to dismiss the possibility of learning effects associated with operations 1, 2, 4, 6, and 7, which may be masked by the increase in difficulty due to the items' structural properties. However, the obtained results support the existence of genuine learning effects in the DA5 derived from the practice of operations 3 and 5, since these operations showed a decrease in difficulty during the test despite the fact that the items in which they were involved included a progressively greater number of operations and interactions.</p> <p>Figure 3 shows the estimated difficulty of the DA5 operations as a function of previous practice, that is:[<reflink idref="bib1" id="ref74">1</reflink>]</p> <p>(<reflink idref="bib7" id="ref75">7</reflink>)</p> <p>Graph</p> <p> <ephtml> <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow><msub><mover><mi>α</mi><mo stretchy="false">ˆ</mo></mover><mi>m</mi></msub></mrow><mo>−</mo><mrow><msub><mi>p</mi><mi>m</mi></msub></mrow><mrow><msub><mover><mi>δ</mi><mo stretchy="false">ˆ</mo></mover><mi>m</mi></msub></mrow><mrow><mo>,</mo></mrow></math> </ephtml> </p> <p>Graph: Figure 3. Estimated difficulty of the DA5 operations as a function of previous practice</p> <p>where <emph>p<subs>m</subs></emph> (<emph>p<subs>m</subs></emph> = 0, 1, ... , <emph>P<subs>m</subs></emph> – 1) is the previous practice in operation <emph>m</emph>, and <emph>P<subs>m</subs></emph> is the number of times operation <emph>m</emph> was required in the test. The figure illustrates the decrease in difficulty during the test in operations 3 and 5. For instance, the initial difficulty of operation 5 (i.e., when <emph>p</emph><subs>5</subs> = 0) was</p> <p>Graph</p> <p> <ephtml> <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow><msub><mover><mi>α</mi><mo stretchy="false">ˆ</mo></mover><mn>5</mn></msub></mrow><mo>=</mo><mn>0.729</mn></math> </ephtml> , and each time this operation was practiced its difficulty decreased by</p> <p>Graph</p> <p> <ephtml> <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow><msub><mover><mi>δ</mi><mo stretchy="false">ˆ</mo></mover><mn>5</mn></msub></mrow><mo>=</mo><mn>0.185</mn></math> </ephtml> . Thereby, in item 16 (see Table 1), after six previous presentations (i.e., when <emph>p</emph><subs>5</subs> = 6), the difficulty associated with operation 5 was 0.729 – 6(0.185) = –0.381. Since item 16 involved operation 5 twice and also involved operation 1 after ten previous presentations, its difficulty, according to the OSLM, was:</p> <p>(<reflink idref="bib8" id="ref76">8</reflink>)</p> <p>Graph</p> <p> <ephtml> <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow><msub><mrow><mover><mi>β</mi><mo stretchy="false">ˆ</mo></mover></mrow><mrow><mn>16</mn><mspace width="thinmathspace" /></mrow></msub></mrow><mspace width="thinmathspace" /><mo>=</mo><mspace width="thinmathspace" /><mrow><msub><mrow><mover><mi>α</mi><mo stretchy="false">ˆ</mo></mover></mrow><mn>1</mn></msub></mrow><mspace width="thinmathspace" /><mo>−</mo><mspace width="thinmathspace" /><mrow><msub><mi>p</mi><mn>1</mn></msub></mrow><mover><msub><mi>δ</mi><mrow><mn>1</mn></mrow></msub><mo stretchy="false">ˆ</mo></mover><mspace width="thinmathspace" /><mo>+</mo><mspace width="thinmathspace" /><mn>2</mn><mfenced open="(" close=")"><mrow><mrow><msub><mrow><mover><mi>α</mi><mo stretchy="false">ˆ</mo></mover></mrow><mn>5</mn></msub></mrow><mspace width="thinmathspace" /><mo>−</mo><mspace width="thinmathspace" /><mrow><msub><mi>p</mi><mn>5</mn></msub></mrow><mrow><msub><mrow><mover><mi>δ</mi><mo stretchy="false">ˆ</mo></mover></mrow><mn>5</mn></msub></mrow></mrow></mfenced><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mtext mathcolor="red">\break</mtext><mo>=</mo><mo>−</mo><mn>2.498</mn><mo>−</mo><mn>10</mn><mo stretchy="false">(</mo><mo>−</mo><mn>0.140</mn><mo stretchy="false">)</mo><mo>+</mo><mn>2</mn><mfenced open="[" close="]"><mrow><mn>0.729</mn><mo>−</mo><mn>6</mn><mo stretchy="false">(</mo><mn>0.185</mn><mo stretchy="false">)</mo></mrow></mfenced><mspace width="thinmathspace" /><mtext mathcolor="red">\break</mtext><mo>=</mo><mo>−</mo><mn>1.86.</mn><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /><mspace width="thinmathspace" /></math> </ephtml> </p> <hd id="AN0152947607-10">4. Discussion</hd> <p>This study was aimed at illustrating the usefulness of a Bayesian framework for estimating and testing the OSLM (Fischer & Formann, [<reflink idref="bib11" id="ref77">11</reflink>]; Scheiblechner, [<reflink idref="bib30" id="ref78">30</reflink>]; Spada, [<reflink idref="bib33" id="ref79">33</reflink>]; Spada & McGaw, [<reflink idref="bib34" id="ref80">34</reflink>]). To that end, operation-specific learning effects were examined in a logical ability test. The goodness-of-fit estimates substantiated both the fit of the Rasch model to the data and the virtual absence of local dependencies in the data derived from practice effects. The results show how the incorporation of previous practice into the linear logistic model allowed for the detection of learning effects during the test as well as improved the fit of the model and the prediction of the Rasch item difficulty estimates.</p> <p>According to the OSLM, the most difficult logic operation was the fifth operation, <emph>interchange the figure with the previous one</emph>, whereas the easiest operations were the second and first operations, <emph>rotate the figure from left to right</emph> and <emph>rotate the figure from top to bottom</emph>, respectively. These results have a clear substantive interpretation. The first two logic operations were the only operations in the test that were not referred to other figures or operations within the item and, therefore, were less demanding in terms of working memory load. Moreover, their corresponding symbols were the easiest to recognize. As a result, they were less prone to errors than the other operations. On the contrary, the fifth logic operation was the only operation that involved a rearrangement of the figures, which, at the same time, entailed an interaction with another operation. As a result, the fifth operation was the most demanding in terms of working memory load and the most prone to errors. See Figure 1 for an illustration. Interestingly, the order of difficulty of the logic operations according to the LLTM was somewhat different than that observed with the OSLM due to the confounding effects of practice during the test.</p> <p>The OSLM identified the existence of learning effects in the third and fifth logic operations, <emph>erase the previous figure</emph> and <emph>interchange the figure with the previous one</emph>, respectively. Although the difficulty parameter estimates indicated that these operations were two of the most difficult operations at the beginning of the test, their corresponding practice parameter estimates suggested that their difficulties were significantly reduced throughout the test as a result of practice. The possibility that the increase in difficulty associated with operations 1, 2, 4, 6, and 7 was due to the items' structural properties did not allow us to properly examine the presence of learning effects associated with these operations. However, the obtained results are enough to support the existence of learning in the DA5, which is in line with the hypothesis that the selective combination category identified by Sternberg ([<reflink idref="bib36" id="ref81">36</reflink>]) for the resolution of reasoning tasks is susceptible to practice effects (Lohman & Lakin, [<reflink idref="bib20" id="ref82">20</reflink>]).</p> <p>The present study illustrates the usefulness of the OSLM for the substantive analysis of the learning processes underlying the responses to a set of items. However, there are more research and applied settings where the detection and measurement of learning effects during a test may be of interest. For instance, a potential application of the OSLM is the study of differences in learning ability between populations (e.g., normal vs impaired, children at different developmental stages, etc.). This can be easily achieved within the Bayesian framework by examining the posterior distribution of the differences in the learning parameter between populations. On this basis, the model allows for the identification of groups with learning difficulties in specific cognitive operations, which not only provides an explanation for their low performance on tests but also allows for the design of interventions focused on specific operations in order to improve performance. There are also interesting methodological applications of the OSLM such as the identification of items not suitable for computerized adaptive testing. In this regard, practice effects during the test make item properties depend on the item position in the test, which compromises their use in computerized adaptive tests. Furthermore, the OSLM may also open the door for the use of this kind of item in adaptive testing. Based on a prior assessment of the difficulty and practice effects associated with each operation, and provided that the model fits the data, the OSLM allows for on-the-fly estimation of the difficulty that an item would have in any position within the test as a function of the operations involved in the item and the practice the participant has had with these operations in previous items. In this vein, future research may explore the applicability of the OSLM to deal with practice effects in computerized adaptive testing.</p> <p>Despite its wide range of applications, the OSLM inherits some restrictive assumptions of the Rasch model that are necessary to sustain the sufficiency of the raw score. For instance, the model assumes equality of slopes for all items and a zero lower asymptote. Although the relaxation of these assumptions would entail certain costs in terms of model properties, future extensions of the model based on the three parameters logistic model might account for learning effects in a wider variety of item response contexts as a result of the incorporation of discrimination and guessing parameters into the model. Likewise, the model assumes that the learning effects are non-contingent and are the same for all participants, which may be a too restrictive assumption for particular sets of data. In this regard, future studies may be aimed at testing contingent learning formulations of the model as well as the possibility of estimating individual differences in learning.</p> <hd id="AN0152947607-11">Acknowledgement</hd> <p>The computations were run with the support of the Scientific Computing Centre at Universidad Autónoma de Madrid (CCC-UAM).</p> <hd id="AN0152947607-12">Disclosure conflicts of interest statement</hd> <p>No potential conflict of interest was reported by the author(s).</p> <hd id="AN0152947607-13">Supplemental Material</hd> <p>Supplemental data for this article can be accessed on the https://doi.org/10.1080/08957347.2021.1933982</p> <ref id="AN0152947607-14"> <title> References </title> <blist> <bibl id="bib1" idref="ref9" type="bt">1</bibl> <bibtext> Béguin, A. A., & Glas, C. A. W. (2001). MCMC estimation and some model-fit analysis of multidimensional IRT models. Psychometrika, 66, 541 – 562. doi: 10.1007/BF02296195</bibtext> </blist> <blist> <bibl id="bib2" idref="ref12" type="bt">2</bibl> <bibtext> Chen, W.-H., & Thissen, D. (1997). Local dependence indexes for item pairs using item response theory. Journal of Educational and Behavioral Statistics, 22, 265 – 289. doi: 10.3102/10769986022003265</bibtext> </blist> <blist> <bibl id="bib3" idref="ref19" type="bt">3</bibl> <bibtext> De Boeck, P., Cho, S. J., & Wilson, M. (2016). Explanatory item response models. In A. A. Rupp & J. P. Leighton (Eds.), The Wiley handbook of cognition and assessment: Frameworks, methodologies, and applications (pp. 249 – 268). Malden, MA : John Wiley & Sons.</bibtext> </blist> <blist> <bibl id="bib4" idref="ref26" type="bt">4</bibl> <bibtext> De Boeck, P., & Wilson, M. (2004). Explanatory item response models: A generalized linear and nonlinear approach. New York, NY : Springer.</bibtext> </blist> <blist> <bibl id="bib5" idref="ref6" type="bt">5</bibl> <bibtext> De Boeck, P., & Wilson, M. (2016). Explanatory response models. In W. J. van der Linden (Ed.), Handbook of item response theory. Volume one: Models (pp. 565 – 580). Boca Raton, FL : CRC Press.</bibtext> </blist> <blist> <bibl id="bib6" idref="ref20" type="bt">6</bibl> <bibtext> Debeer, D., & Janssen, R. (2013). Modeling item‐position effects within an IRT framework. Journal of Educational Measurement, 50, 164 – 185. doi: 10.1111/jedm.12009</bibtext> </blist> <blist> <bibl id="bib7" idref="ref37" type="bt">7</bibl> <bibtext> Duane, S., Kennedy, A. D., Pendleton, B. J., & Roweth, D. (1987). Hybrid Monte Carlo. Physics Letters B, 195, 216 – 222. doi: 10.1016/0370-2693(87)91197-X</bibtext> </blist> <blist> <bibl id="bib8" idref="ref2" type="bt">8</bibl> <bibtext> Fischer, G. H. (1973). The linear logistic test model as an instrument in educational research. Acta Psychologica, 3, 359 – 374. doi: 10.1016/0001-6918(73)90003-6</bibtext> </blist> <blist> <bibl id="bib9" idref="ref4" type="bt">9</bibl> <bibtext> Fischer, G. H. (1983). Logistic latent trait models with linear constraints. Psychometrika, 48, 3 – 26. doi: 10.1007/BF02314674</bibtext> </blist> <blist> <bibtext> Fischer, G. H. (1995). The linear logistic test model. In G. H. Fischer & I. W. Molenaar (Eds.), Rasch models: Foundations, recent developments, and applications (pp. 131 – 155). New York, NY : Springer.</bibtext> </blist> <blist> <bibtext> Fischer, G. H., & Formann, A. K. (1982). Some applications of logistic latent trait models with linear constraints on the parameters. Applied Psychological Measurement, 6, 397 – 416. doi: 10.1177/014662168200600403</bibtext> </blist> <blist> <bibtext> Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Rubin, D. B. (2013). Bayesian data analysis (3rd ed.). Boca Raton, FL : Chapman & Hall/CRC Press.</bibtext> </blist> <blist> <bibtext> Gelman, A., Meng, X.-L., & Stern, H. (1996). Posterior predictive assessment of model fitness via realized discrepancies. Statistica Sinica, 6, 733 – 807.</bibtext> </blist> <blist> <bibtext> Gelman, A., & Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences. Statistical Science, 7, 457 – 472. doi: 10.1214/ss/1177011136</bibtext> </blist> <blist> <bibtext> Geman, S., & Geman, D. (1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-6, 721 – 741. doi: 10.1109/TPAMI.1984.4767596</bibtext> </blist> <blist> <bibtext> Hoffman, M. D., & Gelman, A. (2014). The No-U-turn sampler: Adaptively setting path lengths in Hamiltonian Monte Carlo. Journal of Machine Learning Research, 15, 1593 – 1623.</bibtext> </blist> <blist> <bibtext> Janssen, R. (2016). Linear logistic models. In W. J. van der Linden (Ed.), Handbook of item response theory. Volume one: Models (pp. 211 – 225). Boca Raton, FL : CRC Press.</bibtext> </blist> <blist> <bibtext> Kempf, W. F. (1977). Dynamic models for the measurement of traits in social behavior. In W. F. Kempf & B. H. Repp (Eds.), Mathematical models for social psychology (pp. 14 – 58). London, UK : Wiley.</bibtext> </blist> <blist> <bibtext> Kubinger, K. D. (2008). On the revival of the Rasch model-based LLTM: From constructing tests using item generating rules to measuring item administration effects. Psychology Science Quarterly, 50, 311 – 327.</bibtext> </blist> <blist> <bibtext> Lohman, D. F., & Lakin, J. M. (2011). Intelligence and reasoning. In R. J. Sternberg & S. B. Kaufman (Eds.), The Cambridge handbook of intelligence (pp. 419 – 441). New York, NY : Cambridge University Press.</bibtext> </blist> <blist> <bibtext> Lozano, J. H., & Revuelta, J. (2020). Investigating operation-specific learning effects in the Raven's Advanced Progressive Matrices : A linear logistic test modeling approach. Intelligence, 82, 101468. doi: 10.1016/j.intell.2020.101468</bibtext> </blist> <blist> <bibtext> Meng, X.-L. (1994). Posterior predictive p-values. The Annals of Statistics, 22, 1142 – 1160. doi: 10.1214/aos/1176325622</bibtext> </blist> <blist> <bibtext> Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., & Teller, E. (1953). Equation of state calculations by fast computing machines. The Journal of Chemical Physics, 21, 1087 – 1092. doi: 10.1063/1.1699114</bibtext> </blist> <blist> <bibtext> Neal, R. M. (1994). An improved acceptance procedure for the hybrid Monte Carlo algorithm. Journal of Computational Physics, 111, 194 – 203. doi: 10.1006/jcph.1994.1054</bibtext> </blist> <blist> <bibtext> Neal, R. M. (2011). MCMC using Hamiltonian dynamics. In S. Brooks, A. Gelman, G. L. Jones, & X.-L. Meng (Eds.), Handbook of Markov chain Monte Carlo (pp. 116 – 162). Boca Raton, FL : Chapman and Hall/CRC.</bibtext> </blist> <blist> <bibtext> R Development Core Team. (2019). R: A language and environment for statistical computing. R Vienna, Austria : Foundation for Statistical Computing.</bibtext> </blist> <blist> <bibtext> Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen, Denmark : Danish Institute for Educational Research.</bibtext> </blist> <blist> <bibtext> Revuelta, J. (2010). Estimating difficulty from polytomous categorical data. Psychometrika, 75, 331 – 350. doi: 10.1007/s11336-009-9145-9</bibtext> </blist> <blist> <bibtext> Revuelta, J., & Ponsoda, V. (1998). Un test adaptativo informatizado de análisis lógico basado en la generación automática de ítems (A logical analysis computerized adaptive test based on automatic item generation). Psicothema, 10, 709 – 716.</bibtext> </blist> <blist> <bibtext> Scheiblechner, H. (1972). Das lernen und lösen komplexer denkaufgaben (The learning and solving of complex cognitive tasks). Zeitschrift für Experimentelle und Angewandte Psychologie, 19, 476 – 506.</bibtext> </blist> <blist> <bibtext> SHL. (1996). DA5: Diagramas codificados (DA5: Coded diagrams). Madrid, Spain : Author.</bibtext> </blist> <blist> <bibtext> Sinharay, S. (2005). Assessing fit of unidimensional item response theory models using a Bayesian approach. Journal of Educational Measurement, 42, 375 – 394. doi: 10.1111/j.1745-3984.2005.00021.x</bibtext> </blist> <blist> <bibtext> Spada, H. (1977). Logistic models of learning and thought. In H. Spada & W. F. Kempf (Eds.), Structural models of thinking and learning (pp. 227 – 262). Bern, Germany : Huber.</bibtext> </blist> <blist> <bibtext> Spada, H., & McGaw, B. (1985). The assessment of learning effects with linear logistic test models. In S. Embretson (Ed.), Test design: New directions in psychology and psychometrics (pp. 169 – 193). New York, NY : Academic Press.</bibtext> </blist> <blist> <bibtext> Stan Development Team. (2019). Stan modeling language: user's guide and reference manual. Version 2.20.0. Retrieved from <ulink href="http://mc-stan.org">http://mc-stan.org</ulink></bibtext> </blist> <blist> <bibtext> Sternberg, R. J. (1986). Toward a unified theory of human reasoning. Intelligence, 10, 281 – 314. doi: 10.1016/0160-2896(86)90001-2</bibtext> </blist> <blist> <bibtext> Verhelst, N. D., & Glas, C. A. W. (1993). A dynamic generalization of the Rasch model. Psychometrika, 58, 395 – 415. doi: 10.1007/BF02294648</bibtext> </blist> </ref> <ref id="AN0152947607-15"> <title> Footnotes </title> <blist> <bibtext> The circumflex represents the expected a posteriori (EAP) estimate.</bibtext> </blist> </ref> <aug> <p>By José H. Lozano and Javier Revuelta</p> <p>Reported by Author; Author</p> </aug> <nolink nlid="nl1" bibid="bib27" firstref="ref1"></nolink> <nolink nlid="nl2" bibid="bib10" firstref="ref5"></nolink> <nolink nlid="nl3" bibid="bib17" firstref="ref7"></nolink> <nolink nlid="nl4" bibid="bib30" firstref="ref10"></nolink> <nolink nlid="nl5" bibid="bib11" firstref="ref15"></nolink> <nolink nlid="nl6" bibid="bib33" firstref="ref16"></nolink> <nolink nlid="nl7" bibid="bib34" firstref="ref17"></nolink> <nolink nlid="nl8" bibid="bib19" firstref="ref21"></nolink> <nolink nlid="nl9" bibid="bib18" firstref="ref23"></nolink> <nolink nlid="nl10" bibid="bib37" firstref="ref24"></nolink> <nolink nlid="nl11" bibid="bib31" firstref="ref30"></nolink> <nolink nlid="nl12" bibid="bib28" firstref="ref38"></nolink> <nolink nlid="nl13" bibid="bib29" firstref="ref39"></nolink> <nolink nlid="nl14" bibid="bib36" firstref="ref45"></nolink> <nolink nlid="nl15" bibid="bib20" firstref="ref49"></nolink> <nolink nlid="nl16" bibid="bib32" firstref="ref50"></nolink> <nolink nlid="nl17" bibid="bib21" firstref="ref53"></nolink> <nolink nlid="nl18" bibid="bib26" firstref="ref54"></nolink> <nolink nlid="nl19" bibid="bib35" firstref="ref55"></nolink> <nolink nlid="nl20" bibid="bib16" firstref="ref56"></nolink> <nolink nlid="nl21" bibid="bib24" firstref="ref58"></nolink> <nolink nlid="nl22" bibid="bib25" firstref="ref59"></nolink> <nolink nlid="nl23" bibid="bib15" firstref="ref60"></nolink> <nolink nlid="nl24" bibid="bib23" firstref="ref61"></nolink> <nolink nlid="nl25" bibid="bib12" firstref="ref62"></nolink> <nolink nlid="nl26" bibid="bib14" firstref="ref63"></nolink> <nolink nlid="nl27" bibid="bib13" firstref="ref65"></nolink> <nolink nlid="nl28" bibid="bib22" firstref="ref67"></nolink>
Header DbId: eric
DbLabel: ERIC
An: EJ1312427
AccessLevel: 3
PubType: Academic Journal
PubTypeId: academicJournal
PreciseRelevancyScore: 0
IllustrationInfo
Items – Name: Title
  Label: Title
  Group: Ti
  Data: Bayesian Estimation and Testing of a Linear Logistic Test Model for Learning during the Test
– Name: Language
  Label: Language
  Group: Lang
  Data: English
– Name: Author
  Label: Authors
  Group: Au
  Data: <searchLink fieldCode="AR" term="%22Lozano%2C+José+H%2E%22">Lozano, José H.</searchLink><br /><searchLink fieldCode="AR" term="%22Revuelta%2C+Javier%22">Revuelta, Javier</searchLink> (ORCID <externalLink term="http://orcid.org/0000-0003-4705-6282">0000-0003-4705-6282</externalLink>)
– Name: TitleSource
  Label: Source
  Group: Src
  Data: <searchLink fieldCode="SO" term="%22Applied+Measurement+in+Education%22"><i>Applied Measurement in Education</i></searchLink>. 2021 34(3):223-235.
– Name: Avail
  Label: Availability
  Group: Avail
  Data: Routledge. Available from: Taylor & Francis, Ltd. 530 Walnut Street Suite 850, Philadelphia, PA 19106. Tel: 800-354-1420; Tel: 215-625-8900; Fax: 215-207-0050; Web site: http://www.tandf.co.uk/journals
– Name: PeerReviewed
  Label: Peer Reviewed
  Group: SrcInfo
  Data: Y
– Name: Pages
  Label: Page Count
  Group: Src
  Data: 13
– Name: DatePubCY
  Label: Publication Date
  Group: Date
  Data: 2021
– Name: TypeDocument
  Label: Document Type
  Group: TypDoc
  Data: Journal Articles<br />Reports - Research
– Name: Subject
  Label: Descriptors
  Group: Su
  Data: <searchLink fieldCode="DE" term="%22Bayesian+Statistics%22">Bayesian Statistics</searchLink><br /><searchLink fieldCode="DE" term="%22Computation%22">Computation</searchLink><br /><searchLink fieldCode="DE" term="%22Learning%22">Learning</searchLink><br /><searchLink fieldCode="DE" term="%22Testing%22">Testing</searchLink><br /><searchLink fieldCode="DE" term="%22Statistical+Analysis%22">Statistical Analysis</searchLink><br /><searchLink fieldCode="DE" term="%22Models%22">Models</searchLink><br /><searchLink fieldCode="DE" term="%22Test+Items%22">Test Items</searchLink><br /><searchLink fieldCode="DE" term="%22Difficulty+Level%22">Difficulty Level</searchLink><br /><searchLink fieldCode="DE" term="%22Item+Response+Theory%22">Item Response Theory</searchLink><br /><searchLink fieldCode="DE" term="%22Logical+Thinking%22">Logical Thinking</searchLink>
– Name: DOI
  Label: DOI
  Group: ID
  Data: 10.1080/08957347.2021.1933982
– Name: ISSN
  Label: ISSN
  Group: ISSN
  Data: 0895-7347
– Name: Abstract
  Label: Abstract
  Group: Ab
  Data: The present study proposes a Bayesian approach for estimating and testing the operation-specific learning model, a variant of the linear logistic test model that allows for the measurement of the learning that occurs during a test as a result of the repeated use of the operations involved in the items. The advantages of using a Bayesian framework compared to the traditional frequentist approach are discussed. The application of the model is illustrated with real data from a logical ability test. The results show how the incorporation of previous practice into the linear logistic model improves the fit of the model as well as the prediction of the Rasch item difficulty estimates. The model provides evidence of learning associated with two of the logic operations involved in the items, which supports the hypothesis of practice effects in deductive reasoning tasks.
– Name: AbstractInfo
  Label: Abstractor
  Group: Ab
  Data: As Provided
– Name: DateEntry
  Label: Entry Date
  Group: Date
  Data: 2021
– Name: AN
  Label: Accession Number
  Group: ID
  Data: EJ1312427
PLink https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=eric&AN=EJ1312427
RecordInfo BibRecord:
  BibEntity:
    Identifiers:
      – Type: doi
        Value: 10.1080/08957347.2021.1933982
    Languages:
      – Text: English
    PhysicalDescription:
      Pagination:
        PageCount: 13
        StartPage: 223
    Subjects:
      – SubjectFull: Bayesian Statistics
        Type: general
      – SubjectFull: Computation
        Type: general
      – SubjectFull: Learning
        Type: general
      – SubjectFull: Testing
        Type: general
      – SubjectFull: Statistical Analysis
        Type: general
      – SubjectFull: Models
        Type: general
      – SubjectFull: Test Items
        Type: general
      – SubjectFull: Difficulty Level
        Type: general
      – SubjectFull: Item Response Theory
        Type: general
      – SubjectFull: Logical Thinking
        Type: general
    Titles:
      – TitleFull: Bayesian Estimation and Testing of a Linear Logistic Test Model for Learning during the Test
        Type: main
  BibRelationships:
    HasContributorRelationships:
      – PersonEntity:
          Name:
            NameFull: Lozano, José H.
      – PersonEntity:
          Name:
            NameFull: Revuelta, Javier
    IsPartOfRelationships:
      – BibEntity:
          Dates:
            – D: 01
              M: 01
              Type: published
              Y: 2021
          Identifiers:
            – Type: issn-print
              Value: 0895-7347
          Numbering:
            – Type: volume
              Value: 34
            – Type: issue
              Value: 3
          Titles:
            – TitleFull: Applied Measurement in Education
              Type: main
ResultId 1