Interpolated Testing and Content Pretesting as Interventions to Reduce Task-Unrelated Thoughts during a Video Lecture

Saved in:
Bibliographic Details
Title: Interpolated Testing and Content Pretesting as Interventions to Reduce Task-Unrelated Thoughts during a Video Lecture
Language: English
Authors: Welhaf, Matthew S., Phillips, Natalie E., Smeekens, Bridget A., Miyake, Akira, Kane, Michael J.
Source: Cognitive Research: Principles and Implications. 2022 7.
Availability: Springer. Available from: Springer Nature. One New York Plaza, Suite 4600, New York, NY 10004. Tel: 800-777-4643; Tel: 212-460-1500; Fax: 212-460-1700; e-mail: customerservice@springernature.com; Web site: https://link.springer.com/
Peer Reviewed: Y
Page Count: 22
Publication Date: 2022
Sponsoring Agency: National Science Foundation (NSF)
Contract Number: DRL1252333
DRL1252385
Document Type: Journal Articles
Reports - Research
Education Level: Higher Education
Postsecondary Education
Descriptors: Testing, Pretesting, Attention Control, Undergraduate Students, Lecture Method, Video Technology, Introductory Courses, Statistics Education, Situated Learning, Student Interests, Effect Size
DOI: 10.1186/s41235-022-00372-y
ISSN: 2365-7464
Abstract: Considerable research has examined the prevalence and apparent consequences of task-unrelated thoughts (TUTs) in both laboratory and authentic educational settings. Few studies, however, have explored methods to reduce TUTs during learning; those few studies tested small samples or used unvalidated TUT assessments. The present experimental study attempted to conceptually replicate or extend previous findings of interpolated testing and pretesting effects on TUT and learning. In a study of 195 U.S. undergraduates, we investigated whether interpolated testing (compared to interpolated restudy) and pretesting on lecture-relevant materials (compared to pretesting on conceptually related but lecture-irrelevant materials) would reduce TUTs during a video lecture on introductory statistics. Subjects completed either a content-matched or content-mismatched pretest on statistics concepts and then watched a narrated lecture slideshow. During the lecture, half of the sample completed interpolated tests on the lecture material and half completed interpolated restudy of that material. All subjects responded to unpredictably presented thought probes during the video to assess their immediately preceding thoughts, including TUTs. Following the lecture, students reported on their situational interest elicited by the lecture and then completed a posttest. Interpolated testing significantly reduced TUT rates during the lecture compared to restudying, conceptually replicating previous findings--but with a small effect size and no supporting Bayes-factor evidence. We found statistical evidence for neither an interpolated testing effect on learning, nor an effect of matched-content pretesting on TUT rates or learning. Interpolated testing might have limited utility to support students' attention, but varying effect sizes across studies warrants further work.
Abstractor: As Provided
Notes: https://osf.io/6ujsg
Entry Date: 2022
Accession Number: EJ1331359
Database: ERIC
Full text is not displayed to guests.
FullText Links:
  – Type: pdflink
    Url: https://content.ebscohost.com/cds/retrieve?content=AQICAHj0k_4E0hTGH8RJwT4gCJyBsGNe_WN95AvKlDbXJGqwxwFgLlSTFzTWF7_VlUKQnrcvAAAA4jCB3wYJKoZIhvcNAQcGoIHRMIHOAgEAMIHIBgkqhkiG9w0BBwEwHgYJYIZIAWUDBAEuMBEEDJ8F6Ym-SH9zebUTQAIBEICBmiD9vNvRdFH9RVKKDtFmRELn33vwDTWc3nDRJv1ugPwCro6RXu0Roq9zS9bbB-NkC-99_ZxiApybxmsFZR1Z3XaNbEB0-0-js5wiquRBgKbK5orAQkTROSCjdYLXakkk57AF0D41G0CFPVmNWmLg-EduBwk2rxfFk0yJgPG9keiLziADFAJ393p96kMfxWy0hCpIsOzeNAMAymE=
Text:
  Availability: 1
  Value: <anid>AN0156024414;[k1e6]26mar.22;2022Apr01.05:10;v2.2.500</anid> <title id="AN0156024414-1">Interpolated testing and content pretesting as interventions to reduce task-unrelated thoughts during a video lecture </title> <p>Considerable research has examined the prevalence and apparent consequences of task-unrelated thoughts (TUTs) in both laboratory and authentic educational settings. Few studies, however, have explored methods to reduce TUTs during learning; those few studies tested small samples or used unvalidated TUT assessments. The present experimental study attempted to conceptually replicate or extend previous findings of interpolated testing and pretesting effects on TUT and learning. In a study of 195 U.S. undergraduates, we investigated whether interpolated testing (compared to interpolated restudy) and pretesting on lecture-relevant materials (compared to pretesting on conceptually related but lecture-irrelevant materials) would reduce TUTs during a video lecture on introductory statistics. Subjects completed either a content-matched or content-mismatched pretest on statistics concepts and then watched a narrated lecture slideshow. During the lecture, half of the sample completed interpolated tests on the lecture material and half completed interpolated restudy of that material. All subjects responded to unpredictably presented thought probes during the video to assess their immediately preceding thoughts, including TUTs. Following the lecture, students reported on their situational interest elicited by the lecture and then completed a posttest. Interpolated testing significantly reduced TUT rates during the lecture compared to restudying, conceptually replicating previous findings—but with a small effect size and no supporting Bayes-factor evidence. We found statistical evidence for neither an interpolated testing effect on learning, nor an effect of matched-content pretesting on TUT rates or learning. Interpolated testing might have limited utility to support students' attention, but varying effect sizes across studies warrants further work.</p> <p>Keywords: Mind wandering; Attention; Education; Testing; Pretesting</p> <hd id="AN0156024414-2">Background</hd> <p>Students often lose focus and fail to attend to material presented during class, on video recordings, or in textbooks. Given the prevalence of such task-unrelated thoughts (TUTs), and the potential costs of chronic inattention to academic success, the science of learning has begun focusing its attention on distraction and mind wandering (for reviews, see Immordino-Yang et al., [<reflink idref="bib24" id="ref1">24</reflink>]; Lang, [<reflink idref="bib35" id="ref2">35</reflink>]; Pachai et al., [<reflink idref="bib46" id="ref3">46</reflink>]; Smallwood et al., [<reflink idref="bib72" id="ref4">72</reflink>]; Szpunar, Moulton, et al., [<reflink idref="bib75" id="ref5">75</reflink>]).</p> <p>Most studies on TUTs during learning rely on experience-sampling methods that randomly interrupt students during a scholastic activity to report on their immediately preceding thoughts, particularly on whether their thoughts were focused on the learning task. Considerable research—in both laboratory and authentic educational settings—has documented TUT rates' association with comprehension and learning outcomes, with students reporting more TUTs also demonstrating poorer comprehension and learning (e.g., Hollis & Was, [<reflink idref="bib23" id="ref6">23</reflink>]; Kane et al., [<reflink idref="bib29" id="ref7">29</reflink>]; Lindquist & McLean, [<reflink idref="bib38" id="ref8">38</reflink>]; Loh et al., [<reflink idref="bib41" id="ref9">41</reflink>]; Varao-Sousa & Kingstone, [<reflink idref="bib77" id="ref10">77</reflink>]; Wammes, Seli, et al., [<reflink idref="bib80" id="ref11">80</reflink>]). Empirical studies have also focused on identifying contextual and individual-difference predictors of TUTs during learning (e.g., Bixler & D'Mello, [<reflink idref="bib4" id="ref12">4</reflink>]; Forrin et al., [<reflink idref="bib17" id="ref13">17</reflink>]; Hollis & Was, [<reflink idref="bib23" id="ref14">23</reflink>]; Kane, Carruth, et al., [<reflink idref="bib27" id="ref15">27</reflink>]; Lindquist & McLean, [<reflink idref="bib38" id="ref16">38</reflink>]; Locke & Jensen, [<reflink idref="bib40" id="ref17">40</reflink>]; Pham & Wang, [<reflink idref="bib53" id="ref18">53</reflink>]; Ralph et al., [<reflink idref="bib56" id="ref19">56</reflink>]; Risko et al., [<reflink idref="bib58" id="ref20">58</reflink>], [<reflink idref="bib59" id="ref21">59</reflink>]; Schoen, [<reflink idref="bib66" id="ref22">66</reflink>]; Wammes et al., [<reflink idref="bib78" id="ref23">78</reflink>]).</p> <p>Much less research has targeted methods by which educators might limit TUTs, but there are some promising leads. High-tech methods might someday be widely available to help teachers or learners catch mind wandering on the fly and interrupt it, by analyzing subtle student behaviors that betray off-task thought, such as eye movements (e.g., Faber et al., [<reflink idref="bib14" id="ref24">14</reflink>]; Mills et al., [<reflink idref="bib44" id="ref25">44</reflink>]) and electroencephalography (e.g., Dhindsa et al., [<reflink idref="bib13" id="ref26">13</reflink>]). Until then, however, several common and easily implementable pedagogical practices, along the lines of "small teaching" (Lang, [<reflink idref="bib36" id="ref27">36</reflink>]), might be helpful.</p> <p>For example, limited experimental evidence suggests that encouraging notetaking (versus not permitted notetaking) reduced TUTs during a video lecture, at least for students with less prior knowledge in the topic (Kane et al., [<reflink idref="bib29" id="ref28">29</reflink>]); correlational evidence also indicates that students who better take notes during lectures report fewer TUTs (Kane et al., [<reflink idref="bib29" id="ref29">29</reflink>]; Lindquist & McLean, [<reflink idref="bib38" id="ref30">38</reflink>]). As well, students sitting toward the back of lecture halls report more TUTs than do those toward the front (Lindquist & McLean, [<reflink idref="bib38" id="ref31">38</reflink>]), even after statistically controlling for other academic traits and habits (Kane, Carruth, et al., [<reflink idref="bib27" id="ref32">27</reflink>]; but see Wammes et al., [<reflink idref="bib79" id="ref33">79</reflink>]); these correlational findings suggest that sitting closer to the instructor might reduce TUTs but experiments that randomly assign students to seats are needed to establish causality.</p> <p>The primary goal of the present study was to assess whether two interventions that prototypically benefit memory—interpolated testing and pretesting—may also facilitate focused attention during learning. As described below, several small but promising studies suggest that (a) periodically testing students on material they've recently encountered during a lecture, or (b) pretesting them on material they are about to encounter, reduces their TUT rates substantially compared to control conditions. The present study crossed both these interventions using video-learning materials previously demonstrated to yield high TUT rates and to produce individual differences in TUT rates that predict learning from, and situational interest evoked by, the lecture (Kane et al., [<reflink idref="bib29" id="ref34">29</reflink>]).</p> <hd id="AN0156024414-3">Effects of interpolated testing and pretesting on TUTs</hd> <p>Among the few experimental intervention studies, the best replicated findings are that testing students on lecture-relevant information, either before or periodically during the lecture, reduces TUTs. Testing and pretesting effects are typically explored and evident in subsequent memory for learned material (for reviews, see Adesope et al., [<reflink idref="bib1" id="ref35">1</reflink>]; Carpenter & Toftness, [<reflink idref="bib8" id="ref36">8</reflink>]; Kornell & Vaughn, [<reflink idref="bib32" id="ref37">32</reflink>]; Metcalfe, [<reflink idref="bib43" id="ref38">43</reflink>]; Pan & Rickard, [<reflink idref="bib47" id="ref39">47</reflink>]; Roediger & Butler, [<reflink idref="bib60" id="ref40">60</reflink>]), but findings of "test-potentiated learning" (Chan et al., [<reflink idref="bib9" id="ref41">9</reflink>]) indicate that testing previously learned material can also benefit the subsequent learning of new material (e.g., Pastötter & Bauml, [<reflink idref="bib50" id="ref42">50</reflink>]; Wissman et al., [<reflink idref="bib84" id="ref43">84</reflink>]). Moreover, several recent laboratory studies using video lectures have found that either <emph>interpolated testing</emph> (where subjects are periodically tested <emph>during</emph> the lecture on material they've recently encountered) or <emph>pretesting</emph> (where subjects are tested on material <emph>before</emph> they've encountered it) also subsequently reduce TUTs during the lecture (Jing et al., [<reflink idref="bib25" id="ref44">25</reflink>]; Pan et al., [<reflink idref="bib48" id="ref45">48</reflink>]; Szpunar, Khan, et al., [<reflink idref="bib74" id="ref46">74</reflink>]).</p> <hd id="AN0156024414-4">Interpolated testing and TUTs</hd> <p>Two articles, each reporting two studies, have examined the impact of interpolated testing on TUTs and learning from a video lecture (Jing et al., [<reflink idref="bib25" id="ref47">25</reflink>]; Szpunar, Khan, et al., [<reflink idref="bib74" id="ref48">74</reflink>]). Their logic is that in-lecture testing might motivate students to better attend to subsequent study materials. The findings are mostly supportive, but with some inconsistencies and ambiguities.</p> <p>The Szpunar, Khan, et al. ([<reflink idref="bib74" id="ref49">74</reflink>]), study presented subjects with a 21-min video about statistics divided into four segments, with post-segment activities varying between groups (<emph>n</emph> = 16 in each). In Experiment 1, each segment was followed by either a six-item test of the segment material, or no test (two groups); in Experiment 2, each segment was followed by a six-item test, no test, or a presentation of six test items with their answers provided for restudy, which is a more typical and appropriate control for studies of testing benefits (three groups). TUTs were assessed differently in each experiment. Experiment 1 measured TUTs at the end of the lecture via a 1–7 rating scale about the extent of mind wandering; such retrospective ratings, however, are vulnerable to memory and aggregation errors, as well as response biases, that may reduce their validity compared to in-the-moment thought reports (Kane, Smeekens, et al., [<reflink idref="bib28" id="ref50">28</reflink>]). Experiment 2 measured TUTs in the moment, with an experience-sampling probe inserted into each of the four lecture segments that asked whether subjects were just mind wandering.</p> <p>Interpolated-testing groups reported less off-task thinking than did controls in both experiments. Subjects in Experiment 1 rated their attention as significantly less off-task during the lecture in the interpolated-testing condition (<emph>Mdn</emph> = 4) than in the no-testing condition (<emph>Mdn</emph> = 5). Similarly, in Experiment 2, subjects reported TUTs at significantly fewer probes in the interpolated-testing condition (<emph>M</emph> = 19%) than in the no-test (<emph>M</emph> = 41%) and restudy (<emph>M</emph> = 39%) conditions (<emph>d</emph> = 1.05 for the testing vs. restudy comparison). Although these findings suggest that interpolated testing reduced TUTs, both studies also allowed notetaking during the lecture, and subjects in the interpolated-testing group took more notes than did those in the other groups. It is possible, then, that in-lecture testing only indirectly affected TUTs by encouraging notetaking (Kane et al., [<reflink idref="bib29" id="ref51">29</reflink>]; Lindquist & McLean, [<reflink idref="bib38" id="ref52">38</reflink>]).</p> <p>A follow-up study by Jing et al. ([<reflink idref="bib25" id="ref53">25</reflink>]) compared interpolated testing and restudy groups (<emph>n</emph> = 18 in each) in two experiments, both of which also allowed notetaking. Here, eight thought probes were presented during a 40-min video lecture on public health. Experiment 1 assessed TUTs with "yes/no" mind-wandering thought probes and did not find a significant TUT-rate difference between interpolated-testing and restudy groups (<emph>M</emph>s = 21% and 24%, respectively; <emph>d</emph> = 0.15), thus failing to replicate prior findings.</p> <p>Experiment 2 from Jing et al. ([<reflink idref="bib25" id="ref54">25</reflink>]) modified the yes/no probes to assess five thought types, including thoughts related to the lecture topic but not about the here-and-now of the lecture (i.e., lecture-<emph>related</emph> off-task thought, such as reflecting on something mentioned earlier), in addition to lecture-<emph>unrelated</emph> off-task thought and "zoning out" without thought content. Here, the interpolated-testing group reported significantly lower TUT rates (lecture-unrelated plus zoning out; <emph>M</emph> = 3%) than did the restudy group (<emph>M</emph> = 15%), with <emph>d</emph> = 0.90. Lecture-related off-task thoughts showed the opposite pattern, with interpolated-testing subjects reporting significantly higher rates (<emph>M</emph> = 20%) than restudy subjects (<emph>M</emph> = 10%). Moreover, rates of lecture-related off-task thought correlated positively with posttest scores, <emph>r</emph>(<reflink idref="bib25" id="ref55">25</reflink>) = 0.45. Although the small sample size urges caution regarding these individual-differences results, they are directionally consistent with those reported by Kane et al. ([<reflink idref="bib29" id="ref56">29</reflink>]) in a larger sample, <emph>r</emph>(<reflink idref="bib180" id="ref57">180</reflink>) = 0.26. In-lecture testing may therefore discourage potentially harmful off-topic thoughts while boosting potentially helpful on-task and lecture-related thoughts.</p> <p>Note, however, that the first experiment by Jing et al. ([<reflink idref="bib25" id="ref58">25</reflink>]) did not replicate the effect of interpolated testing on TUTs, so its benefits may not be robust across methodological variations. Alternatively, perhaps the benefits of in-lecture testing are reasonably robust, but small sample sizes (<emph>n</emph>s = 16 or 18 per group) made these studies vulnerable to false-negative errors and inflated estimates of effect size (e.g., Perugini et al., [<reflink idref="bib52" id="ref59">52</reflink>]; Schäfer & Schwarz, [<reflink idref="bib64" id="ref60">64</reflink>]). Finally, Jing et al., ([<reflink idref="bib25" id="ref61">25</reflink>]; Experiment 1) and both studies reported in Szpunar, Khan, et al. ([<reflink idref="bib74" id="ref62">74</reflink>]), showed increased notetaking with testing, which makes it difficult to establish a potential causal chain from testing to TUTs from the published studies.</p> <p>Why might interpolated testing reduce TUTs? Chan et al. ([<reflink idref="bib9" id="ref63">9</reflink>]) presented four theoretical frameworks for explaining how interpolated testing might potentiate future learning. Here, we discuss two of these frameworks, the "Resource" and the "Metacognitive" accounts, as they have suggested a possible role for interpolated testing in reducing TUTs. Resource accounts argue that testing may increase the available cognitive resources necessary for future learning, specifically because testing may redirect attention to the learning task and away from mind wandering (Jing et al., [<reflink idref="bib25" id="ref64">25</reflink>]; Pastötter et al., [<reflink idref="bib49" id="ref65">49</reflink>]; Szpunar, Khan, et al., [<reflink idref="bib74" id="ref66">74</reflink>]). With fewer resources dedicated to TUTs following testing, more will be available for the encoding of target information. The resource view does not explain, however, why testing episodes should redirect attention to the lecture more strongly than restudy episodes should. Alternatively, metacognitive accounts suggest that testing enhances learning beyond restudy because only testing alerts learners that they have not yet mastered the material. By this view, as learners become aware of their underperformance, they may use this feedback to refocus attention and put more effort in to learning the material (Cho et al., [<reflink idref="bib10" id="ref67">10</reflink>]; Lee & Ahn, [<reflink idref="bib37" id="ref68">37</reflink>]).</p> <hd id="AN0156024414-5">Pretesting and TUTs</hd> <p>Only one study (in two experiments) has examined whether pretesting on information before it is presented, rather than testing on information after it is presented, also reduces TUT reports (Pan et al., [<reflink idref="bib48" id="ref69">48</reflink>]). The logic behind this approach is that, like interpolated testing, pretesting might increase attention to, or curiosity about, lecture-relevant information (Bull & Dizney, [<reflink idref="bib7" id="ref70">7</reflink>]; Hannafin & Hughes, [<reflink idref="bib21" id="ref71">21</reflink>]; Peeck, [<reflink idref="bib51" id="ref72">51</reflink>]; Pressley et al., [<reflink idref="bib54" id="ref73">54</reflink>]), or it might provide feedback to students that they have much to learn about the topic and so should pay close attention to the upcoming material (Bjork et al., [<reflink idref="bib5" id="ref74">5</reflink>]; Finn & Tauber, [<reflink idref="bib16" id="ref75">16</reflink>]). As well, and in contrast to testing after material is presented, pretesting might help highlight for students what specific aspects of the upcoming material is most critical, thereby scaffolding their attention allocation to relevant topics during the lecture (e.g., Peeck, [<reflink idref="bib51" id="ref76">51</reflink>]; Sagaria & Di Vesta, [<reflink idref="bib63" id="ref77">63</reflink>]).</p> <p>Subjects in the two Pan et al. ([<reflink idref="bib48" id="ref78">48</reflink>]) experiments viewed a 26-min video lecture, without taking notes, on signal detection theory. Each of four lecture segments ended with a probe to rate (0–100) how focused subjects' attention had been on that entire video segment (again, such broad, retrospective judgments are vulnerable to validity-threatening errors of memory and aggregation). In Experiment 1, subjects either took an eight-item pretest on the upcoming segment's material or solved unrelated math problems before each segment (<emph>n</emph>s ≈ 50 per group). In Experiment 2, subjects either took a 32-item pretest before the video (prevideo-pretested), took an eight-item pretest before each video segment, or solved math problems before each segment (<emph>n</emph>s ≈ 50 per group).</p> <p>In both experiments, subjects who were pretested before each segment reported significantly higher attention ratings than did non-pretested subjects (Experiment 1 <emph>M</emph>s = 67 and 59, respectively, with <emph>d</emph> = 0.39; Experiment 2 <emph>M</emph>s = 67 and 50, respectively, with <emph>d</emph> = 0.74). In Experiment 2, the prevideo-pretested subjects showed similarly high attention ratings to the segment-pretested subjects (<emph>M</emph> = 71; <emph>d</emph> = 0.91 for contrast with non-pretested controls). Pretesting lecture material, either all at once or before each segment, thus appeared to reduce attention failures during learning. But, as in one of the studies showing that interpolated testing reduced mind-wandering (Szpunar, Khan, et al., [<reflink idref="bib74" id="ref79">74</reflink>]), attention was assessed with a retrospective-report measure of questionable construct validity (Kane, Smeekens, et al., [<reflink idref="bib28" id="ref80">28</reflink>]).</p> <hd id="AN0156024414-6">Goals and hypotheses</hd> <p>The present study examined two intriguing but understudied interventions—interpolated testing and pretesting—to foster sustained and focused attention during learning from video lectures. Specifically, in a 2 × 2 study design, we asked whether interpolated testing or matched-content pretesting of lecture material (or both) would reduce subjects' TUT reports during learning from a narrated-slideshow lecture on introductory statistics, a context previously established to yield valid measurement of TUTs and learning (Kane et al., [<reflink idref="bib29" id="ref81">29</reflink>]). Before these promising interventions can be applied to actual educational settings, the field must better establish their robustness and effect sizes.</p> <p>The present study addressed our concerns with prior studies noted earlier. For example, we addressed measurement concerns by assessing TUTs with validated thought probes of immediately preceding experience (that also allowed for the reporting of lecture-related as well as lecture-unrelated off-task thought; Jing et al., [<reflink idref="bib25" id="ref82">25</reflink>]; Kane et al., [<reflink idref="bib29" id="ref83">29</reflink>]). Like prior studies investigating the effect of interpolated testing on TUTs (Jing et al., [<reflink idref="bib25" id="ref84">25</reflink>]; Szpunar, Khan, et al., [<reflink idref="bib74" id="ref85">74</reflink>]), we contrasted an in-lecture testing group to a restudy control group; we did not, however, allow notetaking, which in turn allowed us to assess whether interpolated testing decreases TUTs directly, without possibly doing so indirectly by increasing notetaking.</p> <p>The present study's control condition for the pretesting effect also isolated a different potential mechanism for reducing TUTs from that proposed to drive any effects of testing (i.e., motivating sustained attention based on learning feedback, per the metacognitive account of test-potentiated learning). Unlike Pan et al. ([<reflink idref="bib48" id="ref86">48</reflink>]), who, consistent with most of the pretesting literature, contrasted pretesting to no-pretesting groups, we compared a pretesting group to a control group that also took a pretest, but on statistics topics not covered in the lecture (i.e., mismatched content). Both pretests should similarly provide subjects with feedback that they have little knowledge about statistics and still have much to learn, and so both conditions should similarly engage metacognition and motivate sustained attention to the lecture. Only the matched-content pretest condition, however, highlighted for subjects the specific information from the lecture that would be most important for the final test, and so only the matched-content condition could scaffold attention to the most task-relevant material. Consistent with this possibility, some prior research has found that pretesting benefits for learning are found only for the specific topics that are pretested, rather than generalizing to related information in the learning material (e.g., Bull & Dizney, [<reflink idref="bib7" id="ref87">7</reflink>]; Pressley et al., [<reflink idref="bib54" id="ref88">54</reflink>]; Richland et al., [<reflink idref="bib57" id="ref89">57</reflink>]; Sagaria & Di Vesta, [<reflink idref="bib63" id="ref90">63</reflink>]; but see Carpenter & Toftness, [<reflink idref="bib8" id="ref91">8</reflink>]).</p> <p>Our primary hypotheses were that: (a) subjects who completed in-lecture tests for the lecture material would show decreased rates of TUTs, and possibly increased rates of lecture-related off-task thought, compared to subjects who restudied the information at matching intervals; (b) subjects who completed a pretest on the upcoming lecture material (i.e., matched content) would report fewer TUTs, and possibly more instances of lecture-related off-task thought, than would subjects who completed a lecture-unrelated pretest (i.e., mismatched content).</p> <p>As discussed above, if our study design elicited significant effects of both interpolated testing and matched-content pretesting on TUTs, it should do so via different mechanisms for each (learning feedback to facilitate metacognition from interpolated testing, versus highlighting key to-be-learned information from pretesting). Crossing these interventions, then, should most likely result in additive main effects. However, because over-additive effects of receiving both interventions were possible (although not specified by any prior testing or pretesting research), as a more exploratory exercise we also tested for an interaction of interpolated testing and pretesting content match on TUT rates.</p> <p>Our secondary hypotheses concerned outcome measures beyond TUT rate. As in our previous study of TUTs during learning from videos (Kane et al., [<reflink idref="bib29" id="ref92">29</reflink>]), subjects completed a posttest on the lecture material and reported their situational interest in statistics elicited by the video. The testing effect and pretesting literatures, as well as the studies of interpolated testing and pretesting on TUTs (Jing et al., [<reflink idref="bib25" id="ref93">25</reflink>]; Pan et al., [<reflink idref="bib48" id="ref94">48</reflink>]; Szpunar, Khan, et al., [<reflink idref="bib74" id="ref95">74</reflink>]), suggest that both interpolated testing and pretesting should improve posttest performance in addition to reducing TUTs. Although one might expect that a side effect of decreasing TUTs would be to also increase situational interest in the learning material (Kane et al., [<reflink idref="bib29" id="ref96">29</reflink>]), the study by Jing et al. ([<reflink idref="bib25" id="ref97">25</reflink>]) found no effect of interpolated testing on interest stimulated by the lecture; we therefore we did not have strong predictions for the potential effects of interpolated testing or pretesting on interest.</p> <hd id="AN0156024414-7">Method</hd> <p>Below we report how we determined our sample size and all data exclusion decisions, experimental manipulations, and measures for this study (Simmons et al., [<reflink idref="bib69" id="ref98">69</reflink>]). Some materials and procedures were identical to those from our study on notetaking and TUTs during a video lecture (Kane et al., [<reflink idref="bib29" id="ref99">29</reflink>]). The study received ethics approval from the Institutional Review Board of the University of North Carolina at Greensboro (UNCG). All materials for the current study are available at the OSF site, https://osf.io/6ujsg/. Video lecture materials are available from the Kane et al. ([<reflink idref="bib29" id="ref100">29</reflink>]) OSF site, https://osf.io/u5bnw/.</p> <hd id="AN0156024414-8">Subjects and sample-size determination</hd> <p>We did not preregister a sample size based on power analyses, but we aimed to collect usable data from 200 subjects, yielding 100 subjects per group for each main effect (interpolated testing vs. restudy; matching vs. mismatching pretests). This sample size is five times as large as those in prior experiments on interpolated testing and TUTs (Jing et al., [<reflink idref="bib25" id="ref101">25</reflink>]; Szpunar, Khan, et al., [<reflink idref="bib74" id="ref102">74</reflink>]), about twice as large as those in prior experiments on pretesting and TUTs (Pan et al., [<reflink idref="bib48" id="ref103">48</reflink>]), and similar to our prior study of TUTs with these materials (Kane et al., [<reflink idref="bib29" id="ref104">29</reflink>]). As noted above, our primary hypotheses were for main effects of interpolated testing and matched-content pretesting; we expected additive effects for these interventions when combined in our 2 × 2 design, but over-additive interactions were possible and of applied interest.</p> <p>We report sensitivity analyses for ANOVA main effects using G*Power (Faul et al., [<reflink idref="bib15" id="ref105">15</reflink>]) for 80%, 90% and 95% power (α = 0.05); the curves are displayed in Fig. 1 (panel A). With <emph>N</emph> = 200, we could detect an effect between <emph>f</emph> = 0.20–0.26 (with 80% and 95% power, respectively)—conventionally "medium-sized" effects (for comparison with effect sizes in the literature based on <emph>t</emph>-tests, Cohen's <emph>d</emph> = [Cohen's <emph>f</emph> × 2], assuming equal sample sizes). As noted in the Results section, our final sample after data exclusions was <emph>N</emph> = 195; the corresponding sensitivity analyses (see panel B) also indicated 80% and 95% power to detect main effects of <emph>f</emph> = 0.20 and 0.26, respectively.</p> <p>Graph: Fig. 1 Sensitivity curves based on projected (Panel A) and achieved (Panel B) sample sizes. Effect sizes are Cohen's f</p> <p>For comparison, prior significant effect of testing on TUTs (Jing et al., [<reflink idref="bib25" id="ref106">25</reflink>], Experiment 2; Szpunar, Khan, et al., [<reflink idref="bib74" id="ref107">74</reflink>]) yielded effect sizes in the range of <emph>f</emph> = 0.44–0.49 (but Jing et al., [<reflink idref="bib25" id="ref108">25</reflink>], Experiment 1, found a null effect, <emph>f</emph> = 0.075). Likewise, the Pan et al. ([<reflink idref="bib48" id="ref109">48</reflink>]) effects of pretesting (vs. no pretest) on TUTs yielded effect sizes of <emph>f</emph> = 0.20 (Experiment 1) and <emph>f</emph>s = 0.37 and 0.46 (Experiment 2, for interpolated and blocked pretests, respectively). With a sample size of 195, we would be able to detect an effect of roughly half the size of the significant interpolated-testing effects and of the smallest pretesting effects reported in previous studies. Therefore, our design was well powered for these main effects. Note, however, that any interaction effect of these variables would have to be unusually large to be detected, requiring only cautious conclusions about additivity.</p> <p>We consented 277 undergraduates from UNCG, a comprehensive state university and minority-serving institution for African American students. We tested more subjects than our target sample size because, following Kane et al. ([<reflink idref="bib29" id="ref110">29</reflink>]), we planned to drop data from subjects who indicated that they had previously taken a statistics course (see below). Eligible subjects were between the ages of 18–35 and participated for either partial credit toward an Introductory Psychology requirement or $25.00. We randomly assigned subjects to one of four conditions based on our 2 (Interpolated Activity: Testing vs. Restudy) × 2 (Pretest Content: Match vs. Mismatch) factorial design, with the constraint that all subjects within a session were assigned to the same condition.</p> <p>Of our retained 195 subjects, 72% self-identified as female and 28% as male; mean age was 19.06 years (SD = 1.87). The self-reported racial breakdown of our final sample was 52% White (European or Middle Eastern descent), 37% Black (African or Caribbean descent), 7% Multiracial, 3% Asian, 1% Native Hawaiian or Pacific Islander, and 0% Native American or Alaskan Native (<emph>n</emph> = 2 missing). Finally, self-reported ethnicity, asked separately, was 6% Hispanic or Latino.</p> <hd id="AN0156024414-9">Procedure, materials, and equipment</hd> <p></p> <hd id="AN0156024414-10">Computers, software, and peripheral equipment</hd> <p>Each subject completed the study on a Mac Mini linked to an Acer 22-in LCD monitor. Audio for the video lecture was presented via Koss UR-20 headphones. For the pretest and posttest, we provided subjects with a calculator (Sharp EL243SB). We programmed all measures and the video lecture in E-prime 2.0 (Psychology Software Tools, Pittsburgh, PA).</p> <hd id="AN0156024414-11">Overall procedure</hd> <p>Subjects completed the study individually or in groups of up to four. The experimenter remained in the testing room during the study and read aloud all on-screen instructions. Following the completion of a given task, subjects in group sessions waited until everyone finished before moving on to the next task. Most sessions lasted 90–120 min. Following informed consent, subjects completed the following measures and tasks in the order described.</p> <hd id="AN0156024414-12">Questionnaires, measures, and stimuli</hd> <p></p> <hd id="AN0156024414-13">Statistics background questionnaire</hd> <p>A single-item questionnaire asked subjects to report, by clicking on a box located next to their answer, if they had taken a formal course on statistics (Kane et al., [<reflink idref="bib29" id="ref111">29</reflink>]). The response options were: (A) no statistics courses taken; (B) college statistics course in Psychology on this campus; (C) college statistics course(s) in other Departments on this campus; (D) college statistics course(s) in other institutions/universities; (E) high school statistics course(s); (F) online statistics courses (e.g., Khan Academy, iTunes-U). Data from subjects reporting any statistics coursework (responses B–F) were dropped from analyses.</p> <hd id="AN0156024414-14">Statistics pretest</hd> <p>Depending on pretest-content condition, subjects next completed one of two 10-item multiple-choice pretests with the aid of a calculator and no time limit. Each question was followed by 6 or 7 answer choices labeled A–F or A–G with a checkbox next to each answer choice. Subjects recorded their answer by mouse-clicking the box next to their answer choice. Subjects also provided a confidence report for each item: (a) had to guess and had little confidence; (b) had to guess but were still somewhat confident; (c) knew the answer and/or were highly confident.[<reflink idref="bib1" id="ref112">1</reflink>] The main dependent measure from the pretest, regardless of condition, was the proportion of 10 items answered correctly. Moreover, both pretests were designed such that subjects should answer few items correctly without having previously learned statistics.</p> <p>Subjects in the matched-content pretest condition completed items that reflected the upcoming video-lecture content, and that were identical to those to be presented as Part 1 of the posttest (as in Kane et al., [<reflink idref="bib29" id="ref113">29</reflink>], subjects were unaware that they would be tested on these same items after the lecture). Subjects in the mismatched-content pretest condition completed a set of items that were relevant to introductory statistics courses (and were inspired by several introductory statistics textbooks), but these topics were not covered in the upcoming video lecture and did not appear in the posttest.</p> <hd id="AN0156024414-15">Video lecture</hd> <p>We adapted the 52-min video lecture used by Kane et al. ([<reflink idref="bib29" id="ref114">29</reflink>]), which was a narrated PowerPoint presentation showing text and images that introduced basic statistical concepts (e.g., samples, populations, frequency distributions, central tendency), taught the steps to calculate the standard deviation of a set of scores, and demonstrated the utility of the mean and standard deviation in interpreting one's own SAT scores. This video consisted of 31 segments, the first of which lasted for 5 min, and the remaining 30 segments were between 1:08 and 1:51 min in length. The segments were organized in 5 blocks, each of which ended with either a set of interpolated-test or interpolated-restudy items (a between-subject manipulation).</p> <p>Each interpolated break presented six items: three multiple-choice questions with four response options each (e.g., <emph>If you knew a sample's standard deviation, how do you calculate its variance? a) take the square root of the number; b) square the number; c) divide it by N; d) add it to the sum of</emph> squares), and three short-answer questions (e.g., <emph>How would the median of the following sample of scores: 3,4,7,8,9 change if the largest value (<reflink idref="bib9" id="ref115">9</reflink>) changed to 49?</emph>). Subjects saw one item at a time for 20 s and either answered the question (in the testing condition) or studied the highlighted (italicized and underlined) answer (in the restudy condition) within that time. After 20 s, the next item appeared onscreen (89% of items were answered within 20 s; unanswered items were scored as incorrect). The lecture video resumed after completion of the final item. The interpolated items were related to the content of the immediately preceding lecture block, but they did not match any of the pretest or posttest items. Subjects in the interpolated-testing conditions received no accuracy feedback.</p> <hd id="AN0156024414-16">Video-embedded thought probes and instructions</hd> <p>Before beginning the video, we instructed subjects about the periodic thought probes that would appear throughout the lecture (see Kane et al., [<reflink idref="bib29" id="ref116">29</reflink>], for more details about instructions). Each probe presented a green screen with 7 response options listed, for subjects to report the content of their immediately preceding thoughts. These thought-report options appeared, and were explained, as follows (only the numbers and italicized labels here appeared on each probe screen):</p> <p></p> <ulist> <item> <emph>On-task on the lecture</emph>: Thoughts about the in-the-moment video-lecture content</item> <p></p> <item> <emph>Lecture-related ideas</emph>: Thoughts about some aspect of the lecture topic, but not what was currently happening in the video</item> <p></p> <item> <emph>How well I'm understanding the lecture</emph>: Evaluative thoughts about comprehending (or not) the lecture material</item> <p></p> <item> <emph>Everyday personal concerns</emph>: Thoughts about normal everyday things, life concerns, or personal worries</item> <p></p> <item> <emph>Daydreams</emph>: Fantasies or unrealistic thoughts</item> <p></p> <item> <emph>Current state of being</emph>: Thoughts about one's current physical or mental state (e.g., sleepy, hungry, or fascinated)</item> <p></p> <item> <emph>Other</emph>: Any thoughts not fitting into the other categories.</item> </ulist> <p>During the video, subjects saw 15 probes. As in Kane et al. ([<reflink idref="bib29" id="ref117">29</reflink>]), probes were presented between video segments with the constraint that probes could not appear after three consecutive video segments. We also incorporated an additional constraint that probes could not appear at the end of a block immediately preceding an interpolated test or restudy break. (Note that Kane et al. presented 20 probes, but here we replaced one probe per block with the interpolated activity.) We scored thought reports as follows (consistent with Kane et al., [<reflink idref="bib29" id="ref118">29</reflink>]): TUTs were defined as the proportion of thought reports with responses 4–7, lecture-related off-task thoughts were the proportion of reports with response 2, and comprehension-related off-task thoughts were the proportion of reports with response 3.</p> <hd id="AN0156024414-17">Situational interest questionnaire</hd> <p>As in Kane et al., ([<reflink idref="bib29" id="ref119">29</reflink>]; modified from Linnenbrink-Garcia et al., [<reflink idref="bib39" id="ref120">39</reflink>]), the video lecture was immediately followed by 10 items assessing interest in the video and in statistics (e.g., "<emph>I found the content of this video lecture personally meaningful</emph>," "<emph>To be honest, I just don't find statistics interesting</emph>"). Subjects rated each item on 5-point scale with the following options: (<reflink idref="bib1" id="ref121">1</reflink>) strongly disagree, (<reflink idref="bib2" id="ref122">2</reflink>) somewhat disagree, (<reflink idref="bib3" id="ref123">3</reflink>) neither agree nor disagree, (<reflink idref="bib4" id="ref124">4</reflink>) somewhat agree, and (<reflink idref="bib5" id="ref125">5</reflink>) strongly agree. The dependent measure was the average score of all items, after reverse scoring appropriate items. Although the main analyses in Kane et al. ([<reflink idref="bib29" id="ref126">29</reflink>]) excluded the three items about interest in the field of statistics (as opposed to interest in the lecture, itself), those items behaved similarly to the rest of the scale, so we included all 10 items here.</p> <p>As in Kane et al. ([<reflink idref="bib29" id="ref127">29</reflink>]), the retention interval between the video-lecture and posttest was fixed by presenting each questionnaire item onscreen for 9.5 s. For the first 4.5 s, the item appeared against a white screen. For the final 5 s, the screen turned yellow to indicate that subjects should now type their numerical response. Regardless of when subjects responded, each item stayed onscreen for the full 9.5 s. The questionnaire included one attention-check item with the same response scale ("I saw this exact stats video lecture in my preschool art class."). Data from subjects who responded to this item with <emph>neither agree nor disagree, somewhat agree,</emph> or <emph>strongly agree</emph> were removed from analyses of situational interest (<emph>n</emph> = 14).</p> <hd id="AN0156024414-18">Statistics posttest</hd> <p>We used the same three-part, untimed posttest as Kane et al. ([<reflink idref="bib29" id="ref128">29</reflink>]). Specifically, Part 1 included 10 multiple-choice questions (the same as those appearing in the matched-content pretest); Part 2 required subjects to calculate the standard deviation of a set of four numbers; Part 3 required subjects to calculate the standard deviation of a new set of five numbers, but each of five calculation steps was labeled and completed in turn (i.e., first calculate the mean, then the deviation scores, then the sum of squares, then the variance, and then the standard deviation).</p> <p>For Part 1, subjects mouse-clicked on their answer on-screen, just as in the pretest. For Parts 2 and 3, subjects were provided with a packet to complete their calculations, with the aid of a calculator; for Part 2, subjects used one sheet of packet paper, and for Part 3, each of the five calculation steps was labeled and completed in a separate sheet of paper. Subjects completed their work on paper first and then typed in their answer on the computer and pressed ENTER to record it. As in Kane et al. ([<reflink idref="bib29" id="ref129">29</reflink>]), the dependent measure for the posttest was calculated as the mean score across the three parts after z-scoring the raw score for each part across whole sample (partial credit was granted in Parts 2 and 3, as in Kane et al., [<reflink idref="bib29" id="ref130">29</reflink>]).</p> <hd id="AN0156024414-19">Demographic questionnaire</hd> <p>Subjects completed a demographics questionnaire at the end of the session, reporting on their self-identified Sex/Gender (open-ended), age (open-ended), ethnicity (Hispanic or Latino vs. not Hispanic or Latino), race (Asian; Black: African or Caribbean descent; Native American or Alaskan Native; Native Hawaiian or Pacific Islander; Multiracial; White: European or Middle Eastern descent), and university major (open-ended; unanalyzed).</p> <hd id="AN0156024414-20">Results</hd> <p>All data aggregation and analyses were performed in R (R core team, [<reflink idref="bib55" id="ref131">55</reflink>]) using <emph>tidyverse</emph> (Wickham, [<reflink idref="bib83" id="ref132">83</reflink>]). ANOVAs and calculation of effect sizes were performed in the <emph>afex</emph> (Singmann et al., [<reflink idref="bib71" id="ref133">71</reflink>]), and <emph>effectsize</emph> (Ben-Shachar et al., [<reflink idref="bib3" id="ref134">3</reflink>]) packages; data visualizations were created using <emph>ggplot2</emph> (Wickham, [<reflink idref="bib82" id="ref135">82</reflink>]). Data and analysis scripts are available at the OSF site, https://osf.io/6ujsg/</p> <hd id="AN0156024414-21">Data analysis plan</hd> <p>We adopted a.05 α level for null hypothesis significance testing inferences from our 2 × 2 ANOVAs and report 95% confidence intervals where applicable. For experimental comparisons of interest (e.g., interpolated testing versus restudy), we also conducted <emph>t</emph>-tests with corresponding Bayes Factors (BFs) to compare predictive performance of competing models with a continuous measure of evidence (Kass & Raftery, [<reflink idref="bib30" id="ref136">30</reflink>]). Null models reflected a Cauchy distribution centered around 0 with a scaling parameter of 0.707. This corresponds to a probability that 50% of the distribution was between <emph>d</emph> = − 0.707 and 0.707 (Rouder et al., [<reflink idref="bib61" id="ref137">61</reflink>]). Given the combination of small sample sizes and mixed effect sizes in the prior testing–TUT literature (with some very large effects and one very small effect), this is a reasonable expectation of effect size (Schmalz et al., [<reflink idref="bib65" id="ref138">65</reflink>]). BFs were calculated using the <emph>BayesFactor</emph> package (Morey & Rouder, [<reflink idref="bib45" id="ref139">45</reflink>]). We interpreted BF<subs>10</subs> < 0.33 (1/3) as providing modest evidence for the null relative to the alternative hypothesis and BF<subs>10</subs> > 3.0 as providing modest evidence for the alternative hypothesis relative to the null, and BF<subs>10</subs> < 0.10 (1/10) as providing strong evidence for the null relative to the alternative hypothesis and BF<subs>10</subs> > 10 providing strong evidence for the alternative hypothesis relative to the null.</p> <hd id="AN0156024414-22">Data loss</hd> <p>We based initial data-exclusion decisions on experimenter session notes while blinded to subjects' performance, thought-report, and questionnaire data. We dropped data from two subjects for falling asleep multiple times, from six subjects for leaving the session early, from three subjects who were assigned to the wrong condition in the session, and from four subjects who were in a session that was significantly delayed and disrupted by one subject (total dropped = 15). Additionally, as in Kane et al. ([<reflink idref="bib29" id="ref140">29</reflink>]), we dropped data from 66 subjects who reported they had previously completed a statistics course. Although Kane et al. ([<reflink idref="bib29" id="ref141">29</reflink>]) also dropped data from subjects scoring ≥ 60% on the pretest, the only two subjects who did so here were already dropped for having completed a statistics course. Finally, we dropped data from one subject who reported an age that was outside our eligibility range of 18–35 years. The final sample consisted of 195 subjects (as noted above, we additionally dropped situational interest data from 14 subjects who failed an attention check embedded in the questionnaire).</p> <hd id="AN0156024414-23">Preliminary analyses of pretest performance</hd> <p>Table 1 presents descriptive statistics for all variables of interest, by interpolated activity (testing vs. restudy) and pretest content (matching vs. mismatching). Before assessing whether TUT rates or posttest performance benefitted from either intervention, we tested whether pretest scores suggested any preintervention group differences, despite randomization to conditions. Pretest scores for the four experimental conditions are shown in Fig. 2.</p> <p>Table 1 Descriptive statistics by pretest (content-match vs. mismatch) and interpolated activity (testing vs. restudy) conditions</p> <p> <ephtml> <table frame="hsides" rules="groups"><thead><tr><th align="left" rowspan="3"><p>Dependent variable</p></th><th align="left" colspan="16"><p>Experimental conditions</p></th></tr><tr><th align="left" colspan="4"><p>Matched testing (<italic>n</italic> = 48)</p></th><th align="left" colspan="4"><p>Matched restudy (<italic>n</italic> = 52)</p></th><th align="left" colspan="4"><p>Mismatched testing (<italic>n</italic> = 51)</p></th><th align="left" colspan="4"><p>Mismatched restudy (<italic>n</italic> = 44)</p></th></tr><tr><th align="left"><p>M</p></th><th align="left"><p>SD</p></th><th align="left"><p>Min</p></th><th align="left"><p>Max</p></th><th align="left"><p>M</p></th><th align="left"><p>SD</p></th><th align="left"><p>Min</p></th><th align="left"><p>Max</p></th><th align="left"><p>M</p></th><th align="left"><p>SD</p></th><th align="left"><p>Min</p></th><th align="left"><p>Max</p></th><th align="left"><p>M</p></th><th align="left"><p>SD</p></th><th align="left"><p>Min</p></th><th align="left"><p>Max</p></th></tr></thead><tbody><tr><td align="left"><p>Pretest</p></td><td char="." align="char"><p>2.17</p></td><td char="." align="char"><p>1.42</p></td><td char="." align="char"><p>0.00</p></td><td char="." align="char"><p>5.00</p></td><td char="." align="char"><p>2.65</p></td><td char="." align="char"><p>1.10</p></td><td char="." align="char"><p>0.00</p></td><td char="." align="char"><p>5.00</p></td><td char="." align="char"><p>1.94</p></td><td char="." align="char"><p>1.22</p></td><td char="." align="char"><p>0.00</p></td><td char="." align="char"><p>5.00</p></td><td char="." align="char"><p>2.20</p></td><td char="." align="char"><p>1.11</p></td><td char="." align="char"><p>0.00</p></td><td char="." align="char"><p>5.00</p></td></tr><tr><td align="left"><p>TUT Rate</p></td><td char="." align="char"><p>0.38</p></td><td char="." align="char"><p>0.25</p></td><td char="." align="char"><p>0.00</p></td><td char="." align="char"><p>0.87</p></td><td char="." align="char"><p>0.47</p></td><td char="." align="char"><p>0.24</p></td><td char="." align="char"><p>0.00</p></td><td char="." align="char"><p>1.00</p></td><td char="." align="char"><p>0.42</p></td><td char="." align="char"><p>0.25</p></td><td char="." align="char"><p>0.00</p></td><td char="." align="char"><p>1.00</p></td><td char="." align="char"><p>0.48</p></td><td char="." align="char"><p>0.26</p></td><td char="." align="char"><p>0.00</p></td><td char="." align="char"><p>1.00</p></td></tr><tr><td align="left"><p>Lecture-Related</p></td><td char="." align="char"><p>0.15</p></td><td char="." align="char"><p>0.11</p></td><td char="." align="char"><p>0.00</p></td><td char="." align="char"><p>0.47</p></td><td char="." align="char"><p>0.14</p></td><td char="." align="char"><p>0.12</p></td><td char="." align="char"><p>0.00</p></td><td char="." align="char"><p>0.53</p></td><td char="." align="char"><p>0.15</p></td><td char="." align="char"><p>0.14</p></td><td char="." align="char"><p>0.00</p></td><td char="." align="char"><p>0.53</p></td><td char="." align="char"><p>0.17</p></td><td char="." align="char"><p>0.11</p></td><td char="." align="char"><p>0.00</p></td><td char="." align="char"><p>0.40</p></td></tr><tr><td align="left"><p>Comp-Related</p></td><td char="." align="char"><p>0.14</p></td><td char="." align="char"><p>0.14</p></td><td char="." align="char"><p>0.00</p></td><td char="." align="char"><p>0.60</p></td><td char="." align="char"><p>0.12</p></td><td char="." align="char"><p>0.13</p></td><td char="." align="char"><p>0.00</p></td><td char="." align="char"><p>0.47</p></td><td char="." align="char"><p>0.15</p></td><td char="." align="char"><p>0.13</p></td><td char="." align="char"><p>0.00</p></td><td char="." align="char"><p>0.47</p></td><td char="." align="char"><p>0.15</p></td><td char="." align="char"><p>0.14</p></td><td char="." align="char"><p>0.00</p></td><td char="." align="char"><p>0.53</p></td></tr><tr><td align="left"><p>Posttest Part 1</p></td><td char="." align="char"><p>4.29</p></td><td char="." align="char"><p>2.09</p></td><td char="." align="char"><p>0.00</p></td><td char="." align="char"><p>8.00</p></td><td char="." align="char"><p>4.37</p></td><td char="." align="char"><p>2.39</p></td><td char="." align="char"><p>0.00</p></td><td char="." align="char"><p>9.00</p></td><td char="." align="char"><p>4.59</p></td><td char="." align="char"><p>2.29</p></td><td char="." align="char"><p>1.00</p></td><td char="." align="char"><p>10.00</p></td><td char="." align="char"><p>4.57</p></td><td char="." align="char"><p>2.06</p></td><td char="." align="char"><p>1.00</p></td><td char="." align="char"><p>8.00</p></td></tr><tr><td align="left"><p>Posttest Part 2</p></td><td char="." align="char"><p>2.17</p></td><td char="." align="char"><p>1.71</p></td><td char="." align="char"><p>0.00</p></td><td char="." align="char"><p>5.00</p></td><td char="." align="char"><p>2.16</p></td><td char="." align="char"><p>1.63</p></td><td char="." align="char"><p>0.00</p></td><td char="." align="char"><p>5.00</p></td><td char="." align="char"><p>2.25</p></td><td char="." align="char"><p>1.57</p></td><td char="." align="char"><p>0.00</p></td><td char="." align="char"><p>5.00</p></td><td char="." align="char"><p>2.35</p></td><td char="." align="char"><p>1.70</p></td><td char="." align="char"><p>0.00</p></td><td char="." align="char"><p>5.00</p></td></tr><tr><td align="left"><p>Posttest Part 3</p></td><td char="." align="char"><p>2.47</p></td><td char="." align="char"><p>1.68</p></td><td char="." align="char"><p>0.00</p></td><td char="." align="char"><p>5.00</p></td><td char="." align="char"><p>2.75</p></td><td char="." align="char"><p>1.65</p></td><td char="." align="char"><p>0.00</p></td><td char="." align="char"><p>5.00</p></td><td char="." align="char"><p>2.83</p></td><td char="." align="char"><p>1.80</p></td><td char="." align="char"><p>0.00</p></td><td char="." align="char"><p>5.00</p></td><td char="." align="char"><p>3.06</p></td><td char="." align="char"><p>1.84</p></td><td char="." align="char"><p>0.00</p></td><td char="." align="char"><p>5.00</p></td></tr><tr><td align="left"><p>Posttest Total</p></td><td char="." align="char"><p>−0.10</p></td><td char="." align="char"><p>0.78</p></td><td char="." align="char"><p>−1.68</p></td><td char="." align="char"><p>1.25</p></td><td char="." align="char"><p>−0.03</p></td><td char="." align="char"><p>0.89</p></td><td char="." align="char"><p>−1.52</p></td><td char="." align="char"><p>1.70</p></td><td char="." align="char"><p>0.04</p></td><td char="." align="char"><p>0.87</p></td><td char="." align="char"><p>−1.52</p></td><td char="." align="char"><p>1.70</p></td><td char="." align="char"><p>0.11</p></td><td char="." align="char"><p>0.81</p></td><td char="." align="char"><p>−1.37</p></td><td char="." align="char"><p>1.40</p></td></tr><tr><td align="left"><p>Sit. Interest</p></td><td char="." align="char"><p>2.74</p></td><td char="." align="char"><p>0.61</p></td><td char="." align="char"><p>1.50</p></td><td char="." align="char"><p>3.90</p></td><td char="." align="char"><p>2.75</p></td><td char="." align="char"><p>0.64</p></td><td char="." align="char"><p>1.60</p></td><td char="." align="char"><p>4.00</p></td><td char="." align="char"><p>2.63</p></td><td char="." align="char"><p>0.78</p></td><td char="." align="char"><p>1.00</p></td><td char="." align="char"><p>4.30</p></td><td char="." align="char"><p>2.89</p></td><td char="." align="char"><p>0.80</p></td><td char="." align="char"><p>0.90</p></td><td char="." align="char"><p>4.44</p></td></tr></tbody></table> </ephtml> </p> <p>Matched = content-matched pretest; Mismatched = content-mismatched pretest; Pretest = number correct pretest items; TUT Rate = proportion of thought reports indicating task-unrelated thoughts; Lecture-Related = proportion lecture-related off-task thoughts; Comp-Related = proportion comprehension-related off-task thoughts. Posttest Parts 1–3 = number correct posttest items per part; Posttest Total = z-score average across all posttest parts; Sit. Interest = situational interest scale score. <emph>N</emph>s for Situational Interest outcome: Matched Testing = 44; Matched Restudy = 49; Mismatched Testing = 47; Mismatched Restudy = 41</p> <p>Graph: Fig. 2 Raincloud plots depicting differences in pretest scores between conditions. Dots represent individual subject means in each condition. The closed black dots represent group-level mean estimates for the Restudy conditions; open circles represent the group-level mean estimates for the Testing conditions. Error bars are 95% confidence intervals</p> <p>The results of the 2 (Pretest Content: Match vs. Mismatch) × 2 (Interpolated Activity: Testing vs. Restudy) ANOVA on pretest performance indicated neither a significant main effect of pretest-content match, <emph>F</emph>(<reflink idref="bib1" id="ref142">1</reflink>, 191) = 3.71, <emph>p</emph> = 0.056, <emph>η</emph><subs><emph>p</emph></subs><sups>2</sups> = 0.019, nor a significant interaction with interpolated activity, <emph>F</emph>(<reflink idref="bib1" id="ref143">1</reflink>, 191) = 0.41, <emph>p</emph> = 0.524, <emph>η</emph><subs><emph>p</emph></subs><sups>2</sups> = 0.002. We find no evidence, then, that the two content-matched versus mismatched pretests differed in difficulty (<emph>M</emph>s = 2.42 and 2.06, respectively). Unexpectedly, however, the ANOVA indicated an effect of interpolated activity, with subjects who would subsequently restudy at interpolation breaks scoring significantly higher on the pretest (<emph>M</emph> = 2.45) than did subjects who would subsequently be tested at interpolation breaks (<emph>M</emph> = 2.05), <emph>F</emph>(<reflink idref="bib1" id="ref144">1</reflink>, 191) = 4.59, <emph>p</emph> = 0.033, <emph>η</emph><subs><emph>p</emph></subs><sups>2</sups> = 0.023.</p> <p>As noted earlier, to further explore all main effects of interest, we conducted follow-up <emph>t</emph>-tests to provide corresponding Bayes Factors (BF) and Cohen's <emph>d</emph> indicators of effect size. Table 2 presents these results for all key experimental contrasts in the study. The BF for the significant effect of interpolated activity here indicated only weak evidence that the data were more likely under the alternative than the null hypothesis.</p> <p>Table 2 Follow-up <emph>t</emph>-tests, Cohen's <emph>d</emph>, and Bayes Factors (BF<subs>10</subs>) for Primary Dependent Variables in Testing Versus Restudy Conditions and Content-Matched Versus Content-Mismatched Pretest Conditions</p> <p> <ephtml> <table frame="hsides" rules="groups"><thead><tr><th align="left" rowspan="3"><p>Dependent variables</p></th><th align="left" colspan="6"><p>Experimental comparisons</p></th></tr><tr><th align="left" colspan="3"><p>Testing vs. restudy</p></th><th align="left" colspan="3"><p>Matched vs. mismatched pretest</p></th></tr><tr><th align="left"><p><italic>t</italic>-test</p></th><th align="left"><p><italic>d</italic> [95% CI]</p></th><th align="left"><p>BF<sub>10</sub></p></th><th align="left"><p><italic>t</italic>-test</p></th><th align="left"><p><italic>d</italic> [95% CI]</p></th><th align="left"><p>BF<sub>10</sub></p></th></tr></thead><tbody><tr><td align="left"><p>Pretest</p></td><td char="." align="char"><p><italic>t</italic>(193) = − 2.26*</p></td><td char="." align="char"><p> − 0.32 [− 0.61, − 0.04]</p></td><td align="left"><p>1.67</p></td><td char="." align="char"><p><italic>t</italic>(193) = − 2.03*</p></td><td char="." align="char"><p> − 0.29 [− 0.57, − 0.01]</p></td><td align="left"><p>1.05</p></td></tr><tr><td align="left"><p>TUT Rate</p></td><td char="." align="char"><p><italic>t</italic>(193) = − 2.00*</p></td><td char="." align="char"><p> − 0.29 [− 0.57, − 0.00]</p></td><td align="left"><p>0.99</p></td><td char="." align="char"><p><italic>t</italic>(193) = 0.52</p></td><td char="." align="char"><p>0.08 [− 0.21, 0.37]</p></td><td align="left"><p>0.18</p></td></tr><tr><td align="left"><p>Lecture-Related</p></td><td char="." align="char"><p><italic>t</italic>(193) = − 0.26</p></td><td char="." align="char"><p> − 0.04 [− 0.32, 0.24]</p></td><td align="left"><p>0.16</p></td><td char="." align="char"><p><italic>t</italic>(193) = 0.67</p></td><td char="." align="char"><p>0.10 [− 0.19, 0.38]</p></td><td align="left"><p>0.19</p></td></tr><tr><td align="left"><p>Comp-Related</p></td><td char="." align="char"><p><italic>t</italic>(193) = 0.73</p></td><td char="." align="char"><p>0.11 [− 0.18, 0.39]</p></td><td align="left"><p>0.20</p></td><td char="." align="char"><p><italic>t</italic>(193) = 1.13</p></td><td char="." align="char"><p>0.16 [− 0.12, 0.44]</p></td><td align="left"><p>0.28</p></td></tr><tr><td align="left"><p>Posttest Total</p></td><td char="." align="char"><p><italic>t</italic>(193) = − 0.47</p></td><td char="." align="char"><p> − 0.07 [− 0.35, 0.21]</p></td><td align="left"><p>0.17</p></td><td char="." align="char"><p><italic>t</italic>(193) = 1.08</p></td><td char="." align="char"><p>0.16 [− 0.13, 0.44]</p></td><td align="left"><p>0.27</p></td></tr><tr><td align="left"><p>Sit. Interest</p></td><td char="." align="char"><p><italic>t</italic>(179) = − 1.24</p></td><td char="." align="char"><p> − 0.18 [− 0.48, 0.11]</p></td><td align="left"><p>0.33</p></td><td char="." align="char"><p><italic>t</italic>(179) = 0.06</p></td><td char="." align="char"><p>0.01 [− 0.28, 0.30]</p></td><td align="left"><p>0.16</p></td></tr></tbody></table> </ephtml> </p> <p>Matched = pretest content matched posttest; Mismatched = pretest content mismatched posttest; Pretest = number correct pretest items; TUT Rate = proportion of thought reports indicating task-unrelated thoughts; Lecture-Related = proportion lecture-related off-task thoughts; Comp-Related = proportion comprehension-related off-task thoughts; Posttest Total = z-score average across all parts; Sit. Interest = situational interest scale score <sups>*</sups><emph>p</emph> <.05</p> <p>Despite the weak effect, the statistically significant pretest findings suggest that we should analyze posttest performance, and all other outcomes of interest, both with and without including pretest score as a covariate. For these supplemental ANCOVAs, we standardized pretest scores within each pretesting condition, given that the content-matching and content-mismatching conditions presented different pretest items. All ANCOVA results are reported in Appendix A; in no case did the ANCOVA results yield different conclusions than did the ANOVAs without the pretest score covariate.</p> <hd id="AN0156024414-24">Primary analyses of thought reports</hd> <p>Here, we analyze whether TUT rates, or other varieties of off-task thought, were affected by our experimental interventions—interpolated testing versus restudy, content-matched versus content-mismatched pretests, or both.</p> <hd id="AN0156024414-25">TUT rates</hd> <p>As seen in Table 1, subjects averaged reporting TUTs to about 40–50% of the probes during the video lecture, consistent with our prior study using the same video content and thought probes (Kane et al., [<reflink idref="bib29" id="ref145">29</reflink>]). Also consistent with prior findings, there was considerable individual variability in TUT rates, with standard deviations of about 25% around those means.</p> <p>Our primary question focused on the potential effects of interpolated activity and pretest match on TUT rates. As suggested by Fig. 3, the 2 × 2 ANOVA indicated a just-significant main effect of interpolated activity, <emph>F</emph>(<reflink idref="bib1" id="ref146">1</reflink>, 191) = 4.05, <emph>p</emph> = 0.046, <emph>η</emph><subs><emph>p</emph></subs><sups>2</sups> = 0.021, with lower TUT rates for subjects in the interpolated testing condition (<emph>M</emph> = 0.40) than in the restudy condition (<emph>M</emph> = 0.47). There was no significant effect of pretest-content match, <emph>F</emph>(<reflink idref="bib1" id="ref147">1</reflink>, 191) = 0.40, <emph>p</emph> = 0.526, <emph>η</emph><subs><emph>p</emph></subs><sups>2</sups> = 0.002, and no interaction, <emph>F</emph>(<reflink idref="bib1" id="ref148">1</reflink>, 191) = 0.07, <emph>p</emph> = 0.793, <emph>η</emph><subs><emph>p</emph></subs><sups>2</sups> = 0.000. Our findings therefore conceptually replicated the interpolated-testing benefits reported by Jing et al. ([<reflink idref="bib25" id="ref149">25</reflink>], Study 2) and Szpunar, Khan, et al. ([<reflink idref="bib74" id="ref150">74</reflink>]).</p> <p>Graph: Fig. 3 Raincloud plots depicting TUT rates (the proportions of thought reports indicating TUTs) in each condition. Dots represent individual subjects' TUT rates. The closed black dots represent group-level mean estimates for the Restudy conditions; open circles represent the group-level mean estimates for the Testing conditions. Error bars are 95% confidence intervals</p> <p>To contextualize the interpolated-testing effect size on TUTs, we conducted a <emph>t</emph>-test comparing testing and restudy groups (collapsed across pretest-match conditions); Table 2 indicates a corresponding BF that does not provide supporting evidence that the data were more likely under either the alternative or the null hypothesis, along with a conventionally small-to-medium effect size (Cohen's <emph>d</emph> = − 0.29).</p> <p>As further perspective on effect size (see Magnusson, [<reflink idref="bib42" id="ref151">42</reflink>]), the Cohen's <emph>d</emph> of − 0.29 corresponds to: (a) 61.4% of the restudy group having a higher TUT rate than the mean TUT rate for the testing group (Cohen's U<subs>3</subs>), (b) an 88.5% overlap between the TUT-rate distributions for the restudy and testing groups, and (c) a 58.1% chance that a randomly chosen subject from the restudy group would have a higher TUT rate than a randomly chosen subject from the testing group. Thus, although we replicated a significant testing effect on TUT rate, it was modest in magnitude and not compelling from a Bayesian perspective.</p> <p>As an exploratory follow-up analysis, we examined the time-course of mind wandering across the video lecture, to see (a) whether a stronger interpolated-testing effect might be evident later in the lecture, where TUT rates typically rise (as they did in Kane et al., [<reflink idref="bib29" id="ref152">29</reflink>]), or (b) whether a content-matched pretest effect might be evident only in early blocks, closest to the pretesting experience (where memory for pretested topics should be best). To do so, for each subject, we calculated a TUT rate for each of the 5 blocks and entered the values into a 2 (Interpolated Activity) × 2 (Pretest-Content Match) × 5 (Video Block) mixed ANOVA, with video block as a repeated measure. The ANOVA (conducted using the Greenhouse–Geisser correction for sphericity to account for within-subject manipulations) indicated a main effect of block, <emph>F</emph>(3.66, 698.32) = 18.85, <emph>p</emph> < 0.001, η<subs>p</subs><sups>2</sups> = 0.090, but no significant interactions involving testing or pretesting. Our experimental manipulations did not appear to affect TUT-rate trajectories across the lecture.</p> <p>Although no significant interaction with interpolated activity was indicated, we note that Fig. 4 shows no evidence of an interpolated-testing effect on TUTs in Block 1, before any test was presented. As would be expected if interpolated tests exerted a causal effect on mind wandering, TUT rates diverged between the testing and restudy groups only after the first interpolated test following Block 1. We therefore conducted an additional exploratory analysis to see whether we (and prior studies) underestimated the effect of interpolated testing on TUTs by including TUT rates from the first part of the video lecture, before any testing had occurred.</p> <p>Graph: Fig. 4 Raincloud plots depicting TUT rates (the proportions of thought reports indicating TUTs) by block in each interpolated activity condition. Dots represent individual subjects' TUT rates. The closed black dots represent group-level mean estimates for the Restudy conditions; open circles represent the group-level mean estimates for the Testing conditions. Error bars are 95% confidence intervals</p> <p>Here, we recalculated each subject's overall TUT rate by including thought-probe responses from only Blocks 2–5 and used these as the dependent measure in a 2 (interpolated activity) × 2 (pretest-content match) ANOVA. Of most importance here, the effect of interpolated activity was again significant, <emph>F</emph>(<reflink idref="bib1" id="ref153">1</reflink>, 91) = 5.31, <emph>p</emph> = 0.023, <emph>η</emph><subs><emph>p</emph></subs><sups>2</sups> = 0.027, with lower TUT rates for subjects in the interpolated-testing condition (<emph>M</emph> = 0.42) than in the restudy condition (<emph>M</emph> = 0.51). The difference between groups was numerically somewhat larger, and the <emph>p</emph>-value somewhat smaller, than in our original analysis, but the effect-size estimates were similar.</p> <p>Indeed, a <emph>t</emph>-test comparing testing and restudy groups (collapsed across pretest-match conditions) yielded BF = 1.78, indicating only anecdotal evidence that the data were more likely under the alternative than the null hypothesis. It also indicated a Cohen's <emph>d</emph> = − 0.33 [− 0.61, − 0.05], which closely matches our originally calculated effect size (<emph>d</emph> = − 0.29) and is still considerably smaller than those reported in prior studies (Jing et al., [<reflink idref="bib25" id="ref154">25</reflink>]; Szpunar, Khan, et al., [<reflink idref="bib74" id="ref155">74</reflink>]). In conclusion, then, we did not greatly underestimate the effect of interpolated testing on TUT rates by including Block 1 thought probes that occurred before the first interpolated test.</p> <hd id="AN0156024414-26">Rates of other off-task thought reports</hd> <p>We next examined whether interpolated activity or pretest-content matching affected rates of reported lecture-related off-task thoughts. As illustrated in Fig. 5 (see also Table 1), the 2 × 2 ANOVA on lecture-related off-task thoughts indicated no significant effects of interpolated activity, <emph>F</emph>(<reflink idref="bib1" id="ref156">1</reflink>, 191) = 0.10, <emph>p</emph> = 0.751, <emph>η</emph><subs><emph>p</emph></subs><sups>2</sups> = 0.000, or pretest-content match, <emph>F</emph>(<reflink idref="bib1" id="ref157">1</reflink>, 191) = 0.48, <emph>p</emph> = 0.489, <emph>η</emph><subs><emph>p</emph></subs><sups>2</sups> = 0.003, and no interaction, <emph>F</emph>(<reflink idref="bib1" id="ref158">1</reflink>, 191) = 0.55, <emph>p</emph> = 0.460, <emph>η</emph><subs><emph>p</emph></subs><sups>2</sups> = 0.003. Table 2 also shows BFs indicating modest evidence that the data were more likely under the null than the alternative model for both the interpolated-testing effect and the pretest-matching effect. We therefore failed to conceptually replicate the significant interpolating-testing effect on lecture-related off-task thoughts reported by Jing et al., ([<reflink idref="bib25" id="ref159">25</reflink>], Experiment 2), where <emph>M</emph> report rates were approximately 0.20 and 0.10 for interpolated testing and restudy groups, respectively.</p> <p>Graph: Fig. 5 Raincloud plots depicting the proportions of lecture-related off-task thought in each condition. Dots represent individual subjects' rates. The closed black dots represent group-level mean estimates for the Restudy conditions; open circles represent the group-level mean estimates for the Testing conditions. Error bars are 95% confidence intervals</p> <p>We also conducted a 2 × 2 ANOVA on comprehension-related thoughts (see Table 1). It indicated no significant effects of interpolated activity, <emph>F</emph>(<reflink idref="bib1" id="ref160">1</reflink>, 191) = 0.43, <emph>p</emph> = 0.515, η<subs>p</subs><sups>2</sups> = 0.002, pretest-content match, <emph>F</emph>(<reflink idref="bib1" id="ref161">1</reflink>, 191) = 1.21, <emph>p</emph> = 0.274, <emph>η</emph><subs><emph>p</emph></subs><sups>2</sups> = 0.006, or their interaction, <emph>F</emph>(<reflink idref="bib1" id="ref162">1</reflink>, 191) = 0.35, <emph>p</emph> = 0.556, <emph>η</emph><subs><emph>p</emph></subs><sups>2</sups> = 0.002; Table 2 presents BFs indicating modest evidence that the data were more likely under the null model than the alternative model for the effects of both interpolated activity and pretest-content match.</p> <hd id="AN0156024414-27">Secondary analysis of posttest performance</hd> <p>Figure 6 presents the posttest data. A 2 × 2 ANOVA did not indicate a main effect of interpolated activity (i.e., no interpolated-testing effect on posttest performance), <emph>F</emph>(<reflink idref="bib1" id="ref163">1</reflink>, 191) = 0.28, <emph>p</emph> = 0.595, <emph>η</emph><subs><emph>p</emph></subs><sups>2</sups> = 0.001, or of pretest-content match, <emph>F</emph>(<reflink idref="bib1" id="ref164">1</reflink>, 191) = 1.22, <emph>p</emph> = 0.271, η<subs>p</subs><sups>2</sups> = 0.006, or an interaction between the two, <emph>F</emph>(<reflink idref="bib1" id="ref165">1</reflink>, 191) = 0.00, <emph>p</emph> = 0.982, <emph>η</emph><subs><emph>p</emph></subs><sups>2</sups> = 0.000. As seen in Table 2, the BFs for the difference between interpolated testing and restudy groups indicated modest-to-strong evidence that the data were more likely under the null than the alternative model. In short, we did not find conventional benefits for either interpolated testing or content-matched pretesting on final test performance.[<reflink idref="bib2" id="ref166">2</reflink>]</p> <p>Graph: Fig. 6 Raincloud plots depicting differences in posttest performance between conditions. Dots represent individual subject means in each condition. The closed black dots represent group-level mean estimates for the Restudy conditions; open circles represent the group-level mean estimates for the Testing conditions. Error bars are 95% confidence intervals</p> <p>As an additional way to assess a possible testing effect in our posttest data, we examined whether subjects in the interpolated-testing condition improved more from pretest to posttest than did subjects in the interpolated-restudy condition. To do this, we selected all subjects in the matched-pretest conditions (<emph>n</emph> = 100) and compared pretest scores to Part 1 of the posttest, which presented the identical 10 multiple-choice items. Scores increased significantly from pretest to posttest, indicating that subjects learned from the lecture, <emph>F</emph>(<reflink idref="bib1" id="ref167">1</reflink>, 98) = 84.14, <emph>p</emph> < 0.001, <emph>η</emph><subs><emph>p</emph></subs><sups>2</sups> = 0.462. We did not find, however, a significant interpolated activity (testing vs. restudy) × pretest-to-posttest interaction, <emph>F</emph>(<reflink idref="bib1" id="ref168">1</reflink>, 98) = 0.98, <emph>p</emph> = 0.325, <emph>η</emph><subs><emph>p</emph></subs><sups>2</sups> = 0.010, again providing no evidence for test-potentiated learning (i.e., no benefit of interpolated testing over restudy for subsequent learning).</p> <p>In Appendix B, we explore the possibility that performance levels on the interpolated tests affected the results here. Specifically, we asked whether interpolated testing produced limited benefits in TUT reduction or learning because subjects did not perform well enough on the interpolated tests. The findings are ambiguous, but we report them for archival purposes.</p> <hd id="AN0156024414-28">Secondary analysis of situational interest</hd> <p>Following Jing et al. ([<reflink idref="bib25" id="ref169">25</reflink>]), here we tested whether interpolated activity or pretest-content matching affected self-reported post-video situational interest in the lecture or the broad topic of statistics (as noted previously, we dropped data from 14 subjects who failed the embedded attention check). As suggested by Fig. 7 (see also Table 1), the 2 × 2 ANOVA indicated no effect of interpolated activity (i.e., no interpolated-testing effect), <emph>F</emph>(<reflink idref="bib1" id="ref170">1</reflink>, 177) = 1.62, <emph>p</emph> = 0.204, <emph>η</emph><subs><emph>p</emph></subs><sups>2</sups> = 0.009, consistent with findings from Jing et al. ([<reflink idref="bib25" id="ref171">25</reflink>]), or of pretest content match, <emph>F</emph>(<reflink idref="bib1" id="ref172">1</reflink>, 177) = 0.02, <emph>p</emph> = 0.881, <emph>η</emph><subs><emph>p</emph></subs><sups>2</sups> = 0.000, and no interaction between the two, <emph>F</emph>(<reflink idref="bib1" id="ref173">1</reflink>, 177) = 1.34, <emph>p</emph> = 0.249, <emph>η</emph><subs><emph>p</emph></subs><sups>2</sups> = 0.008. Table 2 presents BFs indicating modest-to-strong evidence for the data being more likely under the null than the alternative model for the effects of interpolated activity and pretest content matching.</p> <p>Graph: Fig. 7 Raincloud plots depicting differences in situational interest ratings between conditions. Dots represent individual subject means in each condition. The closed black dots represent group-level mean estimates for the Restudy conditions; open circles represent the group-level mean estimates for the Testing conditions. Error bars are 95% confidence intervals</p> <hd id="AN0156024414-29">Exploratory correlational analyses</hd> <p>Our goal for these analyses was to assess the replication of previously reported significant associations between off-task thought types (i.e., TUT and lecture-related) and outcomes (i.e., posttest performance and situational interest in the lecture) using these video-lecture and assessment materials (Kane et al., [<reflink idref="bib29" id="ref174">29</reflink>]). We approached these analyses in two ways: (a) using the whole sample, collapsed across all manipulations, and (b) separately assessing correlations within the testing and restudy groups while collapsing across pretest-match conditions. We consider these analyses not only secondary but also "exploratory"—and we interpret them cautiously—because in both cases we collapsed over conditions that may have affected individual differences without demonstrating robust experimental effects, and in the latter cases our samples were too small to allow precise estimates of correlational effect sizes (Schönbrodt & Perugini, [<reflink idref="bib67" id="ref175">67</reflink>]).</p> <p>Table 3 presents the relevant correlations from the present study and from Kane et al. ([<reflink idref="bib29" id="ref176">29</reflink>]). Although the correlations from Kane et al. ([<reflink idref="bib29" id="ref177">29</reflink>]) were generally stronger than those found here, the present <emph>r</emph> values from the full sample were all within 0.06–0.11 of the Kane et al. ([<reflink idref="bib29" id="ref178">29</reflink>]) values (and all within the originals' 95% confidence intervals). The correlations from the separate interpolated-testing and restudy groups were more variable, and some were not significant, but that is not surprising given their smaller samples sizes. We thus conclude that the present study replicated the primary correlational results from Kane et al. ([<reflink idref="bib29" id="ref179">29</reflink>]): (a) the strong negative correlations between TUT rates and posttest performance and situational interest and; (b) the modest positive correlations between lecture-related off-task thoughts and posttest performance and situational interest.[<reflink idref="bib3" id="ref180">3</reflink>]</p> <p>Table 3 Correlations between off-task thought types and other variables, in the present study (both for the full sample and separately for each interpolated-activity condition, i.e., testing versus restudy) and in the methodologically similar Kane et al. ([<reflink idref="bib29" id="ref181">29</reflink>]) study</p> <p> <ephtml> <table frame="hsides" rules="groups"><thead><tr><th align="left"><p>Off-task thought</p></th><th align="left"><p>Correlate</p></th><th align="left"><p>Study/sample</p></th><th align="left"><p>Correlation [with 95% CI]</p></th></tr></thead><tbody><tr><td align="left"><p>TUT</p></td><td align="left"><p>Posttest composite</p></td><td align="left"><p>Present/full</p></td><td char="." align="char"><p><italic>r</italic>(193) = −.39 [−.50, −.26]*</p></td></tr><tr><td align="left" /><td align="left" /><td align="left"><p>Present/testing</p></td><td char="." align="char"><p><italic>r</italic>(97) = −.40 [−.54, −.22]*</p></td></tr><tr><td align="left" /><td align="left" /><td align="left"><p>Present/restudy</p></td><td char="." align="char"><p><italic>r</italic>(94) = −.40 [−.55, −.21]*</p></td></tr><tr><td align="left" /><td align="left" /><td align="left"><p>Kane et al. (<xref ref-type="bibr" rid="bibr29">2017</xref>)</p></td><td char="." align="char"><p><italic>r</italic>(180) = −.48 [−.58, −.36]*</p></td></tr><tr><td align="left" /><td align="left"><p>Situational interest</p></td><td align="left"><p>Present/full</p></td><td char="." align="char"><p><italic>r</italic>(179) = −.47 [−.58, −.35]*</p></td></tr><tr><td align="left" /><td align="left" /><td align="left"><p>Present/testing</p></td><td char="." align="char"><p><italic>r</italic>(89) = −.41 [−.57, −.22]*</p></td></tr><tr><td align="left" /><td align="left" /><td align="left"><p>Present/restudy</p></td><td char="." align="char"><p><italic>r</italic>(88) = −.57 [−.70, −.41]*</p></td></tr><tr><td align="left" /><td align="left" /><td align="left"><p>Kane et al. (<xref ref-type="bibr" rid="bibr29">2017</xref>)</p></td><td char="." align="char"><p><italic>r</italic>(180) = −.56 [−.65, −.45]*</p></td></tr><tr><td align="left"><p>Lecture-related</p></td><td align="left"><p>Posttest composite</p></td><td align="left"><p>Present/full</p></td><td char="." align="char"><p><italic>r</italic>(193) =.15 [.01,.29]*</p></td></tr><tr><td align="left" /><td align="left" /><td align="left"><p>Present/testing</p></td><td char="." align="char"><p><italic>r</italic>(97) =.13 [−.07,.32]</p></td></tr><tr><td align="left" /><td align="left" /><td align="left"><p>Present/restudy</p></td><td char="." align="char"><p><italic>r</italic>(94) =.18 [−.03,.36]</p></td></tr><tr><td align="left" /><td align="left" /><td align="left"><p>Kane et al. (<xref ref-type="bibr" rid="bibr29">2017</xref>)</p></td><td char="." align="char"><p><italic>r</italic>(180) =.26 [.12,.39]*</p></td></tr><tr><td align="left" /><td align="left"><p>Situational interest</p></td><td align="left"><p>Present/full</p></td><td char="." align="char"><p><italic>r</italic>(179) =.20 [.06,.34]*</p></td></tr><tr><td align="left" /><td align="left" /><td align="left"><p>Present/testing</p></td><td char="." align="char"><p><italic>r</italic>(89) =.04 [−.16,.25]</p></td></tr><tr><td align="left" /><td align="left" /><td align="left"><p>Present/restudy</p></td><td char="." align="char"><p><italic>r</italic>(88) =.36 [.17,.53]*</p></td></tr><tr><td align="left" /><td align="left" /><td align="left"><p>Kane et al. (<xref ref-type="bibr" rid="bibr29">2017</xref>)</p></td><td char="." align="char"><p><italic>r</italic>(180) =.26 [.12,.39]*</p></td></tr></tbody></table> </ephtml> </p> <p>*<emph>p</emph> <.05</p> <hd id="AN0156024414-30">Discussion</hd> <p>Educationally relevant research (and its application to the classroom) has recently broadened its focus beyond memory and metacognition to pay more attention to failures of attention (for a popular review, see Lang, [<reflink idref="bib35" id="ref182">35</reflink>]), and particularly to mind wandering (e.g., Risko et al., [<reflink idref="bib58" id="ref183">58</reflink>]; Smallwood et al., [<reflink idref="bib72" id="ref184">72</reflink>]; Szpunar, Moulton, et al., [<reflink idref="bib75" id="ref185">75</reflink>]; Unsworth et al., [<reflink idref="bib76" id="ref186">76</reflink>]). Ample evidence from video and live lectures, in laboratory and classroom contexts, shows that TUTs during learning predict disruptions to encoding and comprehension (e.g., Hollis & Was, [<reflink idref="bib23" id="ref187">23</reflink>]; Kane, Carruth, et al., [<reflink idref="bib27" id="ref188">27</reflink>]; Kane et al., [<reflink idref="bib29" id="ref189">29</reflink>]; Risko et al., [<reflink idref="bib59" id="ref190">59</reflink>]; Varao-Sousa & Kingstone, [<reflink idref="bib77" id="ref191">77</reflink>]; Wammes et al., [<reflink idref="bib78" id="ref192">78</reflink>], [<reflink idref="bib80" id="ref193">80</reflink>]; Wammes & Smilek [<reflink idref="bib79" id="ref194">79</reflink>]). Indeed, with our lengthy (~ 50 min) video lecture on statistics, we replicated prior findings of frequent mind wandering during video lectures (<emph>M</emph> TUT rate ≈ 0.40–0.50) and a negative correlation between TUT rate and posttest test performance (<emph>r</emph> ≈ − 0.40); these replicated findings include the key correlations reported by our previous study using these same learning materials (Kane et al., [<reflink idref="bib29" id="ref195">29</reflink>]).</p> <p>The present laboratory study drew upon a smaller literature on behavioral interventions, such as interpolated testing and pretesting, that might reduce TUTs in learning contexts (Jing et al., [<reflink idref="bib25" id="ref196">25</reflink>]; Pan et al., [<reflink idref="bib48" id="ref197">48</reflink>]; Szpunar, Khan, et al., [<reflink idref="bib74" id="ref198">74</reflink>]). If interpolated testing or pretesting reduce TUTs, it not only presents a practical solution to an applied educational problem, but also suggests that basic theoretical work might profitably expand to consider how attentional mechanisms contribute to testing and pretest effects in learning and memory (e.g., Kornell & Vaughn, [<reflink idref="bib32" id="ref199">32</reflink>]; Metcalfe, [<reflink idref="bib43" id="ref200">43</reflink>]; Pan & Rickard, [<reflink idref="bib47" id="ref201">47</reflink>]), especially in ecologically valid contexts where subsequent learning builds on prior learning (e.g., Chan et al., [<reflink idref="bib9" id="ref202">9</reflink>]).</p> <p>We designed the present study to address concerns regarding sample sizes, measurement limitations, and potential confounds (e.g., effects of notetaking) in prior work in this area (Jing et al., [<reflink idref="bib25" id="ref203">25</reflink>]; Pan et al., [<reflink idref="bib48" id="ref204">48</reflink>]; Szpunar, Khan, et al., [<reflink idref="bib74" id="ref205">74</reflink>]). The study was well powered to detect medium-sized main effects (of interpolated testing and content-matched pretesting). It used well validated thought probes to assess TUTs (and lecture-related off-task thought), and it prevented notetaking to clarify the mechanisms of any potential testing or pretesting benefits. It also contrasted a matched-pretest group to a mismatched-pretest group, to isolate the possible mechanism of any pretesting effect on TUTs found here (i.e., scaffolding attention to the foreshadowed, critical topics).</p> <hd id="AN0156024414-31">Interpolated testing and TUT rate</hd> <p>We conceptually replicated prior findings (Jing et al., [<reflink idref="bib25" id="ref206">25</reflink>]; Szpunar, Khan, et al., [<reflink idref="bib74" id="ref207">74</reflink>]) that students given periodic tests within a lecture reported significantly fewer TUTs (<emph>M</emph> rate = 0.40) than did those who restudied the same information (<emph>M</emph> rate = 0.47); we also replicated the Jing et al. ([<reflink idref="bib25" id="ref208">25</reflink>]) finding that interpolated testing did not increase situational interest in the lecture, despite reducing TUTs. Consistent with the metacognition framework for explaining test-potentiated new learning (see Chan et al., [<reflink idref="bib9" id="ref209">9</reflink>]), the interpolated-testing benefit over restudying suggests that testing works by providing students with feedback on their learning from the prior portions of the lecture, which then motivates greater attention.</p> <p>The collective results across studies, however, suggest that this interpolated-testing effect on TUT rate yields highly variable standardized effect sizes: Two prior experiments reported Cohen's <emph>d</emph>s of approximately 1.0 (Jing et al., [<reflink idref="bib25" id="ref210">25</reflink>], Experiment 2, <emph>n</emph> = 18 per group; Szpunar, Khan, et al., [<reflink idref="bib74" id="ref211">74</reflink>], <emph>n</emph> = 16 per group), one prior experiment reported a nonsignificant testing effect (<emph>d</emph> = 0.15; Jing et al., [<reflink idref="bib25" id="ref212">25</reflink>], Experiment 1, <emph>n</emph> = 18 per group), and the present study reported a just-significant effect with a modest <emph>d</emph> = 0.29 (<emph>n</emph> = 96–100 per group). Moreover, the BF for the present study's effect of interpolated testing on TUTs indicated anecdotal evidence supporting the <emph>null</emph> hypothesis.</p> <p>Some of this effect-size variability across studies is likely due to small sample sizes, which produce noisy effect-size estimates (e.g., Perugini et al., [<reflink idref="bib52" id="ref213">52</reflink>]; Schäfer & Schwarz, [<reflink idref="bib64" id="ref214">64</reflink>]). As well, standardized effect sizes are products not only of intervention strength but also of the entire study design, including heterogeneity within the studied sample (e.g., Simpson, [<reflink idref="bib70" id="ref215">70</reflink>]). It is possible, then, that the larger effect sizes from prior studies arose from testing only Harvard University students (Szpunar, Khan, et al., [<reflink idref="bib74" id="ref216">74</reflink>]) or an unspecified mix of Harvard and Boston University students (Jing et al., [<reflink idref="bib25" id="ref217">25</reflink>]). Both samples were likely much more intellectually homogeneous than students at a comprehensive state university, such as UNCG, which should reduce the ratio of noise to signal and thus produce larger effect sizes.</p> <p>With that said, any generalizations from this small literature are challenging for many reasons: These few studies are so methodologically diverse that effect sizes might vary systematically with aspects of the study design, such as subject sample, video topic and length, number of thought probes, thought-probe format, number of interpolated tests and their format, interpolated-test difficulty, allowing or not allowing notetaking, posttest retention interval and difficulty, and extent of subjects' prior knowledge on the lecture topic. Future research on the effect of interpolated testing on TUTs should thus take designing-for-variation and meta-analytic approaches to estimating effect size and its robustness (e.g., Baribault et al., [<reflink idref="bib2" id="ref218">2</reflink>]; Brunswik, [<reflink idref="bib6" id="ref219">6</reflink>]; Fyfe et al., [<reflink idref="bib18" id="ref220">18</reflink>]; Greenwald et al., [<reflink idref="bib20" id="ref221">20</reflink>]; Harder, [<reflink idref="bib22" id="ref222">22</reflink>]; Landy et al., [<reflink idref="bib34" id="ref223">34</reflink>]).</p> <p>We were able to provisionally rule out one possible explanation for the small effect of interpolated testing on TUTs found here, however. Our lecture video was longer (52 min) than those used in prior studies (21 min in Szpunar, Khan, et al., [<reflink idref="bib74" id="ref224">74</reflink>]; 40 min in Jing et al., [<reflink idref="bib25" id="ref225">25</reflink>]), and most learning studies find that TUT rates increase substantially over the lecture period (e.g., Cohen et al., [<reflink idref="bib11" id="ref226">11</reflink>]; Kane, Carruth, et al., [<reflink idref="bib27" id="ref227">27</reflink>]; Kane et al., [<reflink idref="bib29" id="ref228">29</reflink>]; Lindquist & McLean, [<reflink idref="bib38" id="ref229">38</reflink>]). Perhaps, then, we underestimated the testing benefit on TUTs because the negative effects of time-on-task were stronger than the benefits of interpolated testing. Although we replicated prior findings of TUT rates increasing over the lecture here, we did not find an interaction of lecture block with interpolated activity. TUT rates increased similarly for the testing and restudy groups across the lecture, with no sign of an early benefit of interpolated testing over restudy that diminished with time.</p> <p>As a final interpretive point, we consider here that the present study produced a small but significant interpolated-testing effect on TUTs but no significant testing effect on subsequent posttest performance, unlike prior studies (Jing et al., [<reflink idref="bib25" id="ref230">25</reflink>]; Szpunar, Khan, et al., [<reflink idref="bib74" id="ref231">74</reflink>]). This null posttest finding may appear peculiar on the surface, given that learning and TUT experiences are likely linked. One major difference between our study and the prior studies, however, is that our subjects were not allowed to take notes during the lecture. It is thus possible that notetaking—which increased significantly under interpolated testing—contributed to these prior findings of interpolated-testing effects on posttest performance (Jing et al., [<reflink idref="bib25" id="ref232">25</reflink>]; Szpunar, Khan, et al., [<reflink idref="bib74" id="ref233">74</reflink>]).</p> <p>More generally, other aspects of our study design may have minimized the size of the interpolated testing effect on memory (i.e., on the lecture posttest), based on moderator results from recent meta-analyses (Adesope et al., [<reflink idref="bib1" id="ref234">1</reflink>]; Rowland, [<reflink idref="bib62" id="ref235">62</reflink>]). For example, our posttest contained recognition and free-response items, which produce weaker testing effects on memory than does cued recall (<emph>g</emph> = 0.29 vs. 0.61; Rowland, [<reflink idref="bib62" id="ref236">62</reflink>]).[<reflink idref="bib4" id="ref237">4</reflink>] Further, we had a brief retention interval, which reduces testing effects on memory relative to longer retention intervals (<emph>g</emph> = 0.56 vs. 0.82, Adesope et al., [<reflink idref="bib1" id="ref238">1</reflink>]; <emph>g</emph> = 0.41 vs. 0.69; Rowland, [<reflink idref="bib62" id="ref239">62</reflink>]). As well, our interpolated tests and the final test presented different items (sometimes on different lecture subtopics), which reduces testing effect sizes relative to matching items (<emph>g</emph> = 0.53 vs. 0.63; Adesope et al., [<reflink idref="bib1" id="ref240">1</reflink>]). Finally, we did not provide feedback about initial learning or following the interpolated tests, which one meta-analysis (Rowland, [<reflink idref="bib62" id="ref241">62</reflink>]) found to reduce testing effects on memory (no feedback: <emph>g</emph> = 0.39 vs. feedback: <emph>g</emph> = 0.73; but see Adesope et al., [<reflink idref="bib1" id="ref242">1</reflink>], with <emph>g</emph>s = 0.60 vs. 0.63, respectively).[<reflink idref="bib5" id="ref243">5</reflink>]</p> <p>We might have found a larger testing effect on posttest recall if we had used a longer retention interval, if we had matched posttest items to interpolated-test items, or if we had focused our posttest on cued-recall items. None of these variables, however, could have retroactively affected mind wandering that had already occurred during the lecture. That is, because several mechanisms contribute to interpolated testing effects on final recall but not on in-lecture TUTs, and because any interpolated-testing effects on TUTs should have some downstream consequences for learning—rather than vice versa—the finding of large, small, or null testing effects on final memory tests should not be considered diagnostic for evaluating the evidence for interpolated-testing effects on TUTs.</p> <hd id="AN0156024414-32">Content-matched pretesting and TUT rate</hd> <p>Building on Pan et al. ([<reflink idref="bib48" id="ref244">48</reflink>]), who found that pretesting, either before each video segment or before the entire video, reduced retrospective TUT ratings relative to no-pretest controls (<emph>d</emph>s = 0.39, 0.74, and 0.91), we found no effect of content-matched pretesting versus content-mismatched pretesting in reducing TUT reports to in-the-moment thought probes (<emph>M</emph> TUT rates =.43 and.44 for matching and mismatching pretest groups, respectively; <emph>F</emph> < 1). Although these conflicting results may reflect sampling error, they were likely driven by the different control conditions across studies.</p> <p>Whereas Pan et al. ([<reflink idref="bib48" id="ref245">48</reflink>]) compared pretested subjects to those who completed an unrelated task (algebra problems), as is typical of the pretesting literature, we compared pretested subjects to those who also completed a pretest on lecture-<emph>related</emph> topics not appearing in the video or posttest. So, here, we found that subjects provided with advance warning of the topics to be covered in (and tested from) the lecture did not mind-wander less than did subjects who were uninformed about the <emph>specific</emph> topics to be covered in (and tested from) the lecture.</p> <p>If our null content-matched pretesting findings are replicable, they suggest that any pretesting benefit on TUTs does not arise from highlighting to subjects what specific information they should most closely attend to during the lecture. Such pretesting benefits, such as that reported by Pan et al. ([<reflink idref="bib48" id="ref246">48</reflink>]), might instead arise from the more general feedback that subjects receive from completing a challenging pretest that demonstrates their lack of knowledge. Although, as noted earlier, effects of pretesting <emph>on memory</emph> may sometimes be limited to material that matches what was included in the pretest (e.g., Pressley et al., [<reflink idref="bib54" id="ref247">54</reflink>]; Richland et al., [<reflink idref="bib57" id="ref248">57</reflink>]), any effects of pretesting (versus no pretesting) <emph>on TUTs</emph> may be due to pretesting increasing curiosity or the motivation to attend and reduce the knowledge deficit, that is, by the same mechanism that is likely responsible for any interpolated-testing effect on TUTs.[<reflink idref="bib6" id="ref249">6</reflink>]</p> <p>The lack of a pretesting-content match on learning in the present study might be attributable to subjects' failure to remember the pretest items (or topics) during the lecture. That is, subjects who took the matched pretest might not have processed the items deeply enough to remember them (and any errors they made on those items) while watching the video or while taking the posttest. For example, St. Hilaire and Carpenter ([<reflink idref="bib73" id="ref250">73</reflink>]) reported a pretesting benefit for learning only in cases where subjects remembered the pretested items during the video lecture. Given that our video lecture was so lengthy (see Geller et al., [<reflink idref="bib19" id="ref251">19</reflink>])—at over twice the duration of the Pan et al. ([<reflink idref="bib48" id="ref252">48</reflink>]) lecture—and that our pretest material was unfamiliar to most subjects, they may have not been able to effectively associate the ongoing lecture with the pretest.</p> <hd id="AN0156024414-33">Interpolated testing, content-matched pretesting, and lecture-related off-task thought</hd> <p>Students in both classroom and laboratory studies sometimes report thoughts that are not about the here-and-now of a lecture but that are nonetheless conceptually related to the topic (e.g., thinking about earlier lecture points, or connecting lecture material to everyday life; Locke & Jensen, [<reflink idref="bib40" id="ref253">40</reflink>]; Schoen, [<reflink idref="bib66" id="ref254">66</reflink>]; Shukor, [<reflink idref="bib68" id="ref255">68</reflink>]). Such lecture-related mind-wandering might even be helpful (perhaps akin to elaboration effects in memory; Craik & Tulving, [<reflink idref="bib12" id="ref256">12</reflink>]), as it correlates positively with learning from that lecture (Jing et al., [<reflink idref="bib25" id="ref257">25</reflink>]; Kane et al., [<reflink idref="bib29" id="ref258">29</reflink>]).</p> <p>The present study replicated the modest positive correlation between lecture-related off-task thought and posttest performance (<emph>r</emph> = 0.15). However, whereas Jing et al. ([<reflink idref="bib25" id="ref259">25</reflink>]) reported that interpolated testing both decreased TUTs and increased lecture-related off-task thoughts during a video lecture, we did not find either interpolated testing or the match in pretesting content to increase lecture-related off-task thoughts in a larger subject sample. Again, a designing-for-variation approach in future work, with well-powered studies and meta-analyses, might indicate the dependency of any association between lecture-related off-task thought and learning to particular aspects of the learning or testing context.</p> <hd id="AN0156024414-34">Additional limitations and constraints on generality</hd> <p>While the present study arguably has some strengths compared to prior studies of testing and pretesting on TUTs, there are limitations worth noting that we have not yet discussed. First, like most studies of TUTs in educational contexts, the present investigation was limited in using a convenience sample of North American undergraduates from a single university (albeit a university with a relatively diverse student body).</p> <p>Second, our randomization process did not yield sufficiently similar groups across testing conditions after exclusions, as subjects in our interpolated-restudy groups scored higher on the pretest, on average, than did subjects in our interpolated-testing groups. Although we conducted all analyses both with and without pretest scores as a covariate, and although pretest scores did not significantly predict any of our thought-report outcomes, confidence in our conclusions would be stronger had our design produced better-matched groups.</p> <p>Third, like most investigations of TUTs during video lectures, our study design did not match some aspects of university students' real-world learning from video materials. Subjects were not able to pause or rewind the lengthy video, to take notes, or to ask questions about the lecture content. In controlling the flow of learning material and limiting typical learning aids (which was important to determining whether interpolated-testing effects on TUTs were independent of interpolated-testing effects on notetaking), we may have hampered subjects' efforts to build integrative mental models of the material. This may then have artificially inflated their tendency for TUTs and the disruptive influence of TUTs on learning.</p> <p>Finally, some of the current study's results may have been biased by the thought probes we used—content-focused probes that assessed not only TUTs but also lecture-related and comprehension-related thoughts. Prior work has found reports of these thought types with open-ended probes (Jordano, [<reflink idref="bib26" id="ref260">26</reflink>]; Locke & Jensen, [<reflink idref="bib40" id="ref261">40</reflink>]; Schoen, [<reflink idref="bib66" id="ref262">66</reflink>]), suggesting that they are not always produced as a demand effect. Nonetheless, studies that repeatedly present probes asking about lecture- and comprehension-related off-task thoughts have the potential to bias subjects' experiences or reporting. If stronger students particularly believe they should have these kinds of thoughts, or that such thoughts are likely to be helpful, they may come to have these thoughts more frequently or simply endorse them more frequently as a socially desirable response. Such selective biasing may contribute to the positive correlation between lecture-related off-task thoughts and learning (see also Jing et al., [<reflink idref="bib25" id="ref263">25</reflink>]; Kane et al., [<reflink idref="bib29" id="ref264">29</reflink>]), although they cannot explain the lack of correlation between comprehension-related off-task thoughts and learning (see also Kane et al., [<reflink idref="bib29" id="ref265">29</reflink>]).</p> <hd id="AN0156024414-35">Conclusion</hd> <p>Consistent with a small number of studies measuring TUTs during video lectures (Jing et al., [<reflink idref="bib25" id="ref266">25</reflink>]; Szpunar, Khan, et al., [<reflink idref="bib74" id="ref267">74</reflink>]), we found that interpolated tests significantly reduced TUT rates relative to interpolated restudy opportunities, but the standardized effect size was small—considerably smaller than in most prior studies—and the associated Bayes factor suggested inadequate evidence for either the null or the alternative model. The benefits of interpolated testing to engaging students' attention may thus be smaller or more fragile than anticipated. Indeed, they may be too small or fragile to be of much practical use in reducing TUTs in authentic educational settings.</p> <p>We did not find that the match in content of a pretest about the upcoming lecture material reduced TUTs compared to a mismatch in content. If the pretesting effect on TUTs found by Pan et al. ([<reflink idref="bib48" id="ref268">48</reflink>]) is genuine and generalizable, then pretesting may reduce TUTs by showing students how little they know about a general topic and thus motivating them to pay attention (detectable with the Pan et al. design), rather than by highlighting or foreshadowing test-specific material for enhanced attentional focus (detectable with our design).</p> <hd id="AN0156024414-36">Acknowledgements</hd> <p>For assistance in data collection, we thank Faiza Asif-Fraz, Lewis Faw, Kristen Fisher, Daniel Josephsohn, April Matthews, Aaron Newcomer, Hadley Palattella, Joshua Perkins, Patrick Redmond, and Devin Tilley. For assistance in study management, we thank Hadley Palattella.</p> <hd id="AN0156024414-37">Significance statement</hd> <p>Educators strive to create learning environments and practices that optimize student success. One barrier to effective learning is that students' attention drifts throughout educational activities, including live and prerecorded lectures. When students experience mind wandering, they less effectively encode the material into memory. Thus, educators face the challenge of keeping students' attention focused to optimize learning. The present laboratory study tested whether two instructional methods that promote learning might enhance students' attention during a video lecture. These two methods are: (a) providing pretests on upcoming to-be-learned information, and (b) periodically testing students on recently learned information. We found that periodically testing students modestly reduced their rates of mind wandering relative to periodically presenting them with lecture material to restudy, but the effectiveness of testing was less than in prior, smaller studies. Pretesting on lecture-relevant materials, however, did not reduce mind wandering relative to pretesting lecture-unrelated materials. Our results suggest that inserting brief tests into lectures may somewhat suppress students' tendencies to mind wander, perhaps by alerting them to their learning deficiencies and motivating more focused attention; the lack of a pretesting benefit will require additional investigation.</p> <hd id="AN0156024414-38">Authors' contributions</hd> <p>MSW drafted the initial manuscript, conducted all statistical analyses, and created all figures and tables. NEP, BAS, AM, and MJK contributed to the conception and design of the study. NEP and BAS programmed experimental tasks and measures. NEP, BAS, and MJK oversaw data collection and curation. AM and MJK provided feedback on, and revised, drafts of manuscript. All authors read and approved the final manuscript.</p> <hd id="AN0156024414-39">Funding</hd> <p>This research was supported by award numbers DRL1252333 (to Michael Kane) and DRL1252385 (to Akira Miyake) from the National Science Foundation. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Science Foundation.</p> <hd id="AN0156024414-40">Availability of data and materials</hd> <p>All data and materials created for the present study are available at the OSF site, https://osf.io/6ujsg/. Video lecture materials used here are available from the Kane et al. ([<reflink idref="bib29" id="ref269">29</reflink>]) OSF site, https://osf.io/u5bnw/</p> <hd id="AN0156024414-41">Declarations</hd> <p></p> <hd id="AN0156024414-42">Ethics approval and consent to participate</hd> <p>The study received ethics approval from the Institutional Review Board of the University of North Carolina at Greensboro (UNCG). All subjects were 18–35 years old and provided informed consent before participating.</p> <hd id="AN0156024414-43">Consent for publication</hd> <p>Not applicable.</p> <hd id="AN0156024414-44">Competing interests</hd> <p>The authors declare they have no competing interests.</p> <hd id="AN0156024414-45">Appendix A: Follow-up ANCOVA results for the ANOVA effects of interest</hd> <p>For all the dependent variables below, we re-conducted the reported 2 × 2 ANOVAs as ANCOVAs that controlled for pretest score (standardized within pretest type).</p> <p></p> <ulist> <item> <emph>TUT rates.</emph> Of most importance, the ANCOVA indicated a significant effect of interpolated activity (testing versus restudy) on TUT rate, <emph>F</emph>(<reflink idref="bib1" id="ref270">1</reflink>, 190) = 4.88, <emph>p</emph> = 0.028, η<subs>p</subs><sups>2</sups> = 0.025, as did the ANOVA; the pretest covariate did not predict TUT rate, <emph>F</emph>(<reflink idref="bib1" id="ref271">1</reflink>, 190) = 2.01, <emph>p</emph> = 0.158, <emph>η</emph><subs><emph>p</emph></subs><sups>2</sups> = 0.010. Also as in the ANOVA, there was no significant effect of pretest-content match, <emph>F</emph>(<reflink idref="bib1" id="ref272">1</reflink>, 190) = 0.42, <emph>p</emph> = 0.518, <emph>η</emph><subs><emph>p</emph></subs><sups>2</sups> = 0.002, nor an interaction of interpolated activity and pretest-content match, <emph>F</emph>(<reflink idref="bib1" id="ref273">1</reflink>, 190) = 0.10, <emph>p</emph> = 0.750, <emph>η</emph><subs><emph>p</emph></subs><sups>2</sups> < 0.001.</item> <p></p> <item> <emph>Lecture-related off-task thought rates.</emph> Pretest scores did not significantly predict lecture-related mind-wandering rates, <emph>F</emph>(<reflink idref="bib1" id="ref274">1</reflink>, 190) = 1.68, <emph>p</emph> = 0.197, <emph>η</emph><subs><emph>p</emph></subs><sups>2</sups> = 0.009; and, as in the ANOVA, there was no effect indicated for interpolated activity, <emph>F</emph>(<reflink idref="bib1" id="ref275">1</reflink>, 190) = 0.01, <emph>p</emph> = 0.906, <emph>η</emph><subs><emph>p</emph></subs><sups>2</sups> = 0.000, pretest-content match, <emph>F</emph>(<reflink idref="bib1" id="ref276">1</reflink>, 190) = 0.47, <emph>p</emph> = 0.495, <emph>η</emph><subs><emph>p</emph></subs><sups>2</sups> = 0.002, or their interaction, <emph>F</emph>(<reflink idref="bib1" id="ref277">1</reflink>, 190) = 0.63, <emph>p</emph> = 0.429, <emph>η</emph><subs><emph>p</emph></subs><sups>2</sups> = 0.003.</item> <p></p> <item> <emph>Comprehension-related off-task thought rates.</emph> Pretest scores did not significantly predict comprehension-related thought rates, <emph>F</emph>(<reflink idref="bib1" id="ref278">1</reflink>, 190) = 0.62, <emph>p</emph> = 0.431, <emph>η</emph><subs><emph>p</emph></subs><sups>2</sups> = 0.003. As in the ANOVA, there was no significant effect of interpolated activity, <emph>F</emph>(<reflink idref="bib1" id="ref279">1</reflink>, 190) = 0.28, <emph>p</emph> = 0.600, <emph>η</emph><subs><emph>p</emph></subs><sups>2</sups> = 0.001, pretest-content match, <emph>F</emph>(<reflink idref="bib1" id="ref280">1</reflink>, 190) = 1.22, <emph>p</emph> = 0.271, <emph>η</emph><subs><emph>p</emph></subs><sups>2</sups> = 0.006, or their interaction, <emph>F</emph>(<reflink idref="bib1" id="ref281">1</reflink>, 190) = 0.31, <emph>p</emph> = 0.578, <emph>η</emph><subs><emph>p</emph></subs><sups>2</sups> = 0.002.</item> <p></p> <item> <emph>Posttest scores</emph>. As in the ANOVA, there was no significant effect of interpolated activity (i.e., no testing effect), <emph>F</emph>(<reflink idref="bib1" id="ref282">1</reflink>,<reflink idref="bib190" id="ref283">190</reflink>) = 1.23, <emph>p</emph> = 0.270, <emph>η</emph><subs><emph>p</emph></subs><sups>2</sups> = 0.006, or of pretest-content match (i.e., no pretesting effect), <emph>F</emph>(<reflink idref="bib1" id="ref284">1</reflink>,<reflink idref="bib190" id="ref285">190</reflink>) = 0.00, <emph>p</emph> = 0.995, <emph>η</emph><subs><emph>p</emph></subs><sups>2</sups> = 0.000, and no interaction, <emph>F</emph>(<reflink idref="bib1" id="ref286">1</reflink>,<reflink idref="bib190" id="ref287">190</reflink>) = 0.01, <emph>p</emph> = 0.907, <emph>η</emph><subs><emph>p</emph></subs><sups>2</sups> = 0.000. The pretest score covariate did, however, significantly predict posttest score, <emph>F</emph>(<reflink idref="bib1" id="ref288">1</reflink>,<reflink idref="bib190" id="ref289">190</reflink>) = 12.48, <emph>p</emph> < 0.001, <emph>η</emph><subs><emph>p</emph></subs><sups>2</sups> = 0.062.</item> <p></p> <item> <emph>Situational interest</emph>. Pretest scores did not significantly predict situational interest, <emph>F</emph>(<reflink idref="bib1" id="ref290">1</reflink>, 176) = 2.78, <emph>p</emph> = 0.097, <emph>η</emph><subs><emph>p</emph></subs><sups>2</sups> = 0.016. As in the ANOVA, there was no significant effect of interpolated activity, <emph>F</emph>(<reflink idref="bib1" id="ref291">1</reflink>, 176) = 1.05, <emph>p</emph> = 0.306, <emph>η</emph><subs><emph>p</emph></subs><sups>2</sups> = 0.006, pretest-content match, <emph>F</emph>(<reflink idref="bib1" id="ref292">1</reflink>, 176) = 0.02, <emph>p</emph> = 0.900, <emph>η</emph><subs><emph>p</emph></subs><sups>2</sups> = 0.000, or their interaction, <emph>F</emph>(<reflink idref="bib1" id="ref293">1</reflink>, 176) = 1.52, <emph>p</emph> = 0.219, <emph>η</emph><subs><emph>p</emph></subs><sups>2</sups> = 0.009.</item> </ulist> <hd id="AN0156024414-46">Appendix B: Exploratory analyses of performance on the interpolated tests</hd> <p>The lack of a traditional interpolated-testing effect on learning, and the significant-but-modest interpolated-testing effect on TUTs, suggests that we should consider subjects' performance on the interpolated tests. Perhaps interpolated testing did not greatly help students to sustain attention or learn because the tests were too difficult to motivate continued or improved effort. For similar reasons, perhaps testing only selectively helped the students who answered more of the interpolated-test items correctly. We addressed these possibilities by analyzing interpolated-test performance (max score = 6 points per interpolated test) from the two interpolated-testing conditions, collapsed across pretest-match conditions (<emph>n</emph> = 98), and assessing correlations between interpolated-test performance and TUT rates and posttest scores.</p> <p>First, mean points per interpolated test was 2.35 with SD = 0.79. Although these scores were not at floor, they do indicate that subjects typically answered more than half of interpolated-test items incorrectly. The Chan et al. ([<reflink idref="bib9" id="ref294">9</reflink>]) meta-analysis of test-potentiated new learning indicated, however, that performance levels on interpolated tests did not significantly moderate the effect. These meta-analytic findings suggest that the somewhat low mean performance in our sample is not driving the small interpolated-testing benefits we found here.</p> <p>Second, interpolated-test performance correlated with our outcome measures of interest. Specifically, interpolated-test performance was significantly correlated with TUT rate, <emph>r</emph>(<reflink idref="bib97" id="ref295">97</reflink>) =  − 0.26 [− 0.44, − 0.07], <emph>p</emph> < 0.05, and posttest performance, <emph>r</emph>(<reflink idref="bib97" id="ref296">97</reflink>) = 0.68 [0.56, 0.78], <emph>p</emph> < 0.001. Subjects who were more successful in their retrieval practice demonstrated fewer TUTs during the lecture and better mastery of the lecture material. Unfortunately, these correlational findings are causally ambiguous. They might support the claim that interpolated-test performance reduced TUTs and improved learning, or they might instead indicate that better sustained attention and learning increased subjects' performance on the interpolated tests.</p> <p>We attempted to examine this issue further, with respect to TUT rates, by assessing how interpolated-test performance correlated with TUTs during the current (just completed) block and subsequent block (e.g., Block 1 Test × Block 1 TUT Rate vs. Block 1 Test × Block 2 TUT Rate). If interpolated-testing performance selectively predicts subsequent TUT rate, that would suggest that testing reduces mind wandering. However, if interpolated-testing performance selectively predicts the just-completed block's TUT rate, that would suggest that mind wandering reduces test performance.</p> <p>Table A1 presents the results of these block analyses. Again, the results are ambiguous. In some cases, interpolated-test performance is significantly negatively correlated with current block TUTs, consistent with TUTs causing poorer interpolated-test performance. In other cases, however, interpolated-test performance correlates with subsequent block TUT rates, consistent with interpolated testing causing a reduction in TUTs. Given this ambiguity—not to mention the possible reciprocal influences of interpolated-testing and TUTs on each other—we cannot confidently interpret the directionality of their association.</p> <p>Table A1 Correlations [with 95% CIs] between interpolated-test performance and TUTs, by current block and subsequent block.</p> <p> <ephtml> <table frame="hsides" rules="groups"><thead><tr><th align="left" /><th align="left"><p>Current block TUT rate</p></th><th align="left"><p>Subsequent block TUT rate</p></th></tr></thead><tbody><tr><td align="left"><p>Interpolated Test 1</p></td><td char="." align="char"><p><italic>r</italic>(97) = .10 [− .10,.29]</p></td><td char="." align="char"><p><italic>r</italic>(97) = .25 [.06,.43]*</p></td></tr><tr><td align="left"><p>Interpolated Test 2</p></td><td char="." align="char"><p><italic>r</italic>(97) =  − .27 [− .45, − .08]**</p></td><td char="." align="char"><p><italic>r</italic>(97) =  − .25 [− .43, − .06]*</p></td></tr><tr><td align="left"><p>Interpolated Test 3</p></td><td char="." align="char"><p><italic>r</italic>(97) =  − .13 [− .32,.07]</p></td><td char="." align="char"><p><italic>r</italic>(97) =  − .29 [− .46, − .09]**</p></td></tr><tr><td align="left"><p>Interpolated Test 4</p></td><td char="." align="char"><p><italic>r</italic>(97) =  − .20 [− .39, − .01]*</p></td><td char="." align="char"><p><italic>r</italic>(97) =  − .14 [− .33,.06]</p></td></tr><tr><td align="left"><p>Interpolated Test 5</p></td><td char="." align="char"><p><italic>r</italic>(97) =  − .19 [− .37,.01]</p></td><td char="." align="char" /></tr></tbody></table> </ephtml> </p> <p>*<emph>p</emph> < .05</p> <p>**<emph>p</emph> < .01</p> <hd id="AN0156024414-47">Publisher's Note</hd> <p>Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.</p> <ref id="AN0156024414-48"> <title> References </title> <blist> <bibl id="bib1" idref="ref35" type="bt">1</bibl> <bibtext> Adesope OO, Trevisan DA, Sundararajan N. Rethinking the use of tests: A meta-analysis of practice testing. Review of Educational Research. 2017; 87: 659-701</bibtext> </blist> <blist> <bibl id="bib2" idref="ref122" type="bt">2</bibl> <bibtext> Baribault B, Donkin C, Little DR, Trueblood JS, Oravecz Z, van Ravenzwaaij D, White CN, De Boeck P, Vandekerckhove J. Metastudies for robust tests of theory. Proceedings of the National Academy of Sciences. 2018; 115: 2607-2612</bibtext> </blist> <blist> <bibl id="bib3" idref="ref123" type="bt">3</bibl> <bibtext> Ben-Shachar, M. S, Makowski, D, & Lüdecke, D. (2020). Compute and interpret indices of effect size. CRAN. Available from https://github.com/easystats/effectsize.</bibtext> </blist> <blist> <bibl id="bib4" idref="ref12" type="bt">4</bibl> <bibtext> Bixler R, D'Mello S. Automatic gaze-based user-independent detection of mind wandering during computerized reading. User Modeling and User-Adapted Interaction. 2016; 26: 33-68</bibtext> </blist> <blist> <bibl id="bib5" idref="ref74" type="bt">5</bibl> <bibtext> Bjork RA, Dunlosky J, Kornell N. Self-regulated learning: Beliefs, techniques, and illusions. Annual Review of Psychology. 2013; 64: 417-444. 23020639</bibtext> </blist> <blist> <bibl id="bib6" idref="ref219" type="bt">6</bibl> <bibtext> Brunswik E. Representative design and probabilistic theory in a functional psychology. Psychological Review. 1955; 62: 193-217. 14371898</bibtext> </blist> <blist> <bibl id="bib7" idref="ref70" type="bt">7</bibl> <bibtext> Bull SG, Dizney HF. Epistemic-curiosity-arousing prequestions: Their effect on long-term retention. Journal of Educational Psychology. 1973; 65: 45-49</bibtext> </blist> <blist> <bibl id="bib8" idref="ref36" type="bt">8</bibl> <bibtext> Carpenter SK, Toftness AR. The effect of prequestions on learning from video presentations. Journal of Applied Research in Memory & Cognition. 2017; 6: 104-109</bibtext> </blist> <blist> <bibl id="bib9" idref="ref41" type="bt">9</bibl> <bibtext> Chan JCK, Meissner CA, Davis SD. Retrieval potentiates new learning: A theoretical and meta-analytic review. Psychological Bulletin. 2018; 144: 1111-1146. 30265011</bibtext> </blist> <blist> <bibtext> Cho KW, Neely JH, Crocco S, Vitrano D. Testing enhances both encoding and retrieval for both tested and untested items. Quarterly Journal of Experimental Psychology: Human Experimental Psychology. 2017; 70: 1211-1235</bibtext> </blist> <blist> <bibtext> Cohen J, Hansel CEM, Sylvester JD. Mind wandering. British Journal of Psychology. 1956; 47: 61-62. 13304281</bibtext> </blist> <blist> <bibtext> Craik FIM, Tulving E. Depth of processing and the retention of words in episodic memory. Journal of Experimental Psychology: General. 1975; 104: 268-294</bibtext> </blist> <blist> <bibtext> Dhindsa K, Acai A, Wagner N, Bosynak D, Kelly S, Bhandari M, Petrisor B, Sonnadara RR. Individualized pattern recognition for detecting mind wandering from EEG during live lectures. PLoS ONE. 2019; 14; 9. 10.1371/journal.pone.0222276. 31513622. 6742406</bibtext> </blist> <blist> <bibtext> Faber M, Bixler R, D'Mello SK. An automated behavioral measure of mind wandering during computerized reading. Behavior Research Methods. 2018; 50: 134-150. 28181186</bibtext> </blist> <blist> <bibtext> Faul F, Erdfelder E, Lang AG, Buchner A. G_Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behavior Research Methods. 2007; 39: 175-191</bibtext> </blist> <blist> <bibtext> Finn B, Tauber SK. When confidence is not a signal of knowing: How students' experiences and beliefs about processing fluency can lead to miscalibrated confidence. Educational Psychology Review. 2015; 27: 567-586</bibtext> </blist> <blist> <bibtext> Forrin ND, Mills C, D'Mello SK, Risko EF, Smilek D, Seli P. TL;DR: Longer sections of text increase rates of unintentional mind-wandering. Journal of Experimental Education. 2021. 10.1080/00220973.2020.1751578</bibtext> </blist> <blist> <bibtext> Fyfe ER, de Leeuw JR, Carvalho PF, Goldstone RL, Sherman J, Admiraal D, Alford LK, Bonner A, Brassil CE, Brooks CA, Carbonetto T, Chang SH, Cruz L, Czymoniewicz-Klippel M, Daniel F, Driessen M, Habashy N, Hanson-Bradley CL, Hirt ER, Motz BA. ManyClasses 1: Assessing the generalizable effect of immediate feedback versus delayed feedback across many college classes. Advances in Methods and Practices in Psychological Science. 2021; 4: 1-24</bibtext> </blist> <blist> <bibtext> Geller J, Carpenter SK, Lamm MH, Rahman S, Armstrong PI, Coffman CR. Prequestions do not enhance the benefits of retrieval in a STEM classroom. Cognitive Research: Principles and Implications. 2017; 2; 1: 1-13</bibtext> </blist> <blist> <bibtext> Greenwald AG, Pratkanis AR, Leippe MR, Baumbardner MH. Under what conditions does theory obstruct research progress?. Psychological Review. 1986; 93: 216-229. 3714929</bibtext> </blist> <blist> <bibtext> Hannafin, M. J, & Hughes, C. W. (1986). A framework for incorporating orienting activities in computer-based interactive video.Instructional Science, 15, 239–255.</bibtext> </blist> <blist> <bibtext> Harder JA. The multiverse of methods: Extending the multiverse analysis to address data-collection decisions. Perspectives on Psychological Science. 2020; 15: 1158-1177. 32598854</bibtext> </blist> <blist> <bibtext> Hollis RB, Was CA. Mind wandering, control failures, and social media distraction in online learning. Learning and Instruction. 2016; 42: 104-112</bibtext> </blist> <blist> <bibtext> Immordino-Yang MH, Christodoulou JA, Singh V. Rest is not idleness: Implications of the brain's default mode for human development and education. Perspectives on Psychological Science. 2012; 7: 352-364. 26168472</bibtext> </blist> <blist> <bibtext> Jing HG, Szpunar KK, Schacter DL. Interpolated testing influences focused attention and improves integrations of information during a video-recorded lecture. Journal of Experimental Psychology: Applied. 2016; 22: 305-318. 27295464</bibtext> </blist> <blist> <bibtext> Jordano, M. J. (2018). How often do younger and older adults engage in monitoring? A new approach to studying metacognition. Unpublished doctoral dissertation. University of North Carolina at Greensboro.</bibtext> </blist> <blist> <bibtext> Kane MJ, Carruth N, Lurquin J, Silvia P, Smeekens BA, von Bastian CC, Miyake A. Individual differences in task-unrelated thought in university classrooms. Memory & Cognition. 2021; 49: 1247-1266</bibtext> </blist> <blist> <bibtext> Kane MJ, Smeekens BA, Meier M, Welhaf M, Phillips N. Testing the construct validity of competing measurement approaches to probed mind-wandering reports. Behavior Research Methods. 2021. 10.3758/s13428-021-01557-x. 34508289. 8613094</bibtext> </blist> <blist> <bibtext> Kane MJ, Smeekens BA, von Bastian CC, Lurquin JH, Carruth NP, Miyake A. A combined experimental and individual-differences investigation into mind wandering during a video lecture. Journal of Experimental Psychology: General. 2017; 146: 1649-1674</bibtext> </blist> <blist> <bibtext> Kass RE, Raftery AE. Bayes factors. Journal of the American Statistical Association. 1995; 90; 430: 773-795</bibtext> </blist> <blist> <bibtext> Kline RB. Beyond significance testing: Reforming data analysis methods in behavioral research. 2004; American Psychological Association. 10.1037/10693-000</bibtext> </blist> <blist> <bibtext> Kornell N, Vaughn KERoss B. How retrieval attempts affect learning: A review and synthesis. The Psychology of Learning and Motivation, 65. 2016; Academic Press: 183-215</bibtext> </blist> <blist> <bibtext> Lakens D. Calculating and reporting effect sizes to facilitate cumulative science: a practical primer for t-tests and ANOVAs. Frontiers in Psychology. 2013; 4: 863. 10.3389/fpsyg.2013.00863. 24324449. 3840331</bibtext> </blist> <blist> <bibtext> Landy JF, Jia M, Ding IL, Viganola D, Tierney W, Dreber A, Johannesson M, Pfeiffer T, Ebersole CR, Gronau QF, Ly A, van den Bergh D, Marsman M, Derks K, Wagenmakers E-J, Proctor A, Bartels DM, Bauman CW, Brady WJ, Uhlmann EL. Crowdsourcing hypothesis tests: Making transparent how design choices shape research results. Psychological Bulletin. 2020; 146: 451-479. 31944796</bibtext> </blist> <blist> <bibtext> Lang, J. M. (2020). Distracted: Why students can't focus and what you can do about it. Basic books.</bibtext> </blist> <blist> <bibtext> Lang, J. M. (2021). Small teaching: Everyday lessons from the science of learning (2nd ed.) Jossey-Bass.</bibtext> </blist> <blist> <bibtext> Lee HS, Ahn D. Testing prepares students to learn better: The forward effect of testing in category learning. Journal of Educational Psychology. 2017; 110: 203-217</bibtext> </blist> <blist> <bibtext> Lindquist SI, McLean JP. Daydreaming and its correlates in an education environment. Learning and Individual Differences. 2011; 21: 158-167</bibtext> </blist> <blist> <bibtext> Linnenbrink-Garcia L, Durik AM, Conley AM, Barron KE, Tauer JM, Karabenick SA, Harackiewicz JM. Measuring situational interest in academic domains. Educational and Psychological Measurement. 2010; 70: 647-671</bibtext> </blist> <blist> <bibtext> Locke LF, Jensen MK. Thought sampling: A study of student attention through self-report. Research Quarterly. 1974; 45: 263-275. 4529171</bibtext> </blist> <blist> <bibtext> Loh KK, Tan BZH, Lim SWH. Media multitasking predicts video-recorded lecture learning performance through mind wandering tendencies. Computers in Human Behavior. 2016; 63: 943-947</bibtext> </blist> <blist> <bibtext> Magnusson, K. (2020). Interpreting Cohen's d effect size: An interactive visualization (Version 2.4.2) [Web App]. R Psychologist. https://rpsychologist.com/cohend/</bibtext> </blist> <blist> <bibtext> Metcalfe J. Learning from errors. Annual Review of Psychology. 2017; 68: 465-489. 27648988</bibtext> </blist> <blist> <bibtext> Mills C, Gregg J, Bixler R, D'Mello SK. Eye-mind reader: An intelligent reading interface that promotes long-term comprehension by detecting and responding to mind wandering. Human-Computer Interaction. 2021; 36: 306-332</bibtext> </blist> <blist> <bibtext> Morey, R. D, & Rouder, J. N. (2018). BayesFactor: Computation of Bayes Factors for Common Designs. R package version 0.9.12–4.2. https://cran.r-project.org/web/packages/BayesFactor/index.html</bibtext> </blist> <blist> <bibtext> Pachai AA, Acai A, LoGiudice AB, Kim JA. The mind that wanders: Challenges and potential benefits of mind wandering in education. Scholarship of Teaching and Learning in Psychology. 2016; 2: 134-146</bibtext> </blist> <blist> <bibtext> Pan SC, Rickard TC. Transfer of test-enhanced learning: Meta-analytic review and synthesis. Psychological Bulletin. 2018; 144; 7: 710-756. 29733621</bibtext> </blist> <blist> <bibtext> Pan SC, Schmitt AG, Bjork EL, Sana F. Pretesting reduces mind wandering and enhances learning during online lectures. Journal of Applied Research in Memory and Cognition. 2020; 9: 542-554</bibtext> </blist> <blist> <bibtext> Pastötter B, Schicker S, Niedernhuber J, Bäuml KHT. Retrieval during learning facilitates subsequent memory encoding. Journal of Experimental Psychology: Learning, Memory, and Cognition. 2011; 37; 2: 287. 21171804</bibtext> </blist> <blist> <bibtext> Pastötter B, Bauml KH. Retrieval practice enhances new learning: The forward effect of testing. Frontiers in Psychology. 2014. 10.3389/fpsyg.2014.00286. 24772101. 3983480</bibtext> </blist> <blist> <bibtext> Peeck J. Effect of prequestions on delayed retention of prose material. Journal of Educational Psychology. 1970; 61: 241-246</bibtext> </blist> <blist> <bibtext> Perugini M, Gallucci M, Costantini G. Safeguard power as a protection against imprecise power estimates. Perspectives on Psychological Science. 2014; 9: 319-332. 26173267</bibtext> </blist> <blist> <bibtext> Pham P, Wang JConati C, Heffernan N, Mitrovic A, Verdejo MF. AttentiveLearner: Improving mobile MOOC learning via implicit heart rate tracking. Artificial intelligence in education. 2015; Springer: 367-376</bibtext> </blist> <blist> <bibtext> Pressley M, Tanenbaum R, McDaniel MA, Wood E. What happens when university students try to answer prequestions that accompany textbook material?. Contemporary Educational Psychology. 1990; 15: 27-35</bibtext> </blist> <blist> <bibtext> R Core Team (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://<ulink href="http://www.R-project.org/">www.R-project.org/</ulink></bibtext> </blist> <blist> <bibtext> Ralph BCW, Wammes JD, Barr N, Smilek D. Wandering minds and wavering goals: Examining the relation between mind wandering and grit in everyday life and the classroom. Canadian Journal of Experimental Psychology. 2017; 71: 120-132. 28604049</bibtext> </blist> <blist> <bibtext> Richland LE, Kornell N, Kao LS. The pretesting effect: Do unsuccessful retrieval attempts enhance learning?. Journal of Experimental Psychology: Applied. 2009; 15: 243-257. 19751074</bibtext> </blist> <blist> <bibtext> Risko EF, Anderson N, Sawal A, Engelhardt M, Kingstone A. Everyday attention: Variation in mind wandering and memory in a lecture. Applied Cognitive Psychology. 2011; 26: 234-242</bibtext> </blist> <blist> <bibtext> Risko EF, Buchanan D, Medimorec S, Kingstone A. Everyday attention: Mind wandering and computer use during lectures. Computers & Education. 2013; 68: 275-283</bibtext> </blist> <blist> <bibtext> Roediger HL III, Butler AC. The critical role of retrieval practice in long-term retention. Trends in Cognitive Sciences. 2011; 15: 20-27. 20951630</bibtext> </blist> <blist> <bibtext> Rouder JN, Speckman PL, Sun D, Morey RD, Iverson G. Bayesian t-tests for accepting and rejecting the null hypothesis. Psychonomic Bulletin & Review. 2009; 16: 225-237</bibtext> </blist> <blist> <bibtext> Rowland CA. The effect of testing versus restudy on retention: A meta-analytic review of the testing effect. Psychological Bulletin. 2014; 140: 1432-1463. 25150680</bibtext> </blist> <blist> <bibtext> Sagaria SD, Di Vesta F. Learner expectations induced by adjunct questions and the retrieval of intentional and incidental information. Journal of Educational Psychology. 1978; 70: 280-288</bibtext> </blist> <blist> <bibtext> Schäfer T, Schwarz MA. The meaningfulness of effect sizes in psychological research: Differences between sub-disciplines and the impact of potential biases. Frontiers in Psychology. 2019; 10: 813. 10.3389/fpsyg.2019.00813. 31031679. 6470248</bibtext> </blist> <blist> <bibtext> Schmalz X, Biurrun-Manresa J, Zhang L. What is a Bayes factor?. Psychological Methods. 2021. 10.1037/met0000421. 34780246</bibtext> </blist> <blist> <bibtext> Schoen JR. Use of consciousness sampling to study teaching methods. The Journal of Educational Research. 1970; 63: 387-390</bibtext> </blist> <blist> <bibtext> Schönbrodt FD, Perugini M. At what sample size do correlations stabilize?. Journal of Research in Personality. 2013; 47: 609-612</bibtext> </blist> <blist> <bibtext> Shukor, S. (2005). Insights into students' thoughts during problem based learning small group discussions and traditional tutorials. Unpublished manuscript. Retrieved March 18, 2016 from: <ulink href="http://www.tp.edu.sg/staticfiles/TP/files/centres/pbl/pbl%5fsuriya%5fshukor.pdf">http://www.tp.edu.sg/staticfiles/TP/files/centres/pbl/pbl%5fsuriya%5fshukor.pdf</ulink></bibtext> </blist> <blist> <bibtext> Simmons, J. P, Nelson, L. D, & Simonsohn, U. (2012). A 21 word solution. Dialogue: The Official Newsletter of the Society for Personality and Social Psychology, 26, 4 –7.</bibtext> </blist> <blist> <bibtext> Simpson A. Princesses are bigger than elephants: Effect size as a category error in evidence-based education. British Educational Research Journal. 2018; 44: 897-913</bibtext> </blist> <blist> <bibtext> Singmann, H, Bolker, B, Westfall, J, Aust, F, & Ben-Shachar, M. S. (2020). afex: Analysis of Factorial Experiments. R package version 0.27-2. https://CRAN.R-project.org/package=afex</bibtext> </blist> <blist> <bibtext> Smallwood J, Fishman DJ, Schooler JW. Counting the cost of an absent mind: Mind wandering as an underrecognized influence on educational performance. Psychonomic Bulletin & Review. 2007; 14: 230-236</bibtext> </blist> <blist> <bibtext> St. Hilaire, K. J, & Carpenter, S. K. (2020). Prequestions enhance learning, but only when they are remembered. Journal of Experimental Psychology: Applied, 26(4), 705–716.</bibtext> </blist> <blist> <bibtext> Szpunar KK, Khan NY, Schacter DL. Interpolated memory tests reduce mind wandering and improve learning of online lectures. Proceedings of the National Academy of Sciences, USA of the United States of America. 2013; 110: 6313-6317</bibtext> </blist> <blist> <bibtext> Szpunar KK, Moulton ST, Schacter DL. Mind wandering and education: From the classroom to online learning. Frontiers in Psychology. 2013; 4: 495. 10.3389/fpsyg.2013.00495. 23914183. 3730052</bibtext> </blist> <blist> <bibtext> Unsworth N, McMillan BD, Brewer GA, Spillers GJ. Everyday attention failures: An individual differences investigation. Journal of Experimental Psychology: Learning, Memory, and Cognition. 2012; 38: 1765-1772. 22468805</bibtext> </blist> <blist> <bibtext> Varao-Sousa TL, Kingstone A. Memory for lectures: How lecture format impacts the learning experience. PLoS ONE. 2015; 10; 11. 10.1371/journal.pone.0141587. 26561235. 4641615</bibtext> </blist> <blist> <bibtext> Wammes JD, Boucher PO, Seli P, Cheyne JA, Smilek D. Mind wandering during lectures I: Changes in rates across an entire semester. Scholarship of Teaching and Learning in Psychology. 2016; 2: 13-32</bibtext> </blist> <blist> <bibtext> Wammes JD, Ralph BCW, Mills C, Bosch N, Duncan TL, Smilek D. Disengagement during lectures: Media multi-tasking and mind wandering in university classrooms. Computers & Education. 2019; 132: 76-89</bibtext> </blist> <blist> <bibtext> Wammes JD, Seli P, Cheyne JA, Boucher PO, Smilek D. Mind wandering during lectures II: Relation to academic performance. Scholarship of Teaching and Learning in Psychology. 2016; 2: 33-48</bibtext> </blist> <blist> <bibtext> Wammes JD, Smilek D. Examining the influence of lecture format on degree of mind wandering. Journal of Applied Research in Memory and Cognition. 2017; 6; 2: 174-184</bibtext> </blist> <blist> <bibtext> Wickham H. ggplot2: Elegant Graphics for Data Analysis. 2016; Springer-Verlag</bibtext> </blist> <blist> <bibtext> Wickham, H, Averick, M, Bryan, J, Chang, W, McGowan, L. D, François, R, Grolemund, G, Hayes, A, Henry, L, Hester, J, Kuhn, M, Pederson, T. L, Miller, E, Bache, S. M, Müller, K, Ooms, J, Robinson, D, Seidel, D. P, Spinu, V. & Yutani, H. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4, 1686.</bibtext> </blist> <blist> <bibtext> Wissman KT, Rawson KA, Pyc MA. The interim test effect: Testing prior material can facilitate the learning of new material. Psychonomic Bulletin & Review. 2011; 18: 1140-1147</bibtext> </blist> </ref> <ref id="AN0156024414-49"> <title> Footnotes </title> <blist> <bibtext> Kane et al. ([29]) showed that the pretest items from the matched-content condition were challenging in their sample for students with no background in statistics (i.e., fewer than 2% of subjects answered more than three questions accurately while reporting high confidence/not guessing). We did not analyze confidence reports for the current study.</bibtext> </blist> <blist> <bibtext> The results of a 2 (pretest condition) × 2 (interpolated activity) ANOVA on just Part 1 of the posttest, which was comprised of the same 10 multiple-choice items as the matched-content pretest, suggested no main effect of pretest match, <emph>F</emph>(1, 191) = 0.61, <emph>p</emph> = .435, η<subs>p</subs><sups>2</sups> = .003, no effect of interpolated activity, <emph>F</emph>(1, 191) = 0.01, <emph>p</emph> = .933, η<subs>p</subs><sups>2</sups> < .001, and no interaction, <emph>F</emph>(1, 191) = 0.02, <emph>p</emph> = .883, η<subs>p</subs><sups>2</sups> < .001. Thus, subjects who took a pretest that perfectly matched the eventual final test did not outperform subjects who completed a content-mismatched pretest.</bibtext> </blist> <blist> <bibtext> Rates of comprehension-related off-task thoughts did not correlate with posttest scores in the present study, <emph>r</emph>(193) =  − .10 [− .24,.04], <emph>p</emph> = . 166; the corresponding correlation from Kane et al. ([29]) was numerically positive but near zero, <emph>r</emph>(180) = .01 [− .14,.16]. The correlation was not significant here in either the testing condition, <emph>r</emph>(97) =  − .10 [− .29,.10], <emph>p</emph> = 0.320, or the restudy condition, <emph>r</emph>(94) =  − .10 [− .29,.11], <emph>p</emph> = .360.</bibtext> </blist> <blist> <bibtext> The primary meta-analytic test statistic for between group comparisons is Hedge's <emph>g</emph> which is based on a similar formula as Cohen's <emph>d</emph>. In fact, these two statistics are nearly identical in sample sizes above 20 (Kline, [31]; Lakens, [33]).</bibtext> </blist> <blist> <bibtext> We acknowledge that one aspect of our design may have promoted <emph>larger</emph> testing effects on final memory, according to Adesope et al. ([1]): Our interpolated tests presented items using multiple formats (multiple-choice and free response) rather than just one format (<emph>g</emph> = .80 vs..70, respectively).</bibtext> </blist> <blist> <bibtext> We also acknowledge that other methodological differences between the present study and Pan et al. ([48]) might have affected the results, including the thought-probe type, video length, and average pretest performance. Moreover, our claim assumes that the Pan et al. control task (algebra problems) did not somehow <emph>suppress</emph> learning relative to other possible controls.</bibtext> </blist> </ref> <aug> <p>By Matthew S. Welhaf; Natalie E. Phillips; Bridget A. Smeekens; Akira Miyake and Michael J. Kane</p> <p>Reported by Author; Author; Author; Author; Author</p> </aug> <nolink nlid="nl1" bibid="bib24" firstref="ref1"></nolink> <nolink nlid="nl2" bibid="bib35" firstref="ref2"></nolink> <nolink nlid="nl3" bibid="bib46" firstref="ref3"></nolink> <nolink nlid="nl4" bibid="bib72" firstref="ref4"></nolink> <nolink nlid="nl5" bibid="bib75" firstref="ref5"></nolink> <nolink nlid="nl6" bibid="bib23" firstref="ref6"></nolink> <nolink nlid="nl7" bibid="bib29" firstref="ref7"></nolink> <nolink nlid="nl8" bibid="bib38" firstref="ref8"></nolink> <nolink nlid="nl9" bibid="bib41" firstref="ref9"></nolink> <nolink nlid="nl10" bibid="bib77" firstref="ref10"></nolink> <nolink nlid="nl11" bibid="bib80" firstref="ref11"></nolink> <nolink nlid="nl12" bibid="bib17" firstref="ref13"></nolink> <nolink nlid="nl13" bibid="bib27" firstref="ref15"></nolink> <nolink nlid="nl14" bibid="bib40" firstref="ref17"></nolink> <nolink nlid="nl15" bibid="bib53" firstref="ref18"></nolink> <nolink nlid="nl16" bibid="bib56" firstref="ref19"></nolink> <nolink nlid="nl17" bibid="bib58" firstref="ref20"></nolink> <nolink nlid="nl18" bibid="bib59" firstref="ref21"></nolink> <nolink nlid="nl19" bibid="bib66" firstref="ref22"></nolink> <nolink nlid="nl20" bibid="bib78" firstref="ref23"></nolink> <nolink nlid="nl21" bibid="bib14" firstref="ref24"></nolink> <nolink nlid="nl22" bibid="bib44" firstref="ref25"></nolink> <nolink nlid="nl23" bibid="bib13" firstref="ref26"></nolink> <nolink nlid="nl24" bibid="bib36" firstref="ref27"></nolink> <nolink nlid="nl25" bibid="bib79" firstref="ref33"></nolink> <nolink nlid="nl26" bibid="bib32" firstref="ref37"></nolink> <nolink nlid="nl27" bibid="bib43" firstref="ref38"></nolink> <nolink nlid="nl28" bibid="bib47" firstref="ref39"></nolink> <nolink nlid="nl29" bibid="bib60" firstref="ref40"></nolink> <nolink nlid="nl30" bibid="bib50" firstref="ref42"></nolink> <nolink nlid="nl31" bibid="bib84" firstref="ref43"></nolink> <nolink nlid="nl32" bibid="bib25" firstref="ref44"></nolink> <nolink nlid="nl33" bibid="bib48" firstref="ref45"></nolink> <nolink nlid="nl34" bibid="bib74" firstref="ref46"></nolink> <nolink nlid="nl35" bibid="bib28" firstref="ref50"></nolink> <nolink nlid="nl36" bibid="bib180" firstref="ref57"></nolink> <nolink nlid="nl37" bibid="bib52" firstref="ref59"></nolink> <nolink nlid="nl38" bibid="bib64" firstref="ref60"></nolink> <nolink nlid="nl39" bibid="bib49" firstref="ref65"></nolink> <nolink nlid="nl40" bibid="bib10" firstref="ref67"></nolink> <nolink nlid="nl41" bibid="bib37" firstref="ref68"></nolink> <nolink nlid="nl42" bibid="bib21" firstref="ref71"></nolink> <nolink nlid="nl43" bibid="bib51" firstref="ref72"></nolink> <nolink nlid="nl44" bibid="bib54" firstref="ref73"></nolink> <nolink nlid="nl45" bibid="bib16" firstref="ref75"></nolink> <nolink nlid="nl46" bibid="bib63" firstref="ref77"></nolink> <nolink nlid="nl47" bibid="bib57" firstref="ref89"></nolink> <nolink nlid="nl48" bibid="bib69" firstref="ref98"></nolink> <nolink nlid="nl49" bibid="bib15" firstref="ref105"></nolink> <nolink nlid="nl50" bibid="bib39" firstref="ref120"></nolink> <nolink nlid="nl51" bibid="bib55" firstref="ref131"></nolink> <nolink nlid="nl52" bibid="bib83" firstref="ref132"></nolink> <nolink nlid="nl53" bibid="bib71" firstref="ref133"></nolink> <nolink nlid="nl54" bibid="bib82" firstref="ref135"></nolink> <nolink nlid="nl55" bibid="bib30" firstref="ref136"></nolink> <nolink nlid="nl56" bibid="bib61" firstref="ref137"></nolink> <nolink nlid="nl57" bibid="bib65" firstref="ref138"></nolink> <nolink nlid="nl58" bibid="bib45" firstref="ref139"></nolink> <nolink nlid="nl59" bibid="bib42" firstref="ref151"></nolink> <nolink nlid="nl60" bibid="bib67" firstref="ref175"></nolink> <nolink nlid="nl61" bibid="bib76" firstref="ref186"></nolink> <nolink nlid="nl62" bibid="bib70" firstref="ref215"></nolink> <nolink nlid="nl63" bibid="bib18" firstref="ref220"></nolink> <nolink nlid="nl64" bibid="bib20" firstref="ref221"></nolink> <nolink nlid="nl65" bibid="bib22" firstref="ref222"></nolink> <nolink nlid="nl66" bibid="bib34" firstref="ref223"></nolink> <nolink nlid="nl67" bibid="bib11" firstref="ref226"></nolink> <nolink nlid="nl68" bibid="bib62" firstref="ref235"></nolink> <nolink nlid="nl69" bibid="bib73" firstref="ref250"></nolink> <nolink nlid="nl70" bibid="bib19" firstref="ref251"></nolink> <nolink nlid="nl71" bibid="bib68" firstref="ref255"></nolink> <nolink nlid="nl72" bibid="bib12" firstref="ref256"></nolink> <nolink nlid="nl73" bibid="bib26" firstref="ref260"></nolink> <nolink nlid="nl74" bibid="bib190" firstref="ref283"></nolink> <nolink nlid="nl75" bibid="bib97" firstref="ref295"></nolink>
Header DbId: eric
DbLabel: ERIC
An: EJ1331359
AccessLevel: 3
PubType: Academic Journal
PubTypeId: academicJournal
PreciseRelevancyScore: 0
IllustrationInfo
Items – Name: Title
  Label: Title
  Group: Ti
  Data: Interpolated Testing and Content Pretesting as Interventions to Reduce Task-Unrelated Thoughts during a Video Lecture
– Name: Language
  Label: Language
  Group: Lang
  Data: English
– Name: Author
  Label: Authors
  Group: Au
  Data: <searchLink fieldCode="AR" term="%22Welhaf%2C+Matthew+S%2E%22">Welhaf, Matthew S.</searchLink><br /><searchLink fieldCode="AR" term="%22Phillips%2C+Natalie+E%2E%22">Phillips, Natalie E.</searchLink><br /><searchLink fieldCode="AR" term="%22Smeekens%2C+Bridget+A%2E%22">Smeekens, Bridget A.</searchLink><br /><searchLink fieldCode="AR" term="%22Miyake%2C+Akira%22">Miyake, Akira</searchLink><br /><searchLink fieldCode="AR" term="%22Kane%2C+Michael+J%2E%22">Kane, Michael J.</searchLink>
– Name: TitleSource
  Label: Source
  Group: Src
  Data: <searchLink fieldCode="SO" term="%22Cognitive+Research%3A+Principles+and+Implications%22"><i>Cognitive Research: Principles and Implications</i></searchLink>. 2022 7.
– Name: Avail
  Label: Availability
  Group: Avail
  Data: Springer. Available from: Springer Nature. One New York Plaza, Suite 4600, New York, NY 10004. Tel: 800-777-4643; Tel: 212-460-1500; Fax: 212-460-1700; e-mail: customerservice@springernature.com; Web site: https://link.springer.com/
– Name: PeerReviewed
  Label: Peer Reviewed
  Group: SrcInfo
  Data: Y
– Name: Pages
  Label: Page Count
  Group: Src
  Data: 22
– Name: DatePubCY
  Label: Publication Date
  Group: Date
  Data: 2022
– Name: SourceSuprt
  Label: Sponsoring Agency
  Group: SrcSuprt
  Data: National Science Foundation (NSF)
– Name: NumberContract
  Label: Contract Number
  Group: NumCntrct
  Data: DRL1252333<br />DRL1252385
– Name: TypeDocument
  Label: Document Type
  Group: TypDoc
  Data: Journal Articles<br />Reports - Research
– Name: Audience
  Label: Education Level
  Group: Audnce
  Data: <searchLink fieldCode="EL" term="%22Higher+Education%22">Higher Education</searchLink><br /><searchLink fieldCode="EL" term="%22Postsecondary+Education%22">Postsecondary Education</searchLink>
– Name: Subject
  Label: Descriptors
  Group: Su
  Data: <searchLink fieldCode="DE" term="%22Testing%22">Testing</searchLink><br /><searchLink fieldCode="DE" term="%22Pretesting%22">Pretesting</searchLink><br /><searchLink fieldCode="DE" term="%22Attention+Control%22">Attention Control</searchLink><br /><searchLink fieldCode="DE" term="%22Undergraduate+Students%22">Undergraduate Students</searchLink><br /><searchLink fieldCode="DE" term="%22Lecture+Method%22">Lecture Method</searchLink><br /><searchLink fieldCode="DE" term="%22Video+Technology%22">Video Technology</searchLink><br /><searchLink fieldCode="DE" term="%22Introductory+Courses%22">Introductory Courses</searchLink><br /><searchLink fieldCode="DE" term="%22Statistics+Education%22">Statistics Education</searchLink><br /><searchLink fieldCode="DE" term="%22Situated+Learning%22">Situated Learning</searchLink><br /><searchLink fieldCode="DE" term="%22Student+Interests%22">Student Interests</searchLink><br /><searchLink fieldCode="DE" term="%22Effect+Size%22">Effect Size</searchLink>
– Name: DOI
  Label: DOI
  Group: ID
  Data: 10.1186/s41235-022-00372-y
– Name: ISSN
  Label: ISSN
  Group: ISSN
  Data: 2365-7464
– Name: Abstract
  Label: Abstract
  Group: Ab
  Data: Considerable research has examined the prevalence and apparent consequences of task-unrelated thoughts (TUTs) in both laboratory and authentic educational settings. Few studies, however, have explored methods to reduce TUTs during learning; those few studies tested small samples or used unvalidated TUT assessments. The present experimental study attempted to conceptually replicate or extend previous findings of interpolated testing and pretesting effects on TUT and learning. In a study of 195 U.S. undergraduates, we investigated whether interpolated testing (compared to interpolated restudy) and pretesting on lecture-relevant materials (compared to pretesting on conceptually related but lecture-irrelevant materials) would reduce TUTs during a video lecture on introductory statistics. Subjects completed either a content-matched or content-mismatched pretest on statistics concepts and then watched a narrated lecture slideshow. During the lecture, half of the sample completed interpolated tests on the lecture material and half completed interpolated restudy of that material. All subjects responded to unpredictably presented thought probes during the video to assess their immediately preceding thoughts, including TUTs. Following the lecture, students reported on their situational interest elicited by the lecture and then completed a posttest. Interpolated testing significantly reduced TUT rates during the lecture compared to restudying, conceptually replicating previous findings--but with a small effect size and no supporting Bayes-factor evidence. We found statistical evidence for neither an interpolated testing effect on learning, nor an effect of matched-content pretesting on TUT rates or learning. Interpolated testing might have limited utility to support students' attention, but varying effect sizes across studies warrants further work.
– Name: AbstractInfo
  Label: Abstractor
  Group: Ab
  Data: As Provided
– Name: Note
  Label: Notes
  Group: Note
  Data: https://osf.io/6ujsg
– Name: DateEntry
  Label: Entry Date
  Group: Date
  Data: 2022
– Name: AN
  Label: Accession Number
  Group: ID
  Data: EJ1331359
PLink https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=eric&AN=EJ1331359
RecordInfo BibRecord:
  BibEntity:
    Identifiers:
      – Type: doi
        Value: 10.1186/s41235-022-00372-y
    Languages:
      – Text: English
    PhysicalDescription:
      Pagination:
        PageCount: 22
    Subjects:
      – SubjectFull: Testing
        Type: general
      – SubjectFull: Pretesting
        Type: general
      – SubjectFull: Attention Control
        Type: general
      – SubjectFull: Undergraduate Students
        Type: general
      – SubjectFull: Lecture Method
        Type: general
      – SubjectFull: Video Technology
        Type: general
      – SubjectFull: Introductory Courses
        Type: general
      – SubjectFull: Statistics Education
        Type: general
      – SubjectFull: Situated Learning
        Type: general
      – SubjectFull: Student Interests
        Type: general
      – SubjectFull: Effect Size
        Type: general
    Titles:
      – TitleFull: Interpolated Testing and Content Pretesting as Interventions to Reduce Task-Unrelated Thoughts during a Video Lecture
        Type: main
  BibRelationships:
    HasContributorRelationships:
      – PersonEntity:
          Name:
            NameFull: Welhaf, Matthew S.
      – PersonEntity:
          Name:
            NameFull: Phillips, Natalie E.
      – PersonEntity:
          Name:
            NameFull: Smeekens, Bridget A.
      – PersonEntity:
          Name:
            NameFull: Miyake, Akira
      – PersonEntity:
          Name:
            NameFull: Kane, Michael J.
    IsPartOfRelationships:
      – BibEntity:
          Dates:
            – D: 01
              M: 01
              Type: published
              Y: 2022
          Identifiers:
            – Type: issn-electronic
              Value: 2365-7464
          Numbering:
            – Type: volume
              Value: 7
          Titles:
            – TitleFull: Cognitive Research: Principles and Implications
              Type: main
ResultId 1