View in EDS HTML Full Text PDF Full Text

Applications and Modeling of Keystroke Logs in Writing Assessments

Saved in:

Bibliographic Details
Title:	Applications and Modeling of Keystroke Logs in Writing Assessments
Language:	English
Authors:	Mo Zhang (ORCID 0000-0003-2689-2089), Paul Deane, Andrew Hoang, Hongwen Guo (ORCID 0000-0002-1751-0918), Chen Li
Source:	Educational Measurement: Issues and Practice. 2025 44(2):5-19.
Availability:	Wiley. Available from: John Wiley & Sons, Inc. 111 River Street, Hoboken, NJ 07030. Tel: 800-835-6770; e-mail: cs-journals@wiley.com; Web site: https://www.wiley.com/en-us
Peer Reviewed:	Y
Page Count:	15
Publication Date:	2025
Document Type:	Journal Articles Reports - Research
Descriptors:	Writing Tests, Computer Assisted Testing, Keyboarding (Data Entry), Writing Processes, Individual Differences, Individual Characteristics, Context Effect, Artificial Intelligence, Models
DOI:	10.1111/emip.12668
ISSN:	0731-1745 1745-3992
Abstract:	In this paper, we describe two empirical studies that demonstrate the application and modeling of keystroke logs in writing assessments. We illustrate two different approaches of modeling differences in writing processes: analysis of mean differences in handcrafted theory-driven features and use of large language models to identify stable personal characteristics. In the first study, we examined the effects of test environment on writing characteristics: at-home versus in-center, using features extracted from keystroke logs. In a second study, we explored ways to measure stable personal characteristics and traits. As opposed to feature engineering that can be difficult to scale, raw keystroke logs were used as input in the second study, and large language models were developed to infer latent relations in the data. Implications, limitations, and future research directions are also discussed.
Abstractor:	As Provided
Entry Date:	2025
Accession Number:	EJ1472029
Database:	ERIC
Full text is not displayed to guests. Login for full access.

FullText	Links: – Type: pdflink Url: https://content.ebscohost.com/cds/retrieve?content=AQICAHj0k_4E0hTGH8RJwT4gCJyBsGNe_WN95AvKlDbXJGqwxwFUeSzeUGMUcdIKqVLEO2c9AAAA4jCB3wYJKoZIhvcNAQcGoIHRMIHOAgEAMIHIBgkqhkiG9w0BBwEwHgYJYIZIAWUDBAEuMBEEDFeU1l6YAXwHd99KHgIBEICBmm0PjwFHbF-kYFtR_EIIjy5q4TNDiY_ckP9JHhhQO2AEepqt_hgk_mFMO8py1xjJGrekTDPKtfgw7puiP7-r6wx1Gkt2jUI--g5v4ukv0ItgJh1Tf30hwmsgWOspDUS-ThcD3Fp7IhUC7Bf4Mx8prVJkp4vzlYEDw3FMmYZtMNQKlTnaAOuMKZwIty1dVYnE5o8uUxJhQKXqB-Q= Text: Availability: 1 Value: <anid>AN0185399426;ems01jun.25;2025May27.05:43;v2.2.500</anid> <title id="AN0185399426-1">Applications and Modeling of Keystroke Logs in Writing Assessments </title> <p>In this paper, we describe two empirical studies that demonstrate the application and modeling of keystroke logs in writing assessments. We illustrate two different approaches of modeling differences in writing processes: analysis of mean differences in handcrafted theory‐driven features and use of large language models to identify stable personal characteristics. In the first study, we examined the effects of test environment on writing characteristics: at‐home versus in‐center, using features extracted from keystroke logs. In a second study, we explored ways to measure stable personal characteristics and traits. As opposed to feature engineering that can be difficult to scale, raw keystroke logs were used as input in the second study, and large language models were developed to infer latent relations in the data. Implications, limitations, and future research directions are also discussed.</p> <p>Keywords: feature engineering; keystroke logs; large language model; writing assessment</p> <p>Theorists have long recognized the importance of the cognitive processes that underlie writing (Bereiter &amp; Scardamalia, [<reflink idref="bib7" id="ref1">7</reflink>]; Emig, [<reflink idref="bib23" id="ref2">23</reflink>]; Stallard, [<reflink idref="bib59" id="ref3">59</reflink>]). Skilled writing requires skillful coordination of multiple such processes (Hayes, [<reflink idref="bib31" id="ref4">31</reflink>])—idea generation; sentence generation; handwriting, typing, or keyboarding; and monitoring and evaluating what one has already written (Berninger, [<reflink idref="bib8" id="ref5">8</reflink>], [<reflink idref="bib9" id="ref6">9</reflink>]; Flower &amp; Hayes, [<reflink idref="bib24" id="ref7">24</reflink>]; Kellog, [<reflink idref="bib37" id="ref8">37</reflink>]; McCutchen, [<reflink idref="bib46" id="ref9">46</reflink>]). However, as Stallard ([<reflink idref="bib59" id="ref10">59</reflink>]) observed, many of these processes are hard to observe when writing takes place using paper and pen and requires careful instrumentation and record keeping. When writing takes place in a digital environment, it becomes possible to capture detailed records of the writing process automatically through the use of keystroke logs. Every interaction between the writer and the text can be automatically recorded, and the ways that writers choose to use their time can be more readily quantified. This is why keystroke log analysis has recently attracted significant attention in the fields of educational measurement and writing research. A well‐designed keystroke logging system can accurately capture individual keystrokes and other changes made to the text, along with timestamps that capture the tempo of text production (Leijten &amp; van Waes, [<reflink idref="bib40" id="ref11">40</reflink>]; Vandermeulen, Leijten, &amp; Van Waes, [<reflink idref="bib62" id="ref12">62</reflink>]). This makes it possible to accurately reconstruct what the student text looked like at every point in the writing process and, with additional analysis, to extract valuable information about the cognitive processes that take place during composition.</p> <p>However, this kind of low‐level data needs to be analyzed to support useful inferences about writing processes. It matters, for example, where someone pauses and for how long. Long pauses at certain locations—such as paragraph boundaries—tend to be associated with planning content and generating ideas, whereas long pauses at other locations may be associated with difficulties in typing and spelling. The complexity of keystroke logs reflects, in part, the fact that the way people write is shaped by multiple causal factors: task requirements, environmental factors, personal characteristics and skills, and interactions among each of these factors. In many contexts—for instance, in a large‐scale assessment—it is important to disentangle person, task, and environmental effects. For instance, one may want to consider the impact of testing conditions on test validity or use personal characteristics to improve test security. In large‐scale assessments, in particular, it is important to measure latent traits of the person and not treat characteristics of the task or the testing environment as measuring what the test‐taker knows and can do.</p> <p>This paper explores two such issues. One issue is the effect of the environment (Study 1) on writing performance and characteristics: whether the test is administered at home, where candidates can use their own devices, or whether the test is administered in a test center, where testing conditions are standardized. The other issue concerns methods to measure personal characteristics (Study 2): traits that are stable within a person (at least in the large‐scale testing environment). Specifically in the second study, we attempted to understand individual differences between people by asking whether we could predict that a second essay was composed by the same person as the first essay. Personal characteristics can be critical for test security, for instance, as a way to detect impostors who have been paid to take the test (or some part of the test if the test is delivered on a computer) in the test‐taker's stead. Of course, such issues may be intertwined. Features that appear to be stable measures of a person or skill in one context (for instance, in a test center) may not be stable measures in another. Many recent publications have used hand‐engineered theory‐driven features to identify significant keystroke features that, for example, predict performance and are meaningful for understanding performance differences between different demographic subgroups (e.g., Bennett, Zhang, &amp; Sinharay, [<reflink idref="bib6" id="ref13">6</reflink>]; Guo, Zhang, Deane, &amp; Bennett, [<reflink idref="bib28" id="ref14">28</reflink>]; Zhang &amp; Sinharay, [<reflink idref="bib70" id="ref15">70</reflink>]). However, this situation can be difficult to scale, especially if it is not clear which features are meaningful under which writing context (varying by task specification, writer population, and delivery platforms, among other factors). An alternative is to use neural or deep learning models (including language models) to infer latent statistical relations in the data, which we explored in Study 2. Transformers may be appropriate when the data involves complex interactions and nonlinear relationships between features. As the model can directly take the entire raw log as input, the step of construct‐driven feature engineering, or feature extraction, potentially can be skipped.</p> <p>We are thus specifically concerned with two research questions:</p> <p></p> <ulist> <item> 1. Do people write differently at home versus in testing centers as evidenced by information collected from keystroke logs? (Study 1)</item> <p></p> <item> 2. Can we distinguish writing behaviors of the same person versus between different people? How accurately can we classify whether a second essay is written by the same person versus by different people? (Study 2)</item> </ulist> <p>In the context of these two questions, we illustrate two different modes of modeling differences in writing processes: (a) analysis of mean differences in summary features extracted from keystroke logs in Study 1 and (b) use of deep learning models to identify stable personal characteristics in Study 2. The two studies demonstrate different approaches to treat and analyze keystroke log data in writing research.</p> <p>The remainder of the paper is organized as follows. "A Brief Description of Keystroke Logging" section describes what a keystroke log entails, examples of feature engineering, and relevant literature. Then we describe the two studies, respectively, including each study's problem statement, participants, analysis and modeling methods, and results. Finally, in "Discussion" section, we provide an overall discussion and address implications, limitations, and directions of future research.</p> <hd id="AN0185399426-2">A Brief Description of Keystroke Logging</hd> <p>A well‐designed keystroke logging system can accurately capture each key press and changes made to the text as well as individual time stamps (Leijten &amp; van Waes, [<reflink idref="bib40" id="ref16">40</reflink>]). In addition to InputLog (Leijten &amp; van Waes, [<reflink idref="bib40" id="ref17">40</reflink>]), other keystroke logging systems that have been used for education purposes include WritingMaetriX (Kusanagi, Abe, Fukuta, &amp; Kawaguchi, [<reflink idref="bib39" id="ref18">39</reflink>]) and RUI‐Recording User Input (Morgan, Cheng, Pike, &amp; Ritter, [<reflink idref="bib50" id="ref19">50</reflink>]). While we focus on writing assessment in the current study, it is worth noting that the application of keystroke logging capability is widely used in academic and industrial fields, which not only includes writing research in education, but also computer programming education (Edwards, Leinonen, Birthare, Zavgorodniaia, &amp; Hellas, [<reflink idref="bib22" id="ref20">22</reflink>]; Shrestha, Leinonen, Hellas, Ihantola, &amp; Edwards, [<reflink idref="bib56" id="ref21">56</reflink>]) and information security (Mondal, [<reflink idref="bib49" id="ref22">49</reflink>]), among others. Table 1 gives an example of a keystroke log. In the example, the writer typed "November 1st." During the typing process, the writer edited the date twice due to misspelling. This editing behavior and how long it took are represented in the keystroke log.</p> <p>1 Table An Example of Keystroke Log</p> <p> <ephtml> &lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Index&lt;/th&gt;&lt;th align="center"&gt;Character&lt;/th&gt;&lt;th align="center"&gt;Action&lt;/th&gt;&lt;th align="center"&gt;GapTime&lt;/th&gt;&lt;th align="center"&gt;CursorPosition&lt;/th&gt;&lt;th align="center"&gt;PositionChange&lt;/th&gt;&lt;th align="center"&gt;TextToDate&lt;/th&gt;&lt;/tr&gt;&lt;tr&gt;&lt;th /&gt;&lt;th&gt;(&lt;italic&gt;c&lt;/italic&gt;)&lt;/th&gt;&lt;th&gt;(&lt;italic&gt;a&lt;/italic&gt;)&lt;/th&gt;&lt;th&gt;(&lt;italic&gt;t&lt;/italic&gt;)&lt;/th&gt;&lt;th&gt;(&lt;italic&gt;p&lt;/italic&gt;)&lt;/th&gt;&lt;th /&gt;&lt;th /&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;0&lt;/td&gt;&lt;td&gt;N&lt;/td&gt;&lt;td&gt;Insert&lt;/td&gt;&lt;td&gt;0.015&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;N&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;o&lt;/td&gt;&lt;td&gt;Insert&lt;/td&gt;&lt;td&gt;0.021&lt;/td&gt;&lt;td&gt;2&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;No&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;2&lt;/td&gt;&lt;td&gt;v&lt;/td&gt;&lt;td&gt;Insert&lt;/td&gt;&lt;td&gt;0.024&lt;/td&gt;&lt;td&gt;3&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;Nov&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;3&lt;/td&gt;&lt;td&gt;e&lt;/td&gt;&lt;td&gt;Insert&lt;/td&gt;&lt;td&gt;0.016&lt;/td&gt;&lt;td&gt;4&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;Nove&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;4&lt;/td&gt;&lt;td&gt;n&lt;/td&gt;&lt;td&gt;Insert&lt;/td&gt;&lt;td&gt;0.016&lt;/td&gt;&lt;td&gt;5&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;Noven&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;5&lt;/td&gt;&lt;td&gt;n&lt;/td&gt;&lt;td&gt;Delete&lt;/td&gt;&lt;td&gt;0.033&lt;/td&gt;&lt;td&gt;4&lt;/td&gt;&lt;td&gt;&amp;#8722;1&lt;/td&gt;&lt;td&gt;Nove&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;6&lt;/td&gt;&lt;td&gt;m&lt;/td&gt;&lt;td&gt;Insert&lt;/td&gt;&lt;td&gt;0.012&lt;/td&gt;&lt;td&gt;5&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;Novem&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;7&lt;/td&gt;&lt;td&gt;b&lt;/td&gt;&lt;td&gt;Insert&lt;/td&gt;&lt;td&gt;0.032&lt;/td&gt;&lt;td&gt;6&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;Novemb&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;8&lt;/td&gt;&lt;td&gt;e&lt;/td&gt;&lt;td&gt;Insert&lt;/td&gt;&lt;td&gt;0.049&lt;/td&gt;&lt;td&gt;7&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;Novembe&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;9&lt;/td&gt;&lt;td&gt;r&lt;/td&gt;&lt;td&gt;Insert&lt;/td&gt;&lt;td&gt;0.010&lt;/td&gt;&lt;td&gt;8&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;November&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;10&lt;/td&gt;&lt;td&gt;(space)&lt;/td&gt;&lt;td&gt;Insert&lt;/td&gt;&lt;td&gt;0.013&lt;/td&gt;&lt;td&gt;9&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;November&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;11&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;Insert&lt;/td&gt;&lt;td&gt;0.067&lt;/td&gt;&lt;td&gt;10&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;November 1&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;12&lt;/td&gt;&lt;td&gt;d&lt;/td&gt;&lt;td&gt;Insert&lt;/td&gt;&lt;td&gt;0.028&lt;/td&gt;&lt;td&gt;11&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;November 1d&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;13&lt;/td&gt;&lt;td&gt;d&lt;/td&gt;&lt;td&gt;Delete&lt;/td&gt;&lt;td&gt;0.040&lt;/td&gt;&lt;td&gt;10&lt;/td&gt;&lt;td&gt;&amp;#8722;1&lt;/td&gt;&lt;td&gt;November 1&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;14&lt;/td&gt;&lt;td&gt;s&lt;/td&gt;&lt;td&gt;Insert&lt;/td&gt;&lt;td&gt;0.022&lt;/td&gt;&lt;td&gt;11&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;November 1s&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;15&lt;/td&gt;&lt;td&gt;t&lt;/td&gt;&lt;td&gt;Insert&lt;/td&gt;&lt;td&gt;0.012&lt;/td&gt;&lt;td&gt;12&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;November 1st&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;16&lt;/td&gt;&lt;td&gt;.&lt;/td&gt;&lt;td&gt;Insert&lt;/td&gt;&lt;td&gt;0.016&lt;/td&gt;&lt;td&gt;13&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;November 1st.&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt; </ephtml> </p> <p>1 <emph>Note</emph>. "GapTime" is also known as interkey interval, which refers to the pause time (in seconds) between two adjacent key presses. "CursorPosition" refers to the absolute position of the cursor in the item response box. It is inferred based on the location of which keystroke action happens and can be triggered by using a mouse or using the keyboard (e.g., arrow keys). "PositionChange" is the difference between the current position and the previous position of the cursor, which in theory can take on any value.</p> <p>With keystroke logging, all the interactions between the writer and the text are captured. A well‐designed keystroke logging system for writing research would include four types of information: the type of an action, the length of the action, the location of the action, and the associated time stamp for when the action occurred. Additional information may be tracked by the system, such as cursor movements (triggered by the use of either the mouse or the keyboard); use of editing tools, spell checker, or thesaurus; access to external sources; use of formatting tools; etc. This kind of information can be combined in many ways, such as:</p> <p></p> <ulist> <item> Type of action (<emph>What</emph>): pause, insert, delete, paste, cut, replace, line break/enter, etc.</item> <p></p> <item> Length of action (<emph>How long</emph>): with regard to the number of characters, words, or tokens, or time duration, etc.</item> <p></p> <item> Location of action (<emph>Where</emph>): inside of a word, between words or tokens, between sentences, between lines or paragraphs, etc.</item> <p></p> <item> Time‐point of action (<emph>When</emph>): after jumping away from current cursor position, at the start of a writing session, before final submission, etc.</item> </ulist> <p>Combining these kinds of data enable researchers or practitioners to address substantive and meaningful questions about writing processes and the causal factors that affect them.</p> <p>From the keystroke logs, theoretically motivated aspects of the writing process, such as those suggested by Hayes ([<reflink idref="bib31" id="ref23">31</reflink>]) and Berninger ([<reflink idref="bib8" id="ref24">8</reflink>]) (e.g., translation, transcription, reviewing, and monitoring), can be potentially operationalized and measured via feature engineering. Table 2 provides some examples of specific features that can be extracted and calculated from the keystroke logs. The features listed in Table 2 are a subset of the features used in Study 1. Feature engineering, on a high level, involves theoretical analysis of the construct to be measured and the development of computer programs to compute features designed to measure the construct. Alternatively, as shown in Study 2, labor‐intensive feature engineering can be avoided by training a transformer encoder on raw keystroke sequence inputs, enabling it to generate meaningful latent representations of the logs (also known as representation learning). The resulting, pretrained model then can be used to perform various downstream tasks such as feature extraction, performance prediction, or text classification.</p> <p>2 Table Examples of Process Feature Engineering</p> <p> <ephtml> &lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Process Indicators/Construct&lt;/th&gt;&lt;th align="center"&gt;Feature Engineering&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;BetweenWordSpeed&lt;/td&gt;&lt;td&gt;Median rate of between&amp;#8208;word whitespace keystrokes (in characters per seconds)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td /&gt;&lt;td&gt;Median duration of the between&amp;#8208;word append keystroke intervals (in logged ms)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Large&amp;#8208;burst fluency&lt;/td&gt;&lt;td&gt;Mean append&amp;#8208;only burst length when burst is defined using 8,000 ms as threshold for concluding pause length (in character)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td /&gt;&lt;td&gt;Mean of all&amp;#8208;action burst length where burst boundaries are defined as eight standard deviations above the median interkey pause time (in character)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Productivity&lt;/td&gt;&lt;td&gt;Total number of keystrokes&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td /&gt;&lt;td&gt;Total number of between&amp;#8208;sentence punctuation mark keystrokes&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Deletion editing&lt;/td&gt;&lt;td&gt;Total time spent on cut events&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td /&gt;&lt;td&gt;Maximum keystroke efficiency: number of characters divided by number of keystrokes per word&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Jump editing&lt;/td&gt;&lt;td&gt;Total time spent on jump&amp;#8208;to&amp;#8208;edit&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td /&gt;&lt;td&gt;Mean jumped distance across jump&amp;#8208;to&amp;#8208;edit within the same word (in character)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Paraphrasing&lt;/td&gt;&lt;td&gt;Total number of line&amp;#8208;break keystrokes&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td /&gt;&lt;td&gt;Max duration of line&amp;#8208;break keystrokes (in logged ms)&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt; </ephtml> </p> <hd id="AN0185399426-3">Previous Studies of Keystroke Logging in Writing Assessment</hd> <p>Prior studies have investigated keystroke log analysis in an assessment context, including analyzing the predictive power of process features with respect to writing performance, examining relations between keyboarding and composition fluency, exploring longitudinal patterns of writing processes, and analyzing subgroup differences, task effects, and repeater identification. Writing time and the number of keystrokes reflected general writing fluency and effort and were therefore related to writing quality. For example, Zhang, Zou, Wu, Deane, and Li ([<reflink idref="bib71" id="ref25">71</reflink>]) found that, in certain timed writing assessments, a shorter pause at the start of the log is associated with stronger writing, possibly because it indicates an adequate understanding of task requirements, greater familiarity with the writing topic, or more efficient task planning. In addition, the length of <emph>bursts</emph> of text production (where a burst is a sequence of rapid uninterrupted typing events) is a predictor of writing quality frequently discussed in the psycholinguistic literature. The mean and standard deviation of burst length have been shown to predict essay quality in timed‐writing assessments. Several studies (e.g., Almond, Deane, Quinlan, Wagner, &amp; Sydorenko, [<reflink idref="bib2" id="ref26">2</reflink>]) used linear regression to predict essay scores from process features extracted from keystroke logs. For example, Zhang and Deane ([<reflink idref="bib69" id="ref27">69</reflink>]) performed a linear regression of essay scores on (a) only process features extracted from the keystroke logs similar to those in Table 2 and (b) a combination of process <emph>and</emph> product features where product features were extracted from the final written submissions such as proportion of grammatical errors as a function of total number of words. They found that process features accounted for a small amount of significant variance above and beyond that accounted for by product features. Sinharay, Zhang, and Deane ([<reflink idref="bib57" id="ref28">57</reflink>]) used data mining methods to predict essay scores from both process and product features. Among the most useful features are low‐level timing measures, such as interkey intervals (the latency between successive keystrokes) and intra‐ and inter‐ word intervals (the latency between successive keystrokes within or between words). These types of pauses are likely to reflect a mixture of processes related to basic keyboarding skills and composition capabilities. Several studies have modeled interkey intervals or intra‐ or interword latencies using heavy‐tailed distributions (Almond et al., [<reflink idref="bib2" id="ref29">2</reflink>]; Guo, Deane, van Rijn, Zhang, &amp; Bennett, [<reflink idref="bib27" id="ref30">27</reflink>]). Almond et al. ([<reflink idref="bib2" id="ref31">2</reflink>]) used a mixture of lognormal distributions with five parameters describing the pause sequences to model the distribution of within and between word pause time intervals. However, the Almond et al. ([<reflink idref="bib2" id="ref32">2</reflink>]) study is based on a rather small sample (20–80 students, or writing samples, per test form) and the results obtained were inconclusive. To overcome these limitations, Guo et al. ([<reflink idref="bib27" id="ref33">27</reflink>]) used a substantially larger sample consisting of six writing prompts, each answered by a few hundred students. The authors estimated within and between word pause‐time intervals as heavy‐tailed probability distributions where the majority of the pause time between key presses are short, gradually tapering off with fewer occurrences as the pause times extend toward longer durations. The authors fitted both lognormal and stable distributions and determined that both density functions fit the data well and that the estimated parameters were robust across all writing prompts. The study by Guo et al. ([<reflink idref="bib27" id="ref34">27</reflink>]), in particular, provided the empirical evidence that motivated the use of logarithm transforms for much of the data used in Studies 1 and 2.</p> <p>The cited studies focused on using keystroke logs to understand writing processes and were linked to text quality or on building statistical models of features to support this goal. Their results were relevant to the current set of studies because they helped identify aspects of the writing process that were stable (or variable) across individuals, contexts, and tasks and therefore support the construction of statistical models of keystroke data. To our best knowledge, very little prior research has examined similar questions in a security context, with few exceptions such as Choi, Hao, Deane, and Zhang ([<reflink idref="bib13" id="ref35">13</reflink>]), Deane, Zhang, Hao, Li ([<reflink idref="bib19" id="ref36">19</reflink>]) and Jiang, Zhang, Hao, Deane, and Li ([<reflink idref="bib34" id="ref37">34</reflink>]). It is feasible that analytical tools developed for the study of writing performance can be repurposed to build a test security model (i.e., an impostor detection model). The two studies presented in this paper hence serve to expand the existing literature of writing research by investigating alternative methods or providing empirical evidence that can help potentially advance test security in writing assessment.</p> <hd id="AN0185399426-4">Study 1</hd> <p>In this section, we first present the problem statement in Study 1. We then describe the participants and data analyses that were conducted to answer Research Question 1. We end this section with presentations of the results for Study 1.</p> <hd id="AN0185399426-5">Problem Statement in Study 1</hd> <p>Many high‐stakes tests transitioned to remote delivery when Covid‐19 became prevalent in early 2020. While offering much greater flexibility in scheduling and location, remotely proctored (also termed at‐home) testing presents unique challenges. Researchers have noted multiple issues with remotely proctored assessments (Camara, [<reflink idref="bib11" id="ref38">11</reflink>]; Jiao &amp; Lissitz, [<reflink idref="bib35" id="ref39">35</reflink>]). One concern is whether test‐taking experiences are different in a remote test setting than in a test center. Spence et al. ([<reflink idref="bib58" id="ref40">58</reflink>]) reported that test‐takers experienced a moderate number of issues associated with remotely‐proctored tests, such as technical problems (e.g., camera or audio malfunction), delayed in getting help from proctoring staffs, and difficulties in understanding the proctor's instruction; however, despite the downsides reported by the test‐takers, the authors found that there was still a strong preference for remote proctoring as it helped decrease test‐taking anxiety. In many cases, at‐home test administrators cannot monitor or control the hardware and the testing environment as well as they do in testing centers (Weiner &amp; Hurtz, [<reflink idref="bib65" id="ref41">65</reflink>]). Based on log data collected from a high‐stakes assessment in an early stage of the implementation of remote (at‐home) test administration, Guo ([<reflink idref="bib26" id="ref42">26</reflink>]) found no testing mode effect that showed any practical significance. While Guo ([<reflink idref="bib26" id="ref43">26</reflink>]) focused on overall test‐taking behaviors in an assessment, Kim and Walker ([<reflink idref="bib38" id="ref44">38</reflink>]) analyzed test scores and found no administration mode effects for the three tests examined in their investigation. This paper is concerned with mode effects in writing assessment, specifically in essay writing. Becker, Liu, and Jones ([<reflink idref="bib4" id="ref45">4</reflink>]) conducted a study that focused on test security breaches in the two testing modes and found that remotely proctored tests showed higher levels of test collusion than test‐center administered tests. While prior keystroke log research investigated many aspects of the writing process, it did not examine the effects of differences in the test‐taking environment. In Study 1, we showed how handcrafted features extracted from keystroke logs could provide valuable information about the effects of test environment on writing assessment.</p> <hd id="AN0185399426-6">Participants in Study 1</hd> <p>We used a data set drawn from a standardized assessment that included five subtests: reading, mathematics, social studies, science, and writing. All subtests were scored on a 0 to 20 point scale. The writing subtest contained an essay writing section, which was the focus of this analysis. All essays were scored by two human raters on a scale from 1 to 6 and the interrater reliability was high (quadratically weighted kappa of 0.711). Keystroke logs were recorded for the essay writing task. There were a total of 17,609 candidates associated with valid essay responses and logs. Within the sample, 3,747 test‐takers completed the assessment at home with proctoring and 13,862 completed it in a test center. The demographic background distribution of the test‐taker sample is given in Table 3.</p> <p>3 Table Demographic Background Distribution of the Original and Matched Samples</p> <p> <ephtml> &lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Background&lt;/th&gt;&lt;th align="center"&gt;Subgroup&lt;/th&gt;&lt;th align="center"&gt;At&amp;#8208;Home&lt;/th&gt;&lt;th align="center"&gt;In&amp;#8208;Center&lt;/th&gt;&lt;/tr&gt;&lt;tr&gt;&lt;th /&gt;&lt;th /&gt;&lt;th /&gt;&lt;th align="center"&gt;Before Matching&lt;/th&gt;&lt;th align="center"&gt;After Matching&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;Gender&lt;/td&gt;&lt;td&gt;Female&lt;/td&gt;&lt;td&gt;1,536 (50.1%)&lt;/td&gt;&lt;td&gt;5,184 (44.45%)&lt;/td&gt;&lt;td&gt;1,513 (49.35%)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td /&gt;&lt;td&gt;Male&lt;/td&gt;&lt;td&gt;1,530 (49.9%)&lt;/td&gt;&lt;td&gt;6,479 (55.55%)&lt;/td&gt;&lt;td&gt;1,553 (50.65%)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Ethnicity&lt;/td&gt;&lt;td&gt;Hispanic&lt;/td&gt;&lt;td&gt;815 (26.58%)&lt;/td&gt;&lt;td&gt;1,569 (13.45%)&lt;/td&gt;&lt;td&gt;787 (25.67%)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td /&gt;&lt;td&gt;White&lt;/td&gt;&lt;td&gt;1,252 (40.83%)&lt;/td&gt;&lt;td&gt;6,042 (51.80%)&lt;/td&gt;&lt;td&gt;1,286 (41.94%)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td /&gt;&lt;td&gt;African American&lt;/td&gt;&lt;td&gt;403 (13.14%)&lt;/td&gt;&lt;td&gt;1,482 (12.71%)&lt;/td&gt;&lt;td&gt;400 (13.05%)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td /&gt;&lt;td&gt;Others&lt;/td&gt;&lt;td&gt;596 (19.44%)&lt;/td&gt;&lt;td&gt;2,570 (22.04%)&lt;/td&gt;&lt;td&gt;593 (19.34%)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;English as best&lt;/td&gt;&lt;td&gt;Yes&lt;/td&gt;&lt;td&gt;2,776 (90.54%)&lt;/td&gt;&lt;td&gt;10,825 (92.81%)&lt;/td&gt;&lt;td&gt;2,778 (90.61%)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td /&gt;&lt;td&gt;No&lt;/td&gt;&lt;td&gt;275 (8.97%)&lt;/td&gt;&lt;td&gt;760 (6.52%)&lt;/td&gt;&lt;td&gt;270 (8.81%)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td /&gt;&lt;td&gt;Others&lt;/td&gt;&lt;td&gt;15 (0.49%)&lt;/td&gt;&lt;td&gt;78 (0.67%)&lt;/td&gt;&lt;td&gt;18 (0.59%)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Employment status&lt;/td&gt;&lt;td&gt;Full&amp;#8208;time&lt;/td&gt;&lt;td&gt;458 (14.94%)&lt;/td&gt;&lt;td&gt;1,605 (13.76%)&lt;/td&gt;&lt;td&gt;475 (15.49%)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td /&gt;&lt;td&gt;Part&amp;#8208;time&lt;/td&gt;&lt;td&gt;773 (25.21%)&lt;/td&gt;&lt;td&gt;1,852 (15.88%)&lt;/td&gt;&lt;td&gt;830 (27.07%)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td /&gt;&lt;td&gt;Seeking job&lt;/td&gt;&lt;td&gt;875 (28.54%)&lt;/td&gt;&lt;td&gt;2,252 (19.31%)&lt;/td&gt;&lt;td&gt;841 (27.43%)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td /&gt;&lt;td&gt;Others&lt;/td&gt;&lt;td&gt;960 (31.31%)&lt;/td&gt;&lt;td&gt;5,954 (51.05%)&lt;/td&gt;&lt;td&gt;920 (30.01%)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Education level&lt;/td&gt;&lt;td&gt;&amp;#60; 9th grade&lt;/td&gt;&lt;td&gt;150 (4.89%)&lt;/td&gt;&lt;td&gt;497 (4.26%)&lt;/td&gt;&lt;td&gt;137 (4.47%)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td /&gt;&lt;td&gt;&lt;p&gt;&lt;math display="inline" altimg="urn:x-wiley:07311745:media:emip12668:emip12668-math-0001" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics xmlns=""&gt;&lt;mo&gt;&amp;#62;&lt;/mo&gt;&lt;mo&gt;=&lt;/mo&gt;&lt;annotation encoding="application/x-tex"&gt;$&amp;#62;=$&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/p&gt; 9th grade&lt;/td&gt;&lt;td&gt;2,395 (78.11%)&lt;/td&gt;&lt;td&gt;6,359 (54.52%)&lt;/td&gt;&lt;td&gt;2,438 (79.52%)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td /&gt;&lt;td&gt;Others&lt;/td&gt;&lt;td&gt;521 (16.99%)&lt;/td&gt;&lt;td&gt;4,807 (41.22%)&lt;/td&gt;&lt;td&gt;491 (16.01%)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Total sample size&lt;/td&gt;&lt;td /&gt;&lt;td align="right"&gt;3,066&lt;/td&gt;&lt;td align="right"&gt;11,663&lt;/td&gt;&lt;td align="right"&gt;3,066&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt; </ephtml> </p> <p>2 <emph>Note</emph>. "English as best" refers to a dichotomous choice made by the participants about whether English was considered their best communicative language. "Education level" refers to the highest education level of the participants at the time when they took the test.</p> <hd id="AN0185399426-7">Data Analysis in Study 1</hd> <p>Upon an examination of the raw sample, we found that the demographic distributions of the at‐home and in‐center samples were different, as shown in Table 3. There were proportionally more female and Hispanic test‐takers and fewer White test‐takers who took the at‐home edition of the assessment, compared to other groups. For example, 50.1% of at‐home test‐takers were female, compared to 44.45% in‐center test‐takers who were female. Among at‐home test‐takers 26.58% were Hispanic and 40.83% were White compared to 13.45% Hispanic and 51.80% White test‐takers among in‐center test‐takers. Discrepancies were found in other background variables as well, such as best communicative language preference, employment status, and highest education level. Furthermore, similar to what was reported in Guo ([<reflink idref="bib26" id="ref46">26</reflink>]) and Kim and Walker ([<reflink idref="bib38" id="ref47">38</reflink>]), the performance level of candidates was also different between at‐home and in‐center modes, with at‐home test‐takers tendnig to have somewhat higher scores than in‐center test‐takers.</p> <p>For analysis in Study 1, we controlled for demographic and score differences between at‐home and in‐center test‐takers. Specifically, we applied propensity score matching (Austin, [<reflink idref="bib3" id="ref48">3</reflink>]) to obtain comparable samples of students who completed the task in the at‐home and in‐center modes. A related weighting method proposed by Haberman ([<reflink idref="bib29" id="ref49">29</reflink>]) was also applied and produced the same results. At‐home samples were used as the base; matching samples were selected from in‐center test‐takers. The variables used to match the samples were gender, ethnicity, whether English was a candidate's best communicative language, employment status, highest education level at the time of the testing, and subtest scores (i.e., scale scores in reading, mathematics, social study, science, and the multiple‐choice portion of the writing subtest). The resulting matched samples showed comparable performance level and demographic distributions (Table 3).</p> <p>Using the matched samples, we then performed two‐sample <emph>t</emph>‐tests to evaluate the statistical significant differences in the means on three writing process features extracted from keystroke logs, six composite process indicators that combined related features extracted from the keystroke logs, and two basic metrics of the final submission: text length in word counts (Len) and essay score. The three writing process features were the (a) total time spent on the writing task in seconds (TT), (b) in‐word speed of all words (LogIKI), and (c) typing speed on common words (KBS). Specifically, LogIKI was calculated as the logged median value across all pauses within words; KBS was a keyboarding speed measure calculated using the most common English words if and when they appeared during the writing process. The six composite indicators were generated by combining subsets of about 50 features. A subset of those features are described in Table 2. The composite process indicators were created by standardizing each feature score to a mean of 0 and a standard deviation of 1 and then summing the results to create a composite process indicator. Because these process indicators were not composed of equal numbers of features, they were on different scales and could not be directly compared to one another. Therefore we further standardized each composite process indicator to a mean of 0 and standard deviation of 1 to help with the interpretation of the results. Brief descriptions of each indicator follow (the numbers in the parentheses after each indicator's short label were the number of process features used in aggregating the composite indicator):</p> <p></p> <ulist> <item> <ephtml> &lt;math display="inline" altimg="urn:x-wiley:07311745:media:emip12668:emip12668-math-0002" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics&gt;&lt;mo&gt;&amp;#8728;&lt;/mo&gt;&lt;annotation encoding="application/x-tex"&gt;$ \circ$&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt; </ephtml> <emph>Between‐word speed</emph> (BWSpeed, 9 features): facility in transitioning to the next word, indicative of fluency of lexical retrieval, syntactic encoding, and typing facility.</item> <p></p> <item> <ephtml> &lt;math display="inline" altimg="urn:x-wiley:07311745:media:emip12668:emip12668-math-0003" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics&gt;&lt;mo&gt;&amp;#8728;&lt;/mo&gt;&lt;annotation encoding="application/x-tex"&gt;$ \circ$&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt; </ephtml> <emph>Large‐burst fluency</emph> (BstFlu, 10 features): automaticity for generating relatively long bursts of text without error, corrected or not.</item> <p></p> <item> <ephtml> &lt;math display="inline" altimg="urn:x-wiley:07311745:media:emip12668:emip12668-math-0004" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics&gt;&lt;mo&gt;&amp;#8728;&lt;/mo&gt;&lt;annotation encoding="application/x-tex"&gt;$ \circ$&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt; </ephtml> <emph>Productivity</emph> (Prod, 8 features): fluency and efficiency mainly based on the amount of text produced (though not necessarily retained in the final essay submission) during the text‐production process.</item> <p></p> <item> <ephtml> &lt;math display="inline" altimg="urn:x-wiley:07311745:media:emip12668:emip12668-math-0005" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics&gt;&lt;mo&gt;&amp;#8728;&lt;/mo&gt;&lt;annotation encoding="application/x-tex"&gt;$ \circ$&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt; </ephtml> <emph>Deletion editing</emph> (DelEdit, 9 features): features indicating the propensity to make many quick cuts, including at locations some distance from the current cursor position.</item> <p></p> <item> <ephtml> &lt;math display="inline" altimg="urn:x-wiley:07311745:media:emip12668:emip12668-math-0006" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics&gt;&lt;mo&gt;&amp;#8728;&lt;/mo&gt;&lt;annotation encoding="application/x-tex"&gt;$ \circ$&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt; </ephtml> <emph>Jump editing</emph> (JumpEit, 12 features): actions indicating text‐monitoring behavior at the word, phrase, sentence, and occasionally whole‐text level.</item> <p></p> <item> <ephtml> &lt;math display="inline" altimg="urn:x-wiley:07311745:media:emip12668:emip12668-math-0007" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics&gt;&lt;mo&gt;&amp;#8728;&lt;/mo&gt;&lt;annotation encoding="application/x-tex"&gt;$ \circ$&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt; </ephtml> <emph>Paraphrasing</emph> (Para, 4 features): pauses indicating extent of effort devoted to organization and to planning in between paragraphs.</item> </ulist> <p>In addition, to compare the two testing modes, we further computed Cohen's <ephtml> &lt;math display="inline" altimg="urn:x-wiley:07311745:media:emip12668:emip12668-math-0008" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics&gt;&lt;mi&gt;d&lt;/mi&gt;&lt;annotation encoding="application/x-tex"&gt;$d$&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt; </ephtml> statistic (also termed effect size) to measure the practical significance of any identified differences (Cohen, [<reflink idref="bib15" id="ref50">15</reflink>]): <ephtml> &lt;math display="block" altimg="urn:x-wiley:07311745:media:emip12668:emip12668-math-0009" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;d&lt;/mi&gt;&lt;mo linebreak="badbreak"&gt;=&lt;/mo&gt;&lt;mrow&gt;&lt;mo&gt;(&lt;/mo&gt;&lt;msub&gt;&lt;mi&gt;M&lt;/mi&gt;&lt;mn&gt;1&lt;/mn&gt;&lt;/msub&gt;&lt;mo linebreak="goodbreak"&gt;&amp;#8722;&lt;/mo&gt;&lt;msub&gt;&lt;mi&gt;M&lt;/mi&gt;&lt;mn&gt;2&lt;/mn&gt;&lt;/msub&gt;&lt;mo&gt;)&lt;/mo&gt;&lt;/mrow&gt;&lt;mfenced open="/" close=""&gt;&lt;msqrt&gt;&lt;mfrac&gt;&lt;mrow&gt;&lt;msubsup&gt;&lt;mi&gt;s&lt;/mi&gt;&lt;mn&gt;1&lt;/mn&gt;&lt;mn&gt;2&lt;/mn&gt;&lt;/msubsup&gt;&lt;mo&gt;+&lt;/mo&gt;&lt;msubsup&gt;&lt;mi&gt;s&lt;/mi&gt;&lt;mn&gt;2&lt;/mn&gt;&lt;mn&gt;2&lt;/mn&gt;&lt;/msubsup&gt;&lt;/mrow&gt;&lt;mn&gt;2&lt;/mn&gt;&lt;/mfrac&gt;&lt;/msqrt&gt;&lt;/mfenced&gt;&lt;mo&gt;,&lt;/mo&gt;&lt;/mrow&gt;&lt;annotation encoding="application/x-tex"&gt;$$\begin{equation} d = (M&amp;#95;1 - M&amp;#95;2)\left/\sqrt {\frac{{s&amp;#95;1^2 + s&amp;#95;2^2}}{2}}\right., \end{equation}$$&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt; </ephtml> where <ephtml> &lt;math display="inline" altimg="urn:x-wiley:07311745:media:emip12668:emip12668-math-0010" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics&gt;&lt;msub&gt;&lt;mi&gt;M&lt;/mi&gt;&lt;mn&gt;1&lt;/mn&gt;&lt;/msub&gt;&lt;annotation encoding="application/x-tex"&gt;$M&amp;#95;1$&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt; </ephtml> and <ephtml> &lt;math display="inline" altimg="urn:x-wiley:07311745:media:emip12668:emip12668-math-0011" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics&gt;&lt;msub&gt;&lt;mi&gt;s&lt;/mi&gt;&lt;mn&gt;1&lt;/mn&gt;&lt;/msub&gt;&lt;annotation encoding="application/x-tex"&gt;$s&amp;#95;1$&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt; </ephtml> represent the mean and standard deviation of the at‐home sample and <ephtml> &lt;math display="inline" altimg="urn:x-wiley:07311745:media:emip12668:emip12668-math-0012" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics&gt;&lt;msub&gt;&lt;mi&gt;M&lt;/mi&gt;&lt;mn&gt;2&lt;/mn&gt;&lt;/msub&gt;&lt;annotation encoding="application/x-tex"&gt;$M&amp;#95;2$&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt; </ephtml> and <ephtml> &lt;math display="inline" altimg="urn:x-wiley:07311745:media:emip12668:emip12668-math-0013" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics&gt;&lt;msub&gt;&lt;mi&gt;s&lt;/mi&gt;&lt;mn&gt;2&lt;/mn&gt;&lt;/msub&gt;&lt;annotation encoding="application/x-tex"&gt;$s&amp;#95;2$&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt; </ephtml> represent the mean and standard deviation of the in‐center sample. To evaluate and interpret the magnitude of effect sizes, we considered a commonly adopted approach in social science: small for an effect size of 0.2, moderate for an effect size of 0.5, and large for an effect size of 0.8. This evaluation guideline was adopted by previous studies of keystroke logging in writing research. We recognized, however, that the guideline was arbitrary and should be used with other evidence when interpreting the results in the current context. Finally, we also compared the correlation of essay score with those writing process features and composite process indicators between the two testing conditions. In Study 1, the feature engineering was done using Python and the statistical analyses were conducted using SAS.</p> <hd id="AN0185399426-8">Study 1 Results</hd> <p>The two‐sample <emph>t</emph> test results are given in Table 4. Compared to in‐center test‐takers, at‐home test‐takers tended, on average, to spend more time on the writing task, submitted longer responses, typed faster (as indicated by two speed measures, LogIKI and KBS), and received higher essay scores. The writing process indicators also revealed significant mean differences between at‐home and in‐center test‐takers—with the exception of large‐burst fluency, at‐home test‐takers demonstrated greater text generation fluency and did more text editing at the word, phrasal, and paragraph levels.</p> <p>4 Table Comparisons of At‐Home and In‐Center Essay Writing</p> <p> <ephtml> &lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Measure&lt;/th&gt;&lt;th align="center"&gt;Short Label&lt;/th&gt;&lt;th align="center"&gt;At&amp;#8208;Home Mean&lt;/th&gt;&lt;th align="center"&gt;In&amp;#8208;Center Mean&lt;/th&gt;&lt;th align="center"&gt;&lt;italic&gt;p&lt;/italic&gt; Value&lt;/th&gt;&lt;th align="center"&gt;Effect Size&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;Essay score (1 to 6)&lt;/td&gt;&lt;td align="left"&gt;&amp;#8212;&lt;/td&gt;&lt;td&gt;2.99&lt;/td&gt;&lt;td&gt;2.91&lt;/td&gt;&lt;td&gt;0.0006&lt;/td&gt;&lt;td&gt;0.09&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Time on task (in seconds)&lt;/td&gt;&lt;td&gt;TT&lt;/td&gt;&lt;td&gt;2440.4&lt;/td&gt;&lt;td&gt;2136.5&lt;/td&gt;&lt;td&gt;&amp;#60;0.0001&lt;/td&gt;&lt;td&gt;0.20&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Text length (in words)&lt;/td&gt;&lt;td&gt;Len&lt;/td&gt;&lt;td&gt;309.6&lt;/td&gt;&lt;td&gt;278.2&lt;/td&gt;&lt;td&gt;&amp;#60;0.0001&lt;/td&gt;&lt;td&gt;0.23&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;In&amp;#8208;word median LogIKI&lt;/td&gt;&lt;td&gt;LogIKI&lt;/td&gt;&lt;td&gt;5.31&lt;/td&gt;&lt;td&gt;5.37&lt;/td&gt;&lt;td&gt;&amp;#60;0.0001&lt;/td&gt;&lt;td&gt;&amp;#8722;0.18&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Keyboarding speed&lt;/td&gt;&lt;td&gt;KBS&lt;/td&gt;&lt;td&gt;231.44&lt;/td&gt;&lt;td&gt;242.86&lt;/td&gt;&lt;td&gt;&amp;#60;0.0001&lt;/td&gt;&lt;td&gt;&amp;#8722;0.11&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Between&amp;#8208;word speed&lt;/td&gt;&lt;td&gt;BWSpeed&lt;/td&gt;&lt;td&gt;0.6997&lt;/td&gt;&lt;td&gt;&amp;#8722;0.6997&lt;/td&gt;&lt;td&gt;&amp;#60;0.0001&lt;/td&gt;&lt;td&gt;0.19&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Large&amp;#8208;burst fluency&lt;/td&gt;&lt;td&gt;BstFlu&lt;/td&gt;&lt;td&gt;&amp;#8722;0.0186&lt;/td&gt;&lt;td&gt;0.0186&lt;/td&gt;&lt;td&gt;0.8660&lt;/td&gt;&lt;td&gt;0.00&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Productivity&lt;/td&gt;&lt;td&gt;Prod&lt;/td&gt;&lt;td&gt;0.5740&lt;/td&gt;&lt;td&gt;&amp;#8722;0.5740&lt;/td&gt;&lt;td&gt;&amp;#60;0.0001&lt;/td&gt;&lt;td&gt;0.28&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Deletion editing&lt;/td&gt;&lt;td&gt;DelEdit&lt;/td&gt;&lt;td&gt;0.2636&lt;/td&gt;&lt;td&gt;&amp;#8722;0.2636&lt;/td&gt;&lt;td&gt;&amp;#60;0.0001&lt;/td&gt;&lt;td&gt;0.15&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Jump editing&lt;/td&gt;&lt;td&gt;JumpEdit&lt;/td&gt;&lt;td&gt;0.2074&lt;/td&gt;&lt;td&gt;&amp;#8722;0.2074&lt;/td&gt;&lt;td&gt;&amp;#60;0.0001&lt;/td&gt;&lt;td&gt;0.13&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Paraphrasing&lt;/td&gt;&lt;td&gt;Para&lt;/td&gt;&lt;td&gt;0.1032&lt;/td&gt;&lt;td&gt;&amp;#8722;0.1032&lt;/td&gt;&lt;td&gt;&amp;#60;0.0001&lt;/td&gt;&lt;td&gt;0.12&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt; </ephtml> </p> <p>3 <emph>Note</emph>. At‐home and in‐center have the same sample sizes after matching. Because the raw indicators were on different scales, each indicator was scaled to a mean of 0 and standard deviation of 1.</p> <p>All but one of these mean differences were statistically significant due to the large sample size. However, the effect sizes were mostly on the negligible end with little practical significance. Only text length and productivity showed marginally notable effect sizes (above 0.20), both of which measured the amount of text produced. To clarify this finding, we evenly divided student responses into 10 equal‐percentile buckets based on text length. Figure 1 reveals that responses from at‐home test‐takers tended to be somewhat longer at every score level beyond 1. We suspect that this finding may be associated with the fact that test‐takers used their personal devices, which may have meant they were more familiar with the keyboards (including shortcut keys) and were able to type faster. Critically, the correlations between essay writing characteristics and essay scores appeared to be generally comparable between the at‐home and in‐center modes (Figure 2). Again, only text length and productivity showed significant discrepancies but with small effect sizes. It is interesting to note that the correlations were slightly higher in the at‐home sample, which will require investigation in future studies. One possible explanation is the interaction of process features or indicators with text length, which appeared to be slightly longer for higher scoring at‐home test‐takers. In conclusion, the magnitude of the differences in essay writing processes was relatively small between the two testing modes.</p> <p> <img src="https://imageserver.ebscohost.com/img/embimages/rdk/EMS/01jun25/emip12668-fig-0001.jpg?ephost1=dGJyMNXb4kSepq84yOvqOLCmsE6epq5Srqa4SK6WxWXS" alt="emip12668-fig-0001.jpg" title="1 Essay score against text length by mode." /> </p> <p></p> <p> <img src="https://imageserver.ebscohost.com/img/embimages/rdk/EMS/01jun25/emip12668-fig-0002.jpg?ephost1=dGJyMNXb4kSepq84yOvqOLCmsE6epq5Srqa4SK6WxWXS" alt="emip12668-fig-0002.jpg" title="2 Correlations of essay writing characteristics with essay score by mode." /> </p> <p></p> <hd id="AN0185399426-11">Study 2</hd> <p>In this section, we first describe the problem statement in Study 2, followed by describing the data and participants, how we developed the large language model to treat raw keystroke log sequences, the statistical analyses used to address Research Question 2, and the results.</p> <hd id="AN0185399426-12">Problem Statement in Study 2</hd> <p>It has been shown that how an individual types can be used for biometric purposes (Dowland &amp; Furnell, [<reflink idref="bib21" id="ref51">21</reflink>]; Sahu, Banavar, &amp; Schuckers, [<reflink idref="bib54" id="ref52">54</reflink>]). Although this method may not offer the same level of precision as other forms of biometric authentication such as fingerprints, signature dynamics, and voice (Walker, [<reflink idref="bib64" id="ref53">64</reflink>]), it remains immensely valuable in the realm of test security as keystroke data is readily available in tests entailing essay responses and is highly difficult to spoof. Researchers have experimented with hand‐engineered features for person identification in writing assessment context (e.g., Choi et al., [<reflink idref="bib13" id="ref54">13</reflink>]; Jiang et al., [<reflink idref="bib34" id="ref55">34</reflink>]). However, this approach can be difficult to scale if it is not clear which features are meaningful. An alternative is to use neural or deep learning models to infer latent statistical relation in the data. These models may be appropriate when the data involves complex interactions and nonlinear relationships.</p> <p>In this work, we explored the application of a modified transformer architecture to keystroke logs, enabling us to train a model capable of producing dense representations of keystroke streams. By leveraging the transformer's strengths and incorporating adaptations tailored to keystroke data, we sought to advance the state of the art in modeling keystroke logs without the need of feature engineering and demonstrated how keystroke logs can help distinguish pairs of essays written by (or writing behaviors of) the same person or different people. Practically speaking, skipping feature engineering brings significantly greater flexibility in applications.</p> <hd id="AN0185399426-13">Participants in Study 2</hd> <p>We obtained essays submitted by around 250,000 test‐takers that were written for a standardized writing assessment (different from Study 1). Similar to Study 1, the test, including the writing section, was delivered on a computer, which allowed individual keystrokes to be logged. The writing section of the test consisted of two tasks, which meant that we automatically obtained a pair of essays from each person (and those are considered "positive pairs" in the analysis—to be described in more details later). A small portion of the responses were scored by two human raters. Interrater reliability was strong (quadratically weighted kappas of 0.762 and 0.759, respectively, for the two writing tasks). The demographic composition of the sample was as follows: There were 121,122 females, accounting for 48.7% of the participants; 127,254 males, accounting for 51.2% of the participants; and 201 unreported. Only test‐takers who were US citizens (<emph>n</emph> = 70,364) were asked to report their ethnicity on a voluntary basis. Of those, 7,622 self‐identified as Asian or Asian American, 6,139 as Black or African American, 8,475 as Hispanic, 44,687 as White, and 657 as American Indian or Alaskan Native/Native Hawaiian or Pacific Islander. The remainder were unknown or unreported.</p> <hd id="AN0185399426-14">Modeling</hd> <p>Figure 3 gives an overview of the modeling architecture and various technical decisions made during analysis. We describe each step in subsections below, giving the high‐level overview of the approach. The modeling process started by defining the input variables from keystroke sequences ("Keystroke Sequence Input" section), followed by the creation of embeddings using transformers ("Encoder Model" and "Embedding" sections). We then applied contrastive learning to generate meaningful representations of the encoded keystroke logs ("Encoder and Vector Representation" section). As part of the contrastive learning process, we adopted a technique called "in‐batch negatives" (described in the "Similarity Measure and Loss Function" section), used the cosine similarity measure to define a loss function, and built a classifier to determine whether a second essay was written by the same person (positive pairs of essays) or by different people (negative pairs; described in the "Similarity Measure and Loss Function" section). We also describe the sampling procedure for model training, validation, and testing in the "Similarity Measure and Loss Function" section. Finally in the "Parameter Settings and Fine‐Tuning" section, we provide more details on setting the model parameters and fine‐tuning. All analyses in Study 2 were conducted using Python.</p> <p> <img src="https://imageserver.ebscohost.com/img/embimages/rdk/EMS/01jun25/emip12668-fig-0003.jpg?ephost1=dGJyMNXb4kSepq84yOvqOLCmsE6epq5Srqa4SK6WxWXS" alt="emip12668-fig-0003.jpg" title="3 Keystroke sequence modeling architecture and technical choices." /> </p> <p></p> <p>We aim to research methods that can effectively capture and represent complex keystroke behavior with minimal feature engineering; as such, we are motivated to focus on the domain of "representation learning" (Bengio, Courville, &amp; Vincent, [<reflink idref="bib5" id="ref56">5</reflink>]). Representation learning offers a powerful framework for training a model to discover features at multiple levels of abstraction in an unsupervised manner (i.e., not needing labeled data), and the resulting model can be utilized in various manners through transfer learning (Zhuang et al., [<reflink idref="bib72" id="ref57">72</reflink>]). In this work, we adopted a training scheme called "contrastive learning" (Hadsell, Chopra, &amp; LeCun, [<reflink idref="bib30" id="ref58">30</reflink>]), which has emerged as a popular method for generating high‐quality representations. The objective of contrastive learning is to maximize the similarity between positive pairs of samples while minimizing the similarity of negative pairs.</p> <hd id="AN0185399426-16">Encoder model</hd> <p>The training scheme is one piece of the puzzle; we still need the other key factor—the model, also known as the encoder. The job of the model is to convert the input sample to an embedding representation in the vector space, and it is the model that is trained to pull the similar instances together and push the dissimilar instances apart. In recent years, the transformer model (Vaswani et al., [<reflink idref="bib63" id="ref59">63</reflink>]) has gained prominence as a powerful and flexible encoder, exhibiting exceptional performance in various domains, including natural language processing (Brown et al., [<reflink idref="bib10" id="ref60">10</reflink>]; Liu et al., [<reflink idref="bib41" id="ref61">41</reflink>]), computer vision (Dosovitskiy et al., [<reflink idref="bib20" id="ref62">20</reflink>]), and multimodal tasks (Lu, Batra, Parikh, &amp; Lee, [<reflink idref="bib45" id="ref63">45</reflink>]; Tan &amp; Bansal, [<reflink idref="bib60" id="ref64">60</reflink>]). With its unique attention mechanism, scalability (Brown et al., [<reflink idref="bib10" id="ref65">10</reflink>]; Rae et al., [<reflink idref="bib52" id="ref66">52</reflink>]; Tay et al., [<reflink idref="bib61" id="ref67">61</reflink>]), and adaptability to different input data (Jaegle et al., [<reflink idref="bib33" id="ref68">33</reflink>]), the transformer model is an ideal candidate for our representation learning task.</p> <p>At a high level, the architecture consists of an embedding layer followed by multiple attention layers, each of which has a self‐attention mechanism and a positionwise feed‐forward network (FFN) to focus the model on different parts of the input, as seen in Figure 4. The embedding layer helps encode the input into a vector format to be operated on by the subsequent layers; the self‐attention layers and FFN layers transform and refine the input representations to capture increasingly abstract and high‐level features.</p> <p> <img src="https://imageserver.ebscohost.com/img/embimages/rdk/EMS/01jun25/emip12668-fig-0004.jpg?ephost1=dGJyMNXb4kSepq84yOvqOLCmsE6epq5Srqa4SK6WxWXS" alt="emip12668-fig-0004.jpg" title="4 The transformer encoder.Note. This figure is adapted based on Vaswani et al. ([63])." /> </p> <p></p> <p>In our modified transformer model, the self‐attention mechanism plays a crucial role in the model's ability to capture semantic and syntactic relationships within the input sequence; that is, the entire sequence of the keystroke log is processed in parallel. This mechanism operates by projecting each embedding into three vectors: a query vector <ephtml> &lt;math display="inline" altimg="urn:x-wiley:07311745:media:emip12668:emip12668-math-0014" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics&gt;&lt;mi&gt;Q&lt;/mi&gt;&lt;annotation encoding="application/x-tex"&gt;$Q$&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt; </ephtml> , a key vector <ephtml> &lt;math display="inline" altimg="urn:x-wiley:07311745:media:emip12668:emip12668-math-0015" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics&gt;&lt;mi&gt;K&lt;/mi&gt;&lt;annotation encoding="application/x-tex"&gt;$K$&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt; </ephtml> of dimension <ephtml> &lt;math display="inline" altimg="urn:x-wiley:07311745:media:emip12668:emip12668-math-0016" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics&gt;&lt;msub&gt;&lt;mi&gt;d&lt;/mi&gt;&lt;mi&gt;k&lt;/mi&gt;&lt;/msub&gt;&lt;annotation encoding="application/x-tex"&gt;$d&amp;#95;k$&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt; </ephtml> , and a value vector <ephtml> &lt;math display="inline" altimg="urn:x-wiley:07311745:media:emip12668:emip12668-math-0017" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics&gt;&lt;mi&gt;V&lt;/mi&gt;&lt;annotation encoding="application/x-tex"&gt;$V$&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt; </ephtml> of dimension <ephtml> &lt;math display="inline" altimg="urn:x-wiley:07311745:media:emip12668:emip12668-math-0018" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics&gt;&lt;msub&gt;&lt;mi&gt;d&lt;/mi&gt;&lt;mi&gt;v&lt;/mi&gt;&lt;/msub&gt;&lt;annotation encoding="application/x-tex"&gt;$d&amp;#95;v$&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt; </ephtml> . The dot product of <ephtml> &lt;math display="inline" altimg="urn:x-wiley:07311745:media:emip12668:emip12668-math-0019" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics&gt;&lt;mi&gt;Q&lt;/mi&gt;&lt;annotation encoding="application/x-tex"&gt;$Q$&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt; </ephtml> and <ephtml> &lt;math display="inline" altimg="urn:x-wiley:07311745:media:emip12668:emip12668-math-0020" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics&gt;&lt;mi&gt;K&lt;/mi&gt;&lt;annotation encoding="application/x-tex"&gt;$K$&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt; </ephtml> , normalized by <ephtml> &lt;math display="inline" altimg="urn:x-wiley:07311745:media:emip12668:emip12668-math-0021" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics&gt;&lt;msqrt&gt;&lt;msub&gt;&lt;mi&gt;d&lt;/mi&gt;&lt;mi&gt;k&lt;/mi&gt;&lt;/msub&gt;&lt;/msqrt&gt;&lt;annotation encoding="application/x-tex"&gt;$\sqrt {d&amp;#95;k}$&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt; </ephtml> , passes through a softmax function and produces the scaled attention score <ephtml> &lt;math display="inline" altimg="urn:x-wiley:07311745:media:emip12668:emip12668-math-0022" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics&gt;&lt;mi&gt;A&lt;/mi&gt;&lt;annotation encoding="application/x-tex"&gt;$A$&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt; </ephtml> for the embedding. That is, <ephtml> &lt;math display="block" altimg="urn:x-wiley:07311745:media:emip12668:emip12668-math-0023" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;A&lt;/mi&gt;&lt;mo linebreak="badbreak"&gt;=&lt;/mo&gt;&lt;mi&gt;Softmax&lt;/mi&gt;&lt;mfenced separators="" open="(" close=")"&gt;&lt;mfrac&gt;&lt;mrow&gt;&lt;mi&gt;Q&lt;/mi&gt;&lt;msup&gt;&lt;mi&gt;K&lt;/mi&gt;&lt;mi&gt;T&lt;/mi&gt;&lt;/msup&gt;&lt;/mrow&gt;&lt;msqrt&gt;&lt;msub&gt;&lt;mi&gt;d&lt;/mi&gt;&lt;mi&gt;k&lt;/mi&gt;&lt;/msub&gt;&lt;/msqrt&gt;&lt;/mfrac&gt;&lt;/mfenced&gt;&lt;mo&gt;.&lt;/mo&gt;&lt;/mrow&gt;&lt;annotation encoding="application/x-tex"&gt;$$\begin{equation} A = \textit{Softmax}\left(\frac{QK^{T}}{\sqrt {d&amp;#95;k}}\right). \end{equation}$$&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt; </ephtml></p> <p>The attention output <ephtml> &lt;math display="inline" altimg="urn:x-wiley:07311745:media:emip12668:emip12668-math-0024" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics&gt;&lt;mi&gt;Z&lt;/mi&gt;&lt;annotation encoding="application/x-tex"&gt;$ Z$&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt; </ephtml> is then obtained by multiplying the attention weights <ephtml> &lt;math display="inline" altimg="urn:x-wiley:07311745:media:emip12668:emip12668-math-0025" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics&gt;&lt;mi&gt;V&lt;/mi&gt;&lt;annotation encoding="application/x-tex"&gt;$V$&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt; </ephtml> and passing it through a linear layer, where attention weights are part of the encoder model and are learned in the training process: <ephtml> &lt;math display="block" altimg="urn:x-wiley:07311745:media:emip12668:emip12668-math-0026" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;Z&lt;/mi&gt;&lt;mo linebreak="badbreak"&gt;=&lt;/mo&gt;&lt;mi&gt;Linear&lt;/mi&gt;&lt;mo&gt;(&lt;/mo&gt;&lt;mi&gt;AV&lt;/mi&gt;&lt;mo&gt;)&lt;/mo&gt;&lt;mo&gt;.&lt;/mo&gt;&lt;/mrow&gt;&lt;annotation encoding="application/x-tex"&gt;$$\begin{equation} Z = \textit{Linear}(\textit{AV}). \end{equation}$$&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt; </ephtml></p> <p>The output then goes through a positionwise feed‐forward network, which is two linear layers with a ReLU activation (Vaswani et al., [<reflink idref="bib63" id="ref69">63</reflink>]) in between: <ephtml> &lt;math display="block" altimg="urn:x-wiley:07311745:media:emip12668:emip12668-math-0027" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;O&lt;/mi&gt;&lt;mo linebreak="badbreak"&gt;=&lt;/mo&gt;&lt;mi&gt;Linear&lt;/mi&gt;&lt;mo&gt;(&lt;/mo&gt;&lt;mi&gt;ReLU&lt;/mi&gt;&lt;mo&gt;(&lt;/mo&gt;&lt;mi&gt;Linear&lt;/mi&gt;&lt;mo&gt;(&lt;/mo&gt;&lt;mi&gt;Z&lt;/mi&gt;&lt;mo&gt;)&lt;/mo&gt;&lt;mo&gt;)&lt;/mo&gt;&lt;mo&gt;)&lt;/mo&gt;&lt;mo&gt;.&lt;/mo&gt;&lt;/mrow&gt;&lt;annotation encoding="application/x-tex"&gt;$$\begin{equation} O = \textit{Linear}(\textit{ReLU}(\textit{Linear}(Z))). \end{equation}$$&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt; </ephtml></p> <p>By employing multiple attention layers, the model can construct enriched and contextualized representations, effectively capturing character and temporal dependencies present in the sequence.</p> <hd id="AN0185399426-18">Keystroke sequence input</hd> <p>Unlike common use cases for transformer models, where the input sequence is represented by a sequence of words or subwords, each keystroke in the current study is treated as a tuple comprising four components: the character, the action, the gaptime, and the position, defined as follows:</p> <p></p> <ulist> <item> The character (<emph>c</emph>): the character that is being edited in the existing keystroke stream.</item> <p></p> <item> The action (<emph>a</emph>): the action of inserting or deleting some characters.</item> <p></p> <item> The gaptime (<emph>t</emph>): the time gap between two adjacent key presses.</item> <p></p> <item> The position (<emph>p</emph>): the position of the cursor where the edit is made.</item> </ulist> <p>The keystroke stream is represented by a sequence of these tuples.</p> <hd id="AN0185399426-19">Embedding</hd> <p>The embedding layer traditionally takes input as a sequence of tokens and then maps them to a sequence of continuous vectors; through this process the discrete inputs are converted to dense vectors that can be operated on by the attention layers. For our purpose, instead of the usual embedding layer composed of a token embedding and/or a position embedding, we customized an embedding design that took into account the additional information in a keystroke, such as the "action," since the character alone does not fully convey all information in a keystroke (e.g., a letter "e" might be an insertion to or deletion from the sequence). We constructed separate embedding layers for each component, except for timing information, which is linearly projected. For a keystroke sequence of length <ephtml> &lt;math display="inline" altimg="urn:x-wiley:07311745:media:emip12668:emip12668-math-0028" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics&gt;&lt;mi&gt;l&lt;/mi&gt;&lt;annotation encoding="application/x-tex"&gt;$l$&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt; </ephtml> , <ephtml> &lt;math display="block" altimg="urn:x-wiley:07311745:media:emip12668:emip12668-math-0029" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;S&lt;/mi&gt;&lt;mi&gt;e&lt;/mi&gt;&lt;mi&gt;q&lt;/mi&gt;&lt;mo linebreak="badbreak"&gt;=&lt;/mo&gt;&lt;mo&gt;[&lt;/mo&gt;&lt;msub&gt;&lt;mi&gt;x&lt;/mi&gt;&lt;mn&gt;1&lt;/mn&gt;&lt;/msub&gt;&lt;mo&gt;,&lt;/mo&gt;&lt;msub&gt;&lt;mi&gt;x&lt;/mi&gt;&lt;mn&gt;2&lt;/mn&gt;&lt;/msub&gt;&lt;mo&gt;,&lt;/mo&gt;&lt;mtext&gt;...&lt;/mtext&gt;&lt;mo&gt;,&lt;/mo&gt;&lt;msub&gt;&lt;mi&gt;x&lt;/mi&gt;&lt;mi&gt;i&lt;/mi&gt;&lt;/msub&gt;&lt;mo&gt;,&lt;/mo&gt;&lt;mtext&gt;...&lt;/mtext&gt;&lt;mo&gt;,&lt;/mo&gt;&lt;msub&gt;&lt;mi&gt;x&lt;/mi&gt;&lt;mi&gt;l&lt;/mi&gt;&lt;/msub&gt;&lt;mo&gt;]&lt;/mo&gt;&lt;mo&gt;,&lt;/mo&gt;&lt;/mrow&gt;&lt;annotation encoding="application/x-tex"&gt;$$\begin{equation} Seq = [x&amp;#95;{1}, x&amp;#95;{2}, \ldots, x&amp;#95;i, \ldots, x&amp;#95;{l}], \end{equation}$$&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt; </ephtml> where <ephtml> &lt;math display="block" altimg="urn:x-wiley:07311745:media:emip12668:emip12668-math-0030" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;msub&gt;&lt;mi&gt;x&lt;/mi&gt;&lt;mi&gt;i&lt;/mi&gt;&lt;/msub&gt;&lt;mo linebreak="badbreak"&gt;=&lt;/mo&gt;&lt;mrow&gt;&lt;mo&gt;(&lt;/mo&gt;&lt;msub&gt;&lt;mi&gt;c&lt;/mi&gt;&lt;mi&gt;i&lt;/mi&gt;&lt;/msub&gt;&lt;mo&gt;,&lt;/mo&gt;&lt;msub&gt;&lt;mi&gt;a&lt;/mi&gt;&lt;mi&gt;i&lt;/mi&gt;&lt;/msub&gt;&lt;mo&gt;,&lt;/mo&gt;&lt;msub&gt;&lt;mi&gt;p&lt;/mi&gt;&lt;mi&gt;i&lt;/mi&gt;&lt;/msub&gt;&lt;mo&gt;,&lt;/mo&gt;&lt;msub&gt;&lt;mi&gt;t&lt;/mi&gt;&lt;mi&gt;i&lt;/mi&gt;&lt;/msub&gt;&lt;mo&gt;)&lt;/mo&gt;&lt;/mrow&gt;&lt;mo&gt;.&lt;/mo&gt;&lt;/mrow&gt;&lt;annotation encoding="application/x-tex"&gt;$$\begin{equation} x&amp;#95;{i} = (c&amp;#95;{i}, a&amp;#95;{i}, p&amp;#95;{i}, t&amp;#95;{i}). \end{equation}$$&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt; </ephtml> Let <ephtml> &lt;math display="inline" altimg="urn:x-wiley:07311745:media:emip12668:emip12668-math-0031" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;msub&gt;&lt;mi&gt;c&lt;/mi&gt;&lt;mi&gt;i&lt;/mi&gt;&lt;/msub&gt;&lt;mo&gt;&amp;#8712;&lt;/mo&gt;&lt;msub&gt;&lt;mi&gt;V&lt;/mi&gt;&lt;mi&gt;c&lt;/mi&gt;&lt;/msub&gt;&lt;/mrow&gt;&lt;annotation encoding="application/x-tex"&gt;$c&amp;#95;{i} \in V&amp;#95;{c}$&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt; </ephtml> be the vocabulary of the input characters, <ephtml> &lt;math display="inline" altimg="urn:x-wiley:07311745:media:emip12668:emip12668-math-0032" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;msub&gt;&lt;mi&gt;a&lt;/mi&gt;&lt;mi&gt;i&lt;/mi&gt;&lt;/msub&gt;&lt;mo&gt;&amp;#8712;&lt;/mo&gt;&lt;msub&gt;&lt;mi&gt;V&lt;/mi&gt;&lt;mi&gt;a&lt;/mi&gt;&lt;/msub&gt;&lt;/mrow&gt;&lt;annotation encoding="application/x-tex"&gt;$ a&amp;#95;{i} \in V&amp;#95;{a}$&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt; </ephtml> be the set of vocabulary of the input actions, <ephtml> &lt;math display="inline" altimg="urn:x-wiley:07311745:media:emip12668:emip12668-math-0033" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics&gt;&lt;msub&gt;&lt;mi&gt;p&lt;/mi&gt;&lt;mi&gt;i&lt;/mi&gt;&lt;/msub&gt;&lt;annotation encoding="application/x-tex"&gt;$ p&amp;#95;{i}$&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt; </ephtml> the cursor position, and <ephtml> &lt;math display="inline" altimg="urn:x-wiley:07311745:media:emip12668:emip12668-math-0034" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics&gt;&lt;msub&gt;&lt;mi&gt;t&lt;/mi&gt;&lt;mi&gt;i&lt;/mi&gt;&lt;/msub&gt;&lt;annotation encoding="application/x-tex"&gt;$ t&amp;#95;{i}$&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt; </ephtml> the time gap in milliseconds from when keystroke <ephtml> &lt;math display="inline" altimg="urn:x-wiley:07311745:media:emip12668:emip12668-math-0035" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;i&lt;/mi&gt;&lt;mo&gt;&amp;#8722;&lt;/mo&gt;&lt;mn&gt;1&lt;/mn&gt;&lt;/mrow&gt;&lt;annotation encoding="application/x-tex"&gt;$i-1$&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt; </ephtml> was pressed. See Table 4 for examples of <emph>c</emph>, <emph>a</emph>, <emph>p</emph>, and <emph>t</emph>. The keystroke sequence vector representations are produced as follows: <ephtml> &lt;math display="block" altimg="urn:x-wiley:07311745:media:emip12668:emip12668-math-0036" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;msub&gt;&lt;mi&gt;H&lt;/mi&gt;&lt;mi&gt;c&lt;/mi&gt;&lt;/msub&gt;&lt;mo linebreak="badbreak"&gt;=&lt;/mo&gt;&lt;mi&gt;E&lt;/mi&gt;&lt;mi&gt;m&lt;/mi&gt;&lt;mi&gt;b&lt;/mi&gt;&lt;mi&gt;e&lt;/mi&gt;&lt;msub&gt;&lt;mi&gt;d&lt;/mi&gt;&lt;mi&gt;c&lt;/mi&gt;&lt;/msub&gt;&lt;mrow&gt;&lt;mo&gt;(&lt;/mo&gt;&lt;msub&gt;&lt;mi&gt;X&lt;/mi&gt;&lt;mi&gt;c&lt;/mi&gt;&lt;/msub&gt;&lt;mo&gt;)&lt;/mo&gt;&lt;/mrow&gt;&lt;mo&gt;,&lt;/mo&gt;&lt;/mrow&gt;&lt;annotation encoding="application/x-tex"&gt;$$\begin{equation} H&amp;#95;{c} = Embed&amp;#95;{c}(X&amp;#95;{c}), \end{equation}$$&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt; </ephtml><ephtml> &lt;math display="block" altimg="urn:x-wiley:07311745:media:emip12668:emip12668-math-0037" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;msub&gt;&lt;mi&gt;H&lt;/mi&gt;&lt;mi&gt;a&lt;/mi&gt;&lt;/msub&gt;&lt;mo linebreak="badbreak"&gt;=&lt;/mo&gt;&lt;mi&gt;E&lt;/mi&gt;&lt;mi&gt;m&lt;/mi&gt;&lt;mi&gt;b&lt;/mi&gt;&lt;mi&gt;e&lt;/mi&gt;&lt;msub&gt;&lt;mi&gt;d&lt;/mi&gt;&lt;mi&gt;a&lt;/mi&gt;&lt;/msub&gt;&lt;mrow&gt;&lt;mo&gt;(&lt;/mo&gt;&lt;msub&gt;&lt;mi&gt;X&lt;/mi&gt;&lt;mi&gt;a&lt;/mi&gt;&lt;/msub&gt;&lt;mo&gt;)&lt;/mo&gt;&lt;/mrow&gt;&lt;mo&gt;,&lt;/mo&gt;&lt;/mrow&gt;&lt;annotation encoding="application/x-tex"&gt;$$\begin{equation} H&amp;#95;{a} = Embed&amp;#95;{a}(X&amp;#95;{a}), \end{equation}$$&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt; </ephtml><ephtml> &lt;math display="block" altimg="urn:x-wiley:07311745:media:emip12668:emip12668-math-0038" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;msub&gt;&lt;mi&gt;H&lt;/mi&gt;&lt;mi&gt;p&lt;/mi&gt;&lt;/msub&gt;&lt;mo linebreak="badbreak"&gt;=&lt;/mo&gt;&lt;mi&gt;E&lt;/mi&gt;&lt;mi&gt;m&lt;/mi&gt;&lt;mi&gt;b&lt;/mi&gt;&lt;mi&gt;e&lt;/mi&gt;&lt;msub&gt;&lt;mi&gt;d&lt;/mi&gt;&lt;mi&gt;p&lt;/mi&gt;&lt;/msub&gt;&lt;mrow&gt;&lt;mo&gt;(&lt;/mo&gt;&lt;msub&gt;&lt;mi&gt;X&lt;/mi&gt;&lt;mi&gt;p&lt;/mi&gt;&lt;/msub&gt;&lt;mo&gt;)&lt;/mo&gt;&lt;/mrow&gt;&lt;mo&gt;,&lt;/mo&gt;&lt;/mrow&gt;&lt;annotation encoding="application/x-tex"&gt;$$\begin{equation} H&amp;#95;{p} = Embed&amp;#95;{p}(X&amp;#95;{p}), \end{equation}$$&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt; </ephtml><ephtml> &lt;math display="block" altimg="urn:x-wiley:07311745:media:emip12668:emip12668-math-0039" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;msub&gt;&lt;mi&gt;H&lt;/mi&gt;&lt;mi&gt;t&lt;/mi&gt;&lt;/msub&gt;&lt;mo linebreak="badbreak"&gt;=&lt;/mo&gt;&lt;mi&gt;f&lt;/mi&gt;&lt;mrow&gt;&lt;mo&gt;(&lt;/mo&gt;&lt;msub&gt;&lt;mi&gt;X&lt;/mi&gt;&lt;mi&gt;t&lt;/mi&gt;&lt;/msub&gt;&lt;mo&gt;)&lt;/mo&gt;&lt;/mrow&gt;&lt;mo&gt;,&lt;/mo&gt;&lt;/mrow&gt;&lt;annotation encoding="application/x-tex"&gt;$$\begin{equation} H&amp;#95;{t} = f(X&amp;#95;{t}), \end{equation}$$&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt; </ephtml> where <ephtml> &lt;math display="inline" altimg="urn:x-wiley:07311745:media:emip12668:emip12668-math-0040" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;E&lt;/mi&gt;&lt;mi&gt;m&lt;/mi&gt;&lt;mi&gt;b&lt;/mi&gt;&lt;mi&gt;e&lt;/mi&gt;&lt;mi&gt;d&lt;/mi&gt;&lt;mo&gt;(&lt;/mo&gt;&lt;mo&gt;)&lt;/mo&gt;&lt;/mrow&gt;&lt;annotation encoding="application/x-tex"&gt;$ Embed()$&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt; </ephtml> is an embedding layer that maps the input in dense vectors, and <ephtml> &lt;math display="inline" altimg="urn:x-wiley:07311745:media:emip12668:emip12668-math-0041" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;f&lt;/mi&gt;&lt;mo&gt;(&lt;/mo&gt;&lt;mo&gt;)&lt;/mo&gt;&lt;/mrow&gt;&lt;annotation encoding="application/x-tex"&gt;$ f()$&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt; </ephtml> can be arbitrary transformations on the timing feature. <ephtml> &lt;math display="block" altimg="urn:x-wiley:07311745:media:emip12668:emip12668-math-0042" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;H&lt;/mi&gt;&lt;mo linebreak="badbreak"&gt;=&lt;/mo&gt;&lt;mi&gt;C&lt;/mi&gt;&lt;mi&gt;o&lt;/mi&gt;&lt;mi&gt;n&lt;/mi&gt;&lt;mi&gt;c&lt;/mi&gt;&lt;mi&gt;a&lt;/mi&gt;&lt;mi&gt;t&lt;/mi&gt;&lt;mo&gt;(&lt;/mo&gt;&lt;mrow&gt;&lt;mo&gt;[&lt;/mo&gt;&lt;msub&gt;&lt;mi&gt;H&lt;/mi&gt;&lt;mi&gt;c&lt;/mi&gt;&lt;/msub&gt;&lt;mo&gt;,&lt;/mo&gt;&lt;msub&gt;&lt;mi&gt;H&lt;/mi&gt;&lt;mi&gt;a&lt;/mi&gt;&lt;/msub&gt;&lt;mo&gt;,&lt;/mo&gt;&lt;msub&gt;&lt;mi&gt;H&lt;/mi&gt;&lt;mi&gt;p&lt;/mi&gt;&lt;/msub&gt;&lt;mo&gt;,&lt;/mo&gt;&lt;msub&gt;&lt;mi&gt;H&lt;/mi&gt;&lt;mi&gt;t&lt;/mi&gt;&lt;/msub&gt;&lt;mo&gt;]&lt;/mo&gt;&lt;/mrow&gt;&lt;mo&gt;)&lt;/mo&gt;&lt;mo&gt;.&lt;/mo&gt;&lt;/mrow&gt;&lt;annotation encoding="application/x-tex"&gt;$$\begin{equation} H = Concat([H&amp;#95;{c}, H&amp;#95;{a}, H&amp;#95;{p}, H&amp;#95;{t}]). \end{equation}$$&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt; </ephtml> And the representation <ephtml> &lt;math display="inline" altimg="urn:x-wiley:07311745:media:emip12668:emip12668-math-0043" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;H&lt;/mi&gt;&lt;mo&gt;&amp;#8712;&lt;/mo&gt;&lt;msup&gt;&lt;mi&gt;R&lt;/mi&gt;&lt;mrow&gt;&lt;mi&gt;L&lt;/mi&gt;&lt;mo&gt;&amp;#215;&lt;/mo&gt;&lt;mi&gt;d&lt;/mi&gt;&lt;/mrow&gt;&lt;/msup&gt;&lt;/mrow&gt;&lt;annotation encoding="application/x-tex"&gt;$ H \in {R}^{L \times d}$&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt; </ephtml> is passed to a stack of transformer encoders. This process is visualized in Figure 5.</p> <p> <img src="https://imageserver.ebscohost.com/img/embimages/rdk/EMS/01jun25/emip12668-fig-0005.jpg?ephost1=dGJyMNXb4kSepq84yOvqOLCmsE6epq5Srqa4SK6WxWXS" alt="emip12668-fig-0005.jpg" title="5 Design of custom embedding layer." /> </p> <p></p> <hd id="AN0185399426-21">Encoder and vector representation</hd> <p>The encoder used in this study was a stack of attention layers with the BigBird sparse attention mechanism (Zaheer et al., [<reflink idref="bib67" id="ref70">67</reflink>]). This architectural choice was motivated by needing to train on keystroke sequences, which could be much longer than the essays themselves. Upon encoding the keystroke sequences using the transformer model, we applied mean pooling to obtain the corresponding representation vectors resulting from constrastive learning. To generate representations of keystroke sequences, we applied contrastive learning techniques available in the SimCSE package (Gao, Yao, &amp; Chen, [<reflink idref="bib25" id="ref71">25</reflink>]). In the realm of representation learning, various training schemes such as contrastive learning, triplet network (Hoffer &amp; Ailon, [<reflink idref="bib32" id="ref72">32</reflink>]), and Siamese network (Chopra, Hadsell, &amp; LeCun, [<reflink idref="bib14" id="ref73">14</reflink>]) have been explored to acquire meaningful representations.</p> <hd id="AN0185399426-22">Similarity measure and loss function</hd> <p>Specific to representation learning, the model typically requires positive and negative pairs for effective training. In our context, a positive pair refers to a pair of essays written by the same person, and a negative pair refers to a pair of essays written by different people. The absence of predefined triplets, each comprising an anchor and positive and negative samples, requires generating artificial negative pairs, leading to potential overfitting. However, augmenting the training data set with an exhaustive list of negative pairs can rapidly escalate the data set, posing significant computational challenges. To tackle these issues, we leveraged a technique called "in‐batch negatives"—a sampling trick during model training that significantly enhanced the number of sample pairs the model saw. With in‐batch negatives, each batch of input was collated in such a way that every sample within the batch served as a negative pair for other samples. This approach efficiently generated a diverse set of negative samples without incurring additional storage costs. Furthermore the learning process was enhanced through giving the model more negative pairs to learn from.</p> <p>As a reminder, each test‐taker completed two essays. Figure 6 illustrates what in‐batch negatives entail. Essentially, each test‐taker in a batch has one positive pair, which is the pair of essays written by the test‐taker, and a number of negative pairs, which is one essay written by the test‐taker and the second essay written by another person. In the Figure 6 example, "Essay A1" refers to first essay written by person A; "Essay A2" refers to a second essay written by person A; and so on.</p> <p> <img src="https://imageserver.ebscohost.com/img/embimages/rdk/EMS/01jun25/emip12668-fig-0006.jpg?ephost1=dGJyMNXb4kSepq84yOvqOLCmsE6epq5Srqa4SK6WxWXS" alt="emip12668-fig-0006.jpg" title="6 Illustration of in‐batch negatives." /> </p> <p></p> <p>Subsequently, we computed the dot product between the input and negative or positive candidate vectors resulting in a matrix of similarity scores. These scores were then optimized using a cross‐entropy loss function. We defined the loss function for a pair of input sequences <ephtml> &lt;math display="inline" altimg="urn:x-wiley:07311745:media:emip12668:emip12668-math-0044" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics&gt;&lt;mover accent="true"&gt;&lt;msub&gt;&lt;mi&gt;x&lt;/mi&gt;&lt;mi&gt;a&lt;/mi&gt;&lt;/msub&gt;&lt;mo&gt;&amp;#770;&lt;/mo&gt;&lt;/mover&gt;&lt;annotation encoding="application/x-tex"&gt;$ \hat{x&amp;#95;{a}}$&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt; </ephtml> and <ephtml> &lt;math display="inline" altimg="urn:x-wiley:07311745:media:emip12668:emip12668-math-0045" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics&gt;&lt;mover accent="true"&gt;&lt;msub&gt;&lt;mi&gt;x&lt;/mi&gt;&lt;mi&gt;i&lt;/mi&gt;&lt;/msub&gt;&lt;mo&gt;&amp;#770;&lt;/mo&gt;&lt;/mover&gt;&lt;annotation encoding="application/x-tex"&gt;$ \hat{x&amp;#95;{i}}$&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt; </ephtml> as follows: An encoder takes an input sequence and produces a vector representation, which is then normalized following Gao et al. ([<reflink idref="bib25" id="ref74">25</reflink>]).</p> <p>For each batch, the loss function <ephtml> &lt;math display="inline" altimg="urn:x-wiley:07311745:media:emip12668:emip12668-math-0046" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics&gt;&lt;msub&gt;&lt;mi&gt;L&lt;/mi&gt;&lt;mrow&gt;&lt;mi&gt;i&lt;/mi&gt;&lt;mo&gt;%5f&lt;/mo&gt;&lt;mrow&gt;&lt;mi&gt;s&lt;/mi&gt;&lt;mi&gt;a&lt;/mi&gt;&lt;mi&gt;m&lt;/mi&gt;&lt;mi&gt;p&lt;/mi&gt;&lt;mi&gt;l&lt;/mi&gt;&lt;mi&gt;e&lt;/mi&gt;&lt;/mrow&gt;&lt;/mrow&gt;&lt;/msub&gt;&lt;annotation encoding="application/x-tex"&gt;$ L&amp;#95;{i\&amp;#95;{sample}}$&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt; </ephtml> is defined as: <ephtml> &lt;math display="block" altimg="urn:x-wiley:07311745:media:emip12668:emip12668-math-0047" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;msub&gt;&lt;mi&gt;L&lt;/mi&gt;&lt;msub&gt;&lt;mi&gt;i&lt;/mi&gt;&lt;mrow&gt;&lt;mi&gt;s&lt;/mi&gt;&lt;mi&gt;a&lt;/mi&gt;&lt;mi&gt;m&lt;/mi&gt;&lt;mi&gt;p&lt;/mi&gt;&lt;mi&gt;l&lt;/mi&gt;&lt;mi&gt;e&lt;/mi&gt;&lt;/mrow&gt;&lt;/msub&gt;&lt;/msub&gt;&lt;mo linebreak="badbreak"&gt;=&lt;/mo&gt;&lt;mo&gt;&amp;#8722;&lt;/mo&gt;&lt;mi&gt;l&lt;/mi&gt;&lt;mi&gt;o&lt;/mi&gt;&lt;mi&gt;g&lt;/mi&gt;&lt;mfrac&gt;&lt;msup&gt;&lt;mi&gt;e&lt;/mi&gt;&lt;mrow&gt;&lt;mi&gt;d&lt;/mi&gt;&lt;mi&gt;o&lt;/mi&gt;&lt;mi&gt;t&lt;/mi&gt;&lt;mo&gt;(&lt;/mo&gt;&lt;mover accent="true"&gt;&lt;msub&gt;&lt;mi&gt;h&lt;/mi&gt;&lt;mi&gt;a&lt;/mi&gt;&lt;/msub&gt;&lt;mo&gt;&amp;#770;&lt;/mo&gt;&lt;/mover&gt;&lt;mo&gt;,&lt;/mo&gt;&lt;mover accent="true"&gt;&lt;msub&gt;&lt;mi&gt;h&lt;/mi&gt;&lt;mi&gt;i&lt;/mi&gt;&lt;/msub&gt;&lt;mo&gt;&amp;#770;&lt;/mo&gt;&lt;/mover&gt;&lt;mo&gt;)&lt;/mo&gt;&lt;/mrow&gt;&lt;/msup&gt;&lt;mrow&gt;&lt;msubsup&gt;&lt;mo&gt;&amp;#8721;&lt;/mo&gt;&lt;mrow&gt;&lt;mi&gt;i&lt;/mi&gt;&lt;mo&gt;&amp;#95;&lt;/mo&gt;&lt;mrow&gt;&lt;mi&gt;s&lt;/mi&gt;&lt;mi&gt;a&lt;/mi&gt;&lt;mi&gt;m&lt;/mi&gt;&lt;mi&gt;p&lt;/mi&gt;&lt;mi&gt;l&lt;/mi&gt;&lt;mi&gt;e&lt;/mi&gt;&lt;/mrow&gt;&lt;mo&gt;=&lt;/mo&gt;&lt;mn&gt;1&lt;/mn&gt;&lt;/mrow&gt;&lt;mi&gt;N&lt;/mi&gt;&lt;/msubsup&gt;&lt;msup&gt;&lt;mi&gt;e&lt;/mi&gt;&lt;mrow&gt;&lt;mi&gt;d&lt;/mi&gt;&lt;mi&gt;o&lt;/mi&gt;&lt;mi&gt;t&lt;/mi&gt;&lt;mo&gt;(&lt;/mo&gt;&lt;mover accent="true"&gt;&lt;msub&gt;&lt;mi&gt;h&lt;/mi&gt;&lt;mi&gt;a&lt;/mi&gt;&lt;/msub&gt;&lt;mo&gt;&amp;#770;&lt;/mo&gt;&lt;/mover&gt;&lt;mo&gt;,&lt;/mo&gt;&lt;msub&gt;&lt;mover accent="true"&gt;&lt;mi&gt;h&lt;/mi&gt;&lt;mo&gt;&amp;#770;&lt;/mo&gt;&lt;/mover&gt;&lt;mrow&gt;&lt;mi&gt;i&lt;/mi&gt;&lt;mo&gt;&amp;#95;&lt;/mo&gt;&lt;mrow&gt;&lt;mi&gt;s&lt;/mi&gt;&lt;mi&gt;a&lt;/mi&gt;&lt;mi&gt;m&lt;/mi&gt;&lt;mi&gt;p&lt;/mi&gt;&lt;mi&gt;l&lt;/mi&gt;&lt;mi&gt;e&lt;/mi&gt;&lt;/mrow&gt;&lt;/mrow&gt;&lt;/msub&gt;&lt;mo&gt;)&lt;/mo&gt;&lt;/mrow&gt;&lt;/msup&gt;&lt;/mrow&gt;&lt;/mfrac&gt;&lt;mo&gt;,&lt;/mo&gt;&lt;/mrow&gt;&lt;annotation encoding="application/x-tex"&gt;$$\begin{equation} L&amp;#95;{i&amp;#95;{sample}} = -log \frac{e^{dot(\hat{h&amp;#95;{a}}, \hat{h&amp;#95;{i}})}}{\sum &amp;#95;{i\&amp;#95;{sample}=1}^{N} e^{ dot(\hat{h&amp;#95;{a}}, \hat{h}&amp;#95;{i\&amp;#95;{sample}})}}, \end{equation}$$&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt; </ephtml> where <ephtml> &lt;math display="inline" altimg="urn:x-wiley:07311745:media:emip12668:emip12668-math-0048" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics&gt;&lt;mover accent="true"&gt;&lt;msub&gt;&lt;mi&gt;h&lt;/mi&gt;&lt;mi&gt;a&lt;/mi&gt;&lt;/msub&gt;&lt;mo&gt;&amp;#770;&lt;/mo&gt;&lt;/mover&gt;&lt;annotation encoding="application/x-tex"&gt;$\hat{h&amp;#95;{a}}$&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt; </ephtml> and <ephtml> &lt;math display="inline" altimg="urn:x-wiley:07311745:media:emip12668:emip12668-math-0049" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics&gt;&lt;mover accent="true"&gt;&lt;msub&gt;&lt;mi&gt;h&lt;/mi&gt;&lt;mi&gt;i&lt;/mi&gt;&lt;/msub&gt;&lt;mo&gt;&amp;#770;&lt;/mo&gt;&lt;/mover&gt;&lt;annotation encoding="application/x-tex"&gt;$\hat{h&amp;#95;{i}}$&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt; </ephtml> represent the encoded vector representations of the input sequences <ephtml> &lt;math display="inline" altimg="urn:x-wiley:07311745:media:emip12668:emip12668-math-0050" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics&gt;&lt;mover accent="true"&gt;&lt;msub&gt;&lt;mi&gt;x&lt;/mi&gt;&lt;mi&gt;a&lt;/mi&gt;&lt;/msub&gt;&lt;mo&gt;&amp;#770;&lt;/mo&gt;&lt;/mover&gt;&lt;annotation encoding="application/x-tex"&gt;$\hat{x&amp;#95;{a}}$&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt; </ephtml> and <ephtml> &lt;math display="inline" altimg="urn:x-wiley:07311745:media:emip12668:emip12668-math-0051" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics&gt;&lt;mover accent="true"&gt;&lt;msub&gt;&lt;mi&gt;x&lt;/mi&gt;&lt;mi&gt;i&lt;/mi&gt;&lt;/msub&gt;&lt;mo&gt;&amp;#770;&lt;/mo&gt;&lt;/mover&gt;&lt;annotation encoding="application/x-tex"&gt;$\hat{x&amp;#95;{i}}$&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt; </ephtml> , respectively, and <ephtml> &lt;math display="inline" altimg="urn:x-wiley:07311745:media:emip12668:emip12668-math-0052" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics&gt;&lt;mover accent="true"&gt;&lt;msub&gt;&lt;mi&gt;h&lt;/mi&gt;&lt;mi&gt;i&lt;/mi&gt;&lt;/msub&gt;&lt;mo&gt;&amp;#770;&lt;/mo&gt;&lt;/mover&gt;&lt;annotation encoding="application/x-tex"&gt;$\hat{h&amp;#95;{i}}$&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt; </ephtml> is the positive candidate of <ephtml> &lt;math display="inline" altimg="urn:x-wiley:07311745:media:emip12668:emip12668-math-0053" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics&gt;&lt;mover accent="true"&gt;&lt;msub&gt;&lt;mi&gt;h&lt;/mi&gt;&lt;mi&gt;a&lt;/mi&gt;&lt;/msub&gt;&lt;mo&gt;&amp;#770;&lt;/mo&gt;&lt;/mover&gt;&lt;annotation encoding="application/x-tex"&gt;$\hat{h&amp;#95;{a}}$&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt; </ephtml> , while <ephtml> &lt;math display="inline" altimg="urn:x-wiley:07311745:media:emip12668:emip12668-math-0054" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics&gt;&lt;msub&gt;&lt;mover accent="true"&gt;&lt;mi&gt;h&lt;/mi&gt;&lt;mo&gt;&amp;#770;&lt;/mo&gt;&lt;/mover&gt;&lt;mrow&gt;&lt;mi&gt;i&lt;/mi&gt;&lt;mo&gt;&amp;#95;&lt;/mo&gt;&lt;mrow&gt;&lt;mi&gt;s&lt;/mi&gt;&lt;mi&gt;a&lt;/mi&gt;&lt;mi&gt;m&lt;/mi&gt;&lt;mi&gt;p&lt;/mi&gt;&lt;mi&gt;l&lt;/mi&gt;&lt;mi&gt;e&lt;/mi&gt;&lt;/mrow&gt;&lt;/mrow&gt;&lt;/msub&gt;&lt;annotation encoding="application/x-tex"&gt;$\hat{h}&amp;#95;{i\&amp;#95;{sample}}$&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt; </ephtml> represents all the candidates, positive and negative, present in the batch. The objective of the training process is to minimize this loss, encouraging the model to pull together representations of similar samples while pushing apart those of dissimilar ones.</p> <p>The data were randomly divided into training, validation, and testing sets on the test‐taker level. The resulting training set consisted of 243,605 test‐takers; the validation set, 2,486 test‐takers; and the testing set, 2,486 test‐takers. This unbalanced division of data was intentional in that the training and fine‐tuning of deep learning models generally benefit from larger samples, but the testing and independent evaluation of the model in the current study context did not necessarily require a large sample. We considered the sample size of more than 2,000 candidates in the test set to be sufficient for the current purpose. Results reported in "Study 2 Results" section are based on the testing set. In generating in‐batch negatives, we ignored the tasks, meaning that the two essays were randomly assigned to be the first or the second one. It is worth noting that we used a more stringent condition in model training and validation; that is, we optimized the models to separate the positive essay pair from the <emph>n</emph> number of negative pairs for each batch (e.g., <emph>n</emph> = 31 if the batch size is set at 32). In testing, however, we created a balanced sample consisting of half positive and half negative pairs for comparability with other educational measurement results such as those in Choi et al. ([<reflink idref="bib13" id="ref75">13</reflink>]). Specifically, in testing, we supplied the model with two candidate essays—one from the same person and one randomly drawn. If the model scored the positive pair higher than the other pairs, then it was considered correct/accurate. The classification accuracy was calculated as the proportion of the samples in the testing set where the positive pair had a higher similarity score.</p> <hd id="AN0185399426-24">Parameter settings and fine‐tuning</hd> <p>The final model used in our experiments had four layers, eight attention heads, a hidden size of 384 (hidden size being the length of the vectors in intermediary representations of the keystroke sequence between the attention layers), and an embedding size of 768 (embeddings size being the vector length of the representation right after input embedding layer); the final parameter count including the embedding layer was around 16 million. We trained our model by using the AdamW optimizer (Loshchilov &amp; Hutter, [<reflink idref="bib44" id="ref76">44</reflink>]) at a learning rate of <ephtml> &lt;math display="inline" altimg="urn:x-wiley:07311745:media:emip12668:emip12668-math-0055" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics&gt;&lt;msup&gt;&lt;mn&gt;10&lt;/mn&gt;&lt;mrow&gt;&lt;mo&gt;&amp;#8722;&lt;/mo&gt;&lt;mn&gt;5&lt;/mn&gt;&lt;/mrow&gt;&lt;/msup&gt;&lt;annotation encoding="application/x-tex"&gt;$10^{-5}$&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt; </ephtml> and a warmup of 300 steps. The training was done in half‐precision on a P40 GPU for 3 epochs and took around 15 hours to complete. Similar to Vaswani et al. ([<reflink idref="bib63" id="ref77">63</reflink>]), we also set a warmup scheduler where, for the first 300 training steps, the learning rate was linearly increased. To improve computational efficiency, the training was done in mixed precision (Micikevicius et al., [<reflink idref="bib48" id="ref78">48</reflink>]); that is, the activations and gradients were computed in half precision. We tuned hyperparamters related to the learning rate and batch size as well as experimented with different transformations on the keystroke sequence input, that is, relative (PositionChange) versus absolute (CursorPosition) positional encoding and raw GapTime versus log‐transformed GapTime. PositionChange is a common variation of the absolute cursor position for treating keystroke log data (Deane &amp; Zhang, [<reflink idref="bib18" id="ref79">18</reflink>]), which, to some extent, is a simpler measure in that only the two adjacent key presses are compared to capture the flow of the text production. Log‐transforming the GapTime is inspired by the previous research and findings that human reaction time shows highly skewed distribution (Lo &amp; Andrews, [<reflink idref="bib42" id="ref80">42</reflink>]; Medina, Díaz, &amp; Norwich, [<reflink idref="bib47" id="ref81">47</reflink>]) and log‐transformation could lead to better estimations in the context of writing assessment (Guo et al., [<reflink idref="bib27" id="ref82">27</reflink>]).</p> <hd id="AN0185399426-25">Study 2 Results</hd> <p>We present results from four models that varied on the keystroke input in Table 5. Model i used <ephtml> &lt;math display="inline" altimg="urn:x-wiley:07311745:media:emip12668:emip12668-math-0056" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics&gt;&lt;mi&gt;c&lt;/mi&gt;&lt;annotation encoding="application/x-tex"&gt;$c$&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt; </ephtml> , <ephtml> &lt;math display="inline" altimg="urn:x-wiley:07311745:media:emip12668:emip12668-math-0057" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics&gt;&lt;mi&gt;a&lt;/mi&gt;&lt;annotation encoding="application/x-tex"&gt;$a$&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt; </ephtml> , <ephtml> &lt;math display="inline" altimg="urn:x-wiley:07311745:media:emip12668:emip12668-math-0058" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics&gt;&lt;mi&gt;t&lt;/mi&gt;&lt;annotation encoding="application/x-tex"&gt;$t$&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt; </ephtml> , and <ephtml> &lt;math display="inline" altimg="urn:x-wiley:07311745:media:emip12668:emip12668-math-0059" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics&gt;&lt;mi&gt;p&lt;/mi&gt;&lt;annotation encoding="application/x-tex"&gt;$p$&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt; </ephtml> (absolute cursor position) as input. We found a considerable increase in classification accuracy with a larger batch size. The classification accuracy measure was 0.9867 for Model ii with a batch size of 32, compared to 0.8672 in Model i with a batch size of 16. It is worth noting that a small difference in classification accuracy, practically, can be translated to hundreds or thousands of test‐takers being affected. For example, if a testing program has one million test‐takers, an improvement in classification accuracy by 0.001 means that 1,000 more test‐takers are classified correctly under the new model. We saw no notable differences when training models under different learning rate.</p> <p>5 Table Classification Accuracy Based on Different Embedding Input</p> <p> <ephtml> &lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;th /&gt;&lt;th&gt;Embedding Input&lt;/th&gt;&lt;th /&gt;&lt;th /&gt;&lt;/tr&gt;&lt;tr&gt;&lt;th&gt;Model&lt;/th&gt;&lt;th&gt;Character&lt;/th&gt;&lt;th&gt;Action&lt;/th&gt;&lt;th&gt;GapTime&lt;/th&gt;&lt;th&gt;Log(GapTime)&lt;/th&gt;&lt;th&gt;CursorPosition&lt;/th&gt;&lt;th&gt;PositionChange&lt;/th&gt;&lt;th&gt;Training Batch Size&lt;/th&gt;&lt;th&gt;Accuracy&lt;/th&gt;&lt;th&gt;EER&lt;/th&gt;&lt;/tr&gt;&lt;tr&gt;&lt;th /&gt;&lt;th&gt;(&lt;italic&gt;c&lt;/italic&gt;)&lt;/th&gt;&lt;th&gt;(&lt;italic&gt;a&lt;/italic&gt;)&lt;/th&gt;&lt;th&gt;(&lt;italic&gt;t&lt;/italic&gt;)&lt;/th&gt;&lt;th /&gt;&lt;th&gt;(&lt;italic&gt;p&lt;/italic&gt;)&lt;/th&gt;&lt;th /&gt;&lt;th /&gt;&lt;th /&gt;&lt;th /&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;i&lt;/td&gt;&lt;td&gt;&lt;p&gt;&lt;math display="inline" altimg="urn:x-wiley:07311745:media:emip12668:emip12668-math-0060" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics xmlns=""&gt;&lt;mo&gt;&amp;#8226;&lt;/mo&gt;&lt;annotation encoding="application/x-tex"&gt;$\bullet$&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/p&gt;&lt;/td&gt;&lt;td&gt;&lt;p&gt;&lt;math display="inline" altimg="urn:x-wiley:07311745:media:emip12668:emip12668-math-0061" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics xmlns=""&gt;&lt;mo&gt;&amp;#8226;&lt;/mo&gt;&lt;annotation encoding="application/x-tex"&gt;$\bullet$&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/p&gt;&lt;/td&gt;&lt;td&gt;&lt;p&gt;&lt;math display="inline" altimg="urn:x-wiley:07311745:media:emip12668:emip12668-math-0062" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics xmlns=""&gt;&lt;mo&gt;&amp;#8226;&lt;/mo&gt;&lt;annotation encoding="application/x-tex"&gt;$\bullet$&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/p&gt;&lt;/td&gt;&lt;td /&gt;&lt;td&gt;&lt;p&gt;&lt;math display="inline" altimg="urn:x-wiley:07311745:media:emip12668:emip12668-math-0063" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics xmlns=""&gt;&lt;mo&gt;&amp;#8226;&lt;/mo&gt;&lt;annotation encoding="application/x-tex"&gt;$\bullet$&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/p&gt;&lt;/td&gt;&lt;td /&gt;&lt;td&gt;16&lt;/td&gt;&lt;td&gt;0.8672&lt;/td&gt;&lt;td&gt;0.1385&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;ii&lt;/td&gt;&lt;td&gt;&lt;p&gt;&lt;math display="inline" altimg="urn:x-wiley:07311745:media:emip12668:emip12668-math-0064" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics xmlns=""&gt;&lt;mo&gt;&amp;#8226;&lt;/mo&gt;&lt;annotation encoding="application/x-tex"&gt;$\bullet$&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/p&gt;&lt;/td&gt;&lt;td&gt;&lt;p&gt;&lt;math display="inline" altimg="urn:x-wiley:07311745:media:emip12668:emip12668-math-0065" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics xmlns=""&gt;&lt;mo&gt;&amp;#8226;&lt;/mo&gt;&lt;annotation encoding="application/x-tex"&gt;$\bullet$&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/p&gt;&lt;/td&gt;&lt;td&gt;&lt;p&gt;&lt;math display="inline" altimg="urn:x-wiley:07311745:media:emip12668:emip12668-math-0066" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics xmlns=""&gt;&lt;mo&gt;&amp;#8226;&lt;/mo&gt;&lt;annotation encoding="application/x-tex"&gt;$\bullet$&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/p&gt;&lt;/td&gt;&lt;td /&gt;&lt;td&gt;&lt;p&gt;&lt;math display="inline" altimg="urn:x-wiley:07311745:media:emip12668:emip12668-math-0067" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics xmlns=""&gt;&lt;mo&gt;&amp;#8226;&lt;/mo&gt;&lt;annotation encoding="application/x-tex"&gt;$\bullet$&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/p&gt;&lt;/td&gt;&lt;td /&gt;&lt;td&gt;32&lt;/td&gt;&lt;td&gt;0.9867&lt;/td&gt;&lt;td&gt;0.0153&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;iii&lt;/td&gt;&lt;td&gt;&lt;p&gt;&lt;math display="inline" altimg="urn:x-wiley:07311745:media:emip12668:emip12668-math-0068" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics xmlns=""&gt;&lt;mo&gt;&amp;#8900;&lt;/mo&gt;&lt;annotation encoding="application/x-tex"&gt;$\diamond$&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/p&gt;&lt;/td&gt;&lt;td&gt;&lt;p&gt;&lt;math display="inline" altimg="urn:x-wiley:07311745:media:emip12668:emip12668-math-0069" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics xmlns=""&gt;&lt;mo&gt;&amp;#8900;&lt;/mo&gt;&lt;annotation encoding="application/x-tex"&gt;$\diamond$&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/p&gt;&lt;/td&gt;&lt;td /&gt;&lt;td&gt;&lt;p&gt;&lt;math display="inline" altimg="urn:x-wiley:07311745:media:emip12668:emip12668-math-0070" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics xmlns=""&gt;&lt;mo&gt;&amp;#8900;&lt;/mo&gt;&lt;annotation encoding="application/x-tex"&gt;$\diamond$&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/p&gt;&lt;/td&gt;&lt;td&gt;&lt;p&gt;&lt;math display="inline" altimg="urn:x-wiley:07311745:media:emip12668:emip12668-math-0071" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics xmlns=""&gt;&lt;mo&gt;&amp;#8900;&lt;/mo&gt;&lt;annotation encoding="application/x-tex"&gt;$\diamond$&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/p&gt;&lt;/td&gt;&lt;td /&gt;&lt;td&gt;32&lt;/td&gt;&lt;td&gt;0.9875&lt;/td&gt;&lt;td&gt;0.0129&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;iv&lt;/td&gt;&lt;td&gt;&lt;p&gt;&lt;math display="inline" altimg="urn:x-wiley:07311745:media:emip12668:emip12668-math-0072" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics xmlns=""&gt;&lt;mo&gt;&amp;#8226;&lt;/mo&gt;&lt;annotation encoding="application/x-tex"&gt;$\bullet$&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/p&gt;&lt;/td&gt;&lt;td&gt;&lt;p&gt;&lt;math display="inline" altimg="urn:x-wiley:07311745:media:emip12668:emip12668-math-0073" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics xmlns=""&gt;&lt;mo&gt;&amp;#8226;&lt;/mo&gt;&lt;annotation encoding="application/x-tex"&gt;$\bullet$&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/p&gt;&lt;/td&gt;&lt;td /&gt;&lt;td&gt;&lt;p&gt;&lt;math display="inline" altimg="urn:x-wiley:07311745:media:emip12668:emip12668-math-0074" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics xmlns=""&gt;&lt;mo&gt;&amp;#8226;&lt;/mo&gt;&lt;annotation encoding="application/x-tex"&gt;$\bullet$&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/p&gt;&lt;/td&gt;&lt;td /&gt;&lt;td&gt;&lt;p&gt;&lt;math display="inline" altimg="urn:x-wiley:07311745:media:emip12668:emip12668-math-0075" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics xmlns=""&gt;&lt;mo&gt;&amp;#8226;&lt;/mo&gt;&lt;annotation encoding="application/x-tex"&gt;$\bullet$&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/p&gt;&lt;/td&gt;&lt;td&gt;32&lt;/td&gt;&lt;td&gt;0.9722&lt;/td&gt;&lt;td&gt;0.0356&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt; </ephtml> </p> <p>4 <emph>Note</emph>. "Log(GapTime)" refers to log‐transformed GapTime.</p> <p>We also used equal error rate (EER) to compare models. The EER statistic is where the false acceptance rate (FAR, or rate of Type I errors) is equal to the false rejection rate (FRR, or rate of Type II errors). It is commonly used to evaluate the performance of biometric systems, which have to balance the need for accurate detection against the danger of raising too many false alarms (Shok, Shivashankar, &amp; Mudiraj, [<reflink idref="bib55" id="ref83">55</reflink>]). Depending on the cutoff threshold used, the same underlying metric can produce higher FARs and lower FRRs, or lower FRRs and higher Fars. Typically, the tradeoff between the FAR and the FRR follows a hyperbolic pattern where very low FARs can be achieved at the cost of much higher FRRs, or very low FRRs can be achieved at the cost of much higher FARs. EER provides a single estimate that can be used to compare different classifiers when the goal is to minimize both FAR and FRR. Lower EER is more desirable for the current classification task under investigation (Agrawal, Kapoor, &amp; Agrawal, [<reflink idref="bib1" id="ref84">1</reflink>]; Zhang, Wang, Cooper, Evans, &amp; Yamagishi, [<reflink idref="bib68" id="ref85">68</reflink>]). The equal error rate (EER) dropped from 0.1385 (Model i with a batch size of 16) to 0.0153 (Model ii with a batch size of 32).</p> <p>The importance of batch size for model performance in contrastive learning has been reported by others such as Chen, Kornblith, Norouzi, and Hinton ([<reflink idref="bib12" id="ref86">12</reflink>]). Under the in‐batch negatives setup, having a large batch meant that the model got more negative candidates per sample to learn from. Overall, training with a large enough batch size (Model ii) yielded results competitive with previous methods (Choi et al., [<reflink idref="bib13" id="ref87">13</reflink>]) in classification tasks.</p> <p>In Model iii, we used log‐transformed GapTime as opposed to the raw GapTime, which further improved the classification accuracy from 0.9867 to 0.9875 and reduced the EER from 0.0153 to 0.0129. Figure 7 contains the receiver operating characteristic (ROC) curve (Nahm, [<reflink idref="bib51" id="ref88">51</reflink>]) of Model iii that represents FAR against FRR at various decision thresholds where EER is at where FAR is equal to FRR. The lower the EER, the better. However, replacing the absolute CursorPosition with PositionChange (Model iv) dropped the classification accuracy to 0.9722 and increased the EER. For these findings, we hypothesize that since the original encoder architecture, BigBird (Zaheer et al., [<reflink idref="bib67" id="ref89">67</reflink>]), is designed for absolute position embedding, the relative position encoding in Model iv was the main factor for the loss of performance. While some models rely on relative position encoding, they also employ other modifications to the transformer architecture to accommodate this choice (Dai et al., [<reflink idref="bib16" id="ref90">16</reflink>]; Yang et al., [<reflink idref="bib66" id="ref91">66</reflink>]). On the other hand, log‐transforming the GapTime proved to be effective and we argue that this was because the distribution of the interval was highly skewed to the right as discussed in Guo et al. ([<reflink idref="bib27" id="ref92">27</reflink>]), due to students sometimes taking a long pause to think before continuing typing. Finally, Figure 8 shows the cosine similarities of keystroke sequence embeddings from Model iii on the test set. It is clear from Figure 8 that negative pairs and positive pairs have very distinct cosine similarity distributions. There are some negative pairs that have similar keystroke sequence patterns, but that number is small. As a result, the classification accuracy was not surprising. In summary, this analysis showed that keystroke log data is highly effective for person classification. The best‐performing Model iii achieved comparable performance to the current state of the art while requiring no feature engineering.</p> <p> <img src="https://imageserver.ebscohost.com/img/embimages/rdk/EMS/01jun25/emip12668-fig-0007.jpg?ephost1=dGJyMNXb4kSepq84yOvqOLCmsE6epq5Srqa4SK6WxWXS" alt="emip12668-fig-0007.jpg" title="7 Equal error rate of the best‐performing model iii.Notes: The curve represents the false acceptance rate (FAR) versus the false rejection rate (FRR) at various decision thresholds. Equal error rate (EER) is at where FAR is equal to FRR (lower EER is better)." /> </p> <p></p> <p> <img src="https://imageserver.ebscohost.com/img/embimages/rdk/EMS/01jun25/emip12668-fig-0008.jpg?ephost1=dGJyMNXb4kSepq84yOvqOLCmsE6epq5Srqa4SK6WxWXS" alt="emip12668-fig-0008.jpg" title="8 Cosine similarities of sequence embeddings." /> </p> <p></p> <hd id="AN0185399426-28">Discussion</hd> <p>Keystroke logging has become a valuable tool in writing research. In this paper, we use two empirical studies to demonstrate the applications and modeling of keystroke logs in writing assessment context. We were interested not only in keystroke process features that are related to writing quality, but features that reflect other factors that contribute to writing behavior, including both features that are relatively stable within person, and features that are heavily affected by specific contextual variables. As such, our work can also be considered a contribution to the study of how typing behavior changes both within‐person (in different contexts), and between people (Deane et al., [<reflink idref="bib17" id="ref93">17</reflink>]; Russell, [<reflink idref="bib53" id="ref94">53</reflink>]). Two research questions were addressed with empirical analyses. We showed that it is possible to begin to tease out these factors, exploiting, on the one hand, features that are sensitive to differences between at‐home and in‐center administrations of a writing task, and on the other hand, features that are stable measures of individual writing and typing habits. In doing that, we also illustrated two different approaches of modeling differences in writing processes: (a) analysis of mean differences in handcrafted, theory‐driven process features and (b) use of large language models to generate representations of keystroke logs to make inferences on latent relationships. This type of work is important because the keystroke log data could provide a window for researchers and practitioners into the cognitive processes of writers used in composition that may have implications for improving writing skills and/or assessment design.</p> <p>In the first study, traditional handcrafted features were extracted from the keystroke logs and used to analyze the differences in two testing modes. The findings suggested that people write differently at home versus in testing centers from a process perspective, however the magnitude of the differences was of little practical significance. The features used in the first study were largely designed based on writing theories (e.g., Hayes, [<reflink idref="bib31" id="ref95">31</reflink>]) to capture important and interpretable aspects of writing processes; however, those features are time‐consuming to design and extract and, more importantly, hard to scale. For example, they are difficult to generalize across writing tasks (e.g., short‐response writing vs. extended essay writing), across different writer populations (e.g., K‐12 learners vs. professional writers), and across task delivery platforms (which may log and track keyboarding activities differently). Keyboarding fluency may be a differentiator of performance in the K‐12 education context particularly for the younger learners, but not in a professional writing context where every writer is fluent with keyboarding. In the second study, we introduced a novel approach to training a large language model capable of encoding keystroke logs and generating representations of keystroke logs by eliminating the needs for feature engineering. Traditionally, process data analysis requires labor‐intensive feature engineering, which is not only time‐consuming but also difficult to scale across contexts and tasks. The methods in Study 2 circumvent those limitations, allowing raw sequence data to be fed directly into the model. Broadly speaking, we believe that this study enables greater applications of large language models in educational measurement. Not only does it hold significant promise for improving test security measures, it shows promise to improve other aspects of educational assessment such as the personalization and AI scoring. Specifically we implemented our model based on SimCSE (Gao et al., [<reflink idref="bib25" id="ref96">25</reflink>]) with modifications to the transformer architecture to accommodate raw keystroke logs as input. Our results showed that we can distinguish between the writing behaviors of one person versus different people with a relatively high degree of accuracy. This work makes novel contributions to the use of neural methods to analyze writing behavior. It pioneers the use of transformer models to model writing behavior, making it possible to extend our keystroke analysis to take advantage of the full power of deep learning. Furthermore, the in‐batch negative technique of contrastive learning can be implemented in a fully unsupervised manner (Karpukhin et al., [<reflink idref="bib36" id="ref97">36</reflink>]; Logeswaran et al., [<reflink idref="bib43" id="ref98">43</reflink>]).</p> <p>Relevant to both studies presented in this paper, the analyses were exploratory, so it will be important to validate the conclusions drawn from these analyses with more data sets. Related to it, one major limitation is that there was no direct comparison of the results between the methods used in the two studies. For example, an analysis on how the transformer model works for the Study 1 data could potentially strengthen the generalizability of the model. Future studies are strongly encouraged to use the same data set (that include two writing samples from a test‐taker) to explore different modeling approaches. In Study 1, there was only one essay task in the writing subtest, which meant that generalization across prompts or writing genres was not guaranteed. In addition, all data were collected in a high‐stakes setting when test‐takers had limited time to produce an original response and in which rewriting existing text was not appropriate. It will be worthwhile to replicate the analysis on other writing assessments potentially with more than one writing task or under more variable testing conditions. However, we acknowledge that not all writing assessments require test‐takers to complete two writing samples. Another limitation is that we trained a relatively small‐scale transformer model in the second study due to GPU constraint. With a larger batch size and more training data, for example, we anticipate further performance improvements. As the current early stages of model development in Study 2, we prioritized and allocated most data and computing resources to the model training process in order to get a sense of the model's potentials. Future studies are encouraged to invest more resources in model testing in real applications. We also did not investigate whether the differences found between at‐home and in‐center testing modes, or the classification accuracy of true pairs of essays, were comparable across various demographic groups. As fairness and equity are central to validity in an assessment, this becomes an obvious and important direction for future research. There are other future research possibilities that are worth noting as the natural continuation of the work presented in this paper. For example, employing large language models to predict the mode of testing and comparing it with a model using handcrafted features could yield interesting insights. Similarly, for the second study, predicting whether the same person wrote two different essays using simpler models based on handcrafted features can provide a valuable comparative perspective. An ablation analysis will serve as a valuable complementary analysis to Study 2, as one or more of the features among <ephtml> &lt;math display="inline" altimg="urn:x-wiley:07311745:media:emip12668:emip12668-math-0076" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics&gt;&lt;mi&gt;c&lt;/mi&gt;&lt;annotation encoding="application/x-tex"&gt;$c$&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt; </ephtml> , <ephtml> &lt;math display="inline" altimg="urn:x-wiley:07311745:media:emip12668:emip12668-math-0077" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics&gt;&lt;mi&gt;a&lt;/mi&gt;&lt;annotation encoding="application/x-tex"&gt;$a$&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt; </ephtml> , <ephtml> &lt;math display="inline" altimg="urn:x-wiley:07311745:media:emip12668:emip12668-math-0078" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics&gt;&lt;mi&gt;p&lt;/mi&gt;&lt;annotation encoding="application/x-tex"&gt;$p$&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt; </ephtml> , <ephtml> &lt;math display="inline" altimg="urn:x-wiley:07311745:media:emip12668:emip12668-math-0079" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;semantics&gt;&lt;mi&gt;t&lt;/mi&gt;&lt;annotation encoding="application/x-tex"&gt;$t$&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt; </ephtml> may not be available other keystroke logging systems. Therefore, knowing the impact of these features on the prediction performance will be valuable.</p> <ref id="AN0185399426-29"> <title> Footnotes </title> <blist> <bibl id="bib1" idref="ref84" type="bt">1</bibl> <bibtext> We used the same data set to conduct a supplementary analysis to predict at‐home (labeled as 0) or in‐center (labeled as 1) at the individual level. We applied a threefold cross‐validation approach where the training sample consisted of 66% of the data. The prediction accuracy, even with model fine tuning in some machine learning models, was quite low in terms of model precision, ranging from 0.475 to 0.569 across models trained using Support Vector Machine, Random Forest, Neural Network, Logistic Regression, and Gradient Boosting. This additional result, that is, inability to separate between the two modes, aligned well with the results reported in the main analysis.</bibtext> </blist> </ref> <ref id="AN0185399426-30"> <title> References </title> <blist> <bibtext> Agrawal, P., Kapoor, R., &amp; Agrawal, S. (2014). A hybrid partial fingerprint matching algorithm for estimation of equal error rate. In 2014 IEEE International Conference on Advanced Communications, Control and Computing Technologies (pp. 1295 – 1299). IEEE.</bibtext> </blist> <blist> <bibl id="bib2" idref="ref26" type="bt">2</bibl> <bibtext> Almond, R., Deane, P., Quinlan, T., Wagner, M., &amp; Sydorenko, T. (2012). A preliminary analysis of keystroke log data from a timed writing task. Research Report RR‐12‐23, ETS.</bibtext> </blist> <blist> <bibl id="bib3" idref="ref48" type="bt">3</bibl> <bibtext> Austin, P. C. (2011). An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivariate Behavioral Research, 46 (3), 399 – 424.</bibtext> </blist> <blist> <bibl id="bib4" idref="ref45" type="bt">4</bibl> <bibtext> Becker, K. A., Liu, J., &amp; Jones, P. E. (2019). Test security and the pandemic: Comparison of test center and online proctor delivery modalities. Applied Psychological Measurement, 0 (0), 0.</bibtext> </blist> <blist> <bibl id="bib5" idref="ref56" type="bt">5</bibl> <bibtext> Bengio, Y., Courville, A., &amp; Vincent, P. (2014). Representation learning: A review and new perspectives. arXiv.</bibtext> </blist> <blist> <bibl id="bib6" idref="ref13" type="bt">6</bibl> <bibtext> Bennett, R. E., Zhang, M., &amp; Sinharay, S. (2021). How do educationally at‐risk men and women differ in their essay‐writing processes? NCME Chinese English Journal of Educational Measurement and Evaluation, 2, 1.</bibtext> </blist> <blist> <bibl id="bib7" idref="ref1" type="bt">7</bibl> <bibtext> Bereiter, C. S., &amp; Scardamalia, M. (1987). The psychology of written composition. Lawrence Erlbaum.</bibtext> </blist> <blist> <bibl id="bib8" idref="ref5" type="bt">8</bibl> <bibtext> Berninger, V. W. (1992). Lower‐level developmental skills in beginning writing. Reading and Writing, 4, 257 – 280.</bibtext> </blist> <blist> <bibl id="bib9" idref="ref6" type="bt">9</bibl> <bibtext> Berninger, V. W. (1999). Coordinating transcription and text generation in working memory during composing: Automatic and constructive processes. Learning Disability Quarterly, 22, 99 – 112.</bibtext> </blist> <blist> <bibtext> Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert‐Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., &amp; Amodei, D. (2020). Language models are few‐shot learners. arXiv.</bibtext> </blist> <blist> <bibtext> Camara, W. (2020). Never let a crisis go to waste: Large‐scale assessment and the response to COVID‐19. Educational Measurement: Issues and Practice, 39 (3), 10 – 18.</bibtext> </blist> <blist> <bibtext> Chen, T., Kornblith, S., Norouzi, M., &amp; Hinton, G. (2020). A simple framework for contrastive learning of visual representations. arXiv.</bibtext> </blist> <blist> <bibtext> Choi, I., Hao, J., Deane, P., &amp; Zhang, M. (2021). Benchmark keystroke biometrics accuracy from high‐stakes writing tasks. Research Report RR‐12‐23, ETS.</bibtext> </blist> <blist> <bibtext> Chopra, S., Hadsell, R., &amp; LeCun, Y. (2005). Learning a similarity metric discriminatively, with application to face verification. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), volume 1, (pp. 539 – 546). IEEE.</bibtext> </blist> <blist> <bibtext> Cohen, J. (1988). Statistical power analysis for the behavioral sciences. Routledge Academic.</bibtext> </blist> <blist> <bibtext> Dai, Z., Yang, Z., Yang, Y., Carbonell, J., Le, Q. V., &amp; Salakhutdinov, R. (2019). Transformer‐XL: Attentive language models beyond a fixed‐length context. arXiv.</bibtext> </blist> <blist> <bibtext> Deane, P., Roth, A., Litz, A., Goswami, V., Steck, F., Lewis, M., &amp; Richter, T. (2018). The role of noncognitive constructs and other background variables in graduate education. Research Memorandum RM‐18‐06, ETS.</bibtext> </blist> <blist> <bibtext> Deane, P., &amp; Zhang, M. (2015). Exploring the feasibility of using writing process features to assess text production skills. ETS Research Report RR‐15‐26, ETS.</bibtext> </blist> <blist> <bibtext> Deane, P., Zhang, M., Hao, J., &amp; Li, C. (2025). Using keystroke dynamics to detect non-original text. Journal of Educational Measurement. https://onlinelibrary.wiley.com/doi/full/10.1111/jedm.12431</bibtext> </blist> <blist> <bibtext> Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., &amp; Houlsby, N. (2021). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv.</bibtext> </blist> <blist> <bibtext> Dowland, P. S., &amp; Furnell, S. M. (2004). A long‐term trial of keystroke profiling using digraph, trigraph and keyword latencies. In Deswarte, Y., Cuppens, F., Jajodia, S., &amp; Wang, L. (Eds.), Security and Protection in Information Processing Systems: SEC 2004. IFIP—The International Federation for Information Processing, volume 147 (pp. 275 – 289). Springer.</bibtext> </blist> <blist> <bibtext> Edwards, J., Leinonen, J., Birthare, C., Zavgorodniaia, A., &amp; Hellas, A. (2020). Programming versus natural language: On the effect of context on typing in CS1. In ICER '20: Proceedings of the 2020 ACM Conference on International Computing Education Research (pp. 204 – 215). ACM.</bibtext> </blist> <blist> <bibtext> Emig, J. (1972). The composing processes of twelfth graders. NCTE Research Report NCTE, National Council of Teachers of English.</bibtext> </blist> <blist> <bibtext> Flower, L. S., &amp; Hayes, J. R. (1981). A cognitive process theory of writing. College Composition &amp; Communication, 32, 365 – 387.</bibtext> </blist> <blist> <bibtext> Gao, T., Yao, X., &amp; Chen, D. (2022). SimCSE: Simple contrastive learning of sentence embeddings. arXiv.</bibtext> </blist> <blist> <bibtext> Guo, H. (2022). How did students engage with a remote educational assessment? Educational Measurement: Issues and Practice, 41 (3), 58 – 68.</bibtext> </blist> <blist> <bibtext> Guo, H., Deane, P. D., van Rijn, P. W., Zhang, M., &amp; Bennett, R. E. (2018). Modeling basic writing processes from keystroke logs. Journal of Educational Measurement, 55 (2), 194 – 216.</bibtext> </blist> <blist> <bibtext> Guo, H., Zhang, M., Deane, P., &amp; Bennett, R. (2020). Effects of scenario‐based assessment on students' writing processes. Journal of Educational Data Mining, 12 (1), 19 – 45.</bibtext> </blist> <blist> <bibtext> Haberman, S. J. (1984). Adjustment by minimum discriminant information. Annals of Statistics, 12 (3), 971 – 988.</bibtext> </blist> <blist> <bibtext> Hadsell, R., Chopra, S., &amp; LeCun, Y. (2006). Dimensionality reduction by learning an invariant mapping. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06), volume 2 (pp. 1735 – 1742). IEEE.</bibtext> </blist> <blist> <bibtext> Hayes, J. R. (2012). Modeling and remodeling writing. Written Communication, 29 (3), 369 – 388.</bibtext> </blist> <blist> <bibtext> Hoffer, E., &amp; Ailon, N. (2018). Deep metric learning using Triplet network. arXiv.</bibtext> </blist> <blist> <bibtext> Jaegle, A., Gimeno, F., Brock, A., Zisserman, A., Vinyals, O., &amp; Carreira, J. (2021). Perceiver: General perception with iterative attention. arXiv.</bibtext> </blist> <blist> <bibtext> Jiang, Y., Zhang, M., Hao, J., Deane, P., &amp; Li, C. (2024). Using keystroke behavior patterns to detect nonauthentic texts in writing assessments: Evaluating the fairness of predictive models. Journal of Educational Measurement, 61 (4), 571 – 594.</bibtext> </blist> <blist> <bibtext> Jiao, H., &amp; Lissitz, R. W. (2020). What hath the coronavirus brought to assessment? unprecedented challenges in educational assessment in 2020 and years to come. Educational Measurement: Issues and Practice, 39 (3), 45 – 48.</bibtext> </blist> <blist> <bibtext> Karpukhin, V., Oğuz, B., Min, S., Lewis, P., Wu, L., Edunov, S., Chen, D., &amp; tau Yih, W. (2020). Dense passage retrieval for open‐domain question answering. In Webber, B., Cohn, T., He, Y., &amp; Liu, Y. (Eds.), Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), (pp. 6769 ‐ 6781). Association for Computational Linguistics.</bibtext> </blist> <blist> <bibtext> Kellog, R. T. (1987). Writing performance: Effects of cognitive strategies. Written Communication, 4 (3), 269 – 298.</bibtext> </blist> <blist> <bibtext> Kim, S., &amp; Walker, M. (2021). Assessing mode effects of at‐home testing without a randomized trial. ETS Research Report RR‐21‐10, ETS.</bibtext> </blist> <blist> <bibtext> Kusanagi, K., Abe, D., Fukuta, J., &amp; Kawaguchi, Y. (2013). Visualizing writing process using a key logging system: For construct feedback to enhance autonomous learning. Paper presented at the 81st Spring Conference of the Chubu Chapter, Japan Association for Language Education and Technology (LET), Tokai Gakuen University, Japan.</bibtext> </blist> <blist> <bibtext> Leijten, M., &amp; van Waes, L. (2013). Keystroke logging in writing research using input log to analyze and visualize writing processes. Written Communication, 30 (3), 358 – 392.</bibtext> </blist> <blist> <bibtext> Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., &amp; Stoyanov, V. (2019). RoBERTa: A robustly optimized BERT pretraining approach. arXiv.</bibtext> </blist> <blist> <bibtext> Lo, S., &amp; Andrews, S. (2015). To transform or not to transform: Using generalized linear mixed models to analyze reaction time data. Frontiers in Psychology, 6, 1171.</bibtext> </blist> <blist> <bibtext> Logeswaran, L., Chang, M.‐W., Lee, K., Toutanova, K., Devlin, J., &amp; Lee, H. (2019). Zero‐shot entity linking by reading entity descriptions. arXiv.</bibtext> </blist> <blist> <bibtext> Loshchilov, I., &amp; Hutter, F. (2019). Decoupled weight decay regularization. arXiv.</bibtext> </blist> <blist> <bibtext> Lu, J., Batra, D., Parikh, D., &amp; Lee, S. (2019). ViLBERT: Pretraining task‐agnostic visiolinguistic representations for vision‐and‐language tasks. arXiv.</bibtext> </blist> <blist> <bibtext> McCutchen, D. (1996). A capacity theory of writing: Working memory in composition. Educational Psychology Review, 8, 299 – 325.</bibtext> </blist> <blist> <bibtext> Medina, J. M., Díaz, J. A., &amp; Norwich, K. H. (2014). A theory of power laws in human reaction times: Insights from an information‐processing approach. Frontiers in Human Neuroscience, 8, 621.</bibtext> </blist> <blist> <bibtext> Micikevicius, P., Narang, S., Alben, J., Diamos, G., Elsen, E., Garcia, D., Ginsburg, B., Houston, M., Kuchaiev, O., Venkatesh, G., &amp; Wu, H. (2018). Mixed precision training. arXiv.</bibtext> </blist> <blist> <bibtext> Mondal, S. (2016). Continuous user authentication and identification. PhD thesis, Norwegian University of Science and Technology.</bibtext> </blist> <blist> <bibtext> Morgan, J. H., Cheng, C. Y., Pike, C., &amp; Ritter, F. E. (2013). A design, test, and considerations for improving keystrokes and mouse loggers. Interacting with Computers, 25 (3), 242 – 258.</bibtext> </blist> <blist> <bibtext> Nahm, F. S. (2022). Receiver operating characteristic curve: Overview and practical use for clinicians. Korean Journal of Anethesiology, 75 (1), 25 – 36.</bibtext> </blist> <blist> <bibtext> Rae, J. W., Borgeaud, S., Cai, T., Millican, K., Hoffmann, J., Song, F., Aslanides, J., Henderson, S., Ring, R., Young, S., Rutherford, E., Hennigan, T., Menick, J., Cassirer, A., Powell, R., van den Driessche, G., Hendricks, L. A., Rauh, M., Huang, P.‐S., Glaese, A., Welbl, J., Dathathri, S., Huang, S., Uesato, J., Mellor, J., Higgins, I., Creswell, A., McAleese, N., Wu, A., Elsen, E., Jayakumar, S., Buchatskaya, E., Budden, D., Sutherland, E., Simonyan, K., Paganini, M., Sifre, L., Martens, L., Li, X. L., Kuncoro, A., Nematzadeh, A., Gribovskaya, E., Donato, D., Lazaridou, A., Mensch, A., Lespiau, J.‐B., Tsimpoukelli, M., Grigorev, N., Fritz, D., Sottiaux, T., Pajarskas, M., Pohlen, T., Gong, Z., Toyama, D., de Masson d'Autume, C., Li, Y., Terzi, T., Mikulik, V., Babuschkin, I., Clark, A., de Las Casas, D., Guy, A., Jones, C., Bradbury, J., Johnson, M., Hechtman, B., Weidinger, L., Gabriel, I., Isaac, W., Lockhart, E., Osindero, S., Rimell, L., Dyer, C., Vinyals, O., Ayoub, K., Stanway, J., Bennett, L., Hassabis, D., Kavukcuoglu, K., &amp; Irving, G. (2022). Scaling language models: Methods, analysis &amp; tnsights from training gopher. arXiv.</bibtext> </blist> <blist> <bibtext> Russell, M. (1999). Testing on computers: A follow‐up study comparing performance on computer and on paper. Education Policy Analysis Archives, 7 (20).</bibtext> </blist> <blist> <bibtext> Sahu, C., Banavar, M., &amp; Schuckers, S. (2020). A novel distance‐based algorithm for multi‐user classification in keystroke dynamics. In 2020 54th Asilomar Conference on Signals, Systems, and Computers, Pacific Grove, CA, USA (pp. 63 – 67). IEEE.</bibtext> </blist> <blist> <bibtext> Shok, J., Shivashankar, V., &amp; Mudiraj, P. V. G. S. (2010). An overview of biometrics. International Journal on Computer Science and Engineering, 2 (7), 2402 – 2408.</bibtext> </blist> <blist> <bibtext> Shrestha, R., Leinonen, J., Hellas, A., Ihantola, P., &amp; Edwards, J. (2022). CodeProcess charts: Visualizing the process of writing code. In Sheard, J., &amp; Denny, P. (Eds.), ACE'22: Proceedings of the 24th Australasian Computing Education Conference. ACM.</bibtext> </blist> <blist> <bibtext> Sinharay, S., Zhang, M., &amp; Deane, P. (2019). Application of data mining methods for predicting essay scores from writing process and product features. Applied Measurement in Education, 32 (2), 116 – 137.</bibtext> </blist> <blist> <bibtext> Spence, D., Ward, R., Wooden, S., Browne, M., Song, H., Hawkins, R., &amp; Wojnakowski, M. (2024). Use of resources and method of proctoring during the nbcrna continued professional certification assessment: Analysis of outcomes. Journal of Nursing Regulation, 10 (3), 37 – 46.</bibtext> </blist> <blist> <bibtext> Stallard, C. K. (1974). An analysis of the writing behavior of good student writers. Research in the Teaching of English, 8 (2), 206 – 218.</bibtext> </blist> <blist> <bibtext> Tan, H., &amp; Bansal, M. (2019). LXMERT: Learning cross‐modality encoder representations from transformers. arXiv.</bibtext> </blist> <blist> <bibtext> Tay, Y., Dehghani, M., Tran, V. Q., Garcia, X., Wei, J., Wang, X., Chung, H. W., Shakeri, S., Bahri, D., Schuster, T., Zheng, H. S., Zhou, D., Houlsby, N., &amp; Metzler, D. (2023). UL2: Unifying language learning paradigms. arXiv.</bibtext> </blist> <blist> <bibtext> Vandermeulen, N., Leijten, M., &amp; Van Waes, L. (2020). Reporting writing process feedback in the classroom: Using keystroke logging data to reflect on writing processes. Journal of Writing Research, 12 (1), 109 – 140.</bibtext> </blist> <blist> <bibtext> Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., &amp; Polosukhin, I. (2017). Attention is all you need. arXiv.</bibtext> </blist> <blist> <bibtext> Walker, S. (2002). Biometric selection: Body parts online. SANS Institute Reading Room.</bibtext> </blist> <blist> <bibtext> Weiner, J., &amp; Hurtz, G. (2017). A comparative study of online remote proctored versus onsite proctored high‐stakes exams. Journal of Applied Testing Technology, 18 (1), 13 – 20.</bibtext> </blist> <blist> <bibtext> Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R., &amp; Le, Q. V. (2020). XLNet: Generalized autoregressive pretraining for language understanding. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., &amp; Lin, H. (Eds.), Advances in Neural Information Processing Systems 33 (NeurIPS 2020). Curran Associates.</bibtext> </blist> <blist> <bibtext> Zaheer, M., Guruganesh, G., Dubey, A., Ainslie, J., Alberti, C., Ontanon, S., Pham, P., Ravula, A., Wang, Q., Yang, L., &amp; Ahmed, A. (2020). Big Bird: Transformers for longer sequences. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., &amp; Lin, H. (Eds.), Advances in neural information processing systems 33 (NeurIPS 2020). Curran Associates.</bibtext> </blist> <blist> <bibtext> Zhang, L., Wang, X., Cooper, E., Evans, N., &amp; Yamagishi, J. (2023). Range‐based equal error rate for spoof localization. Proceedings of Interspeech 2023, 3212 – 3216.</bibtext> </blist> <blist> <bibtext> Zhang, M., &amp; Deane, P. (2015). Process features in writing: Internal structure and incremental value over product features. ETS Research Report RR‐15‐27, ETS.</bibtext> </blist> <blist> <bibtext> Zhang, M., &amp; Sinharay, S. (2022). Investigating the writing performance of educationally at‐risk examinees using technology. International Journal of Testing, 312 – 347.</bibtext> </blist> <blist> <bibtext> Zhang, M., Zou, D., Wu, A. D., Deane, P., &amp; Li, C. (2017). An investigation of the writing processes in employed in scenario‐based assessment. In B. D. Zumbo &amp; A. M. Hubley (Eds.), Understanding and investigating response processes in validation research (pp. 321 – 339). Springer.</bibtext> </blist> <blist> <bibtext> Zhuang, F., Qi, Z., Duan, K., Xi, D., Zhu, Y., Zhu, H., Xiong, H., &amp; He, Q. (2020). A comprehensive survey on transfer learning. arXiv.</bibtext> </blist> </ref> <aug> <p>By Mo Zhang; Paul Deane; Andrew Hoang; Hongwen Guo and Chen Li</p> <p>Reported by Author; Author; Author; Author; Author</p> </aug> <nolink nlid="nl1" bibid="bib23" firstref="ref2"></nolink> <nolink nlid="nl2" bibid="bib59" firstref="ref3"></nolink> <nolink nlid="nl3" bibid="bib31" firstref="ref4"></nolink> <nolink nlid="nl4" bibid="bib24" firstref="ref7"></nolink> <nolink nlid="nl5" bibid="bib37" firstref="ref8"></nolink> <nolink nlid="nl6" bibid="bib46" firstref="ref9"></nolink> <nolink nlid="nl7" bibid="bib40" firstref="ref11"></nolink> <nolink nlid="nl8" bibid="bib62" firstref="ref12"></nolink> <nolink nlid="nl9" bibid="bib28" firstref="ref14"></nolink> <nolink nlid="nl10" bibid="bib70" firstref="ref15"></nolink> <nolink nlid="nl11" bibid="bib39" firstref="ref18"></nolink> <nolink nlid="nl12" bibid="bib50" firstref="ref19"></nolink> <nolink nlid="nl13" bibid="bib22" firstref="ref20"></nolink> <nolink nlid="nl14" bibid="bib56" firstref="ref21"></nolink> <nolink nlid="nl15" bibid="bib49" firstref="ref22"></nolink> <nolink nlid="nl16" bibid="bib71" firstref="ref25"></nolink> <nolink nlid="nl17" bibid="bib69" firstref="ref27"></nolink> <nolink nlid="nl18" bibid="bib57" firstref="ref28"></nolink> <nolink nlid="nl19" bibid="bib27" firstref="ref30"></nolink> <nolink nlid="nl20" bibid="bib13" firstref="ref35"></nolink> <nolink nlid="nl21" bibid="bib19" firstref="ref36"></nolink> <nolink nlid="nl22" bibid="bib34" firstref="ref37"></nolink> <nolink nlid="nl23" bibid="bib11" firstref="ref38"></nolink> <nolink nlid="nl24" bibid="bib35" firstref="ref39"></nolink> <nolink nlid="nl25" bibid="bib58" firstref="ref40"></nolink> <nolink nlid="nl26" bibid="bib65" firstref="ref41"></nolink> <nolink nlid="nl27" bibid="bib26" firstref="ref42"></nolink> <nolink nlid="nl28" bibid="bib38" firstref="ref44"></nolink> <nolink nlid="nl29" bibid="bib29" firstref="ref49"></nolink> <nolink nlid="nl30" bibid="bib15" firstref="ref50"></nolink> <nolink nlid="nl31" bibid="bib21" firstref="ref51"></nolink> <nolink nlid="nl32" bibid="bib54" firstref="ref52"></nolink> <nolink nlid="nl33" bibid="bib64" firstref="ref53"></nolink> <nolink nlid="nl34" bibid="bib72" firstref="ref57"></nolink> <nolink nlid="nl35" bibid="bib30" firstref="ref58"></nolink> <nolink nlid="nl36" bibid="bib63" firstref="ref59"></nolink> <nolink nlid="nl37" bibid="bib10" firstref="ref60"></nolink> <nolink nlid="nl38" bibid="bib41" firstref="ref61"></nolink> <nolink nlid="nl39" bibid="bib20" firstref="ref62"></nolink> <nolink nlid="nl40" bibid="bib45" firstref="ref63"></nolink> <nolink nlid="nl41" bibid="bib60" firstref="ref64"></nolink> <nolink nlid="nl42" bibid="bib52" firstref="ref66"></nolink> <nolink nlid="nl43" bibid="bib61" firstref="ref67"></nolink> <nolink nlid="nl44" bibid="bib33" firstref="ref68"></nolink> <nolink nlid="nl45" bibid="bib67" firstref="ref70"></nolink> <nolink nlid="nl46" bibid="bib25" firstref="ref71"></nolink> <nolink nlid="nl47" bibid="bib32" firstref="ref72"></nolink> <nolink nlid="nl48" bibid="bib14" firstref="ref73"></nolink> <nolink nlid="nl49" bibid="bib44" firstref="ref76"></nolink> <nolink nlid="nl50" bibid="bib48" firstref="ref78"></nolink> <nolink nlid="nl51" bibid="bib18" firstref="ref79"></nolink> <nolink nlid="nl52" bibid="bib42" firstref="ref80"></nolink> <nolink nlid="nl53" bibid="bib47" firstref="ref81"></nolink> <nolink nlid="nl54" bibid="bib55" firstref="ref83"></nolink> <nolink nlid="nl55" bibid="bib68" firstref="ref85"></nolink> <nolink nlid="nl56" bibid="bib12" firstref="ref86"></nolink> <nolink nlid="nl57" bibid="bib51" firstref="ref88"></nolink> <nolink nlid="nl58" bibid="bib16" firstref="ref90"></nolink> <nolink nlid="nl59" bibid="bib66" firstref="ref91"></nolink> <nolink nlid="nl60" bibid="bib17" firstref="ref93"></nolink> <nolink nlid="nl61" bibid="bib53" firstref="ref94"></nolink> <nolink nlid="nl62" bibid="bib36" firstref="ref97"></nolink> <nolink nlid="nl63" bibid="bib43" firstref="ref98"></nolink>
Header	DbId: eric DbLabel: ERIC An: EJ1472029 AccessLevel: 3 PubType: Academic Journal PubTypeId: academicJournal PreciseRelevancyScore: 0
IllustrationInfo
Items	– Name: Title Label: Title Group: Ti Data: Applications and Modeling of Keystroke Logs in Writing Assessments – Name: Language Label: Language Group: Lang Data: English – Name: Author Label: Authors Group: Au Data: <searchLink fieldCode="AR" term="%22Mo+Zhang%22">Mo Zhang</searchLink> (ORCID <externalLink term="https://orcid.org/0000-0003-2689-2089">0000-0003-2689-2089</externalLink>)<br /><searchLink fieldCode="AR" term="%22Paul+Deane%22">Paul Deane</searchLink><br /><searchLink fieldCode="AR" term="%22Andrew+Hoang%22">Andrew Hoang</searchLink><br /><searchLink fieldCode="AR" term="%22Hongwen+Guo%22">Hongwen Guo</searchLink> (ORCID <externalLink term="https://orcid.org/0000-0002-1751-0918">0000-0002-1751-0918</externalLink>)<br /><searchLink fieldCode="AR" term="%22Chen+Li%22">Chen Li</searchLink> – Name: TitleSource Label: Source Group: Src Data: <searchLink fieldCode="SO" term="%22Educational+Measurement%3A+Issues+and+Practice%22"><i>Educational Measurement: Issues and Practice</i></searchLink>. 2025 44(2):5-19. – Name: Avail Label: Availability Group: Avail Data: Wiley. Available from: John Wiley & Sons, Inc. 111 River Street, Hoboken, NJ 07030. Tel: 800-835-6770; e-mail: cs-journals@wiley.com; Web site: https://www.wiley.com/en-us – Name: PeerReviewed Label: Peer Reviewed Group: SrcInfo Data: Y – Name: Pages Label: Page Count Group: Src Data: 15 – Name: DatePubCY Label: Publication Date Group: Date Data: 2025 – Name: TypeDocument Label: Document Type Group: TypDoc Data: Journal Articles<br />Reports - Research – Name: Subject Label: Descriptors Group: Su Data: <searchLink fieldCode="DE" term="%22Writing+Tests%22">Writing Tests</searchLink><br /><searchLink fieldCode="DE" term="%22Computer+Assisted+Testing%22">Computer Assisted Testing</searchLink><br /><searchLink fieldCode="DE" term="%22Keyboarding+%28Data+Entry%29%22">Keyboarding (Data Entry)</searchLink><br /><searchLink fieldCode="DE" term="%22Writing+Processes%22">Writing Processes</searchLink><br /><searchLink fieldCode="DE" term="%22Individual+Differences%22">Individual Differences</searchLink><br /><searchLink fieldCode="DE" term="%22Individual+Characteristics%22">Individual Characteristics</searchLink><br /><searchLink fieldCode="DE" term="%22Context+Effect%22">Context Effect</searchLink><br /><searchLink fieldCode="DE" term="%22Artificial+Intelligence%22">Artificial Intelligence</searchLink><br /><searchLink fieldCode="DE" term="%22Models%22">Models</searchLink> – Name: DOI Label: DOI Group: ID Data: 10.1111/emip.12668 – Name: ISSN Label: ISSN Group: ISSN Data: 0731-1745<br />1745-3992 – Name: Abstract Label: Abstract Group: Ab Data: In this paper, we describe two empirical studies that demonstrate the application and modeling of keystroke logs in writing assessments. We illustrate two different approaches of modeling differences in writing processes: analysis of mean differences in handcrafted theory-driven features and use of large language models to identify stable personal characteristics. In the first study, we examined the effects of test environment on writing characteristics: at-home versus in-center, using features extracted from keystroke logs. In a second study, we explored ways to measure stable personal characteristics and traits. As opposed to feature engineering that can be difficult to scale, raw keystroke logs were used as input in the second study, and large language models were developed to infer latent relations in the data. Implications, limitations, and future research directions are also discussed. – Name: AbstractInfo Label: Abstractor Group: Ab Data: As Provided – Name: DateEntry Label: Entry Date Group: Date Data: 2025 – Name: AN Label: Accession Number Group: ID Data: EJ1472029
PLink	https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=eric&AN=EJ1472029
RecordInfo	BibRecord: BibEntity: Identifiers: – Type: doi Value: 10.1111/emip.12668 Languages: – Text: English PhysicalDescription: Pagination: PageCount: 15 StartPage: 5 Subjects: – SubjectFull: Writing Tests Type: general – SubjectFull: Computer Assisted Testing Type: general – SubjectFull: Keyboarding (Data Entry) Type: general – SubjectFull: Writing Processes Type: general – SubjectFull: Individual Differences Type: general – SubjectFull: Individual Characteristics Type: general – SubjectFull: Context Effect Type: general – SubjectFull: Artificial Intelligence Type: general – SubjectFull: Models Type: general Titles: – TitleFull: Applications and Modeling of Keystroke Logs in Writing Assessments Type: main BibRelationships: HasContributorRelationships: – PersonEntity: Name: NameFull: Mo Zhang – PersonEntity: Name: NameFull: Paul Deane – PersonEntity: Name: NameFull: Andrew Hoang – PersonEntity: Name: NameFull: Hongwen Guo – PersonEntity: Name: NameFull: Chen Li IsPartOfRelationships: – BibEntity: Dates: – D: 01 M: 06 Type: published Y: 2025 Identifiers: – Type: issn-print Value: 0731-1745 – Type: issn-electronic Value: 1745-3992 Numbering: – Type: volume Value: 44 – Type: issue Value: 2 Titles: – TitleFull: Educational Measurement: Issues and Practice Type: main
ResultId	1