View in EDS

The Promise and Pitfalls of Paraprofessional Tutors: Evidence from a Pair of Randomized Controlled Trials

Saved in:

Bibliographic Details
Title:	The Promise and Pitfalls of Paraprofessional Tutors: Evidence from a Pair of Randomized Controlled Trials
Language:	English
Authors:	Elizabeth Huffaker, Monica Lee, Helen Zhou, Carly Robinson, Susanna Loeb, Society for Research on Educational Effectiveness (SREE)
Source:	Society for Research on Educational Effectiveness. 2025.
Availability:	Society for Research on Educational Effectiveness. 2040 Sheridan Road, Evanston, IL 60208. Tel: 202-495-0920; e-mail: contact@sree.org; Web site: https://www.sree.org/
Peer Reviewed:	Y
Publication Date:	2025
Document Type:	Reports - Research
Education Level:	Early Childhood Education Elementary Education Kindergarten Primary Education Grade 1
Descriptors:	Paraprofessional Personnel, Tutors, Literacy Education, Numeracy, Reading Achievement, Mathematics Achievement, Kindergarten, Grade 1, Program Effectiveness, Tutoring, Urban Schools, School Districts, Intervention, Reading Instruction, Mathematics Instruction, Elementary School Students
Abstract:	Background: A strong evidence base for relationship-based, individualized tutoring programs -- especially in early grades -- has accumulated in recent years (Heinrich et al., 2014; Nickow et al., 2020; Wanzek et al., 2016). However, scaling implementation of best-practice tutoring programs (i.e., during the school day at least, three times per week) remains challenging (e.g., National Student Support Accelerator, 2023) due to two primary barriers: human capital constraints (finding reliable but affordable tutors) and insufficient dosage (students not receiving enough tutoring sessions). This paper evaluates two randomized controlled trials of early-grade tutoring programs -- one in literacy and one in numeracy -- that leverage classroom paraprofessionals as tutors. This approach is promising but presents challenges: paras can build on existing relationships with students and teachers but may be overburdened with other duties. We use uniquely detailed provider data and institutional context from the District partner to supplement our evaluation with para-led tutoring implementation insights. Purpose/Research Question: Our pre-registered research question for each study is: To what extent does the literacy (numeracy) tutoring program improve reading (math) test scores for grades K-1 students? Exploratory analyses unpack our topline results by estimating dosage (i.e., tutoring attendance) and effect heterogeneity by student and school characteristics. Setting: These programs were implemented in a large urban district serving approximately 48,000 students (56% Black, 22% Hispanic, 15% English learners). The literacy intervention ran in 13 elementary schools during 2022-23, while the numeracy intervention operated in 32 schools during 2023-24. Ten schools hosted both interventions, though with different student participants. Participants: The intent-to-treat (ITT) literacy study sample included 304 tutoring-eligible (i.e., below grade-level) kindergarten and first-grade students, with 105 randomized to receive reading tutoring. The numeracy study comprised 1,069 kindergartners with 344 assigned to treatment and 725 to control. Tables A1-A2 summarize baseline characteristics of participants in the literacy and numeracy studies, respectively. Relative to District composition, Black students were somewhat overrepresented in both samples (69% literacy, 61% numeracy). Hispanic students comprised 25% and 19% respectively, with balanced gender distribution. English learners were overrepresented only in the literacy study (23%). Interventions: Figure 1, below, summarizes the treatment-control contrast, study design, and implementation timeline for both studies. The literacy tutoring program was developed during the COVID-19 pandemic to promote K-1 reading fluency using evidence-based high-impact tutoring features. Students were scheduled to receive daily 15-minute one-on-one sessions during school hours, totaling 140 sessions. Core elements included highly scripted instruction, weekly tutor coaching, and a progress monitoring dashboard. The numeracy tutoring model aimed to provide students with paraprofessional-led tutoring in 20-minute increments, three-times weekly, for 50 total numeracy-focused sessions. This study project builds on promising pilot evaluations of the same program (Clarke et al., 2016). Beyond the expanded sample, this study differs from prior evaluations by removing one-on-one coaching to improve cost-effectiveness. Participating paraprofessionals still received training through two workshops. Research Design: Student-level randomization was stratified by classroom. Baseline trait comparisons (Tables A1 and A2) are consistent with successful randomization. Control group students received "business-as-usual" instruction, with no regular services withheld. We use this specification to estimates ITT effects: Y[subscript ij] = [beta subscript 0] + [beta subscript 1]Treatment[subscript ij] + [beta subscript 2]Baseline[subscript i] + [alpha subscript j] + [gamma]X[subscript i] + [epsilon][subscript ij]. Where Y[subscript ij] is the outcome of interest for student i in classroom j. Treatment[subscript ij] is a binary variable that indicates student assignment to treatment within the classroom randomization block. Baseline[subscript i] is the baseline measure of the main outcome variable. Due to floor effects, a minimum score indicator and quadratic control are added for the literacy study. [alpha subscript j] is a classroom fixed effect, while X[subscript i] is a vector of pretreatment student level covariates, including gender, grade level, and race/ethnicity. We use robust standard errors, [epsilon][subscript ij]. Data Collection and Analysis: Both studies utilized district administrative data including student demographics, English learner and special education statuses, and achievement data from DIBELS-8 literacy and i-Ready math assessments. We received rich provider-side data, especially with respect to student dosage. The literacy tutoring provider shared comprehensive participation metrics including start/end dates, time in tutoring, session attendance, lesson progress, tutor assignments and curriculum. For the numeracy study we collected session data, student tutoring attendance, lesson taught, time-in-tutoring and tutor demographic information via online survey platforms. Surveys also allowed for open-ended responses and inquired about paraprofessionals' experiences and perceptions of their roles, including but not limited to tutoring. Results: Top-line ITT results for both studies are null (Tables A3 and A4). To contextualize these findings, we examined actual tutoring dosage relative to program expectations. Students assigned to literacy tutoring received an average of 42 sessions (30% of recommendation), while numeracy tutoring students averaged 18 sessions (36% of recommendation). Only two schools exceeded half of the recommended literacy tutoring dosage, and only four schools reached 80% of recommended numeracy sessions (Figure A1). Given our within-classroom randomization design we can recover internally valid causal estimates of ITT effects within each campus - or within subsets of campuses. We therefore estimate impacts across sub-samples restricted to schools with progressively higher dosage levels among students assigned to treatment. Figure A2 presents results from this strategy for the numeracy study and indicates that program effects do increase with dosage. Analogous estimates are, however, null and stable in magnitude for the literacy study. Conclusions: The para-led numeracy tutoring program appears effective when implemented towards recommended dosage. Insights from survey data and District partners suggest barriers and builders for para-tutor capacity: (1) a majority of respondents indicate having >4 "core" responsibilities (Figure A3) (2) at least one high-dosage school had introduced dedicated collaborative time for paraprofessionals and teachers to promote aligned and informed tutoring. These data are being used to guide forthcoming qualitative research on para-tutor experiences across several districts. The literacy program, meanwhile, does not appear to improve achievement at any dosage level, although we note the limited sample size. Preliminary analysis of program materials suggests curricular features could independently limit efficacy.
Abstractor:	As Provided
Entry Date:	2026
Access URL:	https://www.sree.org/2025-conference
Accession Number:	ED677770
Database:	ERIC

Description
Abstract:	Background: A strong evidence base for relationship-based, individualized tutoring programs -- especially in early grades -- has accumulated in recent years (Heinrich et al., 2014; Nickow et al., 2020; Wanzek et al., 2016). However, scaling implementation of best-practice tutoring programs (i.e., during the school day at least, three times per week) remains challenging (e.g., National Student Support Accelerator, 2023) due to two primary barriers: human capital constraints (finding reliable but affordable tutors) and insufficient dosage (students not receiving enough tutoring sessions). This paper evaluates two randomized controlled trials of early-grade tutoring programs -- one in literacy and one in numeracy -- that leverage classroom paraprofessionals as tutors. This approach is promising but presents challenges: paras can build on existing relationships with students and teachers but may be overburdened with other duties. We use uniquely detailed provider data and institutional context from the District partner to supplement our evaluation with para-led tutoring implementation insights. Purpose/Research Question: Our pre-registered research question for each study is: To what extent does the literacy (numeracy) tutoring program improve reading (math) test scores for grades K-1 students? Exploratory analyses unpack our topline results by estimating dosage (i.e., tutoring attendance) and effect heterogeneity by student and school characteristics. Setting: These programs were implemented in a large urban district serving approximately 48,000 students (56% Black, 22% Hispanic, 15% English learners). The literacy intervention ran in 13 elementary schools during 2022-23, while the numeracy intervention operated in 32 schools during 2023-24. Ten schools hosted both interventions, though with different student participants. Participants: The intent-to-treat (ITT) literacy study sample included 304 tutoring-eligible (i.e., below grade-level) kindergarten and first-grade students, with 105 randomized to receive reading tutoring. The numeracy study comprised 1,069 kindergartners with 344 assigned to treatment and 725 to control. Tables A1-A2 summarize baseline characteristics of participants in the literacy and numeracy studies, respectively. Relative to District composition, Black students were somewhat overrepresented in both samples (69% literacy, 61% numeracy). Hispanic students comprised 25% and 19% respectively, with balanced gender distribution. English learners were overrepresented only in the literacy study (23%). Interventions: Figure 1, below, summarizes the treatment-control contrast, study design, and implementation timeline for both studies. The literacy tutoring program was developed during the COVID-19 pandemic to promote K-1 reading fluency using evidence-based high-impact tutoring features. Students were scheduled to receive daily 15-minute one-on-one sessions during school hours, totaling 140 sessions. Core elements included highly scripted instruction, weekly tutor coaching, and a progress monitoring dashboard. The numeracy tutoring model aimed to provide students with paraprofessional-led tutoring in 20-minute increments, three-times weekly, for 50 total numeracy-focused sessions. This study project builds on promising pilot evaluations of the same program (Clarke et al., 2016). Beyond the expanded sample, this study differs from prior evaluations by removing one-on-one coaching to improve cost-effectiveness. Participating paraprofessionals still received training through two workshops. Research Design: Student-level randomization was stratified by classroom. Baseline trait comparisons (Tables A1 and A2) are consistent with successful randomization. Control group students received "business-as-usual" instruction, with no regular services withheld. We use this specification to estimates ITT effects: Y[subscript ij] = [beta subscript 0] + [beta subscript 1]Treatment[subscript ij] + [beta subscript 2]Baseline[subscript i] + [alpha subscript j] + [gamma]X[subscript i] + [epsilon][subscript ij]. Where Y[subscript ij] is the outcome of interest for student i in classroom j. Treatment[subscript ij] is a binary variable that indicates student assignment to treatment within the classroom randomization block. Baseline[subscript i] is the baseline measure of the main outcome variable. Due to floor effects, a minimum score indicator and quadratic control are added for the literacy study. [alpha subscript j] is a classroom fixed effect, while X[subscript i] is a vector of pretreatment student level covariates, including gender, grade level, and race/ethnicity. We use robust standard errors, [epsilon][subscript ij]. Data Collection and Analysis: Both studies utilized district administrative data including student demographics, English learner and special education statuses, and achievement data from DIBELS-8 literacy and i-Ready math assessments. We received rich provider-side data, especially with respect to student dosage. The literacy tutoring provider shared comprehensive participation metrics including start/end dates, time in tutoring, session attendance, lesson progress, tutor assignments and curriculum. For the numeracy study we collected session data, student tutoring attendance, lesson taught, time-in-tutoring and tutor demographic information via online survey platforms. Surveys also allowed for open-ended responses and inquired about paraprofessionals' experiences and perceptions of their roles, including but not limited to tutoring. Results: Top-line ITT results for both studies are null (Tables A3 and A4). To contextualize these findings, we examined actual tutoring dosage relative to program expectations. Students assigned to literacy tutoring received an average of 42 sessions (30% of recommendation), while numeracy tutoring students averaged 18 sessions (36% of recommendation). Only two schools exceeded half of the recommended literacy tutoring dosage, and only four schools reached 80% of recommended numeracy sessions (Figure A1). Given our within-classroom randomization design we can recover internally valid causal estimates of ITT effects within each campus - or within subsets of campuses. We therefore estimate impacts across sub-samples restricted to schools with progressively higher dosage levels among students assigned to treatment. Figure A2 presents results from this strategy for the numeracy study and indicates that program effects do increase with dosage. Analogous estimates are, however, null and stable in magnitude for the literacy study. Conclusions: The para-led numeracy tutoring program appears effective when implemented towards recommended dosage. Insights from survey data and District partners suggest barriers and builders for para-tutor capacity: (1) a majority of respondents indicate having >4 "core" responsibilities (Figure A3) (2) at least one high-dosage school had introduced dedicated collaborative time for paraprofessionals and teachers to promote aligned and informed tutoring. These data are being used to guide forthcoming qualitative research on para-tutor experiences across several districts. The literacy program, meanwhile, does not appear to improve achievement at any dosage level, although we note the limited sample size. Preliminary analysis of program materials suggests curricular features could independently limit efficacy.