What's in a Word Family? The Assumptions of Lexical Units

Saved in:
Bibliographic Details
Title: What's in a Word Family? The Assumptions of Lexical Units
Language: English
Authors: Phil Bennett (ORCID 0000-0002-6313-6760)
Source: Vocabulary Learning and Instruction. 2026 15.
Availability: Castledown Publishers. Ground Level, 470 St Kilda Road, Melbourne, 3004, Australia. Tel: +61-3-7003-8355; e-mail: contact@castledown.com; Web site: https://www.castledown.com/journals/vli
Peer Reviewed: Y
Page Count: 26
Publication Date: 2026
Document Type: Journal Articles
Reports - Research
Descriptors: English, Morphemes, Etymology, Word Lists, Vocabulary, Form Classes (Languages), Dictionaries
ISSN: 2981-9954
Abstract: Lemmas, flemmas, and level 6 word families (WF6) are three commonly discussed lexical units. Because each makes differing assumptions about learner knowledge, the selection of one unit over another in research or pedagogy has a great impact on interpretations of the lexical challenge. It is therefore important to fully understand these assumptions so that practitioners can select the most suitable unit for a given purpose. This study introduces an enhanced version of Nation's BNC-COCA word lists that can be used to quantify several features of lexical units. The original WF6 lists were adapted by including flemma and lemma groupings, part-of-speech (POS) tags, morphological codings, frequency data, and an expanded list of proper nouns. Analyses of lexical unit composition reveal that, owing to their much greater inclusivity than flemmas or lemmas, WF6 units provide rapid corpus coverage over the 1-2k bands, and that irregular forms make up a considerable proportion of 1k tokens regardless of the unit chosen. Accuracy checks then suggest that POS-tagged lists offer an improvement over untagged lists due to the latter's overestimation of coverage and blocking of homographic concepts. Finally, lexical and morphological profiling shows that threshold coverage values are unlikely to be reached without knowledge of at least mid-frequency headwords and tens of derivational affixes in most genres.
Abstractor: As Provided
Notes: https://osf.io/4mz6y
Entry Date: 2026
Accession Number: EJ1501238
Database: ERIC
Description
Abstract:Lemmas, flemmas, and level 6 word families (WF6) are three commonly discussed lexical units. Because each makes differing assumptions about learner knowledge, the selection of one unit over another in research or pedagogy has a great impact on interpretations of the lexical challenge. It is therefore important to fully understand these assumptions so that practitioners can select the most suitable unit for a given purpose. This study introduces an enhanced version of Nation's BNC-COCA word lists that can be used to quantify several features of lexical units. The original WF6 lists were adapted by including flemma and lemma groupings, part-of-speech (POS) tags, morphological codings, frequency data, and an expanded list of proper nouns. Analyses of lexical unit composition reveal that, owing to their much greater inclusivity than flemmas or lemmas, WF6 units provide rapid corpus coverage over the 1-2k bands, and that irregular forms make up a considerable proportion of 1k tokens regardless of the unit chosen. Accuracy checks then suggest that POS-tagged lists offer an improvement over untagged lists due to the latter's overestimation of coverage and blocking of homographic concepts. Finally, lexical and morphological profiling shows that threshold coverage values are unlikely to be reached without knowledge of at least mid-frequency headwords and tens of derivational affixes in most genres.
ISSN:2981-9954