What's in a Word Family? The Assumptions of Lexical Units

Saved in:
Bibliographic Details
Title: What's in a Word Family? The Assumptions of Lexical Units
Language: English
Authors: Phil Bennett (ORCID 0000-0002-6313-6760)
Source: Vocabulary Learning and Instruction. 2026 15.
Availability: Castledown Publishers. Ground Level, 470 St Kilda Road, Melbourne, 3004, Australia. Tel: +61-3-7003-8355; e-mail: contact@castledown.com; Web site: https://www.castledown.com/journals/vli
Peer Reviewed: Y
Page Count: 26
Publication Date: 2026
Document Type: Journal Articles
Reports - Research
Descriptors: English, Morphemes, Etymology, Word Lists, Vocabulary, Form Classes (Languages), Dictionaries
ISSN: 2981-9954
Abstract: Lemmas, flemmas, and level 6 word families (WF6) are three commonly discussed lexical units. Because each makes differing assumptions about learner knowledge, the selection of one unit over another in research or pedagogy has a great impact on interpretations of the lexical challenge. It is therefore important to fully understand these assumptions so that practitioners can select the most suitable unit for a given purpose. This study introduces an enhanced version of Nation's BNC-COCA word lists that can be used to quantify several features of lexical units. The original WF6 lists were adapted by including flemma and lemma groupings, part-of-speech (POS) tags, morphological codings, frequency data, and an expanded list of proper nouns. Analyses of lexical unit composition reveal that, owing to their much greater inclusivity than flemmas or lemmas, WF6 units provide rapid corpus coverage over the 1-2k bands, and that irregular forms make up a considerable proportion of 1k tokens regardless of the unit chosen. Accuracy checks then suggest that POS-tagged lists offer an improvement over untagged lists due to the latter's overestimation of coverage and blocking of homographic concepts. Finally, lexical and morphological profiling shows that threshold coverage values are unlikely to be reached without knowledge of at least mid-frequency headwords and tens of derivational affixes in most genres.
Abstractor: As Provided
Notes: https://osf.io/4mz6y
Entry Date: 2026
Accession Number: EJ1501238
Database: ERIC
FullText Text:
  Availability: 0
CustomLinks:
  – Url: https://eric.ed.gov/contentdelivery/servlet/ERICServlet?accno=EJ1501238
    Name: ERIC Full Text
    Category: fullText
    Text: Full Text from ERIC
Header DbId: eric
DbLabel: ERIC
An: EJ1501238
AccessLevel: 3
PubType: Academic Journal
PubTypeId: academicJournal
PreciseRelevancyScore: 0
IllustrationInfo
Items – Name: Title
  Label: Title
  Group: Ti
  Data: What's in a Word Family? The Assumptions of Lexical Units
– Name: Language
  Label: Language
  Group: Lang
  Data: English
– Name: Author
  Label: Authors
  Group: Au
  Data: <searchLink fieldCode="AR" term="%22Phil+Bennett%22">Phil Bennett</searchLink> (ORCID <externalLink term="https://orcid.org/0000-0002-6313-6760">0000-0002-6313-6760</externalLink>)
– Name: TitleSource
  Label: Source
  Group: Src
  Data: <searchLink fieldCode="SO" term="%22Vocabulary+Learning+and+Instruction%22"><i>Vocabulary Learning and Instruction</i></searchLink>. 2026 15.
– Name: Avail
  Label: Availability
  Group: Avail
  Data: Castledown Publishers. Ground Level, 470 St Kilda Road, Melbourne, 3004, Australia. Tel: +61-3-7003-8355; e-mail: contact@castledown.com; Web site: https://www.castledown.com/journals/vli
– Name: PeerReviewed
  Label: Peer Reviewed
  Group: SrcInfo
  Data: Y
– Name: Pages
  Label: Page Count
  Group: Src
  Data: 26
– Name: DatePubCY
  Label: Publication Date
  Group: Date
  Data: 2026
– Name: TypeDocument
  Label: Document Type
  Group: TypDoc
  Data: Journal Articles<br />Reports - Research
– Name: Subject
  Label: Descriptors
  Group: Su
  Data: <searchLink fieldCode="DE" term="%22English%22">English</searchLink><br /><searchLink fieldCode="DE" term="%22Morphemes%22">Morphemes</searchLink><br /><searchLink fieldCode="DE" term="%22Etymology%22">Etymology</searchLink><br /><searchLink fieldCode="DE" term="%22Word+Lists%22">Word Lists</searchLink><br /><searchLink fieldCode="DE" term="%22Vocabulary%22">Vocabulary</searchLink><br /><searchLink fieldCode="DE" term="%22Form+Classes+%28Languages%29%22">Form Classes (Languages)</searchLink><br /><searchLink fieldCode="DE" term="%22Dictionaries%22">Dictionaries</searchLink>
– Name: ISSN
  Label: ISSN
  Group: ISSN
  Data: 2981-9954
– Name: Abstract
  Label: Abstract
  Group: Ab
  Data: Lemmas, flemmas, and level 6 word families (WF6) are three commonly discussed lexical units. Because each makes differing assumptions about learner knowledge, the selection of one unit over another in research or pedagogy has a great impact on interpretations of the lexical challenge. It is therefore important to fully understand these assumptions so that practitioners can select the most suitable unit for a given purpose. This study introduces an enhanced version of Nation's BNC-COCA word lists that can be used to quantify several features of lexical units. The original WF6 lists were adapted by including flemma and lemma groupings, part-of-speech (POS) tags, morphological codings, frequency data, and an expanded list of proper nouns. Analyses of lexical unit composition reveal that, owing to their much greater inclusivity than flemmas or lemmas, WF6 units provide rapid corpus coverage over the 1-2k bands, and that irregular forms make up a considerable proportion of 1k tokens regardless of the unit chosen. Accuracy checks then suggest that POS-tagged lists offer an improvement over untagged lists due to the latter's overestimation of coverage and blocking of homographic concepts. Finally, lexical and morphological profiling shows that threshold coverage values are unlikely to be reached without knowledge of at least mid-frequency headwords and tens of derivational affixes in most genres.
– Name: AbstractInfo
  Label: Abstractor
  Group: Ab
  Data: As Provided
– Name: Note
  Label: Notes
  Group: Note
  Data: https://osf.io/4mz6y
– Name: DateEntry
  Label: Entry Date
  Group: Date
  Data: 2026
– Name: AN
  Label: Accession Number
  Group: ID
  Data: EJ1501238
PLink https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=eric&AN=EJ1501238
RecordInfo BibRecord:
  BibEntity:
    Languages:
      – Text: English
    PhysicalDescription:
      Pagination:
        PageCount: 26
    Subjects:
      – SubjectFull: English
        Type: general
      – SubjectFull: Morphemes
        Type: general
      – SubjectFull: Etymology
        Type: general
      – SubjectFull: Word Lists
        Type: general
      – SubjectFull: Vocabulary
        Type: general
      – SubjectFull: Form Classes (Languages)
        Type: general
      – SubjectFull: Dictionaries
        Type: general
    Titles:
      – TitleFull: What's in a Word Family? The Assumptions of Lexical Units
        Type: main
  BibRelationships:
    HasContributorRelationships:
      – PersonEntity:
          Name:
            NameFull: Phil Bennett
    IsPartOfRelationships:
      – BibEntity:
          Dates:
            – D: 01
              M: 01
              Type: published
              Y: 2026
          Identifiers:
            – Type: issn-electronic
              Value: 2981-9954
          Numbering:
            – Type: volume
              Value: 15
          Titles:
            – TitleFull: Vocabulary Learning and Instruction
              Type: main
ResultId 1