Big Data-driven Chinese Vocabulary Database: Design, Implementation and Linguistic Insights.

Saved in:
Bibliographic Details
Title: Big Data-driven Chinese Vocabulary Database: Design, Implementation and Linguistic Insights.
Authors: Cheng, Xuan1 (AUTHOR) 18925120220@163.com, Ni, Yali2 (AUTHOR) niyali1703@outlook.com
Source: Journal of Circuits, Systems & Computers. 6/1/2026, Vol. 35 Issue 10, p1-17. 17p.
Subjects: Big data, Database design, Chinese language, Natural language processing, Linguistic analysis
Abstract: With the continuous development of people's time and social needs, the transfer of old words and the emergence of new words have also expanded the content of readable dictionaries, but they still cannot meet the needs of the current Chinese language industry. Driven by big data technology, Chinese vocabulary design is also connected with computer networks, and big data databases emerge as the times require. It is imperative to establish a set of Chinese language databases. On the basis of introducing the principles and methods of Chinese vocabulary database construction, this paper constructs a complete vocabulary database system, which not only realizes the statistics and classification of Chinese vocabulary but also enhances people's interest in Chinese learning. The research results of the paper show that the database carries out automatic word segmentation and part-of-speech tagging on the vocabulary, as well as manual proofreading and analysis. The results show that the noun part of speech accounts for the largest proportion of all parts of speech, reaching 46.4%. The noun, verb and adjective parts of speech account for 90.3% of the total morphemes, and the noun, verb and adjective combination of "2" is the majority category; the test value of the big data-driven model is the highest under the same experimental background. According to the experimental data, it can be seen that the recall value in the open test state is gradually decreasing. Under the system exchange test, the big data-driven database model shows quite good performance, and the recall, accuracy and F-value are all above the performance of the other two models. [ABSTRACT FROM AUTHOR]
Copyright of Journal of Circuits, Systems & Computers is the property of World Scientific Publishing Company and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Database: Engineering Source
FullText Text:
  Availability: 0
Header DbId: egs
DbLabel: Engineering Source
An: 193816478
AccessLevel: 6
PubType: Academic Journal
PubTypeId: academicJournal
PreciseRelevancyScore: 0
IllustrationInfo
Items – Name: Title
  Label: Title
  Group: Ti
  Data: Big Data-driven Chinese Vocabulary Database: Design, Implementation and Linguistic Insights.
– Name: Author
  Label: Authors
  Group: Au
  Data: <searchLink fieldCode="AR" term="%22Cheng%2C+Xuan%22">Cheng, Xuan</searchLink><relatesTo>1</relatesTo> (AUTHOR)<i> 18925120220@163.com</i><br /><searchLink fieldCode="AR" term="%22Ni%2C+Yali%22">Ni, Yali</searchLink><relatesTo>2</relatesTo> (AUTHOR)<i> niyali1703@outlook.com</i>
– Name: TitleSource
  Label: Source
  Group: Src
  Data: <searchLink fieldCode="JN" term="%22Journal+of+Circuits%2C+Systems+%26+Computers%22">Journal of Circuits, Systems & Computers</searchLink>. 6/1/2026, Vol. 35 Issue 10, p1-17. 17p.
– Name: Subject
  Label: Subjects
  Group: Su
  Data: <searchLink fieldCode="DE" term="%22Big+data%22">Big data</searchLink><br /><searchLink fieldCode="DE" term="%22Database+design%22">Database design</searchLink><br /><searchLink fieldCode="DE" term="%22Chinese+language%22">Chinese language</searchLink><br /><searchLink fieldCode="DE" term="%22Natural+language+processing%22">Natural language processing</searchLink><br /><searchLink fieldCode="DE" term="%22Linguistic+analysis%22">Linguistic analysis</searchLink>
– Name: Abstract
  Label: Abstract
  Group: Ab
  Data: With the continuous development of people's time and social needs, the transfer of old words and the emergence of new words have also expanded the content of readable dictionaries, but they still cannot meet the needs of the current Chinese language industry. Driven by big data technology, Chinese vocabulary design is also connected with computer networks, and big data databases emerge as the times require. It is imperative to establish a set of Chinese language databases. On the basis of introducing the principles and methods of Chinese vocabulary database construction, this paper constructs a complete vocabulary database system, which not only realizes the statistics and classification of Chinese vocabulary but also enhances people's interest in Chinese learning. The research results of the paper show that the database carries out automatic word segmentation and part-of-speech tagging on the vocabulary, as well as manual proofreading and analysis. The results show that the noun part of speech accounts for the largest proportion of all parts of speech, reaching 46.4%. The noun, verb and adjective parts of speech account for 90.3% of the total morphemes, and the noun, verb and adjective combination of "2" is the majority category; the test value of the big data-driven model is the highest under the same experimental background. According to the experimental data, it can be seen that the recall value in the open test state is gradually decreasing. Under the system exchange test, the big data-driven database model shows quite good performance, and the recall, accuracy and F-value are all above the performance of the other two models. [ABSTRACT FROM AUTHOR]
– Name: AbstractSuppliedCopyright
  Label:
  Group: Ab
  Data: <i>Copyright of Journal of Circuits, Systems & Computers is the property of World Scientific Publishing Company and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract.</i> (Copyright applies to all Abstracts.)
PLink https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=egs&AN=193816478
RecordInfo BibRecord:
  BibEntity:
    Identifiers:
      – Type: doi
        Value: 10.1142/S0218126626500313
    Languages:
      – Code: eng
        Text: English
    PhysicalDescription:
      Pagination:
        PageCount: 17
        StartPage: 1
    Subjects:
      – SubjectFull: Big data
        Type: general
      – SubjectFull: Database design
        Type: general
      – SubjectFull: Chinese language
        Type: general
      – SubjectFull: Natural language processing
        Type: general
      – SubjectFull: Linguistic analysis
        Type: general
    Titles:
      – TitleFull: Big Data-driven Chinese Vocabulary Database: Design, Implementation and Linguistic Insights.
        Type: main
  BibRelationships:
    HasContributorRelationships:
      – PersonEntity:
          Name:
            NameFull: Cheng, Xuan
      – PersonEntity:
          Name:
            NameFull: Ni, Yali
    IsPartOfRelationships:
      – BibEntity:
          Dates:
            – D: 01
              M: 06
              Text: 6/1/2026
              Type: published
              Y: 2026
          Identifiers:
            – Type: issn-print
              Value: 02181266
          Numbering:
            – Type: volume
              Value: 35
            – Type: issue
              Value: 10
          Titles:
            – TitleFull: Journal of Circuits, Systems & Computers
              Type: main
ResultId 1