Big Data-driven Chinese Vocabulary Database: Design, Implementation and Linguistic Insights.
Saved in:
| Title: | Big Data-driven Chinese Vocabulary Database: Design, Implementation and Linguistic Insights. |
|---|---|
| Authors: | Cheng, Xuan1 (AUTHOR) 18925120220@163.com, Ni, Yali2 (AUTHOR) niyali1703@outlook.com |
| Source: | Journal of Circuits, Systems & Computers. 6/1/2026, Vol. 35 Issue 10, p1-17. 17p. |
| Subjects: | Big data, Database design, Chinese language, Natural language processing, Linguistic analysis |
| Abstract: | With the continuous development of people's time and social needs, the transfer of old words and the emergence of new words have also expanded the content of readable dictionaries, but they still cannot meet the needs of the current Chinese language industry. Driven by big data technology, Chinese vocabulary design is also connected with computer networks, and big data databases emerge as the times require. It is imperative to establish a set of Chinese language databases. On the basis of introducing the principles and methods of Chinese vocabulary database construction, this paper constructs a complete vocabulary database system, which not only realizes the statistics and classification of Chinese vocabulary but also enhances people's interest in Chinese learning. The research results of the paper show that the database carries out automatic word segmentation and part-of-speech tagging on the vocabulary, as well as manual proofreading and analysis. The results show that the noun part of speech accounts for the largest proportion of all parts of speech, reaching 46.4%. The noun, verb and adjective parts of speech account for 90.3% of the total morphemes, and the noun, verb and adjective combination of "2" is the majority category; the test value of the big data-driven model is the highest under the same experimental background. According to the experimental data, it can be seen that the recall value in the open test state is gradually decreasing. Under the system exchange test, the big data-driven database model shows quite good performance, and the recall, accuracy and F-value are all above the performance of the other two models. [ABSTRACT FROM AUTHOR] |
| Copyright of Journal of Circuits, Systems & Computers is the property of World Scientific Publishing Company and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.) | |
| Database: | Engineering Source |
| FullText | Text: Availability: 0 |
|---|---|
| Header | DbId: egs DbLabel: Engineering Source An: 193816478 AccessLevel: 6 PubType: Academic Journal PubTypeId: academicJournal PreciseRelevancyScore: 0 |
| IllustrationInfo | |
| Items | – Name: Title Label: Title Group: Ti Data: Big Data-driven Chinese Vocabulary Database: Design, Implementation and Linguistic Insights. – Name: Author Label: Authors Group: Au Data: <searchLink fieldCode="AR" term="%22Cheng%2C+Xuan%22">Cheng, Xuan</searchLink><relatesTo>1</relatesTo> (AUTHOR)<i> 18925120220@163.com</i><br /><searchLink fieldCode="AR" term="%22Ni%2C+Yali%22">Ni, Yali</searchLink><relatesTo>2</relatesTo> (AUTHOR)<i> niyali1703@outlook.com</i> – Name: TitleSource Label: Source Group: Src Data: <searchLink fieldCode="JN" term="%22Journal+of+Circuits%2C+Systems+%26+Computers%22">Journal of Circuits, Systems & Computers</searchLink>. 6/1/2026, Vol. 35 Issue 10, p1-17. 17p. – Name: Subject Label: Subjects Group: Su Data: <searchLink fieldCode="DE" term="%22Big+data%22">Big data</searchLink><br /><searchLink fieldCode="DE" term="%22Database+design%22">Database design</searchLink><br /><searchLink fieldCode="DE" term="%22Chinese+language%22">Chinese language</searchLink><br /><searchLink fieldCode="DE" term="%22Natural+language+processing%22">Natural language processing</searchLink><br /><searchLink fieldCode="DE" term="%22Linguistic+analysis%22">Linguistic analysis</searchLink> – Name: Abstract Label: Abstract Group: Ab Data: With the continuous development of people's time and social needs, the transfer of old words and the emergence of new words have also expanded the content of readable dictionaries, but they still cannot meet the needs of the current Chinese language industry. Driven by big data technology, Chinese vocabulary design is also connected with computer networks, and big data databases emerge as the times require. It is imperative to establish a set of Chinese language databases. On the basis of introducing the principles and methods of Chinese vocabulary database construction, this paper constructs a complete vocabulary database system, which not only realizes the statistics and classification of Chinese vocabulary but also enhances people's interest in Chinese learning. The research results of the paper show that the database carries out automatic word segmentation and part-of-speech tagging on the vocabulary, as well as manual proofreading and analysis. The results show that the noun part of speech accounts for the largest proportion of all parts of speech, reaching 46.4%. The noun, verb and adjective parts of speech account for 90.3% of the total morphemes, and the noun, verb and adjective combination of "2" is the majority category; the test value of the big data-driven model is the highest under the same experimental background. According to the experimental data, it can be seen that the recall value in the open test state is gradually decreasing. Under the system exchange test, the big data-driven database model shows quite good performance, and the recall, accuracy and F-value are all above the performance of the other two models. [ABSTRACT FROM AUTHOR] – Name: AbstractSuppliedCopyright Label: Group: Ab Data: <i>Copyright of Journal of Circuits, Systems & Computers is the property of World Scientific Publishing Company and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract.</i> (Copyright applies to all Abstracts.) |
| PLink | https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=egs&AN=193816478 |
| RecordInfo | BibRecord: BibEntity: Identifiers: – Type: doi Value: 10.1142/S0218126626500313 Languages: – Code: eng Text: English PhysicalDescription: Pagination: PageCount: 17 StartPage: 1 Subjects: – SubjectFull: Big data Type: general – SubjectFull: Database design Type: general – SubjectFull: Chinese language Type: general – SubjectFull: Natural language processing Type: general – SubjectFull: Linguistic analysis Type: general Titles: – TitleFull: Big Data-driven Chinese Vocabulary Database: Design, Implementation and Linguistic Insights. Type: main BibRelationships: HasContributorRelationships: – PersonEntity: Name: NameFull: Cheng, Xuan – PersonEntity: Name: NameFull: Ni, Yali IsPartOfRelationships: – BibEntity: Dates: – D: 01 M: 06 Text: 6/1/2026 Type: published Y: 2026 Identifiers: – Type: issn-print Value: 02181266 Numbering: – Type: volume Value: 35 – Type: issue Value: 10 Titles: – TitleFull: Journal of Circuits, Systems & Computers Type: main |
| ResultId | 1 |