Bibliographic Details
| Title: |
Big Data-driven Chinese Vocabulary Database: Design, Implementation and Linguistic Insights. |
| Authors: |
Cheng, Xuan1 (AUTHOR) 18925120220@163.com, Ni, Yali2 (AUTHOR) niyali1703@outlook.com |
| Source: |
Journal of Circuits, Systems & Computers. 6/1/2026, Vol. 35 Issue 10, p1-17. 17p. |
| Subjects: |
Big data, Database design, Chinese language, Natural language processing, Linguistic analysis |
| Abstract: |
With the continuous development of people's time and social needs, the transfer of old words and the emergence of new words have also expanded the content of readable dictionaries, but they still cannot meet the needs of the current Chinese language industry. Driven by big data technology, Chinese vocabulary design is also connected with computer networks, and big data databases emerge as the times require. It is imperative to establish a set of Chinese language databases. On the basis of introducing the principles and methods of Chinese vocabulary database construction, this paper constructs a complete vocabulary database system, which not only realizes the statistics and classification of Chinese vocabulary but also enhances people's interest in Chinese learning. The research results of the paper show that the database carries out automatic word segmentation and part-of-speech tagging on the vocabulary, as well as manual proofreading and analysis. The results show that the noun part of speech accounts for the largest proportion of all parts of speech, reaching 46.4%. The noun, verb and adjective parts of speech account for 90.3% of the total morphemes, and the noun, verb and adjective combination of "2" is the majority category; the test value of the big data-driven model is the highest under the same experimental background. According to the experimental data, it can be seen that the recall value in the open test state is gradually decreasing. Under the system exchange test, the big data-driven database model shows quite good performance, and the recall, accuracy and F-value are all above the performance of the other two models. [ABSTRACT FROM AUTHOR] |
|
Copyright of Journal of Circuits, Systems & Computers is the property of World Scientific Publishing Company and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.) |
| Database: |
Engineering Source |