Category Discrimination Based Feature Selection Algorithm in Chinese Text Classification.
Saved in:
| Title: | Category Discrimination Based Feature Selection Algorithm in Chinese Text Classification. |
|---|---|
| Authors: | JUNKAI YI1 yijk@mail.buct.edu.cn, GUANG YANG1 jensen-yg@163.com, JING WAN1 wanj@mail.buct.edu.cn |
| Source: | Journal of Information Science & Engineering. Sep2016, Vol. 32 Issue 5, p1145-1159. 15p. |
| Subjects: | Computer software selection, Computational linguistics, Text recognition, Chinese language, Vector processing (Computer science), Computer network resources |
| Abstract: | How to improve the classification precision is a major issue in the field of Chinese text classification. The tf-idf algorithm is a classic and widely-used feature selection algorithm based on VSM. But the traditional tf-idf algorithm neglects the feature term's distribution inside category and among categories, which causes many unreasonable selective results. This paper makes an improvement to the traditional tf-idf algorithm through the introduction of the concept of Category Discrimination. We evaluate our algorithm with experiments, and make comparisons with other algorithms. The experimental results show that the improved tf-idf algorithm consistently has a higher precision and recall compared with the traditional tf-idf algorithm, and is superior to other algorithm as a whole. Therefore, it is a more effective feature selection algorithm in text classification field. [ABSTRACT FROM AUTHOR] |
| Copyright of Journal of Information Science & Engineering is the property of Institute of Information Science, Academia Sinica and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.) | |
| Database: | Engineering Source |
| FullText | Links: – Type: pdflink Text: Availability: 0 |
|---|---|
| Header | DbId: egs DbLabel: Engineering Source An: 118107134 AccessLevel: 6 PubType: Academic Journal PubTypeId: academicJournal PreciseRelevancyScore: 0 |
| IllustrationInfo | |
| Items | – Name: Title Label: Title Group: Ti Data: Category Discrimination Based Feature Selection Algorithm in Chinese Text Classification. – Name: Author Label: Authors Group: Au Data: <searchLink fieldCode="AR" term="%22JUNKAI+YI%22">JUNKAI YI</searchLink><relatesTo>1</relatesTo><i> yijk@mail.buct.edu.cn</i><br /><searchLink fieldCode="AR" term="%22GUANG+YANG%22">GUANG YANG</searchLink><relatesTo>1</relatesTo><i> jensen-yg@163.com</i><br /><searchLink fieldCode="AR" term="%22JING+WAN%22">JING WAN</searchLink><relatesTo>1</relatesTo><i> wanj@mail.buct.edu.cn</i> – Name: TitleSource Label: Source Group: Src Data: <searchLink fieldCode="JN" term="%22Journal+of+Information+Science+%26+Engineering%22">Journal of Information Science & Engineering</searchLink>. Sep2016, Vol. 32 Issue 5, p1145-1159. 15p. – Name: Subject Label: Subjects Group: Su Data: <searchLink fieldCode="DE" term="%22Computer+software+selection%22">Computer software selection</searchLink><br /><searchLink fieldCode="DE" term="%22Computational+linguistics%22">Computational linguistics</searchLink><br /><searchLink fieldCode="DE" term="%22Text+recognition%22">Text recognition</searchLink><br /><searchLink fieldCode="DE" term="%22Chinese+language%22">Chinese language</searchLink><br /><searchLink fieldCode="DE" term="%22Vector+processing+%28Computer+science%29%22">Vector processing (Computer science)</searchLink><br /><searchLink fieldCode="DE" term="%22Computer+network+resources%22">Computer network resources</searchLink> – Name: Abstract Label: Abstract Group: Ab Data: How to improve the classification precision is a major issue in the field of Chinese text classification. The tf-idf algorithm is a classic and widely-used feature selection algorithm based on VSM. But the traditional tf-idf algorithm neglects the feature term's distribution inside category and among categories, which causes many unreasonable selective results. This paper makes an improvement to the traditional tf-idf algorithm through the introduction of the concept of Category Discrimination. We evaluate our algorithm with experiments, and make comparisons with other algorithms. The experimental results show that the improved tf-idf algorithm consistently has a higher precision and recall compared with the traditional tf-idf algorithm, and is superior to other algorithm as a whole. Therefore, it is a more effective feature selection algorithm in text classification field. [ABSTRACT FROM AUTHOR] – Name: AbstractSuppliedCopyright Label: Group: Ab Data: <i>Copyright of Journal of Information Science & Engineering is the property of Institute of Information Science, Academia Sinica and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract.</i> (Copyright applies to all Abstracts.) |
| PLink | https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=egs&AN=118107134 |
| RecordInfo | BibRecord: BibEntity: Languages: – Code: eng Text: English PhysicalDescription: Pagination: PageCount: 15 StartPage: 1145 Subjects: – SubjectFull: Computer software selection Type: general – SubjectFull: Computational linguistics Type: general – SubjectFull: Text recognition Type: general – SubjectFull: Chinese language Type: general – SubjectFull: Vector processing (Computer science) Type: general – SubjectFull: Computer network resources Type: general Titles: – TitleFull: Category Discrimination Based Feature Selection Algorithm in Chinese Text Classification. Type: main BibRelationships: HasContributorRelationships: – PersonEntity: Name: NameFull: JUNKAI YI – PersonEntity: Name: NameFull: GUANG YANG – PersonEntity: Name: NameFull: JING WAN IsPartOfRelationships: – BibEntity: Dates: – D: 01 M: 09 Text: Sep2016 Type: published Y: 2016 Identifiers: – Type: issn-print Value: 10162364 Numbering: – Type: volume Value: 32 – Type: issue Value: 5 Titles: – TitleFull: Journal of Information Science & Engineering Type: main |
| ResultId | 1 |