Category Discrimination Based Feature Selection Algorithm in Chinese Text Classification.

Saved in:
Bibliographic Details
Title: Category Discrimination Based Feature Selection Algorithm in Chinese Text Classification.
Authors: JUNKAI YI1 yijk@mail.buct.edu.cn, GUANG YANG1 jensen-yg@163.com, JING WAN1 wanj@mail.buct.edu.cn
Source: Journal of Information Science & Engineering. Sep2016, Vol. 32 Issue 5, p1145-1159. 15p.
Subjects: Computer software selection, Computational linguistics, Text recognition, Chinese language, Vector processing (Computer science), Computer network resources
Abstract: How to improve the classification precision is a major issue in the field of Chinese text classification. The tf-idf algorithm is a classic and widely-used feature selection algorithm based on VSM. But the traditional tf-idf algorithm neglects the feature term's distribution inside category and among categories, which causes many unreasonable selective results. This paper makes an improvement to the traditional tf-idf algorithm through the introduction of the concept of Category Discrimination. We evaluate our algorithm with experiments, and make comparisons with other algorithms. The experimental results show that the improved tf-idf algorithm consistently has a higher precision and recall compared with the traditional tf-idf algorithm, and is superior to other algorithm as a whole. Therefore, it is a more effective feature selection algorithm in text classification field. [ABSTRACT FROM AUTHOR]
Copyright of Journal of Information Science & Engineering is the property of Institute of Information Science, Academia Sinica and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Database: Engineering Source
FullText Links:
  – Type: pdflink
Text:
  Availability: 0
Header DbId: egs
DbLabel: Engineering Source
An: 118107134
AccessLevel: 6
PubType: Academic Journal
PubTypeId: academicJournal
PreciseRelevancyScore: 0
IllustrationInfo
Items – Name: Title
  Label: Title
  Group: Ti
  Data: Category Discrimination Based Feature Selection Algorithm in Chinese Text Classification.
– Name: Author
  Label: Authors
  Group: Au
  Data: <searchLink fieldCode="AR" term="%22JUNKAI+YI%22">JUNKAI YI</searchLink><relatesTo>1</relatesTo><i> yijk@mail.buct.edu.cn</i><br /><searchLink fieldCode="AR" term="%22GUANG+YANG%22">GUANG YANG</searchLink><relatesTo>1</relatesTo><i> jensen-yg@163.com</i><br /><searchLink fieldCode="AR" term="%22JING+WAN%22">JING WAN</searchLink><relatesTo>1</relatesTo><i> wanj@mail.buct.edu.cn</i>
– Name: TitleSource
  Label: Source
  Group: Src
  Data: <searchLink fieldCode="JN" term="%22Journal+of+Information+Science+%26+Engineering%22">Journal of Information Science & Engineering</searchLink>. Sep2016, Vol. 32 Issue 5, p1145-1159. 15p.
– Name: Subject
  Label: Subjects
  Group: Su
  Data: <searchLink fieldCode="DE" term="%22Computer+software+selection%22">Computer software selection</searchLink><br /><searchLink fieldCode="DE" term="%22Computational+linguistics%22">Computational linguistics</searchLink><br /><searchLink fieldCode="DE" term="%22Text+recognition%22">Text recognition</searchLink><br /><searchLink fieldCode="DE" term="%22Chinese+language%22">Chinese language</searchLink><br /><searchLink fieldCode="DE" term="%22Vector+processing+%28Computer+science%29%22">Vector processing (Computer science)</searchLink><br /><searchLink fieldCode="DE" term="%22Computer+network+resources%22">Computer network resources</searchLink>
– Name: Abstract
  Label: Abstract
  Group: Ab
  Data: How to improve the classification precision is a major issue in the field of Chinese text classification. The tf-idf algorithm is a classic and widely-used feature selection algorithm based on VSM. But the traditional tf-idf algorithm neglects the feature term's distribution inside category and among categories, which causes many unreasonable selective results. This paper makes an improvement to the traditional tf-idf algorithm through the introduction of the concept of Category Discrimination. We evaluate our algorithm with experiments, and make comparisons with other algorithms. The experimental results show that the improved tf-idf algorithm consistently has a higher precision and recall compared with the traditional tf-idf algorithm, and is superior to other algorithm as a whole. Therefore, it is a more effective feature selection algorithm in text classification field. [ABSTRACT FROM AUTHOR]
– Name: AbstractSuppliedCopyright
  Label:
  Group: Ab
  Data: <i>Copyright of Journal of Information Science & Engineering is the property of Institute of Information Science, Academia Sinica and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract.</i> (Copyright applies to all Abstracts.)
PLink https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=egs&AN=118107134
RecordInfo BibRecord:
  BibEntity:
    Languages:
      – Code: eng
        Text: English
    PhysicalDescription:
      Pagination:
        PageCount: 15
        StartPage: 1145
    Subjects:
      – SubjectFull: Computer software selection
        Type: general
      – SubjectFull: Computational linguistics
        Type: general
      – SubjectFull: Text recognition
        Type: general
      – SubjectFull: Chinese language
        Type: general
      – SubjectFull: Vector processing (Computer science)
        Type: general
      – SubjectFull: Computer network resources
        Type: general
    Titles:
      – TitleFull: Category Discrimination Based Feature Selection Algorithm in Chinese Text Classification.
        Type: main
  BibRelationships:
    HasContributorRelationships:
      – PersonEntity:
          Name:
            NameFull: JUNKAI YI
      – PersonEntity:
          Name:
            NameFull: GUANG YANG
      – PersonEntity:
          Name:
            NameFull: JING WAN
    IsPartOfRelationships:
      – BibEntity:
          Dates:
            – D: 01
              M: 09
              Text: Sep2016
              Type: published
              Y: 2016
          Identifiers:
            – Type: issn-print
              Value: 10162364
          Numbering:
            – Type: volume
              Value: 32
            – Type: issue
              Value: 5
          Titles:
            – TitleFull: Journal of Information Science & Engineering
              Type: main
ResultId 1