CAPS: a supervised technique for classifying Stack Overflow posts concerning API issues.

Saved in:
Bibliographic Details
Title: CAPS: a supervised technique for classifying Stack Overflow posts concerning API issues.
Authors: Ahasanuzzaman, Md1 md.ahasanuzzaman@queensu.ca, Asaduzzaman, Muhammad1, Roy, Chanchal K.2, Schneider, Kevin A.2
Source: Empirical Software Engineering. Mar2020, Vol. 25 Issue 2, p1493-1532. 40p.
Subjects: Application program interfaces, Supervised learning, Random fields, Statistical models, Feature extraction
Abstract: The design and maintenance of APIs (Application Programming Interfaces) are complex tasks due to the constantly changing requirements of their users. Despite the efforts of their designers, APIs may suffer from a number of issues (such as incomplete or erroneous documentation, poor performance, and backward incompatibility). To maintain a healthy client base, API designers must learn these issues to fix them. Question answering sites, such as Stack Overflow (SO), have become a popular place for discussing API issues. These posts about API issues are invaluable to API designers, not only because they can help to learn more about the problem but also because they can facilitate learning the requirements of API users. However, the unstructured nature of posts and the abundance of non-issue posts make the task of detecting SO posts concerning API issues difficult and challenging. In this paper, we first develop a supervised learning approach using a Conditional Random Field (CRF), a statistical modeling method, to identify API issue-related sentences. We use the above information together with different features collected from posts, the experience of users, readability metrics and centrality measures of collaboration network to build a technique, called CAPS, that can classify SO posts concerning API issues. In total, we consider 34 features along eight different dimensions. Evaluation of CAPS using carefully curated SO posts on three popular API types reveals that the technique outperforms all three baseline approaches we consider in this study. We then conduct studies to find important features and also evaluate the performance of the CRF-based technique for classifying issue sentences. Comparison with two other baseline approaches shows that the technique has high potential. We also test the generalizability of CAPS results, evaluate the effectiveness of different classifiers, and identify the impact of different feature sets. [ABSTRACT FROM AUTHOR]
Copyright of Empirical Software Engineering is the property of Springer Nature and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Database: Engineering Source
Description
Abstract:The design and maintenance of APIs (Application Programming Interfaces) are complex tasks due to the constantly changing requirements of their users. Despite the efforts of their designers, APIs may suffer from a number of issues (such as incomplete or erroneous documentation, poor performance, and backward incompatibility). To maintain a healthy client base, API designers must learn these issues to fix them. Question answering sites, such as Stack Overflow (SO), have become a popular place for discussing API issues. These posts about API issues are invaluable to API designers, not only because they can help to learn more about the problem but also because they can facilitate learning the requirements of API users. However, the unstructured nature of posts and the abundance of non-issue posts make the task of detecting SO posts concerning API issues difficult and challenging. In this paper, we first develop a supervised learning approach using a Conditional Random Field (CRF), a statistical modeling method, to identify API issue-related sentences. We use the above information together with different features collected from posts, the experience of users, readability metrics and centrality measures of collaboration network to build a technique, called CAPS, that can classify SO posts concerning API issues. In total, we consider 34 features along eight different dimensions. Evaluation of CAPS using carefully curated SO posts on three popular API types reveals that the technique outperforms all three baseline approaches we consider in this study. We then conduct studies to find important features and also evaluate the performance of the CRF-based technique for classifying issue sentences. Comparison with two other baseline approaches shows that the technique has high potential. We also test the generalizability of CAPS results, evaluate the effectiveness of different classifiers, and identify the impact of different feature sets. [ABSTRACT FROM AUTHOR]
ISSN:13823256
DOI:10.1007/s10664-019-09743-4