PyAnalyzer: An Effective and Practical Approach for Dependency Extraction from Python Code.

Saved in:
Bibliographic Details
Title: PyAnalyzer: An Effective and Practical Approach for Dependency Extraction from Python Code.
Authors: Jin, Wuxia1 jinwuxia@mail.xjtu.edu.cn, Xu, Shuo2 spoon1116@stu.xjtu.edu.cn, Chen, Dawei3 thisrabbit@stu.xjtu.edu.cn, He, Jiajun2 znzz_hjj@stu.xjtu.edu.cn, Zhong, Dinghong2 ahong_934@163.com, Fan, Ming3 mingfan@mail.xjtu.edu.cn, Chen, Hongxu4 chenhongxu5@huawei.com, Zhang, Huijia5 zhanghuijia1@huawei.com, Liu, Ting3 tingliu@mail.xjtu.edu.cn
Source: ICSE: International Conference on Software Engineering. 2024, p1-12. 12p.
Subjects: Python programming language, Dylan (Computer program language), Recall (Information retrieval), Precision (Information retrieval), Software engineering
Abstract: Dependency extraction based on static analysis lays the groundwork for a wide range of applications. However, dynamic language features in Python make code behaviors obscure and nondeterministic; consequently, it poses huge challenges for static analyses to resolve symbol-level dependencies. Although prosperous techniques and tools are adequately available, they still lack sufficient capabilities to handle object changes, first-class citizens, varying call sites, and library dependencies. To address the fundamental difficulty for dynamic languages, this work proposes an effective and practical method namely PyAnalyzer for dependency extraction. PyAnalyzer uniformly models functions, classes, and modules into first-class heap objects, propagating the dynamic changes of these objects and class inheritance. This manner better simulates dynamic features like duck typing, object changes, and first-class citizens, resulting in high recall results without compromising precision. Moreover, PyAnalyzer leverages optional type annotations as a shortcut to express varying call sites and resolve library dependencies on demand. We collected two micro-benchmarks (278 small programs), two macro-benchmarks (59 real-world applications), and 191 real-world projects (10MSLOC) for comprehensive comparisons with 7 advanced techniques (i.e., Understand, Sourcetrail, Depends, ENRE19, PySonar2, PyCG, and Type4Py). The results demonstrated that PyAnalyzer achieves a high recall and hence improves the F1 by 24.7% on average, at least 1.4x faster without an obvious compromise of memory efficiency. Our work will benefit diverse client applications. [ABSTRACT FROM AUTHOR]
Copyright of ICSE: International Conference on Software Engineering is the property of Association for Computing Machinery and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Database: Engineering Source
Full text is not displayed to guests.
FullText Links:
  – Type: pdflink
Text:
  Availability: 1
Header DbId: egs
DbLabel: Engineering Source
An: 185196519
AccessLevel: 6
PubType: Conference
PubTypeId: conference
PreciseRelevancyScore: 0
IllustrationInfo
Items – Name: Title
  Label: Title
  Group: Ti
  Data: PyAnalyzer: An Effective and Practical Approach for Dependency Extraction from Python Code.
– Name: Author
  Label: Authors
  Group: Au
  Data: <searchLink fieldCode="AR" term="%22Jin%2C+Wuxia%22">Jin, Wuxia</searchLink><relatesTo>1</relatesTo><i> jinwuxia@mail.xjtu.edu.cn</i><br /><searchLink fieldCode="AR" term="%22Xu%2C+Shuo%22">Xu, Shuo</searchLink><relatesTo>2</relatesTo><i> spoon1116@stu.xjtu.edu.cn</i><br /><searchLink fieldCode="AR" term="%22Chen%2C+Dawei%22">Chen, Dawei</searchLink><relatesTo>3</relatesTo><i> thisrabbit@stu.xjtu.edu.cn</i><br /><searchLink fieldCode="AR" term="%22He%2C+Jiajun%22">He, Jiajun</searchLink><relatesTo>2</relatesTo><i> znzz_hjj@stu.xjtu.edu.cn</i><br /><searchLink fieldCode="AR" term="%22Zhong%2C+Dinghong%22">Zhong, Dinghong</searchLink><relatesTo>2</relatesTo><i> ahong_934@163.com</i><br /><searchLink fieldCode="AR" term="%22Fan%2C+Ming%22">Fan, Ming</searchLink><relatesTo>3</relatesTo><i> mingfan@mail.xjtu.edu.cn</i><br /><searchLink fieldCode="AR" term="%22Chen%2C+Hongxu%22">Chen, Hongxu</searchLink><relatesTo>4</relatesTo><i> chenhongxu5@huawei.com</i><br /><searchLink fieldCode="AR" term="%22Zhang%2C+Huijia%22">Zhang, Huijia</searchLink><relatesTo>5</relatesTo><i> zhanghuijia1@huawei.com</i><br /><searchLink fieldCode="AR" term="%22Liu%2C+Ting%22">Liu, Ting</searchLink><relatesTo>3</relatesTo><i> tingliu@mail.xjtu.edu.cn</i>
– Name: TitleSource
  Label: Source
  Group: Src
  Data: <searchLink fieldCode="JN" term="%22ICSE%3A+International+Conference+on+Software+Engineering%22">ICSE: International Conference on Software Engineering</searchLink>. 2024, p1-12. 12p.
– Name: Subject
  Label: Subjects
  Group: Su
  Data: <searchLink fieldCode="DE" term="%22Python+programming+language%22">Python programming language</searchLink><br /><searchLink fieldCode="DE" term="%22Dylan+%28Computer+program+language%29%22">Dylan (Computer program language)</searchLink><br /><searchLink fieldCode="DE" term="%22Recall+%28Information+retrieval%29%22">Recall (Information retrieval)</searchLink><br /><searchLink fieldCode="DE" term="%22Precision+%28Information+retrieval%29%22">Precision (Information retrieval)</searchLink><br /><searchLink fieldCode="DE" term="%22Software+engineering%22">Software engineering</searchLink>
– Name: Abstract
  Label: Abstract
  Group: Ab
  Data: Dependency extraction based on static analysis lays the groundwork for a wide range of applications. However, dynamic language features in Python make code behaviors obscure and nondeterministic; consequently, it poses huge challenges for static analyses to resolve symbol-level dependencies. Although prosperous techniques and tools are adequately available, they still lack sufficient capabilities to handle object changes, first-class citizens, varying call sites, and library dependencies. To address the fundamental difficulty for dynamic languages, this work proposes an effective and practical method namely PyAnalyzer for dependency extraction. PyAnalyzer uniformly models functions, classes, and modules into first-class heap objects, propagating the dynamic changes of these objects and class inheritance. This manner better simulates dynamic features like duck typing, object changes, and first-class citizens, resulting in high recall results without compromising precision. Moreover, PyAnalyzer leverages optional type annotations as a shortcut to express varying call sites and resolve library dependencies on demand. We collected two micro-benchmarks (278 small programs), two macro-benchmarks (59 real-world applications), and 191 real-world projects (10MSLOC) for comprehensive comparisons with 7 advanced techniques (i.e., Understand, Sourcetrail, Depends, ENRE19, PySonar2, PyCG, and Type4Py). The results demonstrated that PyAnalyzer achieves a high recall and hence improves the F1 by 24.7% on average, at least 1.4x faster without an obvious compromise of memory efficiency. Our work will benefit diverse client applications. [ABSTRACT FROM AUTHOR]
– Name: AbstractSuppliedCopyright
  Label:
  Group: Ab
  Data: <i>Copyright of ICSE: International Conference on Software Engineering is the property of Association for Computing Machinery and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract.</i> (Copyright applies to all Abstracts.)
PLink https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=egs&AN=185196519
RecordInfo BibRecord:
  BibEntity:
    Identifiers:
      – Type: doi
        Value: 10.1145/3597503.3640325
    Languages:
      – Code: eng
        Text: English
    PhysicalDescription:
      Pagination:
        PageCount: 12
        StartPage: 1
    Subjects:
      – SubjectFull: Python programming language
        Type: general
      – SubjectFull: Dylan (Computer program language)
        Type: general
      – SubjectFull: Recall (Information retrieval)
        Type: general
      – SubjectFull: Precision (Information retrieval)
        Type: general
      – SubjectFull: Software engineering
        Type: general
    Titles:
      – TitleFull: PyAnalyzer: An Effective and Practical Approach for Dependency Extraction from Python Code.
        Type: main
  BibRelationships:
    HasContributorRelationships:
      – PersonEntity:
          Name:
            NameFull: Jin, Wuxia
      – PersonEntity:
          Name:
            NameFull: Xu, Shuo
      – PersonEntity:
          Name:
            NameFull: Chen, Dawei
      – PersonEntity:
          Name:
            NameFull: He, Jiajun
      – PersonEntity:
          Name:
            NameFull: Zhong, Dinghong
      – PersonEntity:
          Name:
            NameFull: Fan, Ming
      – PersonEntity:
          Name:
            NameFull: Chen, Hongxu
      – PersonEntity:
          Name:
            NameFull: Zhang, Huijia
      – PersonEntity:
          Name:
            NameFull: Liu, Ting
    IsPartOfRelationships:
      – BibEntity:
          Dates:
            – D: 01
              M: 05
              Text: 2024
              Type: published
              Y: 2024
          Titles:
            – TitleFull: ICSE: International Conference on Software Engineering
              Type: main
ResultId 1