skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Exploring the Architectural Impact of Possible Dependencies in Python Software
Dependencies among software entities are the basis for many software analytic research and architecture analysis tools. Dynamically typed languages, such as Python, JavaScript and Ruby, tolerate the lack of explicit type references, making certain syntactic dependencies indiscernible in source code. We call these possible dependencies, in contrast with the explicit dependencies that are directly referenced in source code. Type inference techniques have been widely studied and applied, but existing architecture analytic research and tools have not taken possible dependencies into consideration. The fundamental question is, to what extent will these missing possible dependencies impact the architecture analysis? To answer this question , we conducted an empirical study with 105 Python projects, using type inference techniques to manifest possible dependencies. Our study revealed that the architectural impact of possible dependencies is substantial-higher than that of explicit dependencies: (1) file-level possible dependencies account for at least 27.93% of all file-level dependencies, and create different dependency structures than that of explicit dependencies only, with an average difference of 30.71%; (2) adding possible dependencies significantly improves the precision (0.52%∼14.18%), recall(31.73%∼39.12%), and F1 scores (22.13%∼32.09%) of capturing co-change relations; (3) on average, a file involved in possible dependencies influences 28% more files and 42% more dependencies within architectural sub-spaces than a file involved in just explicit dependencies; (4) on average, a file involved in possible dependencies consumes 32% more maintenance effort. Consequently, maintainability scores reported by existing tools make a system written in these dynamic languages appear to be better modularized than it actually is. This evidence strongly * with the Ministry suggests that possible dependencies have a more significant impact than explicit dependencies on architecture quality, that architecture analysis and tools should assess and even emphasize the architectural impact of possible dependencies due to dynamic typing.  more » « less
Award ID(s):
1835292
PAR ID:
10300792
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
2020 35th IEEE/ACM International Conference on Automated Software Engineering
Page Range / eLocation ID:
758 to 770
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null; null; null (Ed.)
    Code clones are fragments of code that are duplicated in the codebase of an application. They create problems with maintainability, duplicate buggy code, and increase the size of the repository. To combat these issues, there currently exists a multitude of programs to detect duplicated code segments. However, there are not many varieties of languages among the benchmarks for code clone detection tools. Without covering enough languages for modern software development, the development of code-clone detection tools remains stunted. This paper describes a novel tool that will take a seed of Python source code and generate Type 1, 2, and 3 code clones in Python. As one of the most used and rapidly-growing languages in modern software development, our testbed will provide the opportunity for Python code-clone detection tools to be developed and tested. 
    more » « less
  2. The work aims to enable the use of common software engineering techniques and tools for quantum programming languages (e.g., OpenQASM). With the increased interest in quantum computing, researchers are adopting the use of higher-level quantum programming languages versus low-level circuit diagrams. While general purpose programming languages (e.g., C++, Python) are highly supported by a variety of software engineering tools, these novel programming languages for quantum computing have almost no support. Useable tools for debugging, static analysis, error detection, and transformation are currently non-existent. This work extends an existing software infrastructure (i.e., srcML) for the analysis, exploration, and manipulation of source code to OpenQASM. The srcML infrastructure, via parsing, generates abstract syntax information of programs to support high-level querying and analysis of the source code. With this, quantum developers can extract information and identify possible errors or inefficiencies in their programs. The paper presents the basic syntactic markup for OpenQASM. Also, a number of relevant quantum-based problems (e.g., iteration patterns, control recusion) are described and examples of how they are addressed using srcML is given. 
    more » « less
  3. Lightweight syntactic analysis tools like Semgrep and Comby leverage the tree structure of code, making them more expressive than string and regex search. Unlike traditional language frameworks (e.g., ESLint) that analyze codebases via explicit syntax tree manipulations, these tools use query languages that closely resemble the source language. However, state-of-the-art matching techniques for these tools require queries to be complete and parsable snippets, which makes in-progress query specifications useless. We propose a new search architecture that relies only on tokenizing (not parsing) a query. We introduce a novel language and matching algorithm to support tree-aware wildcards on this architecture by building on tree automata. We also presentstsearch, a syntactic search tool leveraging our approach. In contrast to past work, our approach supports syntactic searcheven for previously unparsable queries.We show empirically that stsea rch can support all tokenizable queries, while still providing results comparable to Semgrep for existing queries. Our work offers evidence that lightweight syntactic code search can accept in-progress specifications, potentially improving support for interactive settings. CCS Concepts: •Software and its engineering→Formal language definitions;Software maintenance tools;•Information systems→Query representation;•Theory of computation→ Tree languages. 
    more » « less
  4. Statically typed languages offer numerous benefits to developers, such as improved code quality and reduced runtime errors, but they also require the overhead of manual type annotations. To mitigate this burden, language designers have started incorporating support for type inference, where the compiler infers the type of a variable based on its declaration/usage context. As a result, type annotations are optional in certain contexts, and developers are empowered to use type inference in these situations. However, the usage patterns of type annotations in languages that support type inference are unclear. These patterns can help provide evidence for further research in program comprehension, in language design, and for education. We conduct a large-scale empirical study using Boa, a tool for mining software repositories, to investigate when and where developers use type inference in 498,963 Kotlin projects. We choose Kotlin because it is the default language for Android development, one of the largest software marketplaces. Additionally, Kotlin has supported declaration-site optional type annotations from its initial release. Our findings reveal that type inference is frequently employed for local variables and variables initialized with method calls declared outside the file are more likely to use type inference. These results have significant implications for language designers, providing valuable insight into where to allow type inference and how to optimize type inference algorithms for maximum efficiency, ultimately improving the development experience for developers. 
    more » « less
  5. Context: Design anti-patterns can be symptoms of problems that lead to long-term maintenance difficulty. How should development teams prioritize their treatment? Which ones are more severe and deserve more attention? Does the impact of anti-patterns and general maintenance efforts differ with different programming languages? Objective: In this study, we assess the prevalence and severity of anti-patterns in different programming languages and the impact of dynamic typing in Python, as well as the impact scopes of prevalent anti-patterns that manifest the violation of design principles. Method: We conducted a large-scale study of anti-patterns using 1717 open-source projects written in Java, C/C++, and Python. For the 288 Python projects, we extracted both explicit and dynamic dependencies and compared how the detected anti-patterns and maintenance costs changed. Finally, we removed anti-patterns involving five or fewer files to assess the impact of trivial anti-patterns. Results: The results reveal that 99.55% of these projects contain anti-patterns. Modularity Violation – frequent co-changes among seemingly unrelated files – is most prevalent (detected in 83.54% of all projects) and costly (incurred 61.55% of maintenance effort on average). Unstable Interface and Crossing, caused by influential but unstable files, although not as prevalent, tend to incur severe maintenance costs. Duck typing in Python incurs more anti-patterns, and the churn spent on Python files multiplies that of C/C++ and Java files. Several prevalent anti-patterns have a large portion of trivial instances, meaning that these common symptoms are usually not harmful. Conclusion: Implicit and visible dependencies are the most expensive to maintain, and dynamic typing in Python exacerbates the issue. Influential but unstable files need to be monitored and rectified early to prevent the accumulation of high maintenance costs. The violations of design principles are widespread, but many are not high-maintenance. 
    more » « less