skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: What kinds of contracts do ML APIs need?
Recent work has shown that Machine Learning (ML) programs are error-prone and called for contracts for ML code. Contracts, as in the design by contract methodology, help document APIs and aid API users in writing correct code. The question is: what kinds of contracts would provide the most help to API users? We are especially interested in what kinds of contracts help API users catch errors at earlier stages in the ML pipeline. We describe an empirical study of posts on Stack Overflow of the four most often-discussed ML libraries: TensorFlow, Scikit-learn, Keras, and PyTorch. For these libraries, our study extracted 413 informal (English) API specifications. We used these specifications to understand the following questions. What are the root causes and effects behind ML contract violations? Are there common patterns of ML contract violations? When does understanding ML contracts require an advanced level of ML software expertise? Could checking contracts at the API level help detect the violations in early ML pipeline stages? Our key findings are that the most commonly needed contracts for ML APIs are either checking constraints on single arguments of an API or on the order of API calls. The software engineering community could employ existing contract mining approaches to mine these contracts to promote an increased understanding of ML APIs. We also noted a need to combine behavioral and temporal contract mining approaches. We report on categories of required ML contracts, which may help designers of contract languages.  more » « less
Award ID(s):
2223812 2120448 2152117 2512857 2512858
PAR ID:
10543738
Author(s) / Creator(s):
; ; ; ;
Editor(s):
Feldt, Robert; Zimmermann, Thomas; Basili, Victor R; Briand, Lionel C
Publisher / Repository:
Springer Nature
Date Published:
Journal Name:
Empirical software engineering
Volume:
28
Issue:
6
ISSN:
1382-3256
Subject(s) / Keyword(s):
Software engineering for machine learning, Empirical software engineering, API contracts, Machine Learning
Format(s):
Medium: X Size: 3.8MB Other: .pdf
Size(s):
3.8MB
Sponsoring Org:
National Science Foundation
More Like this
  1. APIs are becoming the fundamental building block of modern software and their usability is crucial to programming efficiency and software quality. Yet API designers find it hard to gather and interpret user feedback on their APIs. To close the gap, we interviewed 23 API designers from 6 companies and 11 open-source projects to understand their practices and needs. The primary way of gathering user feedback is through bug reports and peer reviews, as formal usability testing is prohibitively expensive to conduct in practice. Participants expressed a strong desire to gather real-world use cases and understand users' mental models, but there was a lack of tool support for such needs. In particular, participants were curious about where users got stuck, their workarounds, common mistakes, and unanticipated corner cases. We highlight several opportunities to address those unmet needs, including developing new mechanisms that systematically elicit users' mental models, building mining frameworks that identify recurring patterns beyond shallow statistics about API usage, and exploring alternative design choices made in similar libraries. 
    more » « less
  2. Aldrich, Jonathan; Silva, Alexandra (Ed.)
    Contract systems enable programmers to state specifications and have them enforced at run time. First-order contracts are expressed using ordinary code, while higher-order contracts are expressed using the notation familiar from type systems. Most interface descriptions, though, come with properties that involve not just assertions about single method calls, but entire call chains. Typical contract systems cannot express these specifications concisely. Such specifications demand domain-specific notations. In response, this paper proposes that contract systems abstract over the notation used for stating specifications. It presents an architecture for such a system, some illustrative examples, and an evaluation in terms of common notations from the literature. 
    more » « less
  3. Deep learning has gained substantial popularity in recent years. Developers mainly rely on libraries and tools to add deep learning capabilities to their software. What kinds of bugs are frequently found in such software? What are the root causes of such bugs? What impacts do such bugs have? Which stages of deep learning pipeline are more bug prone? Are there any antipatterns? Understanding such characteristics of bugs in deep learning software has the potential to foster the development of better deep learning platforms, debugging mechanisms, development practices, and encourage the development of analysis and verification frameworks. Therefore, we study 2716 high-quality posts from Stack Overflow and 500 bug fix commits from Github about five popular deep learning libraries Caffe, Keras, Tensorflow, Theano, and Torch to understand the types of bugs, root causes of bugs, impacts of bugs, bug-prone stage of deep learning pipeline as well as whether there are some common antipatterns found in this buggy software. The key findings of our study include: data bug and logic bug are the most severe bug types in deep learning software appearing more than 48% of the times, major root causes of these bugs are Incorrect Model Parameter (IPS) and Structural Inefficiency (SI) showing up more than 43% of the times.We have also found that the bugs in the usage of deep learning libraries have some common antipatterns. 
    more » « less
  4. Software developers often struggle to update APIs, leading to manual, time-consuming, and error-prone processes. We introduce Melt, a new approach that generates lightweight API migration rules directly from pull requests in popular library repositories. Our key insight is that pull requests merged into open-source libraries are a rich source of information sufficient to mine API migration rules. By leveraging code examples mined from the library source and automatically generated code examples based on the pull requests, we infer transformation rules in Comby, a language for structural code search and replace. Since inferred rules from single code examples may be too specific, we propose a generalization procedure to make the rules more applicable to client projects. Melt rules are syntax-driven, interpretable, and easily adaptable. Moreover, unlike previous work, our approach enables rule inference to seamlessly integrate into the library workflow, removing the need to wait for client code migrations. We evaluated Melt on pull requests from four popular libraries, successfully mining 461 migration rules from code examples in pull requests and 114 rules from auto-generated code examples. Our generalization procedure increases the number of matches for mined rules by 9×. We applied these rules to client projects and ran their tests, which led to an overall decrease in the number of warnings and fixing some test cases demonstrating MELT's effectiveness in real-world scenarios. 
    more » « less
  5. High-quality source code comments are valuable for software development and maintenance, however, code often contains low-quality comments or lacks them altogether. We name such source code comments as suboptimal comments. Such suboptimal comments create challenges in code comprehension and maintenance. Despite substantial research on low-quality source code comments, empirical knowledge about commenting practices that produce suboptimal comments and reasons that lead to suboptimal comments are lacking. We help bridge this knowledge gap by investigating (1)  independent comment changes (ICCs) —comment changes committed independently of code changes—which likely address suboptimal comments, (2) commenting guidelines, and (3) comment-checking tools and comment-generating tools, which are often employed to help commenting practice—especially to prevent suboptimal comments. We collect 24M+ comment changes from 4,392 open-source GitHub Java repositories and find that ICCs widely exist. The ICC ratio —proportion of ICCs among all comment changes—is ~15.5%, with 98.7% of the repositories having ICC. Our thematic analysis of 3,533 randomly sampled ICCs provides a three-dimensional taxonomy for what is changed (four comment categories and 13 subcategories), how it changed (six commenting activity categories), and what factors are associated with the change (three factors). We investigate 600 repositories to understand the prevalence, content, impact, and violations of commenting guidelines. We find that only 15.5% of the 600 sampled repositories have any commenting guidelines. We provide the first taxonomy for elements in commenting guidelines: where and what to comment are particularly important. The repositories without such guidelines have a statistically significantly higher ICC ratio, indicating the negative impact of the lack of commenting guidelines. However, commenting guidelines are not strictly followed: 85.5% of checked repositories have violations. We also systematically study how developers use two kinds of tools, comment-checking tools and comment-generating tools, in the 4,392 repositories. We find that the use of Javadoc tool is negatively correlated with the ICC ratio, while the use of Checkstyle has no statistically significant correlation; the use of comment-generating tools leads to a higher ICC ratio. To conclude, we reveal issues and challenges in current commenting practice, which help understand how suboptimal comments are introduced. We propose potential research directions on comment location prediction, comment generation, and comment quality assessment; suggest how developers can formulate commenting guidelines and enforce rules with tools; and recommend how to enhance current comment-checking and comment-generating tools. 
    more » « less