skip to main content


Title: What kinds of contracts do ML APIs need?
Recent work has shown that Machine Learning (ML) programs are error-prone and called for contracts for ML code. Contracts, as in the design by contract methodology, help document APIs and aid API users in writing correct code. The question is: what kinds of contracts would provide the most help to API users? We are especially interested in what kinds of contracts help API users catch errors at earlier stages in the ML pipeline. We describe an empirical study of posts on Stack Overflow of the four most often-discussed ML libraries: TensorFlow, Scikit-learn, Keras, and PyTorch. For these libraries, our study extracted 413 informal (English) API specifications. We used these specifications to understand the following questions. What are the root causes and effects behind ML contract violations? Are there common patterns of ML contract violations? When does understanding ML contracts require an advanced level of ML software expertise? Could checking contracts at the API level help detect the violations in early ML pipeline stages? Our key findings are that the most commonly needed contracts for ML APIs are either checking constraints on single arguments of an API or on the order of API calls. The software engineering community could employ existing contract mining approaches to mine these contracts to promote an increased understanding of ML APIs. We also noted a need to combine behavioral and temporal contract mining approaches. We report on categories of required ML contracts, which may help designers of contract languages.  more » « less
Award ID(s):
2223812 2120448 2152117
PAR ID:
10543738
Author(s) / Creator(s):
; ; ; ;
Editor(s):
Feldt, Robert; Zimmermann, Thomas; Basili, Victor R; Briand, Lionel C
Publisher / Repository:
Springer Nature
Date Published:
Journal Name:
Empirical software engineering
Volume:
28
Issue:
6
ISSN:
1382-3256
Subject(s) / Keyword(s):
Software engineering for machine learning, Empirical software engineering, API contracts, Machine Learning
Format(s):
Medium: X Size: 3.8MB Other: .pdf
Size(s):
3.8MB
Sponsoring Org:
National Science Foundation
More Like this
  1. APIs are becoming the fundamental building block of modern software and their usability is crucial to programming efficiency and software quality. Yet API designers find it hard to gather and interpret user feedback on their APIs. To close the gap, we interviewed 23 API designers from 6 companies and 11 open-source projects to understand their practices and needs. The primary way of gathering user feedback is through bug reports and peer reviews, as formal usability testing is prohibitively expensive to conduct in practice. Participants expressed a strong desire to gather real-world use cases and understand users' mental models, but there was a lack of tool support for such needs. In particular, participants were curious about where users got stuck, their workarounds, common mistakes, and unanticipated corner cases. We highlight several opportunities to address those unmet needs, including developing new mechanisms that systematically elicit users' mental models, building mining frameworks that identify recurring patterns beyond shallow statistics about API usage, and exploring alternative design choices made in similar libraries. 
    more » « less
  2. Deep learning has gained substantial popularity in recent years. Developers mainly rely on libraries and tools to add deep learning capabilities to their software. What kinds of bugs are frequently found in such software? What are the root causes of such bugs? What impacts do such bugs have? Which stages of deep learning pipeline are more bug prone? Are there any antipatterns? Understanding such characteristics of bugs in deep learning software has the potential to foster the development of better deep learning platforms, debugging mechanisms, development practices, and encourage the development of analysis and verification frameworks. Therefore, we study 2716 high-quality posts from Stack Overflow and 500 bug fix commits from Github about five popular deep learning libraries Caffe, Keras, Tensorflow, Theano, and Torch to understand the types of bugs, root causes of bugs, impacts of bugs, bug-prone stage of deep learning pipeline as well as whether there are some common antipatterns found in this buggy software. The key findings of our study include: data bug and logic bug are the most severe bug types in deep learning software appearing more than 48% of the times, major root causes of these bugs are Incorrect Model Parameter (IPS) and Structural Inefficiency (SI) showing up more than 43% of the times.We have also found that the bugs in the usage of deep learning libraries have some common antipatterns. 
    more » « less
  3. Software developers often struggle to update APIs, leading to manual, time-consuming, and error-prone processes. We introduce Melt, a new approach that generates lightweight API migration rules directly from pull requests in popular library repositories. Our key insight is that pull requests merged into open-source libraries are a rich source of information sufficient to mine API migration rules. By leveraging code examples mined from the library source and automatically generated code examples based on the pull requests, we infer transformation rules in Comby, a language for structural code search and replace. Since inferred rules from single code examples may be too specific, we propose a generalization procedure to make the rules more applicable to client projects. Melt rules are syntax-driven, interpretable, and easily adaptable. Moreover, unlike previous work, our approach enables rule inference to seamlessly integrate into the library workflow, removing the need to wait for client code migrations. We evaluated Melt on pull requests from four popular libraries, successfully mining 461 migration rules from code examples in pull requests and 114 rules from auto-generated code examples. Our generalization procedure increases the number of matches for mined rules by 9×. We applied these rules to client projects and ran their tests, which led to an overall decrease in the number of warnings and fixing some test cases demonstrating MELT's effectiveness in real-world scenarios. 
    more » « less
  4. Due to the under-specified interfaces, developers face challenges in correctly integrating machine learning (ML) APIs in software. Even when the ML API and the software are well designed on their own, the resulting application misbehaves when the API output is incompatible with the software. It is desirable to have an adapter that converts ML API output at runtime to better fit the software need and prevent integration failures. In this paper, we conduct an empirical study to understand ML API integration problems in real-world applications. Guided by this study, we present SmartGear, a tool that automatically detects and converts mismatching or incorrect ML API output at run time, serving as a middle layer between ML API and software. Our evaluation on a variety of open-source applications shows that SmartGear detects 70% incompatible API outputs and prevents 67% potential integration failures, outperforming alternative solutions.

     
    more » « less
  5. null (Ed.)
    Creating modern software inevitably requires using application programming interfaces (APIs). While software developers can sometimes use APIs by simply copying and pasting code examples, a lack of robust knowledge of how an API works can lead to defects, complicate software maintenance, and limit what someone can express with an API. Prior work has uncovered the many ways that API documentation fails to be helpful, though rarely describes precisely why. We present a theory of robust API knowledge that attempts to explain why, arguing that effective understanding and use of APIs depends on three components of knowledge: (1) the domain concepts the API models along with terminology, (2) the usage patterns of APIs along with rationale, and (3) facts about an API’s execution to support reasoning about its runtime behavior. We derive five hypotheses from this theory and present a study to test them. Our study investigated the effect of having access to these components of knowledge, finding that while learners requested these three components of knowledge when they were not available, whether the knowledge helped the learner use or understand the API depended on the tasks and likely the relevance and quality of the specific information provided. The theory and our evidence in support of its claims have implications for what content API documentation, tutorials, and instruction should contain and the importance of giving the right information at the right time, as well as what information API tools should compute, and even how APIs should be designed. Future work is necessary to both further test and refine the theory, as well as exploit its ideas for better instructional design. 
    more » « less