skip to main content

Title: A Meta-Summary of Challenges in Building Products with ML Components – Collecting Experiences from 4758+ Practitioners
Incorporating machine learning (ML) components into software products raises new software-engineering challenges and exacerbates existing ones. Many researchers have invested significant effort in understanding the challenges of industry practitioners working on building products with ML components, through interviews and surveys with practitioners. With the intention to aggregate and present their collective findings, we conduct a meta-summary study: We collect 50 relevant papers that together interacted with over 4758 practitioners using guidelines for systematic literature reviews. We then collected, grouped, and organized the over 500 mentions of challenges within those papers. We highlight the most commonly reported challenges and hope this meta-summary will be a useful resource for the research community to prioritize research and education in this field.  more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
2023 IEEE/ACM 2nd International Conference on AI Engineering – Software Engineering for AI (CAIN)
Page Range / eLocation ID:
171 to 183
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The introduction of machine learning (ML) components in software projects has created the need for software engineers to collaborate with data scientists and other specialists. While collaboration can always be challenging, ML introduces additional challenges with its exploratory model development process, additional skills and knowledge needed, difficulties testing ML systems, need for continuous evolution and monitoring, and non-traditional quality requirements such as fairness and explainability. Through interviews with 45 practitioners from 28 organizations, we identified key collaboration challenges that teams face when building and deploying ML systems into production. We report on common collaboration points in the development of production ML systems for requirements, data, and integration, as well as corresponding team patterns and challenges. We find that most of these challenges center around communication, documentation, engineering, and process, and collect recommendations to address these challenges. 
    more » « less
  2. Abstract Why the new findings matter

    The process of teaching and learning is complex, multifaceted and dynamic. This paper contributes a seminal resource to highlight the digitisation of the educational sciences by demonstrating how new machine learning methods can be effectively and reliably used in research, education and practical application.

    Implications for educational researchers and policy makers

    The progressing digitisation of societies around the globe and the impact of the SARS‐COV‐2 pandemic have highlighted the vulnerabilities and shortcomings of educational systems. These developments have shown the necessity to provide effective educational processes that can support sometimes overwhelmed teachers to digitally impart knowledge on the plan of many governments and policy makers. Educational scientists, corporate partners and stakeholders can make use of machine learning techniques to develop advanced, scalable educational processes that account for individual needs of learners and that can complement and support existing learning infrastructure. The proper use of machine learning methods can contribute essential applications to the educational sciences, such as (semi‐)automated assessments, algorithmic‐grading, personalised feedback and adaptive learning approaches. However, these promises are strongly tied to an at least basic understanding of the concepts of machine learning and a degree of data literacy, which has to become the standard in education and the educational sciences.

    Demonstrating both the promises and the challenges that are inherent to the collection and the analysis of large educational data with machine learning, this paper covers the essential topics that their application requires and provides easy‐to‐follow resources and code to facilitate the process of adoption.

    more » « less
  3. Women are underrepresented in Open Source Software (OSS) projects, as a result of which, not only do women lose career and skill development opportunities, but the projects themselves suffer from a lack of diversity of perspectives. Practitioners and researchers need to understand more about the phenomenon; however, studies about women in open source are spread across multiple fields, including information systems, software engineering, and social science. This paper systematically maps, aggregates, and synthesizes the state-of-the-art on women’s participation in OSS. It focuses on women contributors’ representation and demographics, how they contribute, their motivations and challenges, and strategies employed by communities to attract and retain women. We identified 51 articles (published between 2000 and 2021) that investigated women’s participation in OSS. We found evidence in these papers about who are the women who contribute, what motivates them to contribute, what types of contributions they make, challenges they face, and strategies proposed to support their participation. According to these studies, only about 5% of projects were reported to have women as core developers, and women authored less than 5% of pull-requests, but had similar or even higher rates of pull request acceptances than men. Women make both code and non-code contributions and their motivations to contribute include, learning new skills, altruism, reciprocity, and kinship. Challenges that women face in OSS are mainly social, including lack of peer parity and non-inclusive communication from a toxic culture. We found ten strategies reported in the literature, which we mapped to the reported challenges. Based on these results, we provide guidelines for future research and practice. 
    more » « less
  4. An increasingly popular set of techniques adopted by software engineering (SE) researchers to automate development tasks are those rooted in the concept of Deep Learning (DL). The popularity of such techniques largely stems from their automated feature engineering capabilities, which aid in modeling software artifacts. However, due to the rapid pace at which DL techniques have been adopted, it is difficult to distill the current successes, failures, and opportunities of the current research landscape. In an effort to bring clarity to this cross-cutting area of work, from its modern inception to the present, this article presents a systematic literature review of research at the intersection of SE & DL. The review canvasses work appearing in the most prominent SE and DL conferences and journals and spans 128 papers across 23 unique SE tasks. We center our analysis around the components of learning , a set of principles that governs the application of machine learning techniques (ML) to a given problem domain, discussing several aspects of the surveyed work at a granular level. The end result of our analysis is a research roadmap that both delineates the foundations of DL techniques applied to SE research and highlights likely areas of fertile exploration for the future. 
    more » « less
  5. As societies rely increasingly on computers for critical functions, the importance of cybersecurity becomes ever more paramount. Even in recent months there have been attacks that halted oil production, disrupted online learning at the height of COVID, and put medical records at risk at prominent hospitals. This constant threat of privacy leaks and infrastructure disruption has led to an increase in the adoption of artificial intelligence (AI) techniques, mainly machine learning (ML), in state-of-the-art cybersecurity approaches. Oftentimes, these techniques are borrowed from other disciplines without context and devoid of the depth of understanding as to why such techniques are best suited to solve the problem at hand. This is largely due to the fact that in many ways cybersecurity curricula have failed to keep up with advances in cybersecurity research and integrating AI and ML into cybersecurity curricula is extremely difficult. To address this gap, we propose a new methodology to integrate AI and ML techniques into cybersecurity education curricula. Our methodology consists of four components: i) Analysis of Literature which aims to understand the prevalence of AI and ML in cybersecurity research, ii) Analysis of Cybersecurity Curriculum that intends to determine the materials already present in the curriculum and the possible intersection points in the curricula for the new AI material, iii) Design of Adaptable Modules that aims to design highly adaptable modules that can be directly used by cybersecurity educators where new AI material can naturally supplement/substitute for concepts or material already present in the cybersecurity curriculum, and iv) Curriculum Level Evaluation that aims to evaluate the effectiveness of the proposed methodology from both student and instructor perspectives. In this paper, we focus on the first component of our methodology - Analysis of Literature and systematically analyze over 5000 papers that were published in the top cybersecurity conferences during the last five years. Our results clearly indicate that more than 78% of the cybersecurity papers mention AI terminology. To determine the prevalence of the use of AI, we randomly selected 300 papers and performed a thorough analysis. Our results show that more than 19% of the papers implement ML techniques. These findings suggest that AI and ML techniques should be considered for future integration into cybersecurity curriculum to better align with advancements in the field. 
    more » « less