Abstract ContextPractitioners prefer to achieve performance without sacrificing productivity when developing scientific software. The Julia programming language is designed to develop performant computer programs without sacrificing productivity by providing a syntax that is scripting in nature. According to the Julia programming language website, the common projects are data science, machine learning, scientific domains, and parallel computing. While Julia has yielded benefits with respect to productivity, programs written in Julia can include security weaknesses, which can hamper the security of Julia-based scientific software. A systematic derivation of security weaknesses can facilitate secure development of Julia programs—an area that remains under-explored. ObjectiveThe goal of this paper is to help practitioners securely develop Julia programs by conducting an empirical study of security weaknesses found in Julia programs. MethodWe apply qualitative analysis on 4,592 Julia programs used in 126 open-source Julia projects to identify security weakness categories. Next, we construct a static analysis tool calledJuliaStaticAnalysisTool (JSAT) that automatically identifies security weaknesses in Julia programs. We apply JSAT to automatically identify security weaknesses in 558 open-source Julia projects consisting of 25,008 Julia programs. ResultsWe identify 7 security weakness categories, which include the usage of hard-coded password and unsafe invocation. From our empirical study we identify 23,839 security weaknesses. On average, we observe 24.9% Julia source code files to include at least one of the 7 security weakness categories. ConclusionBased on our research findings, we recommend rigorous inspection efforts during code reviews. We also recommend further development and application of security static analysis tools so that security weaknesses in Julia programs can be detected before execution.
more »
« less
Shifting Left for Machine Learning: An Empirical Study of Security Weaknesses in Supervised Learning-based Projects
Context: Supervised learning-based projects (SLPs), i.e., software projects that use supervised learning algorithms, such as decision trees are useful for performing classification-related tasks. Yet, security weaknesses, such as the use of hard-coded passwords in SLPs, can make SLPs susceptible to security attacks. A characterization of security weaknesses in SLPs can help practitioners understand the security weaknesses that are frequent in SLPs and adopt adequate mitigation strategies. Objective: The goal of this paper is to help practitioners securely develop supervised learning-based projects by conducting an empirical study of security weaknesses in supervised learning-based projects. Methodology: We conduct an empirical study by quantifying the frequency of security weaknesses in 278 open source SLPs. Results: We identify 22 types of security weaknesses that occur in SLPs. We observe ‘use of potentially dangerous function’ to be the most frequently occurring security weakness in SLPs. Of the identified 3,964 security weaknesses, 23.79% and 40.49% respectively, appear for source code files used to train and test models. We also observe evidence of co-location, e.g., instances of command injection co-locates with instances of potentially dangerous function. Conclusion: Based on our findings, we advocate for a shift left approach for SLP development with security-focused code reviews, and application of security static analysis.
more »
« less
- Award ID(s):
- 2026869
- PAR ID:
- 10350205
- Date Published:
- Journal Name:
- 2022 IEEE 46th Annual Computers, Software, and Applications Conference (COMPSAC), 2022
- Page Range / eLocation ID:
- 798 to 808
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Sebastian Uchitel (Ed.)Despite being beneficial for managing computing infrastructure automatically, Puppet manifests are susceptible to security weaknesses, e.g., hard-coded secrets and use of weak cryptography algorithms. Adequate mitigation of security weaknesses in Puppet manifests is thus necessary to secure computing infrastructure that are managed with Puppet manifests. A characterization of how security weaknesses propagate and affect Puppet-based infrastructure management, can inform practitioners on the relevance of the detected security weaknesses, as well as help them take necessary actions for mitigation. We conduct an empirical study with 17,629 Puppet manifests with Taint Tracker for Pup pet Manifests ( TaintPup ). We observe 2.4 times more precision, and 1.8 times more F-measure for TaintPup, compared to that of a state-of-the-art security static analysis tool. From our empirical study, we observe security weaknesses to propagate into 4,457 resources, i.e, Puppet-specific code elements used to manage infrastructure. A single instance of a security weakness can propagate into as many as 35 distinct resources. We observe security weaknesses to propagate into 7 categories of resources, which include resources used to manage continuous integration servers and network controllers. According to our survey with 24 practitioners, propagation of security weaknesses into data storage-related resources is rated to have the most severe impact for Puppet-based infrastructure management.more » « less
-
Machine learning (ML) deployment projects are used by practitioners to automatically deploy ML models. While ML deployment projects aid practitioners, security vulnerabilities in these projects can make ML deployment infrastructure susceptible to security attacks. A systematic characterization of vulnerabilities can aid in identifying activities to secure ML deployment projects used by practitioners. We conduct an empirical study with 149 vulnerabilities mined from 12 open source ML deployment projects to characterize vulnerabilities in ML deployment projects. From our empirical study, we (i) find 68 of the 149 vulnerabilities are critically or highly severe; (ii) derive 10 consequences of vulnerabilities, e.g., unauthorized access to trigger ML deployments; and (iii) observe established quality assurance activities, such as code review to be used in the ML deployment projects. We conclude our paper by providing a set of recommendations for practitioners and researchers. Dataset used for our paper is available online.more » « less
-
Adversarial attacks against supervised learning a algorithms, which necessitates the application of logging while using supervised learning algorithms in software projects. Logging enables practitioners to conduct postmortem analysis, which can be helpful to diagnose any conducted attacks. We conduct an empirical study to identify and characterize log-related coding patterns, i.e., recurring coding patterns that can be leveraged to conduct adversarial attacks and needs to be logged. A list of log-related coding patterns can guide practitioners on what to log while using supervised learning algorithms in software projects. We apply qualitative analysis on 3,004 Python files used to implement 103 supervised learning-based software projects. We identify a list of 54 log-related coding patterns that map to six attacks related to supervised learning algorithms. Using Lo g Assistant to conduct P ostmortems for Su pervised L earning ( LOPSUL ) , we quantify the frequency of the identified log-related coding patterns with 278 open-source software projects that use supervised learning. We observe log-related coding patterns to appear for 22% of the analyzed files, where training data forensics is the most frequently occurring category.more » « less
-
Feldt, Robert; Zimmermann, Thomas (Ed.)Context Despite being beneficial for managing computing infrastructure at scale, Ansible scripts include security weaknesses, such as hard-coded passwords. Security weaknesses can propagate into tasks, i.e., code constructs used for managing computing infrastructure with Ansible. Propagation of security weaknesses into tasks makes the provisioned infrastructure susceptible to security attacks. A systematic characterization of task infection, i.e., the propagation of security weaknesses into tasks, can aid practitioners and researchers in understanding how security weaknesses propagate into tasks and derive insights for practitioners to develop Ansible scripts securely. Objective The goal of the paper is to help practitioners and researchers understand how Ansible-managed computing infrastructure is impacted by security weaknesses by conducting an empirical study of task infections in Ansible scripts. Method We conduct an empirical study where we quantify the frequency of task infections in Ansible scripts. Upon detection of task infections, we apply qualitative analysis to determine task infection categories. We also conduct a survey with 23 practitioners to determine the prevalence and severity of identified task infection categories. With logistic regression analysis, we identify development factors that correlate with presence of task infections. Results In all, we identify 1,805 task infections in 27,213 scripts. We identify six task infection categories: anti-virus, continuous integration, data storage, message broker, networking, and virtualization. From our survey, we observe tasks used to manage data storage infrastructure perceived to have the most severe consequences. We also find three development factors, namely age, minor contributors, and scatteredness to correlate with the presence of task infections. Conclusion Our empirical study shows computing infrastructure managed by Ansible scripts to be impacted by security weaknesses. We conclude the paper by discussing the implications of our findings for practitioners and researchers.more » « less
An official website of the United States government

