skip to main content


Title: DNA as Features: Organic Software Product Lines
Software product line engineering is a best practice for managing reuse in families of software systems. In this work, we explore the use of product line engineering in the emerging programming domain of synthetic biology. In synthetic biology, living organisms are programmed to perform new functions or improve existing functions. These programs are designed and constructed using small building blocks made out of DNA. We conjecture that there are families of products that consist of common and variable DNA parts, and we can leverage product line engineering to help synthetic biologists build, evolve, and reuse these programs. As a first step towards this goal, we perform a domain engineering case study that leverages an open-source repository of more than 45,000 reusable DNA parts. We are able to identify features and their related artifacts, all of which can be composed to make different programs. We demonstrate that we can successfully build feature models representing families for two commonly engineered functions. We then analyze an existing synthetic biology case study and demonstrate how product line engineering can be beneficial in this domain.  more » « less
Award ID(s):
1901543
NSF-PAR ID:
10186659
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Proceedings of the 23rd International Systems and Software Product Line Conference
Volume:
A
Page Range / eLocation ID:
108-118
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Abstract Software product line engineering is a best practice for managing reuse in families of software systems that is increasingly being applied to novel and emerging domains. In this work we investigate the use of software product line engineering in one of these new domains, synthetic biology. In synthetic biology living organisms are programmed to perform new functions or improve existing functions. These programs are designed and constructed using small building blocks made out of DNA. We conjecture that there are families of products that consist of common and variable DNA parts, and we can leverage product line engineering to help synthetic biologists build, evolve, and reuse DNA parts. In this paper we perform an investigation of domain engineering that leverages an open-source repository of more than 45,000 reusable DNA parts. We show the feasibility of these new types of product line models by identifying features and related artifacts in up to 93.5% of products, and that there is indeed both commonality and variability. We then construct feature models for four commonly engineered functions leading to product lines ranging from 10 to 7.5 × 10 20 products. In a case study we demonstrate how we can use the feature models to help guide new experimentation in aspects of application engineering. Finally, in an empirical study we demonstrate the effectiveness and efficiency of automated reverse engineering on both complete and incomplete sets of products. In the process of these studies, we highlight key challenges and uncovered limitations of existing SPL techniques and tools which provide a roadmap for making SPL engineering applicable to new and emerging domains. 
    more » « less
  2. As assurance cases have grown in popularity for safety-critical systems, so too has their complexity and thus the need for methods to systematically build them. Assurance cases can grow too large and too abstract for anyone but the original builders to understand, making reuse difficult. Reuse is important because different systems might have identical or similar components, and a good solution for one system should be applicable to similar systems. Prior research has shown engineers can alleviate some of the complexity issues through modularity and identifying common patterns which are more easily understood for reuse across different systems. However, we believe these patterns are too complicated for users who lack expertise in software engineering or assurance cases. This paper suggests the concept of lower-level patterns which we call recipes. We use the safety-critical field of synthetic biology, as an example discipline to demonstrate how a recipe can be built and applied. 
    more » « less
  3. Ruby, Edward G. (Ed.)
    ABSTRACT

    A conspicuous roadblock to studying marine bacteria for fundamental research and biotechnology is a lack of modular synthetic biology tools for their genetic manipulation. Here, we applied, and generated new parts for, a modular plasmid toolkit to study marine bacteria in the context of symbioses and host-microbe interactions. To demonstrate the utility of this plasmid system, we genetically manipulated the marine bacteriumPseudoalteromonas luteoviolacea, which stimulates the metamorphosis of the model tubeworm,Hydroides elegans. Using these tools, we quantified constitutive and native promoter expression, developed reporter strains that enable the imaging of host-bacteria interactions, and used CRISPR interference (CRISPRi) to knock down a secondary metabolite and a host-associated gene. We demonstrate the broader utility of this modular system for testing the genetic tractability of marine bacteria that are known to be associated with diverse host-microbe symbioses. These efforts resulted in the successful conjugation of 12 marine strains from the Alphaproteobacteria and Gammaproteobacteria classes. Altogether, the present study demonstrates how synthetic biology strategies enable the investigation of marine microbes and marine host-microbe symbioses with potential implications for environmental restoration and biotechnology.

    IMPORTANCE

    Marine Proteobacteria are attractive targets for genetic engineering due to their ability to produce a diversity of bioactive metabolites and their involvement in host-microbe symbioses. Modular cloning toolkits have become a standard for engineering model microbes, such asEscherichia coli, because they enable innumerable mix-and-match DNA assembly and engineering options. However, such modular tools have not yet been applied to most marine bacterial species. In this work, we adapt a modular plasmid toolkit for use in a set of 12 marine bacteria from the Gammaproteobacteria and Alphaproteobacteria classes. We demonstrate the utility of this genetic toolkit by engineering a marinePseudoalteromonasbacterium to study their association with its host animalHydroides elegans. This work provides a proof of concept that modular genetic tools can be applied to diverse marine bacteria to address basic science questions and for biotechnology innovations.

     
    more » « less
  4. Building on prior studies that show a sense of belonging and community bolster student success, we developed a pilot program for computer engineering (CpE) and computer science (CS) undergraduates and their families that focused on building a sense of belonging and community supported by co-curricular and socioeconomic scaffolding. As a dually designated Hispanic-Serving Institution (HSI) and Asian American and Native American Pacific Islander-Serving Institution (AANAPISI) – two types of federally designated Minority-Serving Institutions (MSI) – with 55% of our undergraduates being first-generation students, we aimed to demonstrate the importance of these principles for underrepresented and first-generation students. Using a student cohort model (for each incoming group of students) and also providing supports to build community across cohorts as well as including students’ families in their college experiences, our program aimed to increase student satisfaction and academic success. We recruited two cohorts of nine incoming students each across two years, 2019 and 2020; 69% of participants were from underrepresented racial or minority groups and 33% were women. Each participant was awarded an annual scholarship and given co-curricular support including peer and faculty mentoring, a dedicated cohort space for studying and gathering, monthly co-curricular activities, enhanced tutoring, and summer bridge and orientation programs. Students’ families were also included in the orientation and semi-annual meetings. The program has resulted in students exceeding the retention rates of their comparison groups, which were undergraduates majoring in CpE and CS who entered college in the same semester as the cohorts; first- and second-year retention rates for participants were 83% (compared to 72%) and 67% (compared to 57%). The GPAs of participants were 0.35 points higher on average than the comparison group and, most notably, participants completed 50% more credits than their comparison groups, on average. In addition, 9 of the 18 scholars (all of the students who wanted to participate) engaged in summer research or internships. In combination, the cohort building, inclusion of families, financial literacy education and support, and formal and informal peer and faculty mentoring have correlated with increased academic success. The cohorts are finishing their programs in Spring 2023 and Spring 2024, but data up to this point already show increases in GPA, course completion, and retention and graduation rates, with three students having already graduated early, within three and a half years. The findings from this study are now being used to expand the successful parts of the program and inform university initiatives, with the PI serving on campus-wide STEM pipeline committee aiming to recruit, retain, and support more STEM students at the institution. 
    more » « less
  5. The key challenge of software reverse engi- neering is that the source code of the program under in- vestigation is typically not available. Identifying differ- ences between two executable binaries (binary diffing) can reveal valuable information in the absence of source code, such as vulnerability patches, software plagiarism evidence, and malware variant relations. Recently, a new binary diffing method based on symbolic execution and constraint solving has been proposed to look for the code pairs with the same semantics, even though they are ostensibly different in syntactics. Such semantics- based method captures intrinsic differences/similarities of binary code, making it a compelling choice to analyze highly-obfuscated malicious programs. However, due to the nature of symbolic execution, semantics-based bi- nary diffing suffers from significant performance slow- down, hindering it from analyzing large numbers of malware samples. In this paper, we attempt to miti- gate the high overhead of semantics-based binary diff- ing with application to malware lineage inference. We first study the key obstacles that contribute to the performance bottleneck. Then we propose normalized basic block memoization to speed up semantics-based binary diffing. We introduce an unionfind set structure that records semantically equivalent basic blocks. Managing the union-find structure during successive comparisons allows direct reuse of previously computed results. Moreover, we utilize a set of enhanced optimization methods to further cut down the invocation numbers of constraint solver. We have implemented our tech- nique, called MalwareHunt, on top of a trace-oriented binary diffing tool and evaluated it on 15 polymorphic and metamorphic malware families. We perform intra- family comparisons for the purpose of malware lineage inference. Our experimental results show that Malware- Huntcan accelerate symbolic execution from 2.8X to 5.3X (with an average 4.1X), and reduce constraint solver invocation by a factor of 3.0X to 6.0X (with an average 4.5X). 
    more » « less