skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Design and Implementation of Phylotastic, a Service Architecture for Evolutionary Biology
Access and reuse of authoritative phylogenetic knowledge have been a longstanding challenges in the evolutionary biology community — leading to a number of research efforts (e.g. focused on interoperation, standardization of formats, and development of minimum reporting requirements). The Phylotastic project was launched to provide an answer to such challenges — as an architectural concept collaboratively designed by evolutionary biologists and computer scientists. This paper describes the first comprehensive implementation of the Phylotastic architecture, based on an open platform for Web services composition. The implementation provides a portal, which composes Web services along a fixed collection of workflows, as well as an interface to allow users to develop novel workflows. The Web services composition is guided by automated planning algorithms and built on a Web services registry and an execution monitoring engine. The platform provides resilience through seamless automated recovery from failed services.  more » « less
Award ID(s):
1914635
PAR ID:
10209021
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
International Journal of Software Engineering and Knowledge Engineering
Volume:
30
Issue:
10
ISSN:
0218-1940
Page Range / eLocation ID:
1525 to 1550
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    This paper describes the development of services and tools for scaling data curation services at the Qualitative Data Repository (QDR). Through a set of open-source tools, semi-automated workflows, and extensions to the Dataverse platform, our team has built services for curators to efficiently and effectively publish collections of qualitatively derived data. The contributions we seek to make in this paper are as follows: 1. We describe ‘human-in-the-loop’ curation and the tools that facilitate this model at QDR; 2. We provide an in-depth discussion of the design and implementation of these tools, including applications specific to the Dataverse software repository, as well as standalone archiving tools written in R; and 3. We highlight the role of providing a service layer for data discovery and accessibility of qualitative data. 
    more » « less
  2. A comprehensive phylogeny of species, i.e., a tree of life, has potential uses in a variety of contexts, including research, education, and public policy. Yet, accessing the tree of life typically requires special knowledge, complex software, or long periods of training. The Phylotastic project aims make it as easy to get a phylogeny of species as it is to get driving directions from mapping software. In prior work, we presented a design for an open system to validate and manage taxon names, find phylogeny resources, extract subtrees matching a user’s taxon list, scale trees to time, and integrate related resources such as species images. Here, we report the implementation of a set of tools that together represent a robust, accessible system for on-the-fly delivery of phylogenetic knowledge. This set of tools includes a web portal to execute several customizable workflows to obtain species phylogenies (scaled by geologic time and decorated with thumbnail images); more than 30 underlying web services (accessible via a common registry); and code toolkits in R and Python (allowing others to develop custom applications using Phylotastic services). The Phylotastic system, accessible via http://www.phylotastic.org , provides a unique resource to access the current state of phylogenetic knowledge, useful for a variety of cases in which a tree extracted quickly from online resources (as distinct from a tree custom-made from character data) is sufficient, as it is for many casual uses of trees identified here. 
    more » « less
  3. The management of security credentials (e.g., passwords, secret keys) for computational science workflows is a burden for scientists and information security officers. Problems with credentials (e.g., expiration, privilege mismatch) cause workflows to fail to fetch needed input data or store valuable scientific results, distracting scientists from their research by requiring them to diagnose the problems, re-run their computations, and wait longer for their results. SciTokens introduces a capabilities-based authorization infrastructure for distributed scientific computing, to help scientists manage their security credentials more reliably and securely. SciTokens uses IETF-standard OAuth JSON Web Tokens for capability-based secure access to remote scientific data. These access tokens convey the specific authorizations needed by the workflows, rather than general-purpose authentication impersonation credentials, to address the risks of scientific workflows running on distributed infrastructure including NSF resources (e.g., LIGO Data Grid, Open Science Grid, XSEDE) and public clouds (e.g., Amazon Web Services, Google Cloud, Microsoft Azure). By improving the interoperability and security of scientific workflows, SciTokens 1) enables use of distributed computing for scientific domains that require greater data protection and 2) enables use of more widely distributed computing resources by reducing the risk of credential abuse on remote systems. In this extended abstract, we present the results over the past year of our open source implementation of the SciTokens model and its deployment in the Open Science Grid, including new OAuth support added in the HTCondor 8.8 release series. 
    more » « less
  4. The rapid development of computation power and machine learning algorithms has paved the way for automating scientific discovery with a scanning probe microscope (SPM). The key elements toward operationalization of the automated SPM are the interface to enable SPM control from Python codes, availability of high computing power, and development of workflows for scientific discovery. Here, we build a Python interface library that enables controlling an SPM from either a local computer or a remote high-performance computer, which satisfies the high computation power need of machine learning algorithms in autonomous workflows. We further introduce a general platform to abstract the operations of SPM in scientific discovery into fixed-policy or reward-driven workflows. Our work provides a full infrastructure to build automated SPM workflows for both routine operations and autonomous scientific discovery with machine learning. 
    more » « less
  5. Abstract Tree House Explorer (THEx) is a genome browser that integrates phylogenomic data and genomic annotations into a single interactive platform for combined analysis. THEx allows users to visualize genome-wide variation in evolutionary histories and genetic divergence on a chromosome-by-chromosome basis, with continuous sliding window comparisons to gene annotations, recombination rates, and other user-specified, highly customizable feature annotations. THEx provides a new platform for interactive phylogenomic data visualization to analyze and interpret the diverse evolutionary histories woven throughout genomes. Hosted on Conda, THEx integrates seamlessly into new or pre-existing workflows. 
    more » « less