  1. Abstract

    The task of learning a quantum circuit to prepare a given mixed state is a fundamental quantum subroutine. We present a variational quantum algorithm (VQA) to learn mixed states which is suitable for near-term hardware. Our algorithm represents a generalization of previous VQAs that aimed at learning preparation circuits for pure states. We consider two different ansätze for compiling the target state; the first is based on learning a purification of the state and the second on representing it as a convex combination of pure states. In both cases, the resources required to store and manipulate the compiled state grow with the rank of the approximation. Thus, by learning a lower rank approximation of the target state, our algorithm provides a means of compressing a state for more efficient processing. As a byproduct of our algorithm, one effectively learns the principal components of the target state, and hence our algorithm further provides a new method for principal component analysis. We investigate the efficacy of our algorithm through extensive numerical implementations, showing that typical random states and thermal states of many body systems may be learnt this way. Additionally, we demonstrate on quantum hardware how our algorithm can be used to study hardware noise-induced states.

  2. Planktic foraminifera test iodine to calcium ratios represent an emerging proxy method to assess subsurface seawater oxygenation states. Several core-top studies show lower planktic foraminifera I/Ca in locations with oxygen depleted subsurface waters compared to well oxygenated environments. The reasoning behind this trend is that only the oxidized species of iodine, iodate, is incorporated in foraminiferal calcite. The I/Ca of foraminiferal calcite is thought to reflect iodate contents in seawater. To test this hypothesis, we compare planktic foraminifera I/Ca ratios, obtained from plankton tows, with published and new seawater iodate concentrations from 1) the Eastern North Pacific with extensive oxygen depletion, 2) the Benguela Current System with moderately depleted oxygen concentrations, and 3) the well oxygenated North and South Atlantic. We find the lowest I/Ca ratios (0.07 µmol/mol) in planktic foraminifera retrieved from the Eastern North Pacific, and higher values for samples (up to 0.72 µmol/mol) obtained from the Benguela Current System and North and South Atlantic. The I/Ca ratios of plankton tow foraminifera from environments with well oxygenated subsurface waters, however, are an order of magnitude lower compared to core-tops from similarly well-oxygenated regions. This would suggest that planktic foraminifera gain iodine post-mortem, either when sinking through the water column, or during burial. 
    Core samples obtained from scientific drilling could provide large volumes of direct microstructural and compositional data, but generating results via the traditional treatment of such data is often time-consuming and inefficient. Unifying microstructural data within a spatially referenced Geographic Information System (GIS) environment provides an opportunity to readily locate, visualize, correlate, and apply remote sensing techniques to the data. Using 26 core billet samples from the San Andreas Fault Observatory at Depth (SAFOD), this study developed GIS-based procedures for: 1. Spatially referenced visualization and storage of various microstructural data from core billets; 2. 3D modeling of billets and thin section positions within each billet, which serve as a digital record after irreversible fragmentation of the physical billets; and 3. Vector feature creation and unsupervised classification of a multi-generation calcite vein network from cathodluminescence (CL) imagery. Building on existing work which is predominantly limited to the 2D space of single thin sections, our results indicate that a GIS can facilitate spatial treatment of data even at centimeter to nanometer scales, but also revealed challenges involving intensive 3D representations and complex matrix transformations required to create geographically translated forms of the within-billet coordinate systems, which are suggested for consideration in future studies. 
    No systematic approach has yet been adopted to reliably reference and provide access to digital biodiversity datasets. Based on accumulated evidence, we argue that location-based identifiers such as URLs are not sufficient to ensure long-term data access. We introduce a method that uses dedicated data observatories to evaluate long-term URL reliability. From March 2019 through May 2020, we took periodic inventories of the data provided to major biodiversity aggregators, including GBIF, iDigBio, DataONE, and BHL by accessing the URL-based dataset references from which the aggregators retrieve data. Over the period of observation, we found that, for the URL-based dataset references available in each of the aggregators' data provider registries, 5% to 70% of URLs were intermittently or consistently unresponsive, 0% to 66% produced unstable content, and 20% to 75% became either unresponsive or unstable. We propose the use of cryptographic hashing to generate content-based identifiers that can reliably reference datasets. We show that content-based identifiers facilitate decentralized archival and reliable distribution of biodiversity datasets to enable long-term accessibility of the referenced datasets. 
  5. 10.17605/OSF.IO/AT4XE Despite increased use of digital biodiversity data in research, reliable methods to identify datasets are not widely adopted. While commonly used location-based dataset identifiers such as URLs help to easily download data today, additional identification schemes are needed to ensure long term access to datasets. We propose to augment existing location- and DOI-based identification schemes with cryptographic content-based identifiers. These content-based identifiers can be calculated from the datasets themselves using available cryptographic hashing algorithms (e.g., sha256). These algorithms take only the digital content as input to generate a unique identifier without needing a centralized identification administration. The use of content-based identifiers is not new, but a re-application of change management techniques used in the popular version control system "git". We show how content-based identifiers can be used to version datasets, to track the dataset locations, to monitor their reliability, and to efficiently detect dataset changes. We discuss the results of using our approach on datasets registered in GBIF and iDigBio from Sept 2018 to May 2020. Also, we propose how reliable, decentralized, dataset indexing and archiving systems can be devised. Lastly, we outline a modification to existing data citation practices to help work towards more reproducible and reusable research workflows. 
  7. Abstract

    Understanding the molecular evolution of the SARS‐CoV‐2 virus as it continues to spread in communities around the globe is important for mitigation and future pandemic preparedness. Three‐dimensional structures of SARS‐CoV‐2 proteins and those of other coronavirusess archived in the Protein Data Bank were used to analyze viral proteome evolution during the first 6 months of the COVID‐19 pandemic. Analyses of spatial locations, chemical properties, and structural and energetic impacts of the observed amino acid changes in >48 000 viral isolates revealed how each one of 29 viral proteins have undergone amino acid changes. Catalytic residues in active sites and binding residues in protein–protein interfaces showed modest, but significant, numbers of substitutions, highlighting the mutational robustness of the viral proteome. Energetics calculations showed that the impact of substitutions on the thermodynamic stability of the proteome follows a universal bi‐Gaussian distribution. Detailed results are presented for potential drug discovery targets and the four structural proteins that comprise the virion, highlighting substitutions with the potential to impact protein structure, enzyme activity, and protein–protein and protein–nucleic acid interfaces. Characterizing the evolution of the virus in three dimensions provides testable insights into viral protein function and should aid in structure‐based drug discovery efforts as well as the prospective identification of amino acid substitutions with potential for drug resistance.

