skip to main content


Title: The PySAL Ecosystem: Philosophy and Implementation

PySAL is a library for geocomputation and spatial data science. Written in Python, the library has a long history of supporting novel scholarship and broadening methodological impacts far afield of academic work. Recently, many new techniques, methods of analyses, and development modes have been implemented, making the library much larger and more encompassing than that previously discussed in the literature. As such, we provide an introduction to the library as it stands now, as well as the scientific and conceptual underpinnings of its core set of components. Finally, we provide a prospective look at the library's future evolution.

 
more » « less
Award ID(s):
1831615
NSF-PAR ID:
10445450
Author(s) / Creator(s):
 ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  
Publisher / Repository:
Wiley-Blackwell
Date Published:
Journal Name:
Geographical Analysis
Volume:
54
Issue:
3
ISSN:
0016-7363
Format(s):
Medium: X Size: p. 467-487
Size(s):
["p. 467-487"]
Sponsoring Org:
National Science Foundation
More Like this
  1. Background

    Viruses strongly influence microbial population dynamics and ecosystem functions. However, our ability to quantitatively evaluate those viral impacts is limited to the few cultivated viruses and double-stranded DNA (dsDNA) viral genomes captured in quantitative viral metagenomes (viromes). This leaves the ecology of non-dsDNA viruses nearly unknown, including single-stranded DNA (ssDNA) viruses that have been frequently observed in viromes, but not quantified due to amplification biases in sequencing library preparations (Multiple Displacement Amplification, Linker Amplification or Tagmentation).

    Methods

    Here we designed mock viral communities including both ssDNA and dsDNA viruses to evaluate the capability of a sequencing library preparation approach including an Adaptase step prior to Linker Amplification for quantitative amplification of both dsDNA and ssDNA templates. We then surveyed aquatic samples to provide first estimates of the abundance of ssDNA viruses.

    Results

    Mock community experiments confirmed the biased nature of existing library preparation methods for ssDNA templates (either largely enriched or selected against) and showed that the protocol using Adaptase plus Linker Amplification yielded viromes that were ±1.8-fold quantitative for ssDNA and dsDNA viruses. Application of this protocol to community virus DNA from three freshwater and three marine samples revealed that ssDNA viruses as a whole represent only a minor fraction (<5%) of DNA virus communities, though individual ssDNA genomes, both eukaryote-infecting Circular Rep-Encoding Single-Stranded DNA (CRESS-DNA) viruses and bacteriophages from theMicroviridaefamily, can be among the most abundant viral genomes in a sample.

    Discussion

    Together these findings provide empirical data for a new virome library preparation protocol, and a first estimate of ssDNA virus abundance in aquatic systems.

     
    more » « less
  2. Abstract

    CRISPR‐Cas9 genome editing technologies have enabled complex genetic manipulations in situ, including large‐scale, pooled screening approaches to probe and uncover mechanistic insights across various biological processes. The RNA‐programmable nature of CRISPR‐Cas9 greatly empowers tiling mutagenesis approaches to elucidate molecular details of protein function, in particular the interrogation of mechanisms of resistance to small molecules, an approach termed CRISPR‐suppressor scanning. In a typical CRISPR‐suppressor scanning experiment, a pooled library of single‐guide RNAs is designed to target across the coding sequence(s) of one or more genes, enabling the Cas9 nuclease to systematically mutate the targeted proteins and generate large numbers of diverse protein variants in situ. This cellular pool of protein variants is then challenged with drug treatment to identify mutations conferring a fitness advantage. Drug‐resistance mutations identified with this approach can not only elucidate drug mechanism of action but also reveal deeper mechanistic insights into protein structure‐function relationships. In this article, we outline the framework for a standard CRISPR‐suppressor scanning experiment. Specifically, we provide instructions for the design and construction of a pooled sgRNA library, execution of a CRISPR‐suppressor scanning screen, and basic computational analysis of the resulting data. © 2022 Wiley Periodicals LLC.

    Basic Protocol 1: Design and generation of a pooled sgRNA library

    Support Protocol 1: sgRNA library design using command‐line CRISPOR

    Support Protocol 2: Production and titering of pooled sgRNA library lentivirus

    Basic Protocol 2: Execution and analysis of a CRISPR‐suppressor scanning experiment

     
    more » « less
  3. ABSTRACT

    Stellar population synthesis (SPS) models are invaluable to study star clusters and galaxies. They provide means to extract stellar masses, stellar ages, star formation histories, chemical enrichment, and dust content of galaxies from their integrated spectral energy distributions, colours, or spectra. As most models, they contain uncertainties that can hamper our ability to model and interpret observed spectra. This work aims at studying a specific source of model uncertainty: the choice of an empirical versus a synthetic stellar spectral library. Empirical libraries suffer from limited coverage of parameter space, while synthetic libraries suffer from modelling inaccuracies. Given our current inability to have both ideal stellar-parameter coverage with ideal stellar spectra, what should one favour: better coverage of the parameters (synthetic library) or better spectra on a star-by-star basis (empirical library)? To study this question, we build a synthetic stellar library mimicking the coverage of an empirical library, and SPS models with different choices of stellar library tailored to these investigations. Through the comparison of model predictions and the spectral fitting of a sample of nearby galaxies, we learned that predicted colours are more affected by the coverage effect than the choice of a synthetic versus empirical library; the effects on predicted spectral indices are multiple and defy simple conclusions; derived galaxy ages are virtually unaffected by the choice of the library, but are underestimated when SPS models with limited parameter coverage are used; metallicities are robust against limited HRD coverage, but are underestimated when using synthetic libraries.

     
    more » « less
  4. ABSTRACT

    Nearly a hundred stellar streams have been found to date around the Milky Way and the number keeps growing at an ever faster pace. Here we present the galstreams library, a compendium of angular position, distance, proper motion, and radial velocity track data for nearly a hundred (95) Galactic stellar streams. The information published in the literature has been collated and homogenized in a consistent format and used to provide a set of features uniformly computed throughout the library: e.g. stream length, end points, mean pole, stream’s coordinate frame, polygon footprint, and pole and angular momentum tracks. We also use the information compiled to analyse the distribution of several observables across the library and to assess where the main deficiencies are found in the characterization of individual stellar streams, as a resource for future follow-up efforts. The library is intended to facilitate keeping track of new discoveries and to encourage the use of automated methods to characterize and study the ensemble of known stellar streams by serving as a starting point. The galstreams library is publicly available as a python package and served at the galstreams GitHub repository.

     
    more » « less
  5. Abstract Motivation

    Given a protein of unknown function, fast identification of similar protein structures from the Protein Data Bank (PDB) is a critical step for inferring its biological function. Such structural neighbors can provide evolutionary insights into protein conformation, interfaces and binding sites that are not detectable from sequence similarity. However, the computational cost of performing pairwise structural alignment against all structures in PDB is prohibitively expensive. Alignment-free approaches have been introduced to enable fast but coarse comparisons by representing each protein as a vector of structure features or fingerprints and only computing similarity between vectors. As a notable example, FragBag represents each protein by a ‘bag of fragments’, which is a vector of frequencies of contiguous short backbone fragments from a predetermined library. Despite being efficient, the accuracy of FragBag is unsatisfactory because its backbone fragment library may not be optimally constructed and long-range interacting patterns are omitted.

    Results

    Here we present a new approach to learning effective structural motif presentations using deep learning. We develop DeepFold, a deep convolutional neural network model to extract structural motif features of a protein structure. We demonstrate that DeepFold substantially outperforms FragBag on protein structural search on a non-redundant protein structure database and a set of newly released structures. Remarkably, DeepFold not only extracts meaningful backbone segments but also finds important long-range interacting motifs for structural comparison. We expect that DeepFold will provide new insights into the evolution and hierarchical organization of protein structural motifs.

    Availability and implementation

    https://github.com/largelymfs/DeepFold

     
    more » « less