Abstract Concentrations and elemental stoichiometry of suspended particulate organic carbon, nitrogen, phosphorus, and oxygen demand for respiration (C:N:P:−O 2 ) play a vital role in characterizing and quantifying marine elemental cycles. Here, we present Version 2 of the Global Ocean Particulate Organic Phosphorus, Carbon, Oxygen for Respiration, and Nitrogen (GO-POPCORN) dataset. Version 1 is a previously published dataset of particulate organic matter from 70 different studies between 1971 and 2010, while Version 2 is comprised of data collected from recent cruises between 2011 and 2020. The combined GO-POPCORN dataset contains 2673 paired surface POC/N/P measurements from 70°S to 73°N across all major ocean basins at high spatial resolution. Version 2 also includes 965 measurements of oxygen demand for organic carbon respiration. This new dataset can help validate and calibrate the next generation of global ocean biogeochemical models with flexible elemental stoichiometry. We expect that incorporating variable C:N:P:-O 2 into models will help improve our estimates of key ocean biogeochemical fluxes such as carbon export, nitrogen fixation, and organic matter remineralization.
more »
« less
Bursts of rapid diversification, multiple dispersals out of southern Africa, and two origins of dioecy punctuate the evolutionary history of Asparagus
Dataset and result files from phylogenomic analysis and ancestral biogeography estimation across the genus Asparagus using the Asparagaceae1726 bait set. Contents of this version replaces Quartet_Sampling.zip in the previous version. Contents: Quartet_Sampling_FINAL.zip = input dataset and final results from Quartet Sampling (i.e., 1000 quartet replicates sampled per node)
more »
« less
- PAR ID:
- 10661268
- Publisher / Repository:
- Zenodo
- Date Published:
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Schwartz, Russell (Ed.)Abstract Motivation Site concordance factors (sCFs) have become a widely used way to summarize discordance in phylogenomic datasets. However, the original version of sCFs was calculated by sampling a quartet of tip taxa and then applying parsimony-based criteria for discordance. This approach has the potential to be strongly affected by multiple hits at a site (homoplasy), especially when substitution rates are high or taxa are not closely related. Results Here, we introduce a new method for calculating sCFs. The updated version uses likelihood to generate probability distributions of ancestral states at internal nodes of the phylogeny. By sampling from the states at internal nodes adjacent to a given branch, this approach substantially reduces—but does not abolish—the effects of homoplasy and taxon sampling. Availability and implementation Updated sCFs are implemented in IQ-TREE 2.2.2. The software is freely available at https://github.com/iqtree/iqtree2/releases. Supplementary information Supplementary information is available at Bioinformatics online.more » « less
-
Abstract Motivation Phylogenomics faces a dilemma: on the one hand, most accurate species and gene tree estimation methods are those that co-estimate them; on the other hand, these co-estimation methods do not scale to moderately large numbers of species. The summary-based methods, which first infer gene trees independently and then combine them, are much more scalable but are prone to gene tree estimation error, which is inevitable when inferring trees from limited-length data. Gene tree estimation error is not just random noise and can create biases such as long-branch attraction. Results We introduce a scalable likelihood-based approach to co-estimation under the multi-species coalescent model. The method, called quartet co-estimation (QuCo), takes as input independently inferred distributions over gene trees and computes the most likely species tree topology and internal branch length for each quartet, marginalizing over gene tree topologies and ignoring branch lengths by making several simplifying assumptions. It then updates the gene tree posterior probabilities based on the species tree. The focus on gene tree topologies and the heuristic division to quartets enables fast likelihood calculations. We benchmark our method with extensive simulations for quartet trees in zones known to produce biased species trees and further with larger trees. We also run QuCo on a biological dataset of bees. Our results show better accuracy than the summary-based approach ASTRAL run on estimated gene trees. Availability and implementation QuCo is available on https://github.com/maryamrabiee/quco. Supplementary information Supplementary data are available at Bioinformatics online.more » « less
-
{"Abstract":["# DeepCaImX## Introduction#### Two-photon calcium imaging provides large-scale recordings of neuronal activities at cellular resolution. A robust, automated and high-speed pipeline to simultaneously segment the spatial footprints of neurons and extract their temporal activity traces while decontaminating them from background, noise and overlapping neurons is highly desirable to analyze calcium imaging data. In this paper, we demonstrate DeepCaImX, an end-to-end deep learning method based on an iterative shrinkage-thresholding algorithm and a long-short-term-memory neural network to achieve the above goals altogether at a very high speed and without any manually tuned hyper-parameters. DeepCaImX is a multi-task, multi-class and multi-label segmentation method composed of a compressed-sensing-inspired neural network with a recurrent layer and fully connected layers. It represents the first neural network that can simultaneously generate accurate neuronal footprints and extract clean neuronal activity traces from calcium imaging data. We trained the neural network with simulated datasets and benchmarked it against existing state-of-the-art methods with in vivo experimental data. DeepCaImX outperforms existing methods in the quality of segmentation and temporal trace extraction as well as processing speed. DeepCaImX is highly scalable and will benefit the analysis of mesoscale calcium imaging. \n\n## System and Environment Requirements#### 1. Both CPU and GPU are supported to run the code of DeepCaImX. A CUDA compatible GPU is preferred. * In our demo of full-version, we use a GPU of Quadro RTX8000 48GB to accelerate the training speed.* In our demo of mini-version, at least 6 GB momory of GPU/CPU is required.#### 2. Python 3.9 and Tensorflow 2.10.0#### 3. Virtual environment: Anaconda Navigator 2.2.0#### 4. Matlab 2023a\n\n## Demo and installation#### 1 (_Optional_) GPU environment setup. We need a Nvidia parallel computing platform and programming model called _CUDA Toolkit_ and a GPU-accelerated library of primitives for deep neural networks called _CUDA Deep Neural Network library (cuDNN)_ to build up a GPU supported environment for training and testing our model. The link of CUDA installation guide is https://docs.nvidia.com/cuda/cuda-installation-guide-microsoft-windows/index.html and the link of cuDNN installation guide is https://docs.nvidia.com/deeplearning/cudnn/installation/overview.html. #### 2 Install Anaconda. Link of installation guide: https://docs.anaconda.com/free/anaconda/install/index.html#### 3 Launch Anaconda prompt and install Python 3.x and Tensorflow 2.9.0 as the virtual environment.#### 4 Open the virtual environment, and then pip install mat73, opencv-python, python-time and scipy.#### 5 Download the "DeepCaImX_training_demo.ipynb" in folder "Demo (full-version)" for a full version and the simulated dataset via the google drive link. Then, create and put the training dataset in the path "./Training Dataset/". If there is a limitation on your computing resource or a quick test on our code, we highly recommand download the demo from the folder "Mini-version", which only requires around 6.3 GB momory in training. #### 6 Run: Use Anaconda to launch the virtual environment and open "DeepCaImX_training_demo.ipynb" or "DeepCaImX_testing_demo.ipynb". Then, please check and follow the guide of "DeepCaImX_training_demo.ipynb" or or "DeepCaImX_testing_demo.ipynb" for training or testing respectively.#### Note: Every package can be installed in a few minutes.\n\n## Run DeepCaImX#### 1. Mini-version demo* Download all the documents in the folder of "Demo (mini-version)".* Adding training and testing dataset in the sub-folder of "Training Dataset" and "Testing Dataset" separately.* (Optional) Put pretrained model in the the sub-folder of "Pretrained Model"* Using Anaconda Navigator to launch the virtual environment and opening "DeepCaImX_training_demo.ipynb" for training or "DeepCaImX_testing_demo.ipynb" for predicting.\n\n#### 2. Full-version demo* Download all the documents in the folder of "Demo (full-version)".* Adding training and testing dataset in the sub-folder of "Training Dataset" and "Testing Dataset" separately.* (Optional) Put pretrained model in the the sub-folder of "Pretrained Model"* Using Anaconda Navigator to launch the virtual environment and opening "DeepCaImX_training_demo.ipynb" for training or "DeepCaImX_testing_demo.ipynb" for predicting.\n\n## Data Tailor#### A data tailor developed by Matlab is provided to support a basic data tiling processing. In the folder of "Data Tailor", we can find a "tailor.m" script and an example "test.tiff". After running "tailor.m" by matlab, user is able to choose a "tiff" file from a GUI as loading the sample to be tiled. Settings include size of FOV, overlapping area, normalization option, name of output file and output data format. The output files can be found at local folder, which is at the same folder as the "tailor.m".\n\n## Simulated Dataset#### 1. Dataset generator (FISSA Version): The algorithm for generating simulated dataset is based on the paper of FISSA (_Keemink, S.W., Lowe, S.C., Pakan, J.M.P. et al. FISSA: A neuropil decontamination toolbox for calcium imaging signals. Sci Rep 8, 3493 (2018)_) and SimCalc repository (https://github.com/rochefort-lab/SimCalc/). For the code used to generate the simulated data, please download the documents in the folder "Simulated Dataset Generator". #### Training dataset: https://drive.google.com/file/d/1WZkIE_WA7Qw133t2KtqTESDmxMwsEkjJ/view?usp=share_link#### Testing Dataset: https://drive.google.com/file/d/1zsLH8OQ4kTV7LaqQfbPDuMDuWBcHGWcO/view?usp=share_link\n\n#### 2. Dataset generator (NAOMi Version): The algorithm for generating simulated dataset is based on the paper of NAOMi (_Song, A., Gauthier, J. L., Pillow, J. W., Tank, D. W. & Charles, A. S. Neural anatomy and optical microscopy (NAOMi) simulation for evaluating calcium imaging methods. Journal of neuroscience methods 358, 109173 (2021)_). For the code use to generate the simulated data, please go to this link: https://bitbucket.org/adamshch/naomi_sim/src/master/code/## Experimental Dataset#### We used the samples from ABO dataset:https://github.com/AllenInstitute/AllenSDK/wiki/Use-the-Allen-Brain-Observatory-%E2%80%93-Visual-Coding-on-AWS.#### The segmentation ground truth can be found in the folder "Manually Labelled ROIs". #### The segmentation ground truth of depth 175, 275, 375, 550 and 625 um are manually labeled by us. #### The code for creating ground truth of extracted traces can be found in "Prepro_Exp_Sample.ipynb" in the folder "Preprocessing of Experimental Sample"."]}more » « less
-
We introduce the UConn Bubbles with Swatches dataset. This dataset contains images of voting bubbles, scanned from Connecticut ballots, either captured as grayscale (8 bpp) or color (RGB, 24 bpp) artifacts, and extracted through segmentation using ballot geometry. These images are organized into 4 groups of datasets. The stored file contains all data together in color and we manually convert to greyscale. Each image of a bubble is 40x50 pixels. The labels are produced from an optical lens scanner. The first dataset, Gray-B (Bubbles), uses 42,679 images (40x50, 8 bpp) with blank (35,429 images) and filled (7,250 images) bubbles filled in by humans, but no marginal marks. There are two classes, mark and nonmark. The second dataset, RGB-B, is a 24 bpp color (RGB) version of Bubbles-Gray. The third dataset, Gray-C (Combined), augments Gray-B with a collection of marginal marks called “swatches”, which are synthetic images that vary the position of signal to create samples close to the boundary of an optical lens scanner. The 423,703 randomly generated swatches place equal amounts of random noise throughout each image such that the amount of light is the same. This yields 466,382 labeled images. The fourth dataset, RGB-C, is a 24bpp color (RGB) version of Gray-C. The empty bubbles are bubbles that were printed by a commercial vendor. They have undergone registration and segmentation using predetermined coordinates. Marks are on paper printed by the same vendor. These datasets can be used for classification training. The .h5 has many levels of datasets as shown below. The main dataset used for training is positional. This is only separated into blank (non-mark) and vote (mark). Whether the example is a bubble or a swatch is indicated by batch number. See https://github.com/VoterCenter/Busting-the-Ballot/blob/main/Utilities/LoadVoterData.py for code that creates torch arrays for RGB-B and RGB-C. See the linked Github repo (https://github.com/VoterCenter/Busting-the-Ballot/blob/main/Utilities/VoterLab_Classifier_Functions.py) for grayscale conversion functions and other utilities. Dataset structure: COLOR - POSITIONAL - INFORMATION / / / B/V/Q B/V/Q COLOR/POSITIONAL / / / IMAGE IMAGE B/V/Q / BACKGROUND RGB VALUES Images divided into 'batches' not all of which have dataInformation contains labels for all images. Q is the swatch data, while B and V are non-mark and mark respectively.more » « less
An official website of the United States government
