Data from: Inheritance of DNA methylation differences in the mangrove Rhizophora mangle
Abstract
<p>This record contains supplementary information for the article "Inheritance of DNA methylation differences in the mangrove Rhizophora mangle" published in Evolution&Development. It contains the barcodes (barcodes.txt), the reference contigs- Publisher:
- Zenodo
- Publication Year:
- NSF-PAR ID:
- 10384317
- Award ID(s):
- 1930451
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Obeid, I. (Ed.)The Neural Engineering Data Consortium (NEDC) is developing the Temple University Digital Pathology Corpus (TUDP), an open source database of high-resolution images from scanned pathology samples [1], as part of its National Science Foundation-funded Major Research Instrumentation grant titled “MRI: High Performance Digital Pathology Using Big Data and Machine Learning” [2]. The long-term goal of this project is to release one million images. We have currently scanned over 100,000 images and are in the process of annotating breast tissue data for our first official corpus release, v1.0.0. This release contains 3,505 annotated images of breast tissue including 74 patients with cancerous diagnoses (out of a total of 296 patients). In this poster, we will present an analysis of this corpus and discuss the challenges we have faced in efficiently producing high quality annotations of breast tissue. It is well known that state of the art algorithms in machine learning require vast amounts of data. Fields such as speech recognition [3], image recognition [4] and text processing [5] are able to deliver impressive performance with complex deep learning models because they have developed large corpora to support training of extremely high-dimensional models (e.g., billions of parameters). Other fields that do notmore »
-
Abstract
<p>The intended use of this archive is to facilitate meta-analysis of the Data Observation Network for Earth (DataONE, [1]). </p> <p>DataONE is a distributed infrastructure that provides information about earth observation data. This dataset was derived from the DataONE network using Preston [2] between 17 October 2018 and 6 November 2018, resolving 335,213 urls at an average retrieval rate of about 5 seconds per url, or 720 files per hour, resulting in a data gzip compressed tar archive of 837.3 MB . </p> <p>The archive associates 325,757 unique metadata urls [3] to 202,063 unique ecological metadata files [4]. Also, the DataONE search index was captured to establish provenance of how the dataset descriptors were found and acquired. During the creation of the snapshot (or crawl), 15,389 urls [5], or 4.7% of urls, did not successfully resolve. </p> <p>To facilitate discovery, the record of the Preston snapshot crawl is included in the preston-ls-* files . There files are derived from the rdf/nquad file with hash://sha256/8c67e0741d1c90db54740e08d2e39d91dfd73566ea69c1f2da0d9ab9780a9a9f . This file can also be found in the data.tar.gz at data/8c/67/e0/8c67e0741d1c90db54740e08d2e39d91dfd73566ea69c1f2da0d9ab9780a9a9f/data . For more information about concepts and format, please see [2]. </p> <p>To extract all EML files from the included Preston archive, first extract the hashes -
INTRODUCTION Transposable elements (TEs), repeat expansions, and repeat-mediated structural rearrangements play key roles in chromosome structure and species evolution, contribute to human genetic variation, and substantially influence human health through copy number variants, structural variants, insertions, deletions, and alterations to gene transcription and splicing. Despite their formative role in genome stability, repetitive regions have been relegated to gaps and collapsed regions in human genome reference GRCh38 owing to the technological limitations during its development. The lack of linear sequence in these regions, particularly in centromeres, resulted in the inability to fully explore the repeat content of the human genome in the context of both local and regional chromosomal environments. RATIONALE Long-read sequencing supported the complete, telomere-to-telomere (T2T) assembly of the pseudo-haploid human cell line CHM13. This resource affords a genome-scale assessment of all human repetitive sequences, including TEs and previously unknown repeats and satellites, both within and outside of gaps and collapsed regions. Additionally, a complete genome enables the opportunity to explore the epigenetic and transcriptional profiles of these elements that are fundamental to our understanding of chromosome structure, function, and evolution. Comparative analyses reveal modes of repeat divergence, evolution, and expansion or contraction with locus-level resolution. RESULTS We implementedmore »
-
Abstract
<p>A biodiversity dataset graph: GBIF, iDigBio, BioCASe</p> <p>The intended use of this archive is to facilitate meta-analysis of the Global Biodiversity Information Facility, Integrated Digitized Biocollections, Biological Collection Access Service (GBIF, iDigBio, BioCASe). GBIF, iDigBio and BioCASe help provide access to biological data collections. </p> <p>This dataset provides versioned provenance logs of snapshots of the GBIF, iDigBio, BioCASe network as tracked by Preston [2] between 2018-09-03 and 2019-10-02 using "preston update -u https://gbif.org,https://idigbio.org,http://biocase.org". </p> <p>This publication contains two types of files: index files and provenance logs. Associated data files are hosted elsewhere for pragmatic reasons. Index files provide a way to link provenance files in time to establish a versioning mechanism. Provenance logs describe how, when, what and where the GBIF, iDigBio, BioCASe content was retrieved. For more information, please visit https://preston.guoda.bio or https://doi.org/10.5281/zenodo.1410543 . </p> <p>To retrieve and verify the downloaded GBIF, iDigBio, BioCASe biodiversity dataset graph, use the preston[2] command-line tool to "clone" this dataset using:</p> <p>$ java -jar preston.jar ls --remote https://zenodo.org/record/3484205/files > /dev/null</p> <p>Optionally, you can retrieve all associated data (~500GB) files using:</p> <p>$ java -jar preston.jar clone --remote https://zenodo.org/record/3484205/files,https://deeplinker.bio</p> <p>Please note https://deeplinker.bio is a Preston remote that provided access to GBIF, iDigBio, BioCASe data files at