- Publication Date:
- NSF-PAR ID:
- 10329273
- Journal Name:
- Earth System Science Data
- Volume:
- 14
- Issue:
- 1
- Page Range or eLocation-ID:
- 95 to 116
- ISSN:
- 1866-3516
- Sponsoring Org:
- National Science Foundation
More Like this
-
Abstract. Internally consistent, quality-controlled (QC) data products play animportant role in promoting regional-to-global research efforts tounderstand societal vulnerabilities to ocean acidification (OA). However,there are currently no such data products for the coastal ocean, where mostof the OA-susceptible commercial and recreational fisheries and aquacultureindustries are located. In this collaborative effort, we compiled, quality-controlled, and synthesized 2 decades of discrete measurements ofinorganic carbon system parameters, oxygen, and nutrient chemistry data fromthe North American continental shelves to generate a data product calledthe Coastal Ocean Data Analysis Product in North America (CODAP-NA). Thereare few deep-water (> 1500 m) sampling locations in the currentdata product. As a result, crossover analyses, which rely on comparisonsbetween measurements on different cruises in the stable deep ocean, couldnot form the basis for cruise-to-cruise adjustments. For this reason, carewas taken in the selection of data sets to include in this initial releaseof CODAP-NA, and only data sets from laboratories with known qualityassurance practices were included. New consistency checks and outlierdetections were used to QC the data. Future releases of this CODAP-NAproduct will use this core data product as the basis for cruise-to-cruisecomparisons. We worked closely with the investigators who collected andmeasured these data during the QC process. This version (v2021) of theCODAP-NAmore »
-
Abstract
This dataset incorporates Mexico City related essential data files associated with Beth Tellman's dissertation: Mapping and Modeling Illicit and Clandestine Drivers of Land Use Change: Urban Expansion in Mexico City and Deforestation in Central America. It contains spatio-temporal datasets covering three domains; i) urban expansion from 1992-2015, ii) district and section electoral records for 6 elections from 2000-2015, iii) land titling (regularization) data for informal settlements from 1997-2012 on private and ejido land. The urban expansion data includes 30m resolution urban land cover for 1992 and 2013 (methods published in Goldblatt et al 2018), and a shapefile of digitized urban informal expansion in conservation land from 2000-2015 using the Worldview-2 satellite. The electoral records include shapefiles with the geospatial boundaries of electoral districts and sections for each election, and .csv files of the number of votes per party for mayoral, delegate, and legislature candidates. The private land titling data includes the approximate (in coordinates) location and date of titles given by the city government (DGRT) extracted from public records (Diario Oficial) from 1997-2012. The titling data on ejido land includes a shapefile of georeferenced polygons taken from photos in the CORETT office or ejido land that has been expropriated -
Obeid, I. (Ed.)The Neural Engineering Data Consortium (NEDC) is developing the Temple University Digital Pathology Corpus (TUDP), an open source database of high-resolution images from scanned pathology samples [1], as part of its National Science Foundation-funded Major Research Instrumentation grant titled “MRI: High Performance Digital Pathology Using Big Data and Machine Learning” [2]. The long-term goal of this project is to release one million images. We have currently scanned over 100,000 images and are in the process of annotating breast tissue data for our first official corpus release, v1.0.0. This release contains 3,505 annotated images of breast tissue including 74 patients with cancerous diagnoses (out of a total of 296 patients). In this poster, we will present an analysis of this corpus and discuss the challenges we have faced in efficiently producing high quality annotations of breast tissue. It is well known that state of the art algorithms in machine learning require vast amounts of data. Fields such as speech recognition [3], image recognition [4] and text processing [5] are able to deliver impressive performance with complex deep learning models because they have developed large corpora to support training of extremely high-dimensional models (e.g., billions of parameters). Other fields that do notmore »
-
Abstract
<p>A biodiversity dataset graph: UCSB-IZC</p> <p>The intended use of this archive is to facilitate (meta-)analysis of the UC Santa Barbara Invertebrate Zoology Collection (UCSB-IZC). UCSB-IZC is a natural history collection of invertebrate zoology at Cheadle Center of Biodiversity and Ecological Restoration, University of California Santa Barbara.</p> <p>This dataset provides versioned snapshots of the UCSB-IZC network as tracked by Preston [2,3] between 2021-10-08 and 2021-11-04 using [preston track "https://api.gbif.org/v1/occurrence/search/?datasetKey=d6097f75-f99e-4c2a-b8a5-b0fc213ecbd0"].</p> <p>This archive contains 14349 images related to 32533 occurrence/specimen records. See included sample-image.jpg and their associated meta-data sample-image.json [4].</p> <p>The images were counted using:</p> <p>$ preston cat hash://sha256/80c0f5fc598be1446d23c95141e87880c9e53773cb2e0b5b54cb57a8ea00b20c\<br /> | grep -o -P ".*depict"\<br /> | sort\<br /> | uniq\<br /> | wc -l</p> <p>And the occurrences were counted using:</p> <p>$ preston cat hash://sha256/80c0f5fc598be1446d23c95141e87880c9e53773cb2e0b5b54cb57a8ea00b20c\<br /> | grep -o -P "occurrence/([0-9])+"\<br /> | sort\<br /> | uniq\<br /> | wc -l</p> <p>The archive consists of 256 individual parts (e.g., preston-00.tar.gz, preston-01.tar.gz, ...) to allow for parallel file downloads. The archive contains three types of files: index files, provenance files and data files. Only two index and provenance files are included and have been individually included in this dataset publication. Index files provide a way to links provenance files in time to establish -
Abstract. Recent observations of near-surface soil temperatures over the circumpolarArctic show accelerated warming of permafrost-affected soils. Theavailability of a comprehensive near-surface permafrost and active layerdataset is critical to better understanding climate impacts and toconstraining permafrost thermal conditions and its spatial distribution inland system models. We compiled a soil temperature dataset from 72 monitoringstations in Alaska using data collected by the U.S. Geological Survey, theNational Park Service, and the University of Alaska Fairbanks permafrostmonitoring networks. The array of monitoring stations spans a large range oflatitudes from 60.9 to 71.3∘N and elevations from near sea level to∼1300m, comprising tundra and boreal forest regions. This datasetconsists of monthly ground temperatures at depths up to 1m,volumetric soil water content, snow depth, and air temperature during1997–2016. These data have been quality controlled in collection andprocessing. Meanwhile, we implemented data harmonization evaluation for theprocessed dataset. The final product (PF-AK, v0.1) is available at the ArcticData Center (https://doi.org/10.18739/A2KG55).