skip to main content


Title: Replication Data for: Housing Policies Accelerate Energy Efficiency Participation
Human and machine readable replication dataset for "Housing Policies Accelerate Energy Efficiency Participation" Omar I. Asensio, Olga Churkina, Becky Rafter, Kira E. O'Hare  more » « less
Award ID(s):
1945332
NSF-PAR ID:
10333160
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Harvard Dataverse
Date Published:
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Information about grants funded by NSF to support SES research from 2000-2015. The grants included in this dataset are a subset that we identified as having an SES research focus from a set of grants accessed using the Dimensions platform (https://dimensions.ai). CSV file with 35 columns and names in header row: "Grant Searched" lists the granting NSF program (text); "Grant Searched 2" lists a secondary granting NSF program, if applicable (text); "Grant ID" is the ID from the Dimensions platform (string); "Grant Number" is the NSF Award number (integer); "Number of Papers (NSF)" is the count of publications listed under "PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH" in the NSF Award Search page for the grant (integer); "Number of Pubs Tracked" is the count of publications from "Number of Papers (NSF)" included in our analysis (integer); "Publication notes" are our notes about the publication information. We used "subset" to denote when a grant was associated with >10 publications and we used a random sample of 10 publications in our analysis (text); "Unique ID" is our unique identifier for each grant in the dataset (integer); "Collaborative/Cross Program" denotes whether the grant was submitted as part of a set of collaborative or cross-program proposals. In this case, all linked proposals are given the same unique identifier and treated together in the analysis. (text); "Title" is the title of the grant (text); "Title translated" is the title of the grant translated to English, where applicable (text); "Abstract" is the abstract of the grant (text); "Abstract translated" is the abstract of the grant translated to English, where applicable (text); "Funding Amount" is the numeric value of funding awarded to the grant (integer); "Currency" is the currency associated with the field "Funding Amount" (text); "Funding Amount in USD" is the numeric value of funding awarded to the grant expressed in US Dollars (integer); "Start Date" is the start date of the grant (mm/dd/yyyy); "Start Year" is the year in which grant funding began (year); "End Date" is the end date of the grant (mm/dd/yyyy); "End Year" is the year in which the term of the grant expired (year); "Researchers" lists the Principal Investigators on the grant in First Name Last Name format, separated by semi-colons (text); "Research Organization - original" gives the affiliation of the lead PI as listed in the grant (text); "Research Organization - standardized" gives the affiliation of each PI in the list, separated by semi-colons (text); "GRID ID" is a list of the unique identifier for each the Research Organization in the Global Research Identifier Database [https://grid.ac/?_ga=2.26738100.847204331.1643218575-1999717347.1643218575], separated by semi-colons (string); "Country of Research organization" is a list of the countries in which each Research Organization is located, separated by semi-colons (text); "Funder" gives the NSF Directorate that funded the grant (text); "Source Linkout" is a link to the NSF Award Search page with information about the grant (URL); "Dimensions URL" is a link to information about the grant in Dimensions (URL); "FOR (ANZSRC) Categories" is a list of Field of Research categories [from the Australian and New Zealand Standard Research Classification (ANZSRC) system] associated with each grant, separated by semi-colons (string); "FOR [1-5]" give the FOR categories separated. "NOTES" provide any other notes added by the authors of this dataset during our processing of these data. 
    more » « less
  2. The intended use of this archive is to facilitate meta-analysis of the Data Observation Network for Earth (DataONE, [1]). 

    DataONE is a distributed infrastructure that provides information about earth observation data. This dataset was derived from the DataONE network using Preston [2] between 17 October 2018 and 6 November 2018, resolving 335,213 urls at an average retrieval rate of about 5 seconds per url, or 720 files per hour, resulting in a data gzip compressed tar archive of 837.3 MB .  

    The archive associates 325,757 unique metadata urls [3] to 202,063 unique ecological metadata files [4]. Also, the DataONE search index was captured to establish provenance of how the dataset descriptors were found and acquired. During the creation of the snapshot (or crawl), 15,389 urls [5], or 4.7% of urls, did not successfully resolve. 

    To facilitate discovery, the record of the Preston snapshot crawl is included in the preston-ls-* files . There files are derived from the rdf/nquad file with hash://sha256/8c67e0741d1c90db54740e08d2e39d91dfd73566ea69c1f2da0d9ab9780a9a9f . This file can also be found in the data.tar.gz at data/8c/67/e0/8c67e0741d1c90db54740e08d2e39d91dfd73566ea69c1f2da0d9ab9780a9a9f/data . For more information about concepts and format, please see [2]. 

    To extract all EML files from the included Preston archive, first extract the hashes assocated with EML files using:

    cat preston-ls.tsv.gz | gunzip | grep "Version" | grep -v "deeplinker" | grep -v "query/solr" | cut -f1,3 | tr '\t' '\n' | grep "hash://" | sort | uniq > eml-hashes.txt

    extract data.tar.gz using:

    ~/preston-archive$ tar xzf data.tar.gz 

    then use Preston to extract each hash using something like:

    ~/preston-archive$ preston get hash://sha256/00002d0fc9e35a9194da7dd3d8ce25eddee40740533f5af2397d6708542b9baa
    <eml:eml xmlns:eml="eml://ecoinformatics.org/eml-2.1.1" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:stmml="http://www.xml-cml.org/schema/stmml_1.1" packageId="doi:10.18739/A24P9Q" system="https://arcticdata.io" scope="system" xsi:schemaLocation="eml://ecoinformatics.org/eml-2.1.1 ~/development/eml/eml.xsd">
      <dataset>
        <alternateIdentifier>urn:x-wmo:md:org.aoncadis.www::d76bc3b5-7b19-11e4-8526-00c0f03d5b7c</alternateIdentifier>
        <alternateIdentifier>d76bc3b5-7b19-11e4-8526-00c0f03d5b7c</alternateIdentifier>
        <title>Airglow Image Data 2011 4 of 5</title>
    ...

    Alternatively, without using Preston, you can extract the data using the naming convention:

    data/[x]/[y]/[z]/[hash]/data

    where x is the first 2 characters of the hash, y the second 2 characters, z the third 2 characters, and hash the full sha256 content hash of the EML file.

    For example, the hash hash://sha256/00002d0fc9e35a9194da7dd3d8ce25eddee40740533f5af2397d6708542b9baa can be found in the file: data/00/00/2d/00002d0fc9e35a9194da7dd3d8ce25eddee40740533f5af2397d6708542b9baa/data . For more information, see [2].

    The intended use of this archive is to facilitate meta-analysis of the DataONE dataset network. 

    [1] DataONE, https://www.dataone.org
    [2] https://preston.guoda.bio, https://doi.org/10.5281/zenodo.1410543 . DataONE was crawled via Preston with "preston update -u https://dataone.org".
    [3] cat preston-ls.tsv.gz | gunzip | grep "Version" | grep -v "deeplinker" | grep -v "query/solr" | cut -f1,3 | tr '\t' '\n' | grep -v "hash://" | sort | uniq | wc -l
    [4] cat preston-ls.tsv.gz | gunzip | grep "Version" | grep -v "deeplinker" | grep -v "query/solr" | cut -f1,3 | tr '\t' '\n' | grep "hash://" | sort | uniq | wc -l
    [5] cat preston-ls.tsv.gz | gunzip | grep "Version" | grep  "deeplinker" | grep -v "query/solr" | cut -f1,3 | tr '\t' '\n' | grep -v "hash://" | sort | uniq | wc -l

    This work is funded in part by grant NSF OAC 1839201 from the National Science Foundation.

     
    more » « less
  3. Information about individual publications associated with grants funded by NSF to support SES research from 2000-2015 (see "SES grants, 2000-2015"). For grants with ten or fewer publications, we included information about all available publications in this dataset. For grants with more than ten publications, we randomly selected ten to include in this dataset. CSV file with 13 columns and names in header row: "Grant ID" is the ID from the Dimensions platform (string); "Grant Number" is the NSF Award number (integer); "Publication Title" is the title of the paper (text); "Publication Year" is the year in which the paper was published (year); "Authors" is a list or abbreviated list of the authors of the paper (text); "Journal" is the name of the scientific journal or outlet in which the paper is published (text); "Interdis Rubric 1" is a metric representing the dataset authors' assessment for the level of interdisciplinarity represented by the paper (integer: “1” indicated social and natural science interdisciplinarity where both social and environmental conditions are measured or explored and/or author affiliations included departments across these disciplines; “2” indicated general interdisciplinarity between two or more different fields (that may both be within natural or social science); and “3” indicated single-disciplinarity) "Citations" is the count of citations the paper had received as of the date listed in "date for cite count", as reported in Google Scholar (integer); "date for cite count" is the date on which citation count for the paper was obtained (ddBBByy); "Abstract" is the text of the abstract of the paper, where available (text); "Notes" are any notes added by the authors of the dataset (text). 
    more » « less
  4. A biodiversity dataset graph: UCSB-IZC

    The intended use of this archive is to facilitate (meta-)analysis of the UC Santa Barbara Invertebrate Zoology Collection (UCSB-IZC). UCSB-IZC is a natural history collection of invertebrate zoology at Cheadle Center of Biodiversity and Ecological Restoration, University of California Santa Barbara.

    This dataset provides versioned snapshots of the UCSB-IZC network as tracked by Preston [2,3] between 2021-10-08 and 2021-11-04 using [preston track "https://api.gbif.org/v1/occurrence/search/?datasetKey=d6097f75-f99e-4c2a-b8a5-b0fc213ecbd0"].

    This archive contains 14349 images related to 32533 occurrence/specimen records. See included sample-image.jpg and their associated meta-data sample-image.json [4].

    The images were counted using:

    $ preston cat hash://sha256/80c0f5fc598be1446d23c95141e87880c9e53773cb2e0b5b54cb57a8ea00b20c\
     | grep -o -P ".*depict"\
     | sort\
     | uniq\
     | wc -l

    And the occurrences were counted using:

    $ preston cat hash://sha256/80c0f5fc598be1446d23c95141e87880c9e53773cb2e0b5b54cb57a8ea00b20c\
     | grep -o -P "occurrence/([0-9])+"\
     | sort\
     | uniq\
     | wc -l

    The archive consists of 256 individual parts (e.g., preston-00.tar.gz, preston-01.tar.gz, ...) to allow for parallel file downloads. The archive contains three types of files: index files, provenance files and data files. Only two index and provenance files are included and have been individually included in this dataset publication. Index files provide a way to links provenance files in time to establish a versioning mechanism.

    To retrieve and verify the downloaded UCSB-IZC biodiversity dataset graph, first download preston-*.tar.gz. Then, extract the archives into a "data" folder. Alternatively, you can use the Preston [2,3] command-line tool to "clone" this dataset using:

    $ java -jar preston.jar clone --remote https://archive.org/download/preston-ucsb-izc/data.zip/,https://zenodo.org/record/5557670/files,https://zenodo.org/record/5557670/files/5660088

    After that, verify the index of the archive by reproducing the following provenance log history:

    $ java -jar preston.jar history
    <urn:uuid:0659a54f-b713-4f86-a917-5be166a14110> <http://purl.org/pav/hasVersion> <hash://sha256/d5eb492d3e0304afadcc85f968de1e23042479ad670a5819cee00f2c2c277f36> .
    <hash://sha256/80c0f5fc598be1446d23c95141e87880c9e53773cb2e0b5b54cb57a8ea00b20c> <http://purl.org/pav/previousVersion> <hash://sha256/d5eb492d3e0304afadcc85f968de1e23042479ad670a5819cee00f2c2c277f36> .

    To check the integrity of the extracted archive, confirm that each line produce by the command "preston verify" produces lines as shown below, with each line including "CONTENT_PRESENT_VALID_HASH". Depending on hardware capacity, this may take a while.

    $ java -jar preston.jar verify
    hash://sha256/ce1dc2468dfb1706a6f972f11b5489dc635bdcf9c9fd62a942af14898c488b2c    file:/home/jhpoelen/ucsb-izc/data/ce/1d/ce1dc2468dfb1706a6f972f11b5489dc635bdcf9c9fd62a942af14898c488b2c    OK    CONTENT_PRESENT_VALID_HASH    66438    hash://sha256/ce1dc2468dfb1706a6f972f11b5489dc635bdcf9c9fd62a942af14898c488b2c
    hash://sha256/f68d489a9275cb9d1249767244b594c09ab23fd00b82374cb5877cabaa4d0844    file:/home/jhpoelen/ucsb-izc/data/f6/8d/f68d489a9275cb9d1249767244b594c09ab23fd00b82374cb5877cabaa4d0844    OK    CONTENT_PRESENT_VALID_HASH    4093    hash://sha256/f68d489a9275cb9d1249767244b594c09ab23fd00b82374cb5877cabaa4d0844
    hash://sha256/3e70b7adc1a342e5551b598d732c20b96a0102bb1e7f42cfc2ae8a2c4227edef    file:/home/jhpoelen/ucsb-izc/data/3e/70/3e70b7adc1a342e5551b598d732c20b96a0102bb1e7f42cfc2ae8a2c4227edef    OK    CONTENT_PRESENT_VALID_HASH    5746    hash://sha256/3e70b7adc1a342e5551b598d732c20b96a0102bb1e7f42cfc2ae8a2c4227edef
    hash://sha256/995806159ae2fdffdc35eef2a7eccf362cb663522c308aa6aa52e2faca8bb25b    file:/home/jhpoelen/ucsb-izc/data/99/58/995806159ae2fdffdc35eef2a7eccf362cb663522c308aa6aa52e2faca8bb25b    OK    CONTENT_PRESENT_VALID_HASH    6147    hash://sha256/995806159ae2fdffdc35eef2a7eccf362cb663522c308aa6aa52e2faca8bb25b

    Note that a copy of the java program "preston", preston.jar, is included in this publication. The program runs on java 8+ virtual machine using "java -jar preston.jar", or in short "preston".

    Files in this data publication:

    --- start of file descriptions ---

    -- description of archive and its contents (this file) --
    README

    -- executable java jar containing preston [2,3] v0.3.1. --
    preston.jar

    -- preston archive containing UCSB-IZC (meta-)data/image files, associated provenance logs and a provenance index --
    preston-[00-ff].tar.gz

    -- individual provenance index files --
    2a5de79372318317a382ea9a2cef069780b852b01210ef59e06b640a3539cb5a

    -- example image and meta-data --
    sample-image.jpg (with hash://sha256/916ba5dc6ad37a3c16634e1a0e3d2a09969f2527bb207220e3dbdbcf4d6b810c)
    sample-image.json (with hash://sha256/f68d489a9275cb9d1249767244b594c09ab23fd00b82374cb5877cabaa4d0844)

    --- end of file descriptions ---


    References

    [1] Cheadle Center for Biodiversity and Ecological Restoration (2021). University of California Santa Barbara Invertebrate Zoology Collection. Occurrence dataset https://doi.org/10.15468/w6hvhv accessed via GBIF.org on 2021-11-04 as indexed by the Global Biodiversity Informatics Facility (GBIF) with provenance hash://sha256/d5eb492d3e0304afadcc85f968de1e23042479ad670a5819cee00f2c2c277f36 hash://sha256/80c0f5fc598be1446d23c95141e87880c9e53773cb2e0b5b54cb57a8ea00b20c.
    [2] https://preston.guoda.bio, https://doi.org/10.5281/zenodo.1410543 .
    [3] MJ Elliott, JH Poelen, JAB Fortes (2020). Toward Reliable Biodiversity Dataset References. Ecological Informatics. https://doi.org/10.1016/j.ecoinf.2020.101132
    [4] Cheadle Center for Biodiversity and Ecological Restoration (2021). University of California Santa Barbara Invertebrate Zoology Collection. Occurrence dataset https://doi.org/10.15468/w6hvhv accessed via GBIF.org on 2021-10-08. https://www.gbif.org/occurrence/3323647301 . hash://sha256/f68d489a9275cb9d1249767244b594c09ab23fd00b82374cb5877cabaa4d0844 hash://sha256/916ba5dc6ad37a3c16634e1a0e3d2a09969f2527bb207220e3dbdbcf4d6b810c

    This work is funded in part by grant NSF OAC 1839201 and NSF DBI 2102006 from the National Science Foundation. 
    more » « less
  5. Data Description:

    To improve SOC estimation in the United States, we upscaled site-based SOC measurements to the continental scale using multivariate geographic clustering (MGC) approach coupled with machine learning models. First, we used the MGC approach to segment the United States at 30 arc second resolution based on principal component information from environmental covariates (gNATSGO soil properties, WorldClim bioclimatic variables, MODIS biological variables, and physiographic variables) to 20 SOC regions. We then trained separate random forest model ensembles for each of the SOC regions identified using environmental covariates and soil profile measurements from the International Soil Carbon Network (ISCN) and an Alaska soil profile data. We estimated United States SOC for 0-30 cm and 0-100 cm depths were 52.6 + 3.2 and 108.3 + 8.2 Pg C, respectively.

    Files in collection (32):

    Collection contains 22 soil properties geospatial rasters, 4 soil SOC geospatial rasters, 2 ISCN site SOC observations csv files, and 4 R scripts

    gNATSGO TIF files:

    ├── available_water_storage_30arc_30cm_us.tif                   [30 cm depth soil available water storage]
    ├── available_water_storage_30arc_100cm_us.tif                 [100 cm depth soil available water storage]
    ├── caco3_30arc_30cm_us.tif                                                 [30 cm depth soil CaCO3 content]
    ├── caco3_30arc_100cm_us.tif                                               [100 cm depth soil CaCO3 content]
    ├── cec_30arc_30cm_us.tif                                                     [30 cm depth soil cation exchange capacity]
    ├── cec_30arc_100cm_us.tif                                                   [100 cm depth soil cation exchange capacity]
    ├── clay_30arc_30cm_us.tif                                                     [30 cm depth soil clay content]
    ├── clay_30arc_100cm_us.tif                                                   [100 cm depth soil clay content]
    ├── depthWT_30arc_us.tif                                                        [depth to water table]
    ├── kfactor_30arc_30cm_us.tif                                                 [30 cm depth soil erosion factor]
    ├── kfactor_30arc_100cm_us.tif                                               [100 cm depth soil erosion factor]
    ├── ph_30arc_100cm_us.tif                                                      [100 cm depth soil pH]
    ├── ph_30arc_100cm_us.tif                                                      [30 cm depth soil pH]
    ├── pondingFre_30arc_us.tif                                                     [ponding frequency]
    ├── sand_30arc_30cm_us.tif                                                    [30 cm depth soil sand content]
    ├── sand_30arc_100cm_us.tif                                                  [100 cm depth soil sand content]
    ├── silt_30arc_30cm_us.tif                                                        [30 cm depth soil silt content]
    ├── silt_30arc_100cm_us.tif                                                      [100 cm depth soil silt content]
    ├── water_content_30arc_30cm_us.tif                                      [30 cm depth soil water content]
    └── water_content_30arc_100cm_us.tif                                   [100 cm depth soil water content]

    SOC TIF files:

    ├──30cm SOC mean.tif                             [30 cm depth soil SOC]
    ├──100cm SOC mean.tif                           [100 cm depth soil SOC]
    ├──30cm SOC CV.tif                                 [30 cm depth soil SOC coefficient of variation]
    └──100cm SOC CV.tif                              [100 cm depth soil SOC coefficient of variation]

    site observations csv files:

    ISCN_rmNRCS_addNCSS_30cm.csv       30cm ISCN sites SOC replaced NRCS sites with NCSS centroid removed data

    ISCN_rmNRCS_addNCSS_100cm.csv       100cm ISCN sites SOC replaced NRCS sites with NCSS centroid removed data


    Data format:

    Geospatial files are provided in Geotiff format in Lat/Lon WGS84 EPSG: 4326 projection at 30 arc second resolution.

    Geospatial projection

    GEOGCS["GCS_WGS_1984", DATUM["D_WGS_1984", SPHEROID["WGS_1984",6378137,298.257223563]], PRIMEM["Greenwich",0], UNIT["Degree",0.017453292519943295]] (base) [jbk@theseus ltar_regionalization]$ g.proj -w GEOGCS["wgs84", DATUM["WGS_1984", SPHEROID["WGS_1984",6378137,298.257223563]], PRIMEM["Greenwich",0], UNIT["degree",0.0174532925199433]]

     

     
    more » « less