skip to main content


Title: Identifying datasets for global wildlife trafficking

We describe a novel database on wildlife trafficking that can be used for exploring supply chain coordination via game-theoretic collaboration models, geographic spread of wildlife products trafficked via multi-item knapsack problems, or illicit network interdiction via multi-armed bandit problems.

A publicly available visualization of this dataset is available at: https://public.tableau.com/views/IWTDataDirectory-Gore/Sheet2?:language=en-US&:display_count=n&:origin=viz_share_link 
more » « less
Award ID(s):
2120065 1935451
NSF-PAR ID:
10403308
Author(s) / Creator(s):
; ; ; ; ; ; ;
Publisher / Repository:
Zenodo
Date Published:
Edition / Version:
1.0
Subject(s) / Keyword(s):
["wildlife trafficking","wildlife crime","illicit supply networks"]
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. PLEASE CONTACT AUTHORS IF YOU CONTRIBUTE AND WOULD LIKE TO BE LISTED AS A CO-AUTHOR. (this message will be removed some time weeks/months after the first publication)

    Terrestrial Parasite Tracker indexed biotic interactions and review summary.

    The Terrestrial Parasite Tracker (TPT) project began in 2019 and is funded by the National Science foundation to mobilize data from vector and ectoparasite collections to data aggregators (e.g., iDigBio, GBIF) to help build a comprehensive picture of arthropod host-association evolution, distributions, and the ecological interactions of disease vectors which will assist scientists, educators, land managers, and policy makers. Arthropod parasites often are important to human and wildlife health and safety as vectors of pathogens, and it is critical to digitize these specimens so that they, and their biotic interaction data, will be available to help understand and predict the spread of human and wildlife disease.

    This data publication contains versioned TPT associated datasets and related data products that were tracked, reviewed and indexed by Global Biotic Interactions (GloBI) and associated tools. GloBI provides open access to finding species interaction data (e.g., predator-prey, pollinator-plant, pathogen-host, parasite-host) by combining existing open datasets using open source software.

    If you have questions or comments about this publication, please open an issue at https://github.com/ParasiteTracker/tpt-reporting or contact the authors by email.

    Funding:
    The creation of this archive was made possible by the National Science Foundation award "Collaborative Research: Digitization TCN: Digitizing collections to trace parasite-host associations and predict the spread of vector-borne disease," Award numbers DBI:1901932 and DBI:1901926

    References:
    Jorrit H. Poelen, James D. Simons and Chris J. Mungall. (2014). Global Biotic Interactions: An open infrastructure to share and analyze species-interaction datasets. Ecological Informatics. https://doi.org/10.1016/j.ecoinf.2014.08.005.

    GloBI Data Review Report

    Datasets under review:
     - University of Michigan Museum of Zoology Insect Division. Full Database Export 2020-11-20 provided by Erika Tucker and Barry Oconner. accessed via https://github.com/EMTuckerLabUMMZ/ummzi/archive/6731357a377e9c2748fc931faa2ff3dc0ce3ea7a.zip on 2022-06-24T14:02:48.801Z
     - Academy of Natural Sciences Entomology Collection for the Parasite Tracker Project accessed via https://github.com/globalbioticinteractions/ansp-para/archive/5e6592ad09ec89ba7958266ad71ec9d5d21d1a44.zip on 2022-06-24T14:04:22.091Z
     - Bernice Pauahi Bishop Museum, J. Linsley Gressitt Center for Research in Entomology accessed via https://github.com/globalbioticinteractions/bpbm-ent/archive/c085398dddd36f8a1169b9cf57de2a572229341b.zip on 2022-06-24T14:04:37.692Z
     - Texas A&M University, Biodiversity Teaching and Research Collections accessed via https://github.com/globalbioticinteractions/brtc-para/archive/f0a718145b05ed484c4d88947ff712d5f6395446.zip on 2022-06-24T14:06:40.154Z
     - Brigham Young University Arthropod Museum accessed via https://github.com/globalbioticinteractions/byu-byuc/archive/4a609ac6a9a03425e2720b6cdebca6438488f029.zip on 2022-06-24T14:06:51.420Z
     - California Academy of Sciences Entomology accessed via https://github.com/globalbioticinteractions/cas-ent/archive/562aea232ec74ab615f771239451e57b057dc7c0.zip on 2022-06-24T14:07:16.371Z
     - Clemson University Arthropod Collection accessed via https://github.com/globalbioticinteractions/cu-cuac/archive/6cdcbbaa4f7cec8e1eac705be3a999bc5259e00f.zip on 2022-06-24T14:07:40.925Z
     - Denver Museum of Nature and Science (DMNS) Parasite specimens (DMNS:Para) accessed via https://github.com/globalbioticinteractions/dmns-para/archive/a037beb816226eb8196533489ee5f98a6dfda452.zip on 2022-06-24T14:08:00.730Z
     - Field Museum of Natural History IPT accessed via https://github.com/globalbioticinteractions/fmnh/archive/6bfc1b7e46140e93f5561c4e837826204adb3c2f.zip on 2022-06-24T14:18:51.995Z
     - Illinois Natural History Survey Insect Collection accessed via https://github.com/globalbioticinteractions/inhs-insects/archive/38692496f590577074c7cecf8ea37f85d0594ae1.zip on 2022-06-24T14:19:37.563Z
     - UMSP / University of Minnesota / University of Minnesota Insect Collection accessed via https://github.com/globalbioticinteractions/min-umsp/archive/3f1b9d32f947dcb80b9aaab50523e097f0e8776e.zip on 2022-06-24T14:20:27.232Z
     - Milwaukee Public Museum Biological Collections Data Portal accessed via https://github.com/globalbioticinteractions/mpm/archive/9f44e99c49ec5aba3f8592cfced07c38d3223dcd.zip on 2022-06-24T14:20:46.185Z
     - Museum for Southern Biology (MSB) Parasite Collection accessed via https://github.com/globalbioticinteractions/msb-para/archive/178a0b7aa0a8e14b3fe953e770703fe331eadacc.zip on 2022-06-24T15:16:07.223Z
     - The Albert J. Cook Arthropod Research Collection accessed via https://github.com/globalbioticinteractions/msu-msuc/archive/38960906380443bd8108c9e44aeff4590d8d0b50.zip on 2022-06-24T16:09:40.702Z
     - Ohio State University Acarology Laboratory accessed via https://github.com/globalbioticinteractions/osal-ar/archive/876269d66a6a94175dbb6b9a604897f8032b93dd.zip on 2022-06-24T16:10:00.281Z
     - Frost Entomological Museum, Pennsylvania State University accessed via https://github.com/globalbioticinteractions/psuc-ento/archive/30b1f96619a6e9f10da18b42fb93ff22cc4f72e2.zip on 2022-06-24T16:10:07.741Z
     - Purdue Entomological Research Collection accessed via https://github.com/globalbioticinteractions/pu-perc/archive/e0909a7ca0a8df5effccb288ba64b28141e388ba.zip on 2022-06-24T16:10:26.654Z
     - Texas A&M University Insect Collection accessed via https://github.com/globalbioticinteractions/tamuic-ent/archive/f261a8c192021408da67c39626a4aac56e3bac41.zip on 2022-06-24T16:10:58.496Z
     - University of California Santa Barbara Invertebrate Zoology Collection accessed via https://github.com/globalbioticinteractions/ucsb-izc/archive/825678ad02df93f6d4469f9d8b7cc30151b9aa45.zip on 2022-06-24T16:12:29.854Z
     - University of Hawaii Insect Museum accessed via https://github.com/globalbioticinteractions/uhim/archive/53fa790309e48f25685e41ded78ce6a51bafde76.zip on 2022-06-24T16:12:41.408Z
     - University of New Hampshire Collection of Insects and other Arthropods UNHC-UNHC accessed via https://github.com/globalbioticinteractions/unhc/archive/f72575a72edda8a4e6126de79b4681b25593d434.zip on 2022-06-24T16:12:59.500Z
     - Scott L. Gardner and Gabor R. Racz (2021). University of Nebraska State Museum - Parasitology. Harold W. Manter Laboratory of Parasitology. University of Nebraska State Museum. accessed via https://github.com/globalbioticinteractions/unl-nsm/archive/6bcd8aec22e4309b7f4e8be1afe8191d391e73c6.zip on 2022-06-24T16:13:06.914Z
     - Data were obtained from specimens belonging to the United States National Museum of Natural History (USNM), Smithsonian Institution, Washington DC and digitized by the Walter Reed Biosystematics Unit (WRBU). accessed via https://github.com/globalbioticinteractions/usnmentflea/archive/ce5cb1ed2bbc13ee10062b6f75a158fd465ce9bb.zip on 2022-06-24T16:13:38.013Z
     - US National Museum of Natural History Ixodes Records accessed via https://github.com/globalbioticinteractions/usnm-ixodes/archive/c5fcd5f34ce412002783544afb628a33db7f47a6.zip on 2022-06-24T16:13:45.666Z
     - Price Institute of Parasite Research, School of Biological Sciences, University of Utah accessed via https://github.com/globalbioticinteractions/utah-piper/archive/43da8db550b5776c1e3d17803831c696fe9b8285.zip on 2022-06-24T16:13:54.724Z
     - University of Wisconsin Stevens Point, Stephen J. Taft Parasitological Collection accessed via https://github.com/globalbioticinteractions/uwsp-para/archive/f9d0d52cd671731c7f002325e84187979bca4a5b.zip on 2022-06-24T16:14:04.745Z
     - Giraldo-Calderón, G. I., Emrich, S. J., MacCallum, R. M., Maslen, G., Dialynas, E., Topalis, P., … Lawson, D. (2015). VectorBase: an updated bioinformatics resource for invertebrate vectors and other organisms related with human diseases. Nucleic acids research, 43(Database issue), D707–D713. doi:10.1093/nar/gku1117. accessed via https://github.com/globalbioticinteractions/vectorbase/archive/00d6285cd4e9f4edd18cb2778624ab31b34b23b8.zip on 2022-06-24T16:14:11.965Z
     - WIRC / University of Wisconsin Madison WIS-IH / Wisconsin Insect Research Collection accessed via https://github.com/globalbioticinteractions/wis-ih-wirc/archive/34162b86c0ade4b493471543231ae017cc84816e.zip on 2022-06-24T16:14:29.743Z
     - Yale University Peabody Museum Collections Data Portal accessed via https://github.com/globalbioticinteractions/yale-peabody/archive/43be869f17749d71d26fc820c8bd931d6149fe8e.zip on 2022-06-24T16:23:29.289Z

    Generated on:
    2022-06-24

    by:
    GloBI's Elton 0.12.4 
    (see https://github.com/globalbioticinteractions/elton).

    Note that all files ending with .tsv are files formatted 
    as UTF8 encoded tab-separated values files.

    https://www.iana.org/assignments/media-types/text/tab-separated-values


    Included in this review archive are:

    README:
      This file.

    review_summary.tsv:
      Summary across all reviewed collections of total number of distinct review comments.

    review_summary_by_collection.tsv:
      Summary by reviewed collection of total number of distinct review comments.

    indexed_interactions_by_collection.tsv: 
      Summary of number of indexed interaction records by institutionCode and collectionCode.

    review_comments.tsv.gz:
      All review comments by collection.

    indexed_interactions_full.tsv.gz:
      All indexed interactions for all reviewed collections.

    indexed_interactions_simple.tsv.gz:
      All indexed interactions for all reviewed collections selecting only sourceInstitutionCode, sourceCollectionCode, sourceCatalogNumber, sourceTaxonName, interactionTypeName and targetTaxonName.

    datasets_under_review.tsv:
      Details on the datasets under review.

    elton.jar: 
      Program used to update datasets and generate the review reports and associated indexed interactions.

    datasets.zip:
      Source datasets used by elton.jar in process of executing the generate_report.sh script.

    generate_report.sh:
      Program used to generate the report

    generate_report.log:
      Log file generated as part of running the generate_report.sh script
     

     
    more » « less
  2. The historical settlement data compilation for Spain (HISDAC-ES) is a geospatial dataset consisting of over 240 gridded surfaces measuring the physical, functional, age-related, and evolutionary characteristics of the Spanish building stock. We scraped, harmonized, and aggregated cadastral building footprint data for Spain, covering over 12,000,000 building footprints including construction year attributes, to create a multi-faceted series of gridded surfaces (GeoTIFF format), describing the evolution of human settlements in Spain from 1900 to 2020, at 100m spatial and 5 years temporal resolution. Also, the dataset contains aggregated characteristics and completeness statistics at the municipality level, in CSV and GeoPackage format.

    !!! UPDATE 08-2023 !!!: We provide a new, improved version of HISDAC-ES. Specifically, we fixed two bugs in the production code that caused an incorrect rasterization of the multitemporal BUFA layers and of the PHYS layers (BUFA, BIA, DWEL, BUNITS sum and mean). Moreover, we added decadal raster datasets measuring residential building footprint and building indoor area (1900-2020), and provide a country-wide, harmonized building footprint centroid dataset in GeoPackage vector data format.

    File descriptions:

    Datasets are available in three spatial reference systems:

    1. HISDAC-ES_All_LAEA.zip: Raster data in Lambert Azimuthal Equal Area (LAEA) covering all Spanish territory.
    2. HISDAC-ES_IbericPeninsula_UTM30.zip: Raster data in UTM Zone 30N covering all the Iberic Peninsula + Céuta and Melilla.
    3. HISDAC-ES_CanaryIslands_REGCAN.zip: Raster data in REGCAN-95, covering the Canary Islands only.
    4. HISDAC-ES_MunicipAggregates.zip: Municipality-level aggregates and completeness statistics (CSV, GeoPackage), in LAEA projection.
    5. ES_building_centroids_merged_spatjoin.gpkg: 7,000,000+ building footprint centroids in GeoPackage format, harmonized from the different cadastral systems, representing the input data for HISDAC-ES. These data can be used for sanity checks or for the creation of further, user-defined gridded surfaces.

    Source data:

    HISDAC-ES is derived from cadastral building footprint data, available from different authorities in Spain:

    • Araba province: https://geo.araba.eus/WFS_Katastroa?SERVICE=WFS&VERSION=1.1.0&REQUEST=GetCapabilities
    • Bizkaia province: https://web.bizkaia.eus/es/inspirebizkaia
    • Gipuzkoa province: https://b5m.gipuzkoa.eus/web5000/es/utilidades/inspire/edificios/
    • Navarra region: https://inspire.navarra.es/services/BU/wfs
    • Other regions: http://www.catastro.minhap.es/INSPIRE/buildings/ES.SDGC.bu.atom.xml
    • Data source of municipality polygons: Centro Nacional de Información Geográfica (https://centrodedescargas.cnig.es/CentroDescargas/index.jsp)

    Technical notes:

    Gridded data

    File nomenclature:

    ./region_projection_theme/hisdac_es_theme_variable_version_resolution[m][_year].tif

    Regions:

    • all: complete territory of Spain
    • can: Canarian Islands only
    • ibe: Iberic peninsula + Céuta + Melilla

    Projections:

    • laea: Lambert azimuthal equal area (EPSG:3035)
    • regcan: REGCAN95 / UTM zone 28N (EPSG:4083)
    • utm: ETRS89 / UTM zone 30N (EPSG:25830)

    Themes:

    • evolution / evol: multi-temporal physical measurements
    • landuse: multi-temporal building counts per land use (i.e., building function) class
    • physical / phys: physical building characteristics in 2020
    • temporal / temp: temporal characteristics (construction year statistics)

    Variables: evolution

    • budens: building density (count per grid cell area)
    • bufa: building footprint area
    • deva: developed area (any grid cell containing at least one building)
    • resbufa: residential building footprint area
    • resbia: residential building indoor area

    Variables: physical

    • bia: building indoor area
    • bufa: building footprint area
    • bunits: number of building units
    • dwel: number of dwellings

    Variables: temporal

    • mincoy: minimum construction year per grid cell
    • maxcoy: minimum construction year per grid cell
    • meancoy: mean construction year per grid cell
    • medcoy: median construction year per grid cell
    • modecoy: mode (most frequent) construction year per grid cell
    • varcoy: variety of construction years per grid cell

    Variable: landuse

    Counts of buildings per grid cell and land use type.

    Municipality-level data

    • hisdac_es_municipality_stats_multitemporal_longform_v1.csv: This CSV file contains the zonal sums of the gridded surfaces (e.g., number of buildings per year and municipality) in long form. Note that a value of 0 for the year attribute denotes the statistics for records without construction year information.
    • hisdac_es_municipality_stats_multitemporal_wideform_v1.csv: This CSV file contains the zonal sums of the gridded surfaces (e.g., number of buildings per year and municipality) in wide form. Note that a value of 0 for the year suffix denotes the statistics for records without construction year information.
    • hisdac_es_municipality_stats_completeness_v1.csv: This CSV file contains the missingness rates (in %) of the building attribute per municipality, ranging from 0.0 (attribute exists for all buildings) to 100.0 (attribute exists for none of the buildings) in a given municipality.

    Column names for the completeness statistics tables:

    • NATCODE: National municipality identifier*
    • num_total: number of buildings per munic
    • perc_bymiss: Percentage of buildings with missing built year (construction year)
    • perc_lumiss: Percentage of buildings with missing landuse attribute
    • perc_luother: Percentage of buildings with landuse type "other"
    • perc_num_floors_miss: Percentage of buildings without valid number of floors attribute
    • perc_num_dwel_miss: Percentage of buildings without valid number of dwellings attribute
    • perc_num_bunits_miss: Percentage of buildings without valid number of building units attribute
    • perc_offi_area_miss: Percentage of buildings without valid official area (building indoor area, BIA) attribute
    • perc_num_dwel_and_num_bunits_miss: Percentage of buildings missing both number of dwellings and number of building units attribute

    The same statistics are available as geopackage file including municipality polygons in Lambert azimuthal equal area (EPSG:3035).

    *From the NATCODE, other regional identifiers can be derived as follows:

    • NATCODE: 34 01 04 04001
    • Country: 34
    • Comunidad autónoma (CA_CODE): 01
    • Province (PROV_CODE): 04
    • LAU code: 04001 (province + municipality code)
     
    more » « less
  3. Between 2018 and 2021 PIs for National Science Foundation Awards # 1758781 and 1758814 EAGER: Collaborative Research: Developing and Testing an Incubator for Digital Entrepreneurship in Remote Communities, in partnership with the Tanana Chiefs Conference, the traditional tribal consortium of the 42 villages of Interior Alaska, jointly developed and conducted large-scale digital and in-person surveys of multiple Alaskan interior communities. The survey was distributed via a combination of in-person paper surveys, digital surveys, social media links, verbal in-person interviews and telephone-based responses. Analysis of this measure using SAS demonstrated the statistically significant need for enhanced digital infrastructure and reworked digital entrepreneurial and technological education in the Tanana Chiefs Conference region. 1. Two statistical measures were created during this research: Entrepreneurial Readiness (ER) and Digital Technology needs and skills (DT), both of which showed high measures of internal consistency (.89, .81). 2. The measures revealed entrepreneurial readiness challenges and evidence of specific addressable barriers that are currently preventing (serving as hindrances) to regional digital economic activity. The survey data showed statistically significant correlation with the mixed-methodological in-person focus groups and interview research conducted by the PIs and TCC collaborators in Hughes and Huslia, AK, which further corroborated stated barriers to entrepreneurship development in the region. 3. Data generated by the survey and fieldwork is maintained by the Tanana Chiefs Conference under data sovereignty agreements. The survey and focus group data contains aggregated statistical/empirical data as well as qualitative/subjective detail that runs the risk of becoming personally identifiable especially due to (but not limited to) to concerns with exceedingly small Arctic community population sizes. 4. This metadata is being provided in order to serve as a record of the data collection and analysis conducted, and also to share some high-level findings that, while revealing no personal information, may be helpful for policymaking, regional planning and efforts towards educational curricular development and infrastructural investment. The sample demographics consist of 272 women, 79 men, and 4 with gender not indicated as a response. Barriers to Entrepreneurial Readiness were a component of the measure. Lack of education is the #1 barrier, followed closely by lack of access to childcare. Among women who participated in the survey measure, 30% with 2 or more children report lack of childcare to be a significant barrier to entrepreneurial and small business activity. For entrepreneurial readiness and digital economy, the scales perform well from a psychometric standpoint. The summary scores are roughly normally distributed. Cronbach’s alphas are greater than 0.80 for both. They are moderately correlated with each other (r = 0.48, p < .0001). Men and women do not differ significantly on either measure. Education is significantly related to the digital economy measure. The detail provided in the survey related to educational needs enabled optimized development of the Incubator for Digital Entrepreneurship in Remote Communities. Enhanced digital entrepreneurship training with clear cultural linkages to traditions and community needs, along with additional childcare opportunities are two among several specific recommendations provided to the TCC. The project PIs are working closely with the TCC administration and community members related to elements of culturally-aligned curricular development that respects data tribal sovereignty, local data management protocols, data anonymity and adherence to human subjects (IRB) protocols. While the survey data is currently embargoed and unable to be submitted publicly for reasons of anonymity, the project PIs are working with the NSF Arctic Data Center towards determining pathways for sharing personally-protected data with the larger scientific community. These approaches may consist of aggregating and digitally anonymizing sensitive data in ways that cannot be de-aggregated and that meet agency and scientific community needs (while also fully respecting and protecting participants’ rights and personal privacy). At present the data sensitivity protocols are not yet adapted to TCC requirements and the datasets will remain in their care. 
    more » « less
  4. This data set contains all classifications that the Gravity Spy Machine Learning model for LIGO glitches from the first three observing runs (O1, O2 and O3, where O3 is split into O3a and O3b). Gravity Spy classified all noise events identified by the Omicron trigger pipeline in which Omicron identified that the signal-to-noise ratio was above 7.5 and the peak frequency of the noise event was between 10 Hz and 2048 Hz. To classify noise events, Gravity Spy made Omega scans of every glitch consisting of 4 different durations, which helps capture the morphology of noise events that are both short and long in duration.

    There are 22 classes used for O1 and O2 data (including No_Glitch and None_of_the_Above), while there are two additional classes used to classify O3 data.

    For O1 and O2, the glitch classes were: 1080Lines, 1400Ripples, Air_Compressor, Blip, Chirp, Extremely_Loud, Helix, Koi_Fish, Light_Modulation, Low_Frequency_Burst, Low_Frequency_Lines, No_Glitch, None_of_the_Above, Paired_Doves, Power_Line, Repeating_Blips, Scattered_Light, Scratchy, Tomte, Violin_Mode, Wandering_Line, Whistle

    For O3, the glitch classes were: 1080Lines, 1400Ripples, Air_Compressor, Blip, Blip_Low_Frequency, Chirp, Extremely_Loud, Fast_Scattering, Helix, Koi_Fish, Light_Modulation, Low_Frequency_Burst, Low_Frequency_Lines, No_Glitch, None_of_the_Above, Paired_Doves, Power_Line, Repeating_Blips, Scattered_Light, Scratchy, Tomte, Violin_Mode, Wandering_Line, Whistle

    If you would like to download the Omega scans associated with each glitch, then you can use the gravitational-wave data-analysis tool GWpy. If you would like to use this tool, please install anaconda if you have not already and create a virtual environment using the following command

    ```conda create --name gravityspy-py38 -c conda-forge python=3.8 gwpy pandas psycopg2 sqlalchemy```

    After downloading one of the CSV files for a specific era and interferometer, please run the following Python script if you would like to download the data associated with the metadata in the CSV file. We recommend not trying to download too many images at one time. For example, the script below will read data on Hanford glitches from O2 that were classified by Gravity Spy and filter for only glitches that were labelled as Blips with 90% confidence or higher, and then download the first 4 rows of the filtered table.

    ```

    from gwpy.table import GravitySpyTable

    H1_O2 = GravitySpyTable.read('H1_O2.csv')

    H1_O2[(H1_O2["ml_label"] == "Blip") & (H1_O2["ml_confidence"] > 0.9)]

    H1_O2[0:4].download(nproc=1)

    ```

    Each of the columns in the CSV files are taken from various different inputs: 

    [‘event_time’, ‘ifo’, ‘peak_time’, ‘peak_time_ns’, ‘start_time’, ‘start_time_ns’, ‘duration’, ‘peak_frequency’, ‘central_freq’, ‘bandwidth’, ‘channel’, ‘amplitude’, ‘snr’, ‘q_value’] contain metadata about the signal from the Omicron pipeline. 

    [‘gravityspy_id’] is the unique identifier for each glitch in the dataset. 

    [‘1400Ripples’, ‘1080Lines’, ‘Air_Compressor’, ‘Blip’, ‘Chirp’, ‘Extremely_Loud’, ‘Helix’, ‘Koi_Fish’, ‘Light_Modulation’, ‘Low_Frequency_Burst’, ‘Low_Frequency_Lines’, ‘No_Glitch’, ‘None_of_the_Above’, ‘Paired_Doves’, ‘Power_Line’, ‘Repeating_Blips’, ‘Scattered_Light’, ‘Scratchy’, ‘Tomte’, ‘Violin_Mode’, ‘Wandering_Line’, ‘Whistle’] contain the machine learning confidence for a glitch being in a particular Gravity Spy class (the confidence in all these columns should sum to unity). 

    [‘ml_label’, ‘ml_confidence’] provide the machine-learning predicted label for each glitch, and the machine learning confidence in its classification. 

    [‘url1’, ‘url2’, ‘url3’, ‘url4’] are the links to the publicly-available Omega scans for each glitch. ‘url1’ shows the glitch for a duration of 0.5 seconds, ‘url2’ for 1 seconds, ‘url3’ for 2 seconds, and ‘url4’ for 4 seconds.

    ```

    For the most recently uploaded training set used in Gravity Spy machine learning algorithms, please see Gravity Spy Training Set on Zenodo.

    For detailed information on the training set used for the original Gravity Spy machine learning paper, please see Machine learning for Gravity Spy: Glitch classification and dataset on Zenodo. 

     
    more » « less
  5. Abstract

    The NeonTreeCrowns dataset is a set of individual level crown estimates for 100 million trees at 37 geographic sites across the United States surveyed by the National Ecological Observation Network’s Airborne Observation Platform. Each rectangular bounding box crown prediction includes height, crown area, and spatial location. 

    How can I see the data?

    A web server to look through predictions is available through idtrees.org

    Dataset Organization

    The shapefiles.zip contains 11,000 shapefiles, each corresponding to a 1km^2 RGB tile from NEON (ID: DP3.30010.001). For example "2019_SOAP_4_302000_4100000_image.shp" are the predictions from "2019_SOAP_4_302000_4100000_image.tif" available from the NEON data portal: https://data.neonscience.org/data-products/explore?search=camera. NEON's file convention refers to the year of data collection (2019), the four letter site code (SOAP), the sampling event (4), and the utm coordinate of the top left corner (302000_4100000). For NEON site abbreviations and utm zones see https://www.neonscience.org/field-sites/field-sites-map. 

    The predictions are also available as a single csv for each file. All available tiles for that site and year are combined into one large site. These data are not projected, but contain the utm coordinates for each bounding box (left, bottom, right, top). For both file types the following fields are available:

    Height: The crown height measured in meters. Crown height is defined as the 99th quartile of all canopy height pixels from a LiDAR height model (ID: DP3.30015.001)

    Area: The crown area in m2 of the rectangular bounding box.

    Label: All data in this release are "Tree".

    Score: The confidence score from the DeepForest deep learning algorithm. The score ranges from 0 (low confidence) to 1 (high confidence)

    How were predictions made?

    The DeepForest algorithm is available as a python package: https://deepforest.readthedocs.io/. Predictions were overlaid on the LiDAR-derived canopy height model. Predictions with heights less than 3m were removed.

    How were predictions validated?

    Please see

    Weinstein, B. G., Marconi, S., Bohlman, S. A., Zare, A., & White, E. P. (2020). Cross-site learning in deep learning RGB tree crown detection. Ecological Informatics56, 101061.

    Weinstein, B., Marconi, S., Aubry-Kientz, M., Vincent, G., Senyondo, H., & White, E. (2020). DeepForest: A Python package for RGB deep learning tree crown delineation. bioRxiv.

    Weinstein, Ben G., et al. "Individual tree-crown detection in RGB imagery using semi-supervised deep learning neural networks." Remote Sensing 11.11 (2019): 1309.

    Were any sites removed?

    Several sites were removed due to poor NEON data quality. GRSM and PUUM both had lower quality RGB data that made them unsuitable for prediction. NEON surveys are updated annually and we expect future flights to correct these errors. We removed the GUIL puerto rico site due to its very steep topography and poor sunangle during data collection. The DeepForest algorithm responded poorly to predicting crowns in intensely shaded areas where there was very little sun penetration. We are happy to make these data are available upon request.

    # Contact

    We welcome questions, ideas and general inquiries. The data can be used for many applications and we look forward to hearing from you. Contact ben.weinstein@weecology.org. 

    Gordon and Betty Moore Foundation: GBMF4563 
    more » « less