skip to main content

Title: GESLA Version 3: A major update to the global higher‐frequency sea‐level dataset
This paper describes a major update to the quasi-global, higher-frequency sea-level dataset known as GESLA (Global Extreme Sea Level Analysis). Versions 1 (released 2009) and 2 (released 2016) of the dataset have been used in many published studies, across a wide range of oceanographic and coastal engineering-related investigations concerned with evaluating tides, storm surges, extreme sea levels, and other related processes. The third version of the dataset (released 2021), presented here, contains double the number of years of data, and nearly four times the number of records, compared to Version 2. The dataset consists of records obtained from multiple sources around the world. This paper describes the assembly of the dataset, its processing, and its format, and outlines potential future improvements  more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ; ; ; ; ; ;
Date Published:
Journal Name:
Geoscience Data Journal
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Bottlenose dolphins (Tursiopsspp.) are found in waters around Australia, withT. truncatustypically occupying deeper, more oceanic habitat, whileT. aduncusoccur in shallower, coastal waters. Little is known about the colonization history ofT. aduncusalong the Western Australian coastline; however, it has been hypothesized that extant populations are the result of an expansion along the coastline originating from a source in the north of Australia. To investigate the history of coastalT. aduncuspopulations in the area, we generated a genomic SNP dataset using a double‐digest restriction‐site‐associated DNA (ddRAD) sequencing approach. The resulting dataset consisted of 103,201 biallelic SNPs for 112 individuals which were sampled from eleven coastal and two offshore sites between Shark Bay and Cygnet Bay, Western Australia. Our population genomic analyses showed a pattern consistent with the proposed source in the north with significant isolation by distance along the coastline, as well as a reduction in genomic diversity measures along the coastline with Shark Bay showing the most pronounced reduction. Our demographic analysis indicated that the expansion ofT. aduncusalong the coastline began around the last glacial maximum and progressed southwards with the Shark Bay population being founded only 13 kya. Our results are in line with coastal colonization histories inferred forTursiopsglobally, highlighting the ability of delphinids to rapidly colonize novel coastal niches as habitat is released during glacial cycle‐related global sea level and temperature changes.

    more » « less
  2. South American (SA) societies are highly vulnerable to droughts and pluvials, but lack of long-term climate observations severely limits our understanding of the global processes driving climatic variability in the region. The number and quality of SA climate-sensitive tree ring chronologies have significantly increased in recent decades, now providing a robust network of 286 records for characterizing hydroclimate variability since 1400 CE. We combine this network with a self-calibrated Palmer Drought Severity Index (scPDSI) dataset to derive the South American Drought Atlas (SADA) over the continent south of 12°S. The gridded annual reconstruction of austral summer scPDSI is the most spatially complete estimate of SA hydroclimate to date, and well matches past historical dry/wet events. Relating the SADA to the Australia–New Zealand Drought Atlas, sea surface temperatures and atmospheric pressure fields, we determine that the El Niño–Southern Oscillation (ENSO) and the Southern Annular Mode (SAM) are strongly associated with spatially extended droughts and pluvials over the SADA domain during the past several centuries. SADA also exhibits more extended severe droughts and extreme pluvials since the mid-20th century. Extensive droughts are consistent with the observed 20th-century trend toward positive SAM anomalies concomitant with the weakening of midlatitude Westerlies, while low-level moisture transport intensified by global warming has favored extreme rainfall across the subtropics. The SADA thus provides a long-term context for observed hydroclimatic changes and for 21st-century Intergovernmental Panel on Climate Change (IPCC) projections that suggest SA will experience more frequent/severe droughts and rainfall events as a consequence of increasing greenhouse gas emissions. 
    more » « less
  3. Abstract. Biogeochemical cycling in the semi-enclosed Arctic Ocean is stronglyinfluenced by land–ocean transport of carbon and other elements and isvulnerable to environmental and climate changes. Sediments of the ArcticOcean are an important part of biogeochemical cycling in the Arctic andprovide the opportunity to study present and historical input and the fate oforganic matter (e.g., through permafrost thawing). Comprehensive sedimentary records are required to compare differencesbetween the Arctic regions and to study Arctic biogeochemical budgets. Tothis end, the Circum-Arctic Sediment CArbon DatabasE (CASCADE) wasestablished to curate data primarily on concentrations of organic carbon(OC) and OC isotopes (δ13C, Δ14C) yet also ontotal N (TN) as well as terrigenous biomarkers and other sedimentgeochemical and physical properties. This new database builds on thepublished literature and earlier unpublished records through an extensiveinternational community collaboration. This paper describes the establishment, structure and current status ofCASCADE. The first public version includes OC concentrations in surfacesediments at 4244 oceanographic stations including 2317 with TNconcentrations, 1555 with δ13C-OC values and 268 with Δ14C-OC values and 653 records with quantified terrigenous biomarkers(high-molecular-weight n-alkanes, n-alkanoic acids and lignin phenols).CASCADE also includes data from 326 sediment cores, retrieved by shallowbox or multi-coring, deep gravity/piston coring, or sea-bottom drilling.The comprehensive dataset reveals large-scale features of both OC contentand OC sources between the shelf sea recipients. This offers insight intorelease of pre-aged terrigenous OC to the East Siberian Arctic shelf andyounger terrigenous OC to the Kara Sea. Circum-Arctic sediments therebyreveal patterns of terrestrial OC remobilization and provide clues about thawing of permafrost. CASCADE enables synoptic analysis of OC in Arctic Ocean sediments andfacilitates a wide array of future empirical and modeling studies of theArctic carbon cycle. The database is openly and freely available online(; Martens et al., 2021), is provided in variousmachine-readable data formats (data tables, GIS shapefile, GIS raster), andalso provides ways for contributing data for future CASCADE versions. Wewill continuously update CASCADE with newly published and contributed dataover the foreseeable future as part of the database management of the BolinCentre for Climate Research at Stockholm University. 
    more » « less
  4. Obeid, I. ; Selesnick, I. (Ed.)
    The Neural Engineering Data Consortium at Temple University has been providing key data resources to support the development of deep learning technology for electroencephalography (EEG) applications [1-4] since 2012. We currently have over 1,700 subscribers to our resources and have been providing data, software and documentation from our web site [5] since 2012. In this poster, we introduce additions to our resources that have been developed within the past year to facilitate software development and big data machine learning research. Major resources released in 2019 include: ● Data: The most current release of our open source EEG data is v1.2.0 of TUH EEG and includes the addition of 3,874 sessions and 1,960 patients from mid-2015 through 2016. ● Software: We have recently released a package, PyStream, that demonstrates how to correctly read an EDF file and access samples of the signal. This software demonstrates how to properly decode channels based on their labels and how to implement montages. Most existing open source packages to read EDF files do not directly address the problem of channel labels [6]. ● Documentation: We have released two documents that describe our file formats and data representations: (1) electrodes and channels [6]: describes how to map channel labels to physical locations of the electrodes, and includes a description of every channel label appearing in the corpus; (2) annotation standards [7]: describes our annotation file format and how to decode the data structures used to represent the annotations. Additional significant updates to our resources include: ● NEDC TUH EEG Seizure (v1.6.0): This release includes the expansion of the training dataset from 4,597 files to 4,702. Calibration sequences have been manually annotated and added to our existing documentation. Numerous corrections were made to existing annotations based on user feedback. ● IBM TUSZ Pre-Processed Data (v1.0.0): A preprocessed version of the TUH Seizure Detection Corpus using two methods [8], both of which use an FFT sliding window approach (STFT). In the first method, FFT log magnitudes are used. In the second method, the FFT values are normalized across frequency buckets and correlation coefficients are calculated. The eigenvalues are calculated from this correlation matrix. The eigenvalues and correlation matrix's upper triangle are used to generate feature. ● NEDC TUH EEG Artifact Corpus (v1.0.0): This corpus was developed to support modeling of non-seizure signals for problems such as seizure detection. We have been using the data to build better background models. Five artifact events have been labeled: (1) eye movements (EYEM), (2) chewing (CHEW), (3) shivering (SHIV), (4) electrode pop, electrostatic artifacts, and lead artifacts (ELPP), and (5) muscle artifacts (MUSC). The data is cross-referenced to TUH EEG v1.1.0 so you can match patient numbers, sessions, etc. ● NEDC Eval EEG (v1.3.0): In this release of our standardized scoring software, the False Positive Rate (FPR) definition of the Time-Aligned Event Scoring (TAES) metric has been updated [9]. The standard definition is the number of false positives divided by the number of false positives plus the number of true negatives: #FP / (#FP + #TN). We also recently introduced the ability to download our data from an anonymous rsync server. The rsync command [10] effectively synchronizes both a remote directory and a local directory and copies the selected folder from the server to the desktop. It is available as part of most, if not all, Linux and Mac distributions (unfortunately, there is not an acceptable port of this command for Windows). To use the rsync command to download the content from our website, both a username and password are needed. An automated registration process on our website grants both. An example of a typical rsync command to access our data on our website is: rsync -auxv Rsync is a more robust option for downloading data. We have also experimented with Google Drive and Dropbox, but these types of technology are not suitable for such large amounts of data. All of the resources described in this poster are open source and freely available at We will demonstrate how to access and utilize these resources during the poster presentation and collect community feedback on the most needed additions to enable significant advances in machine learning performance. 
    more » « less
  5. The historical settlement data compilation for Spain (HISDAC-ES) is a geospatial dataset consisting of over 240 gridded surfaces measuring the physical, functional, age-related, and evolutionary characteristics of the Spanish building stock. We scraped, harmonized, and aggregated cadastral building footprint data for Spain, covering over 12,000,000 building footprints including construction year attributes, to create a multi-faceted series of gridded surfaces (GeoTIFF format), describing the evolution of human settlements in Spain from 1900 to 2020, at 100m spatial and 5 years temporal resolution. Also, the dataset contains aggregated characteristics and completeness statistics at the municipality level, in CSV and GeoPackage format.

    !!! UPDATE 08-2023 !!!: We provide a new, improved version of HISDAC-ES. Specifically, we fixed two bugs in the production code that caused an incorrect rasterization of the multitemporal BUFA layers and of the PHYS layers (BUFA, BIA, DWEL, BUNITS sum and mean). Moreover, we added decadal raster datasets measuring residential building footprint and building indoor area (1900-2020), and provide a country-wide, harmonized building footprint centroid dataset in GeoPackage vector data format.

    File descriptions:

    Datasets are available in three spatial reference systems:

    1. Raster data in Lambert Azimuthal Equal Area (LAEA) covering all Spanish territory.
    2. Raster data in UTM Zone 30N covering all the Iberic Peninsula + Céuta and Melilla.
    3. Raster data in REGCAN-95, covering the Canary Islands only.
    4. Municipality-level aggregates and completeness statistics (CSV, GeoPackage), in LAEA projection.
    5. ES_building_centroids_merged_spatjoin.gpkg: 7,000,000+ building footprint centroids in GeoPackage format, harmonized from the different cadastral systems, representing the input data for HISDAC-ES. These data can be used for sanity checks or for the creation of further, user-defined gridded surfaces.

    Source data:

    HISDAC-ES is derived from cadastral building footprint data, available from different authorities in Spain:

    • Araba province:
    • Bizkaia province:
    • Gipuzkoa province:
    • Navarra region:
    • Other regions:
    • Data source of municipality polygons: Centro Nacional de Información Geográfica (

    Technical notes:

    Gridded data

    File nomenclature:



    • all: complete territory of Spain
    • can: Canarian Islands only
    • ibe: Iberic peninsula + Céuta + Melilla


    • laea: Lambert azimuthal equal area (EPSG:3035)
    • regcan: REGCAN95 / UTM zone 28N (EPSG:4083)
    • utm: ETRS89 / UTM zone 30N (EPSG:25830)


    • evolution / evol: multi-temporal physical measurements
    • landuse: multi-temporal building counts per land use (i.e., building function) class
    • physical / phys: physical building characteristics in 2020
    • temporal / temp: temporal characteristics (construction year statistics)

    Variables: evolution

    • budens: building density (count per grid cell area)
    • bufa: building footprint area
    • deva: developed area (any grid cell containing at least one building)
    • resbufa: residential building footprint area
    • resbia: residential building indoor area

    Variables: physical

    • bia: building indoor area
    • bufa: building footprint area
    • bunits: number of building units
    • dwel: number of dwellings

    Variables: temporal

    • mincoy: minimum construction year per grid cell
    • maxcoy: minimum construction year per grid cell
    • meancoy: mean construction year per grid cell
    • medcoy: median construction year per grid cell
    • modecoy: mode (most frequent) construction year per grid cell
    • varcoy: variety of construction years per grid cell

    Variable: landuse

    Counts of buildings per grid cell and land use type.

    Municipality-level data

    • hisdac_es_municipality_stats_multitemporal_longform_v1.csv: This CSV file contains the zonal sums of the gridded surfaces (e.g., number of buildings per year and municipality) in long form. Note that a value of 0 for the year attribute denotes the statistics for records without construction year information.
    • hisdac_es_municipality_stats_multitemporal_wideform_v1.csv: This CSV file contains the zonal sums of the gridded surfaces (e.g., number of buildings per year and municipality) in wide form. Note that a value of 0 for the year suffix denotes the statistics for records without construction year information.
    • hisdac_es_municipality_stats_completeness_v1.csv: This CSV file contains the missingness rates (in %) of the building attribute per municipality, ranging from 0.0 (attribute exists for all buildings) to 100.0 (attribute exists for none of the buildings) in a given municipality.

    Column names for the completeness statistics tables:

    • NATCODE: National municipality identifier*
    • num_total: number of buildings per munic
    • perc_bymiss: Percentage of buildings with missing built year (construction year)
    • perc_lumiss: Percentage of buildings with missing landuse attribute
    • perc_luother: Percentage of buildings with landuse type "other"
    • perc_num_floors_miss: Percentage of buildings without valid number of floors attribute
    • perc_num_dwel_miss: Percentage of buildings without valid number of dwellings attribute
    • perc_num_bunits_miss: Percentage of buildings without valid number of building units attribute
    • perc_offi_area_miss: Percentage of buildings without valid official area (building indoor area, BIA) attribute
    • perc_num_dwel_and_num_bunits_miss: Percentage of buildings missing both number of dwellings and number of building units attribute

    The same statistics are available as geopackage file including municipality polygons in Lambert azimuthal equal area (EPSG:3035).

    *From the NATCODE, other regional identifiers can be derived as follows:

    • NATCODE: 34 01 04 04001
    • Country: 34
    • Comunidad autónoma (CA_CODE): 01
    • Province (PROV_CODE): 04
    • LAU code: 04001 (province + municipality code)
    more » « less