skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Benchmarking of automatic quality control checks for ocean temperature profiles and recommendations for optimal sets
Millions of in situ ocean temperature profiles have been collected historically using various instrument types with varying sensor accuracy and then assembled into global databases. These are essential to our current understanding of the changing state of the oceans, sea level, Earth’s climate, marine ecosystems and fisheries, and for constraining model projections of future change that underpin mitigation and adaptation solutions. Profiles distributed shortly after collection are also widely used in operational applications such as real-time monitoring and forecasting of the ocean state and weather prediction. Before use in scientific or societal service applications, quality control (QC) procedures need to be applied to flag and ultimately remove erroneous data. Automatic QC (AQC) checks are vital to the timeliness of operational applications and for reducing the volume of dubious data which later require QC processing by a human for delayed mode applications. Despite the large suite of evolving AQC checks developed by institutions worldwide, the most effective set of AQC checks was not known. We have developed a framework to assess the performance of AQC checks, under the auspices of the International Quality Controlled Ocean Database (IQuOD) project. The IQuOD-AQC framework is an open-source collaborative software infrastructure built in Python (available from https://github.com/IQuOD ). Sixty AQC checks have been implemented in this framework. Their performance was benchmarked against three reference datasets which contained a spectrum of instrument types and error modes flagged in their profiles. One of these (a subset of the Quality-controlled Ocean Temperature Archive (QuOTA) dataset that had been manually inspected for quality issues by its creators) was also used to identify optimal sets of AQC checks. Results suggest that the AQC checks are effective for most historical data, but less so in the case of data from Mechanical Bathythermographs (MBTs), and much less effective for Argo data. The optimal AQC sets will be applied to generate quality flags for the next release of the IQuOD dataset. This will further elevate the quality and historical value of millions of temperature profile data which have already been improved by IQuOD intelligent metadata and observational uncertainty information ( https://doi.org/10.7289/v51r6nsf ).  more » « less
Award ID(s):
1840868
PAR ID:
10423655
Author(s) / Creator(s):
; ; ; ; ; ; ; ;
Date Published:
Journal Name:
Frontiers in Marine Science
Volume:
9
ISSN:
2296-7745
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Abstract. Internally consistent, quality-controlled (QC) data products play animportant role in promoting regional-to-global research efforts tounderstand societal vulnerabilities to ocean acidification (OA). However,there are currently no such data products for the coastal ocean, where mostof the OA-susceptible commercial and recreational fisheries and aquacultureindustries are located. In this collaborative effort, we compiled, quality-controlled, and synthesized 2 decades of discrete measurements ofinorganic carbon system parameters, oxygen, and nutrient chemistry data fromthe North American continental shelves to generate a data product calledthe Coastal Ocean Data Analysis Product in North America (CODAP-NA). Thereare few deep-water (> 1500 m) sampling locations in the currentdata product. As a result, crossover analyses, which rely on comparisonsbetween measurements on different cruises in the stable deep ocean, couldnot form the basis for cruise-to-cruise adjustments. For this reason, carewas taken in the selection of data sets to include in this initial releaseof CODAP-NA, and only data sets from laboratories with known qualityassurance practices were included. New consistency checks and outlierdetections were used to QC the data. Future releases of this CODAP-NAproduct will use this core data product as the basis for cruise-to-cruisecomparisons. We worked closely with the investigators who collected andmeasured these data during the QC process. This version (v2021) of theCODAP-NA is comprised of 3391 oceanographic profiles from 61 researchcruises covering all continental shelves of North America, from Alaska toMexico in the west and from Canada to the Caribbean in the east. Data for 14variables (temperature; salinity; dissolved oxygen content; dissolvedinorganic carbon content; total alkalinity; pH on total scale; carbonateion content; fugacity of carbon dioxide; and substance contents of silicate,phosphate, nitrate, nitrite, nitrate plus nitrite, and ammonium) have beensubjected to extensive QC. CODAP-NA is available as a merged data product(Excel, CSV, MATLAB, and NetCDF; https://doi.org/10.25921/531n-c230,https://www.ncei.noaa.gov/data/oceans/ncei/ocads/metadata/0219960.html, last access: 15 May 2021)(Jiang et al., 2021a). The original cruise data have also been updated withdata providers' consent and summarized in a table with links to NOAA'sNational Centers for Environmental Information (NCEI) archives(https://www.ncei.noaa.gov/access/ocean-acidification-data-stewardship-oads/synthesis/NAcruises.html). 
    more » « less
  2. This dataset contains salinity-calibrated Conductivity Temperature Depth (CTD) data from the 2020 Ocean Observations Initiative (OOI) Irminger Sea 7 and Overturning in the Subpolar North Atlantic Program – Greenland Deep Western Boundary Current (ONSAP GDWBC) cruise (AR46). Data quality control methods have been used to assess performance of the CTD instrument. Resulting high-quality profiles were then used together with salinity bottle data analyzed at sea to create a post-cruise salinity-calibrated CTD product. This dataset has been produced as part of an ongoing effort to more fully utilize CTD data collected by OOI Irminger cruises, which have been taking place annually since 2014. The hydrographic data collection facilitated by OOI in the Irminger Sea currently supports science for not only OOI end users, but also international oceanographic research projects, including the Overturning in the Subpolar North Atlantic Program (https://www.o-snap.org/), Atlantic Meridional Overturning Circulation Program (https://usclivar.org/amoc) and BioGeoChemical Array for Real-time Geostrophic Oceanography program (https://biogeochemical-argo.org/index.php). Such programs require a higher-level data product than what OOI provides through its standard data dissemination, and hence a quality controlled, salinity-calibrated data product has been produced. Data are in text formats. 
    more » « less
  3. A high-quality hydrographic observational database is essential for ocean and climate studies and operational applications. Because there are numerous global and regional ocean databases, duplicate data continues to be an issue in data management, data processing and database merging, posing a challenge on effectively and accurately using oceanographic data to derive robust statistics and reliable data products. This study aims to provide algorithms to identify the duplicates and assign labels to them. We propose first a set of criteria to define the duplicate data; and second, an open-source and semi-automatic system to detect duplicate data and erroneous metadata. This system includes several algorithms for automatic checks using statistical methods (such as Principal Component Analysis and entropy weighting) and an additional expert (manual) check. The robustness of the system is then evaluated with a subset of the World Ocean Database (WOD18) with over 600,000in-situtemperature and salinity profiles. This system is an open-source Python package (named DC_OCEAN) allowing users to effectively use the software. Users can customize their settings. The application result from the WOD18 subset also forms a benchmark dataset, which is available to support future studies on duplicate checks, metadata error identification, and machine learning applications. This duplicate checking system will be incorporated into the International Quality-controlled Ocean Database (IQuOD) data quality control system to guarantee the uniqueness of ocean observation data in this product. 
    more » « less
  4. This dataset contains salinity-calibrated Conductivity Temperature Depth (CTD) and bottle data from the 2021 Ocean Observatories Initiative (OOI) Irminger Sea 8 cruise of the research vessel Neil Armstrong (AR60-01). Data quality control methods have been used to assess performance of the CTD instrument. Resulting high-quality profiles were then used together with salinity bottle data analyzed at sea to create a post-cruise salinity-calibrated CTD product. This submission has been produced as part of an ongoing effort to more fully utilize CTD data collected by OOI Irminger cruises, which have been taking place annually since 2014. The hydrographic data collection facilitated by OOI in the Irminger Sea currently supports science for not only OOI end users, but also international oceanographic research projects, including the Overturning in the Subpolar North Atlantic Program (https://www.o-snap.org/), Atlantic Meridional Overturning Circulation Program (https://usclivar.org/amoc) and BioGeoChemical Array for Real-time Geostrophic Oceanography program (https://biogeochemical-argo.org). Such programs require a higher-level data product than what OOI provides through its standard data dissemination, and hence a quality controlled, salinity-calibrated data product has been produced. Data are in text format, data description is in PDF. 
    more » « less
  5. This submission contains salinity-calibrated Conductivity Temperature Depth (CTD) data from the 2018 Ocean Observations Initiative (OOI) Irminger Sea 5 cruise (AR30-03). Data quality control methods have been used to assess performance of the CTD instrument. Resulting high-quality profiles were then used together with salinity bottle data analyzed at sea to create a post-cruise salinity-calibrated CTD product. This submission has been produced as part of an ongoing effort to more fully utilize CTD data collected by OOI Irminger cruises, which have been taking place annually since 2014. The hydrographic data collection facilitated by OOI in the Irminger Sea currently supports science for not only OOI end users, but also international oceanographic research projects, including the Overturning in the Subpolar North Atlantic Program (https://www.o-snap.org/), Atlantic Meridional Overturning Circulation Program (https://usclivar.org/amoc) and BioGeoChemical Array for Real-time Geostrophic Oceanography program (https://biogeochemical- argo.org/index.php). Such programs require a higher-level data product than what OOI provides through its standard data dissemination, and hence a quality controlled, salinity-calibrated data product has been produced. Data are in text formats. 
    more » « less