Abstract. This paper presents the quantitative imaging datasets collected during the Tara Pacific expedition (2016–2018) carried out on the schooner Tara. The datasets cover a wide range of plankton sizes, from microphytoplankton (> 20 µm in size) to mesozooplankton (a few centimetres in size), and non-living particles such as plastic and detrital particles. It consists of surface samples collected across the North Atlantic and the North and South Pacific Ocean from open-ocean stations (a total of 357 samples) and from stations located in coastal waters, lagoons or reefs of 32 Pacific islands (a total of 228 samples). As this expedition involved long distances and long sailing times, we designed two sampling systems to collect plankton while sailing at speeds of up to 9 knots. To sample microplankton, surface water was pumped aboard using a customised pumping system and filtered through a 20 µm mesh size plankton net (hereafter referred to as the deck net – DN). A high-speed net (HSN; 330 µm mesh size) was developed to sample the mesoplankton. In addition, a manta net (330 µm) was also used, when possible, to collect mesoplankton and plastics simultaneously. We could not deploy these nets at the reef and lagoon stations of islands. Instead, two bongo nets (20 µm) attached to an underwater scooter were used to sample microplankton. In addition to describing and presenting the datasets, the complementary aim of this paper is to investigate and quantify the potential sampling biases associated with these two high-speed sampling systems and the different net types, in order to improve further ecological interpretations. Regarding the imaging techniques, microplankton (20–200 µm) from the DN and bongo net were imaged directly aboard Tara using a FlowCam instrument (Fluid Imaging Technologies), whereas mesoplankton (>200 µm) from the HSN and manta net were analysed in the laboratory with a ZooScan system (back on land). Organisms and other particles were taxonomically and morphologically classified using the automatic sorting tools of the EcoTaxa web application; following this, validation or correction was carried out by taxonomic experts. For microplankton smaller than 45 µm, a subsample of 30 % of the annotations was 100 % visually validated by experts. More than 300 different taxonomic and morphological groups were identified. The datasets include the metadata and the raw data from which morphological traits such as size (equivalent spherical diameter) and biovolume were calculated for each particle as well as a number of quantitative descriptors of the surface plankton communities. These descriptors include abundance, biovolumes, the Shannon diversity index and normalised biovolume size spectrum, allowing the study of their structures (e.g. taxonomic, functional, size and trophic structures) according to a wide range of environmental parameters at the basin scale (https://doi.org/10.5281/zenodo.6445609, Lombard et al., 2023).
more »
« less
First release of the Pelagic Size Structure database: global datasets of marine size spectra obtained from plankton imaging devices
Abstract. In marine ecosystems, most physiological, ecological, or physical processes are size dependent. These include metabolic rates, the uptake of carbon and other nutrients, swimming and sinking velocities, and trophic interactions, which eventually determine the stocks of commercial species, as well as biogeochemical cycles and carbon sequestration. As such, broad-scale observations of plankton size distribution are important indicators of the general functioning and state of pelagic ecosystems under anthropogenic pressures. Here, we present the first global datasets of the Pelagic Size Structure database (PSSdb), generated from plankton imaging devices. This release includes the bulk particle normalized biovolume size spectrum (NBSS) and the bulk particle size distribution (PSD), along with their related parameters (slope, intercept, and R2) measured within the epipelagic layer (0–200 m) by three imaging sensors: the Imaging FlowCytobot (IFCB), the Underwater Vision Profiler (UVP), and benchtop scanners. Collectively, these instruments effectively image organisms and detrital material in the 7–10 000 µm size range. A total of 92 472 IFCB samples, 3068 UVP profiles, and 2411 scans passed our quality control and were standardized to produce consistent instrument-specific size spectra averaged to 1° × 1° latitude and longitude and by year and month. Our instrument-specific datasets span most major ocean basins, except for the IFCB datasets we have ingested, which were exclusively collected in northern latitudes, and cover decadal time periods (2013–2022 for IFCB, 2008–2021 for UVP, and 1996–2022 for scanners), allowing for a further assessment of the pelagic size spectrum in space and time. The datasets that constitute PSSdb's first release are available at https://doi.org/10.5281/zenodo.11050013 (Dugenne et al., 2024b). In addition, future updates to these data products can be accessed at https://doi.org/10.5281/zenodo.7998799.
more »
« less
- PAR ID:
- 10592015
- Author(s) / Creator(s):
- ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; more »
- Publisher / Repository:
- Copernicus Publications
- Date Published:
- Journal Name:
- Earth System Science Data
- Volume:
- 16
- Issue:
- 6
- ISSN:
- 1866-3516
- Page Range / eLocation ID:
- 2971 to 2999
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract BackgroundComputational cell type deconvolution enables the estimation of cell type abundance from bulk tissues and is important for understanding tissue microenviroment, especially in tumor tissues. With rapid development of deconvolution methods, many benchmarking studies have been published aiming for a comprehensive evaluation for these methods. Benchmarking studies rely on cell-type resolved single-cell RNA-seq data to create simulated pseudobulk datasets by adding individual cells-types in controlled proportions. ResultsIn our work, we show that the standard application of this approach, which uses randomly selected single cells, regardless of the intrinsic difference between them, generates synthetic bulk expression values that lack appropriate biological variance. We demonstrate why and how the current bulk simulation pipeline with random cells is unrealistic and propose a heterogeneous simulation strategy as a solution. The heterogeneously simulated bulk samples match up with the variance observed in real bulk datasets and therefore provide concrete benefits for benchmarking in several ways. We demonstrate that conceptual classes of deconvolution methods differ dramatically in their robustness to heterogeneity with reference-free methods performing particularly poorly. For regression-based methods, the heterogeneous simulation provides an explicit framework to disentangle the contributions of reference construction and regression methods to performance. Finally, we perform an extensive benchmark of diverse methods across eight different datasets and find BayesPrism and a hybrid MuSiC/CIBERSORTx approach to be the top performers. ConclusionsOur heterogeneous bulk simulation method and the entire benchmarking framework is implemented in a user friendly packagehttps://github.com/humengying0907/deconvBenchmarkingandhttps://doi.org/10.5281/zenodo.8206516, enabling further developments in deconvolution methods.more » « less
-
The NAIRR Pilot Inaugural Annual Meeting was held on February 19-21, 2025 at the Hyatt Regency Crystal City in Arlington, VA, USA. The meeting highlighted resource offerings, AI, science, education and innovation outcomes, and the NAIRR pilot’s progress in democratizing access to AI resources, and its vision for the future of AI research in the United States. Final Report: https://doi.org/10.5281/zenodo.15263283 Proceedings: https://zenodo.org/communities/nairr2025 Program: https://doi.org/10.5281/zenodo.15106915more » « less
An official website of the United States government

