skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Circularity in fisheries data weakens real world prediction
Abstract The systematic substitution of direct observational data with synthesized data derived from models during the stock assessment process has emerged as a low-cost alternative to direct data collection efforts. What is not widely appreciated, however, is how the use of such synthesized data can overestimate predictive skill when forecasting recruitment is part of the assessment process. Using a global database of stock assessments, we show that Standard Fisheries Models (SFMs) can successfully predict synthesized data based on presumed stock-recruitment relationships, however, they are generally less skillful at predicting observational data that are either raw or minimally filtered (denoised without using explicit stock-recruitment models). Additionally, we find that an equation-free approach that does not presume a specific stock-recruitment relationship is better than SFMs at predicting synthesized data, and moreover it can also predict observational recruitment data very well. Thus, while synthesized datasets are cheaper in the short term, they carry costs that can limit their utility in predicting real world recruitment.  more » « less
Award ID(s):
1655203
PAR ID:
10298674
Author(s) / Creator(s):
; ; ; ; ; ; ; ;
Date Published:
Journal Name:
Scientific Reports
Volume:
10
Issue:
1
ISSN:
2045-2322
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Griffith, Gary (Ed.)
    Abstract The stock–recruitment relationship is the basis of any stock prediction and thus fundamental for fishery management. Traditional parametric stock–recruitment models often poorly fit empirical data, nevertheless they are still the rule in fish stock assessment procedures. We here apply a multi-model approach to predict recruitment of 20 Atlantic cod (Gadus morhua) stocks as a function of adult biomass and environmental variables. We compare the traditional Ricker model with two non-parametric approaches: (i) the stochastic cusp model from catastrophe theory and (ii) multivariate simplex projections, based on attractor state-space reconstruction. We show that the performance of each model is contingent on the historical dynamics of individual stocks, and that stocks which experienced abrupt and state-dependent dynamics are best modelled using non-parametric approaches. These dynamics are pervasive in Western stocks highlighting a geographical distinction between cod stocks, which have implications for their recovery potential. Furthermore, the addition of environmental variables always improved the models’ predictive power indicating that they should be considered in stock assessment and management routines. Using our multi-model approach, we demonstrate that we should be more flexible when modelling recruitment and tailor our approaches to the dynamical properties of each individual stock. 
    more » « less
  2. Understanding population dynamics is essential for achieving sustainable and productive fisheries. However, estimating recruitment in a stock assessment model involves the challenging task of identifying a self-sustaining population, which often includes representing complex geographic structure. A review of several case studies demonstrated that alternative stock assessment models can influence estimates of recruitment. Incorporating spatial population structure and connectivity into stock assessment models changed the perception of recruit- ment events for a wide diversity of fisheries, but the degree to which estimates were impacted depended on movement rates and relative stock sizes. For multiple population components, estimates of strong recruitment events and the productivity of smaller population units were often more sensitive to connectivity assumptions. Simulation testing, conditioned on these case studies, suggested that accurately accounting for population structure, either in management unit definitions or stock assessment model structure, improved recruitment estimates. An understanding of movement dynamics improved estimation of connected sub-populations. The challenge of representing geographic structure in stock assessment emphasizes the importance of defining self- sustaining management units to justify a unit-stock assumption. 
    more » « less
  3. Speech foundation models (SFMs) have achieved state-of- the-art results for various speech tasks in supervised (e.g. Whis- per) or self-supervised systems (e.g. WavLM). However, the performance of SFMs for child ASR has not been systemati- cally studied. In addition, there is no benchmark for child ASR with standard evaluations, making the comparisons of novel ideas difficult. In this paper, we initiate and present a compre- hensive benchmark on several child speech databases based on various SFMs (Whisper, Wav2vec2.0, HuBERT, and WavLM). Moreover, we investigate finetuning strategies by comparing various data augmentation and parameter-efficient finetuning (PEFT) methods. We observe that the behaviors of these meth- ods are different when the model size increases. For example, PEFT matches the performance of full finetuning for large mod- els but worse for small models. To stabilize finetuning using augmented data, we propose a perturbation invariant finetuning (PIF) loss as a regularization. 
    more » « less
  4. ABSTRACT We compare the star-forming main sequence (SFMS) of galaxies – both integrated and resolved on 1 kpc scales – between the high-resolution TNG50 simulation of IllustrisTNG and observations from the 3D-HST slitless spectroscopic survey at z ∼ 1. Contrasting integrated star formation rates (SFRs), we find that the slope and normalization of the star-forming main sequence in TNG50 are quantitatively consistent with values derived by fitting observations from 3D-HST with the Prospector Bayesian inference framework. The previous offsets of 0.2–1 dex between observed and simulated main-sequence normalizations are resolved when using the updated masses and SFRs from Prospector. The scatter is generically smaller in TNG50 than in 3D-HST for more massive galaxies with M*> 1010 M⊙, by ∼10–40 per cent, after accounting for observational uncertainties. When comparing resolved star formation, we also find good agreement between TNG50 and 3D-HST: average specific star formation rate (sSFR) radial profiles of galaxies at all masses and radii below, on, and above the SFMS are similar in both normalization and shape. Most noteworthy, massive galaxies with M*> 1010.5 M⊙, which have fallen below the SFMS due to ongoing quenching, exhibit a clear central SFR suppression, in both TNG50 and 3D-HST. In contrast, the original Illustris simulation and a variant TNG run without black hole kinetic wind feedback, do not reproduce the central SFR profile suppression seen in data. In TNG, inside-out quenching is due to the supermassive black hole (SMBH) feedback model operating at low accretion rates. 
    more » « less
  5. Previous moderate- and high-temperature geothermal resource assessments of the western United States utilized weight-of-evidence and logistic regression methodstoestimateresourcefavorability,buttheseanalyses relied uponsomeexpert decisions.Whileexpert decisions can add confidence to aspects of the modeling process by ensuring only reasonable models are employed, expert decisions also introduce human bias into assessments. This bias presents a source of error that may affect the performance of the models and resulting resource estimates. Our study aims to reduce expert input through robust data-driven analyses and better-suited data science techniques, with the goals of saving time, reducing bias, and improving predictive ability. We present six favorability maps for geothermal resources in the western United States created using two strategies applied to three modern machine learning algorithms (logistic regression, support- vector machines, and XGBoost). To provide a direct comparison to previous assessments, we use the same input data as the 2008 U.S. Geological Survey (USGS) conventional moderate- to high-temperature geothermal resource assessment. The six new favorability maps required far less expert decision-making, but broadly agree with the previous assessment. Despite the fact that the 2008 assessment results employed linear methods, the non-linear machine learning algorithms (i.e., support-vector machines and XGBoost) produced greater agreement with the previous assessment than the linear machine learning algorithm (i.e., logistic regression). It is not surprising that geothermal systems depend on non-linear combinations of features, and we postulate that the expert decisions during the 2008 assessment accounted for system non-linearities. Substantial challenges to applying machine learning algorithms to predict geothermal resource favorability include severe class imbalance (i.e., there are very few known geothermal systems compared to the large area considered), and while there are known geothermal systems (i.e., positive labels), all other sites have an unknown status (i.e., they are unlabeled), instead of receiving a negative label (i.e., the known/proven absence of a geothermal resource). We address both challenges through a custom undersampling strategy that can be used with any algorithm and then evaluated using F1 scores. 
    more » « less