skip to main content

Title: Circularity in fisheries data weakens real world prediction
Abstract The systematic substitution of direct observational data with synthesized data derived from models during the stock assessment process has emerged as a low-cost alternative to direct data collection efforts. What is not widely appreciated, however, is how the use of such synthesized data can overestimate predictive skill when forecasting recruitment is part of the assessment process. Using a global database of stock assessments, we show that Standard Fisheries Models (SFMs) can successfully predict synthesized data based on presumed stock-recruitment relationships, however, they are generally less skillful at predicting observational data that are either raw or minimally filtered (denoised without using explicit stock-recruitment models). Additionally, we find that an equation-free approach that does not presume a specific stock-recruitment relationship is better than SFMs at predicting synthesized data, and moreover it can also predict observational recruitment data very well. Thus, while synthesized datasets are cheaper in the short term, they carry costs that can limit their utility in predicting real world recruitment.
; ; ; ; ; ; ; ;
Award ID(s):
Publication Date:
Journal Name:
Scientific Reports
Sponsoring Org:
National Science Foundation
More Like this
  1. Griffith, Gary (Ed.)
    Abstract The stock–recruitment relationship is the basis of any stock prediction and thus fundamental for fishery management. Traditional parametric stock–recruitment models often poorly fit empirical data, nevertheless they are still the rule in fish stock assessment procedures. We here apply a multi-model approach to predict recruitment of 20 Atlantic cod (Gadus morhua) stocks as a function of adult biomass and environmental variables. We compare the traditional Ricker model with two non-parametric approaches: (i) the stochastic cusp model from catastrophe theory and (ii) multivariate simplex projections, based on attractor state-space reconstruction. We show that the performance of each model is contingent on the historical dynamics of individual stocks, and that stocks which experienced abrupt and state-dependent dynamics are best modelled using non-parametric approaches. These dynamics are pervasive in Western stocks highlighting a geographical distinction between cod stocks, which have implications for their recovery potential. Furthermore, the addition of environmental variables always improved the models’ predictive power indicating that they should be considered in stock assessment and management routines. Using our multi-model approach, we demonstrate that we should be more flexible when modelling recruitment and tailor our approaches to the dynamical properties of each individual stock.
  2. Understanding population dynamics is essential for achieving sustainable and productive fisheries. However, estimating recruitment in a stock assessment model involves the challenging task of identifying a self-sustaining population, which often includes representing complex geographic structure. A review of several case studies demonstrated that alternative stock assessment models can influence estimates of recruitment. Incorporating spatial population structure and connectivity into stock assessment models changed the perception of recruit- ment events for a wide diversity of fisheries, but the degree to which estimates were impacted depended on movement rates and relative stock sizes. For multiple population components, estimates of strong recruitment events and the productivity of smaller population units were often more sensitive to connectivity assumptions. Simulation testing, conditioned on these case studies, suggested that accurately accounting for population structure, either in management unit definitions or stock assessment model structure, improved recruitment estimates. An understanding of movement dynamics improved estimation of connected sub-populations. The challenge of representing geographic structure in stock assessment emphasizes the importance of defining self- sustaining management units to justify a unit-stock assumption.
  3. Previous moderate- and high-temperature geothermal resource assessments of the western United States utilized weight-of-evidence and logistic regression methodstoestimateresourcefavorability,buttheseanalyses relied uponsomeexpert decisions.Whileexpert decisions can add confidence to aspects of the modeling process by ensuring only reasonable models are employed, expert decisions also introduce human bias into assessments. This bias presents a source of error that may affect the performance of the models and resulting resource estimates. Our study aims to reduce expert input through robust data-driven analyses and better-suited data science techniques, with the goals of saving time, reducing bias, and improving predictive ability. We present six favorability maps for geothermal resources in the western United States created using two strategies applied to three modern machine learning algorithms (logistic regression, support- vector machines, and XGBoost). To provide a direct comparison to previous assessments, we use the same input data as the 2008 U.S. Geological Survey (USGS) conventional moderate- to high-temperature geothermal resource assessment. The six new favorability maps required far less expert decision-making, but broadly agree with the previous assessment. Despite the fact that the 2008 assessment results employed linear methods, the non-linear machine learning algorithms (i.e., support-vector machines and XGBoost) produced greater agreement with the previous assessment than the linearmore »machine learning algorithm (i.e., logistic regression). It is not surprising that geothermal systems depend on non-linear combinations of features, and we postulate that the expert decisions during the 2008 assessment accounted for system non-linearities. Substantial challenges to applying machine learning algorithms to predict geothermal resource favorability include severe class imbalance (i.e., there are very few known geothermal systems compared to the large area considered), and while there are known geothermal systems (i.e., positive labels), all other sites have an unknown status (i.e., they are unlabeled), instead of receiving a negative label (i.e., the known/proven absence of a geothermal resource). We address both challenges through a custom undersampling strategy that can be used with any algorithm and then evaluated using F1 scores.« less
  4. Background Mobile health technology has demonstrated the ability of smartphone apps and sensors to collect data pertaining to patient activity, behavior, and cognition. It also offers the opportunity to understand how everyday passive mobile metrics such as battery life and screen time relate to mental health outcomes through continuous sensing. Impulsivity is an underlying factor in numerous physical and mental health problems. However, few studies have been designed to help us understand how mobile sensors and self-report data can improve our understanding of impulsive behavior. Objective The objective of this study was to explore the feasibility of using mobile sensor data to detect and monitor self-reported state impulsivity and impulsive behavior passively via a cross-platform mobile sensing application. Methods We enrolled 26 participants who were part of a larger study of impulsivity to take part in a real-world, continuous mobile sensing study over 21 days on both Apple operating system (iOS) and Android platforms. The mobile sensing system (mPulse) collected data from call logs, battery charging, and screen checking. To validate the model, we used mobile sensing features to predict common self-reported impulsivity traits, objective mobile behavioral and cognitive measures, and ecological momentary assessment (EMA) of state impulsivity and constructsmore »related to impulsive behavior (ie, risk-taking, attention, and affect). Results Overall, the findings suggested that passive measures of mobile phone use such as call logs, battery charging, and screen checking can predict different facets of trait and state impulsivity and impulsive behavior. For impulsivity traits, the models significantly explained variance in sensation seeking, planning, and lack of perseverance traits but failed to explain motor, urgency, lack of premeditation, and attention traits. Passive sensing features from call logs, battery charging, and screen checking were particularly useful in explaining and predicting trait-based sensation seeking. On a daily level, the model successfully predicted objective behavioral measures such as present bias in delay discounting tasks, commission and omission errors in a cognitive attention task, and total gains in a risk-taking task. Our models also predicted daily EMA questions on positivity, stress, productivity, healthiness, and emotion and affect. Perhaps most intriguingly, the model failed to predict daily EMA designed to measure previous-day impulsivity using face-valid questions. Conclusions The study demonstrated the potential for developing trait and state impulsivity phenotypes and detecting impulsive behavior from everyday mobile phone sensors. Limitations of the current research and suggestions for building more precise passive sensing models are discussed. Trial Registration NCT03006653;« less
  5. Abstract

    The radius of maximum wind (Rmax) in a tropical cyclone governs the footprint of hazards, including damaging wind, surge, and rainfall. However,Rmaxis an inconstant quantity that is difficult to observe directly and is poorly resolved in reanalyses and climate models. In contrast, outer wind radii are much less sensitive to such issues. Here we present a simple empirical model for predictingRmaxfrom the radius of 34-kt (1 kt ≈ 0.51 m s−1) wind (R17.5 ms). The model only requires as input quantities that are routinely estimated operationally: maximum wind speed,R17.5 ms, and latitude. The form of the empirical model takes advantage of our physical understanding of tropical cyclone radial structure and is trained on the Extended Best Track database from the North Atlantic 2004–20. Results are similar for the TC-OBS database. The physics reduces the relationship between the two radii to a dependence on two physical parameters, while the observational data enables an optimal estimate of the quantitative dependence on those parameters. The model performs substantially better than existing operational methods for estimatingRmax. The model reproduces the observed statistical increase inRmaxwith latitude and demonstrates that this increase is driven by the increase inR17.5 mswith latitude. Overall, the model offers a simple and fastmore »first-order prediction ofRmaxthat can be used operationally and in risk models.

    Significance Statement

    If we can better predict the area of strong winds in a tropical cyclone, we can better prepare for its potential impacts. This work develops a simple model to predict the radius where the strongest winds in a tropical cyclone are located. The model is simple and fast and more accurate than existing models, and it also helps us to understand what causes this radius to vary in time, from storm to storm, and at different latitudes. It can be used in both operational forecasting and models of tropical cyclone hazard risk.

    « less