skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Award ID contains: 2046175

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Free, publicly-accessible full text available June 16, 2026
  2. Free, publicly-accessible full text available June 16, 2026
  3. We investigate the impact of low-rank interference on the problem of distinguishing between two seabed types using ambient sound as an acoustic source. The resulting frequency-domain snapshots follow a zero-mean, circularly-symmetric Gaussian distribution, where each seabed type has a unique covariance matrix. Detecting changes in the seabed type across distinct spatial locations can be formulated as a two-sample hypothesis test for equality of covariance, for which Box's M-test is the classical solution. Interference sources such as passing ships result in additive noise with a low-rank covariance that can reduce the performance of hypothesis testing. We first present a method to construct a worst-case interference field, making hypothesis testing as difficult as possible. We then provide an alternating optimization procedure to recover the interference-free covariance matrix. Experiments on synthetic data show that the optimized interferer can greatly reduce hypothesis testing performance, while our recovery method perfectly eliminates this interference for a sufficiently small interference rank. On real data from the New England Shelf Break Acoustics experiment, we show that our approach successfully mitigates interference, allowing for accurate hypothesis testing and improving bottom loss estimation. 
    more » « less
    Free, publicly-accessible full text available February 1, 2026
  4. This article presents a theoretical analysis of optimally distinguishing among environmental parameters from ocean ambient sound. Recent approaches to this problem either focus on parameter estimation or attempt to classify the environment into one of many known types through machine learning. This classification problem is framed as one of hypothesis testing on the received ambient sound snapshots. The resulting test depends on the Kullback-Leibler divergence (KLD) between the distributions corresponding to different environments or sediment types. Analysis of the KLD shows the dependence on the signal-to-noise ratio, the underlying signal subspace, and the distribution of eigenvalues of the respective covariance matrices. This analysis provides insights into both when and why successful hypothesis testing is possible. Experiments demonstrate that our analysis provides insight as to why certain environmental parameters are more difficult to distinguish than others. Experiments on sediment types from the Naval Oceanographic Office Bottom Sediment type database show that certain types are indistinguishable for a given array configuration. Further, the KLD can be used to provide a quantitative alternative to examining bottom loss curves to predict array processing performance. 
    more » « less
  5. We train five models using two machine learning (ML) regression algorithms (i.e., linear regression and XGBoost) to predict hydrothermal upflow in the Great Basin. Feature data are extracted from datasets supporting the INnovative Geothermal Exploration through Novel Investigations Of Undiscovered Systems project (INGENIOUS). The label data (the reported convective signals) are extracted from measured thermal gradients in wells by comparing the total estimated heat flow at the wells to the modeled background conductive heat flow. That is, the reported convective signal is the difference between the background conductive heat flow and the well heat flow. The reported convective signals contain outliers that may affect upflow prediction, so the influence of outliers is tested by constructing models for two cases: 1) using all the data (i.e., -91 to 11,105 mW/m2), and 2) truncating the range of labels to include only reported convective signals between -25 and 200 mW/m2. Because hydrothermal systems are sparse, models that predict high convective signal in smaller areas better match the natural frequency of hydrothermal systems. Early results demonstrate that XGBoost outperforms linear regression. For XGBoost using the truncated range of labels, half of the high reported signals are within < 3 % of the highest predictions. For XGBoost using the entire range of labels, half of the high reported signals are in < 13 % of the highest predictions. While this implies that the truncated regression is superior, the all-data model better predicts the locations of power-producing systems (i.e., the operating power plants are in a smaller fraction of the study area given by the highest predictions). Even though the models generally predict greater hydrothermal upflow for higher reported convective signals than for lower reported convective signals, both XGBoost models consistently underpredict the magnitude of higher signals. This behavior is attributed to low resolution/granularity of input features compared with the scale of a hydrothermal upflow zone (a few km or less across). Trouble estimating exact values while still reliably predicting high versus low convective signals suggests that a future strategy such as ranked ordinal regression (e.g., classifying into ordered bins for low, medium, high, and very high convective signal) might fit better models, since doing so reduces problems introduced by outliers while preserving the property of larger versus smaller signals. 
    more » « less
  6. Recent advances in machine learning (ML) identifying areas favorable to hydrothermal systems indicate that the resolution of feature data remains a subject of necessary improvement before ML can reliably produce better models. Herein, we consider the value of adding new features or replacing other, low-value features with new input features in existing ML pipelines. Our previous work identified stress and seismicity as having less value than the other feature types (i.e., heat flow, distance to faults, and distance to magmatic activity) for the 2008 USGS hydrothermal energy assessment; hence, a fundamental question regards if the addition of new but partially correlated features will improve resulting models for hydrothermal favorability. Therefore, we add new maps for shear strain rate and dilation strain rate to fit logistic regression and XGBoost models, resulting in new 7-feature models that are compared to the old 5-feature models. Because these new features share a degree of correlation with the original relatively uninformative stress and seismicity features, we also consider replacement of the two lower-value features with the two new features, creating new 5-feature models. Adding the new features improves the predictive skill of the new 7-feature model over that of the old 5-feature model; albeit, that improvement is not statistically significant because the new features are correlated with the old features and, consequently, the new features do not present considerable new information. However, the new 5-feature XGBoost model has a statistically significant increase in predictive skill for known positives over the old 5-feature model at p = 0.06. This improved performance is due to the lower-dimensional feature space of the former than that of the latter. In higher-dimensional feature space, relationships between features and the presence or absence of hydrothermal systems are harder to discern (i.e., the 7-feature model likely suffers from the “curse of dimensionality”). 
    more » « less