skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Predicting large hydrothermal systems
We train five models using two machine learning (ML) regression algorithms (i.e., linear regression and XGBoost) to predict hydrothermal upflow in the Great Basin. Feature data are extracted from datasets supporting the INnovative Geothermal Exploration through Novel Investigations Of Undiscovered Systems project (INGENIOUS). The label data (the reported convective signals) are extracted from measured thermal gradients in wells by comparing the total estimated heat flow at the wells to the modeled background conductive heat flow. That is, the reported convective signal is the difference between the background conductive heat flow and the well heat flow. The reported convective signals contain outliers that may affect upflow prediction, so the influence of outliers is tested by constructing models for two cases: 1) using all the data (i.e., -91 to 11,105 mW/m2), and 2) truncating the range of labels to include only reported convective signals between -25 and 200 mW/m2. Because hydrothermal systems are sparse, models that predict high convective signal in smaller areas better match the natural frequency of hydrothermal systems. Early results demonstrate that XGBoost outperforms linear regression. For XGBoost using the truncated range of labels, half of the high reported signals are within < 3 % of the highest predictions. For XGBoost using the entire range of labels, half of the high reported signals are in < 13 % of the highest predictions. While this implies that the truncated regression is superior, the all-data model better predicts the locations of power-producing systems (i.e., the operating power plants are in a smaller fraction of the study area given by the highest predictions). Even though the models generally predict greater hydrothermal upflow for higher reported convective signals than for lower reported convective signals, both XGBoost models consistently underpredict the magnitude of higher signals. This behavior is attributed to low resolution/granularity of input features compared with the scale of a hydrothermal upflow zone (a few km or less across). Trouble estimating exact values while still reliably predicting high versus low convective signals suggests that a future strategy such as ranked ordinal regression (e.g., classifying into ordered bins for low, medium, high, and very high convective signal) might fit better models, since doing so reduces problems introduced by outliers while preserving the property of larger versus smaller signals.  more » « less
Award ID(s):
2046175
PAR ID:
10536406
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
2023 Geothermal Rising Conference
Date Published:
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Recent advances in machine learning (ML) identifying areas favorable to hydrothermal systems indicate that the resolution of feature data remains a subject of necessary improvement before ML can reliably produce better models. Herein, we consider the value of adding new features or replacing other, low-value features with new input features in existing ML pipelines. Our previous work identified stress and seismicity as having less value than the other feature types (i.e., heat flow, distance to faults, and distance to magmatic activity) for the 2008 USGS hydrothermal energy assessment; hence, a fundamental question regards if the addition of new but partially correlated features will improve resulting models for hydrothermal favorability. Therefore, we add new maps for shear strain rate and dilation strain rate to fit logistic regression and XGBoost models, resulting in new 7-feature models that are compared to the old 5-feature models. Because these new features share a degree of correlation with the original relatively uninformative stress and seismicity features, we also consider replacement of the two lower-value features with the two new features, creating new 5-feature models. Adding the new features improves the predictive skill of the new 7-feature model over that of the old 5-feature model; albeit, that improvement is not statistically significant because the new features are correlated with the old features and, consequently, the new features do not present considerable new information. However, the new 5-feature XGBoost model has a statistically significant increase in predictive skill for known positives over the old 5-feature model at p = 0.06. This improved performance is due to the lower-dimensional feature space of the former than that of the latter. In higher-dimensional feature space, relationships between features and the presence or absence of hydrothermal systems are harder to discern (i.e., the 7-feature model likely suffers from the “curse of dimensionality”). 
    more » « less
  2. Le, Khanh N.Q. (Ed.)
    In current clinical settings, typically pain is measured by a patient’s self-reported information. This subjective pain assessment results in suboptimal treatment plans, over-prescription of opioids, and drug-seeking behavior among patients. In the present study, we explored automatic objective pain intensity estimation machine learning models using inputs from physiological sensors. This study uses BioVid Heat Pain Dataset. We extracted features from Electrodermal Activity (EDA), Electrocardiogram (ECG), Electromyogram (EMG) signals collected from study participants subjected to heat pain. We built different machine learning models, including Linear Regression, Support Vector Regression (SVR), Neural Networks and Extreme Gradient Boosting for continuous value pain intensity estimation. Then we identified the physiological sensor, feature set and machine learning model that give the best predictive performance. We found that EDA is the most information-rich sensor for continuous pain intensity prediction. A set of only 3 features from EDA signals using SVR model gave an average performance of 0.93 mean absolute error (MAE) and 1.16 root means square error (RMSE) for the subject-independent model and of 0.92 MAE and 1.13 RMSE for subject-dependent. The MAE achieved with signal-feature-model combination is less than 1 unit on 0 to 4 continues pain scale, which is smaller than the MAE achieved by the methods reported in the literature. These results demonstrate that it is possible to estimate pain intensity of a patient using a computationally inexpensive machine learning model with 3 statistical features from EDA signal which can be collected from a wrist biosensor. This method paves a way to developing a wearable pain measurement device. 
    more » « less
  3. To identify superior thermal contacts to graphene, we implement a high-throughput methodology that systematically explores the Ni−Pd alloy composition spectrum and the effect of Cr adhesion layer thickness on thermal interface conductance with monolayer graphene. Frequency domain thermoreflectance measurements of two independently prepared Ni−Pd/Cr/graphene/ SiO2 samples identify a maximum metal/graphene/SiO2 junction thermal interface conductance of 114 ± (39, 25) MW/m2 K and 113 ± (33, 22) MW/m2 K at ∼10 at. % Pd in Ninearly double the highest reported value for pure metals and 3 times that of pure Ni or Pd. The presence of Cr, at any thickness, suppresses this maximum. Although the origin of the peak is unresolved, we find that it correlates with a region of the Ni−Pd phase diagram that exhibits a miscibility gap. Cross-sectional imaging by high-resolution transmission electron microscopy identifies striations in the alloy at this particular composition, consistent with separation into multiple phases. Through this work, we draw attention to alloys in the search for better contacts to two-dimensional materials for next-generation devices. 
    more » « less
  4. SUMMARY Geothermal heat flow beneath the Greenland and Antarctic ice sheets is an important boundary condition for ice sheet dynamics, but is rarely measured directly and therefore is inferred indirectly from proxies (e.g. seismic structure, magnetic Curie depth, surface topography). We seek to improve the understanding of the relationship between heat flow and one such proxy—seismic structure—and determine how well heat flow data can be predicted from the structure (the characterization problem). We also seek to quantify the extent to which this relationship can be extrapolated from one continent to another (the transportability problem). To address these problems, we use direct heat flow observations and new seismic structural information in the contiguous United States and Europe, and construct three Machine Learning models of the relationship with different levels of complexity (Linear Regression, Decision Tree and Random Forest). We compare these models in terms of their interpretability, the predicted heat flow accuracy within a continent and the accuracy of the extrapolation between Europe and the United States. The Random Forest and Decision Tree models are the most accurate within a continent, while the Linear Regression and Decision Tree models are the most accurate upon extrapolation between continents. The Decision Tree model uniquely illuminates the regional variations of the relationship between heat flow and seismic structure. From the Decision Tree model, uppermost mantle shear wave speed, crustal shear wave speed and Moho depth together explain more than half of the observed heat flow variations in both the United States [$$r^2 \approx 0.6$$ (coefficient of determination), $$\mathrm{RMSE} \approx 8\, {\rm mW}\,{\rm m}^{-2}$$ (Root Mean Squared Error)] and Europe ($$r^2 \approx 0.5, \mathrm{RMSE} \approx 13\, {\rm mW}\,{\rm m}^{-2}$$), such that uppermost mantle shear wave speed is the most important. Extrapolating the U.S.-trained models to Europe reasonably predicts the geographical distribution of heat flow [$$\rho = 0.48$$ (correlation coefficient)], but not the absolute amplitude of the variations ($r^2 = 0.17$), similarly from Europe to the United States ($$\rho = 0.66, r^2 = 0.24$$). The deterioration of accuracy upon extrapolation is caused by differences between the continents in how seismic structure is imaged, the heat flow data and intrinsic crustal radiogenic heat production. Our methods have the potential to improve the reliability and resolution of heat flow inferences across Antarctica and the validation and cross-validation procedures we present can be applied to heat flow proxies other than seismic structure, which may help resolve inconsistencies between existing subglacial heat flow values inferred using different proxies. 
    more » « less
  5. Sea ice regulates heat exchange between the ocean and atmosphere in Earth’s polar regions. The thermal conductivity of sea ice governs this exchange, and is a key parameter in climate modelling. However, it is challenging to measure and predict due to its sensitive dependence on temperature, salinity and brine microstructure. Moreover, as temperature increases, sea ice becomes permeable, and fluid can flow through the porous microstructure. While models for thermal diffusion through sea ice have been obtained, advective contributions to transport have not been considered theoretically. Here, we homogenize a multiscale advection–diffusion equation that models thermal transport through porous sea ice when fluid flow is present. We consider two-dimensional models of convective flow and use an integral representation to derive bounds on the thermal conductivity as a function of the Péclet number. These bounds guarantee enhancement in the thermal conductivity due to the added flow. Further, we relate the Péclet number to temperature, making these bounds useful for global climate models. Our analytic approach offers a mathematical theory which can not only improve predictions of atmosphere–ice–ocean heat exchanges in climate models, but can provide a theoretical framework for a range of problems involving advection–diffusion processes in various fields of application. 
    more » « less