- Award ID(s):
- Publication Date:
- NSF-PAR ID:
- Journal Name:
- Proceedings: Forty Seventh Workshop on Geothermal Reservoir Engineering
- Sponsoring Org:
- National Science Foundation
More Like this
Predicting Geothermal Favorability in the Western United States by Using Machine Learning: Addressing Challenges and Developing SolutionsPrevious moderate- and high-temperature geothermal resource assessments of the western United States utilized weight-of-evidence and logistic regression methodstoestimateresourcefavorability,buttheseanalyses relied uponsomeexpert decisions.Whileexpert decisions can add confidence to aspects of the modeling process by ensuring only reasonable models are employed, expert decisions also introduce human bias into assessments. This bias presents a source of error that may affect the performance of the models and resulting resource estimates. Our study aims to reduce expert input through robust data-driven analyses and better-suited data science techniques, with the goals of saving time, reducing bias, and improving predictive ability. We present six favorability maps for geothermal resources in the western United States created using two strategies applied to three modern machine learning algorithms (logistic regression, support- vector machines, and XGBoost). To provide a direct comparison to previous assessments, we use the same input data as the 2008 U.S. Geological Survey (USGS) conventional moderate- to high-temperature geothermal resource assessment. The six new favorability maps required far less expert decision-making, but broadly agree with the previous assessment. Despite the fact that the 2008 assessment results employed linear methods, the non-linear machine learning algorithms (i.e., support-vector machines and XGBoost) produced greater agreement with the previous assessment than the linearmore »
Automated Analysis of the US Drought Monitor Maps With Machine Learning and Multiple Drought IndicatorsThe US Drought Monitor (USDM) is a hallmark in real time drought monitoring and assessment as it was developed by multiple agencies to provide an accurate and timely assessment of drought conditions in the US on a weekly basis. The map is built based on multiple physical indicators as well as reported observations from local contributors before human analysts combine the information and produce the drought map using their best judgement. Since human subjectivity is included in the production of the USDM maps, it is not an entirely clear quantitative procedure for other entities to reproduce the maps. In this study, we developed a framework to automatically generate the maps through a machine learning approach by predicting the drought categories across the domain of study. A persistence model served as the baseline model for comparison in the framework. Three machine learning algorithms, logistic regression, random forests, and support vector machines, with four different groups of input data, which formed an overall of 12 different configurations, were used for the prediction of drought categories. Finally, all the configurations were evaluated against the baseline model to select the best performing option. The results showed that our proposed framework could reproduce the droughtmore »
Machine learning (ML) methods, such as artificial neural networks (ANN), k-nearest neighbors (kNN), random forests (RF), support vector machines (SVM), and boosted decision trees (DTs), may offer stronger predictive performance than more traditional, parametric methods, such as linear regression, multiple linear regression, and logistic regression (LR), for specific mapping and modeling tasks. However, this increased performance is often accompanied by increased model complexity and decreased interpretability, resulting in critiques of their “black box” nature, which highlights the need for algorithms that can offer both strong predictive performance and interpretability. This is especially true when the global model and predictions for specific data points need to be explainable in order for the model to be of use. Explainable boosting machines (EBM), an augmentation and refinement of generalize additive models (GAMs), has been proposed as an empirical modeling method that offers both interpretable results and strong predictive performance. The trained model can be graphically summarized as a set of functions relating each predictor variable to the dependent variable along with heat maps representing interactions between selected pairs of predictor variables. In this study, we assess EBMs for predicting the likelihood or probability of slope failure occurrence based on digital terrain characteristics inmore »
Abstract STUDY QUESTION
Can we derive adequate models to predict the probability of conception among couples actively trying to conceive?
Leveraging data collected from female participants in a North American preconception cohort study, we developed models to predict pregnancy with performance of ∼70% in the area under the receiver operating characteristic curve (AUC).
WHAT IS KNOWN ALREADY
Earlier work has focused primarily on identifying individual risk factors for infertility. Several predictive models have been developed in subfertile populations, with relatively low discrimination (AUC: 59–64%).
STUDY DESIGN, SIZE, DURATION
Study participants were female, aged 21–45 years, residents of the USA or Canada, not using fertility treatment, and actively trying to conceive at enrollment (2013–2019). Participants completed a baseline questionnaire at enrollment and follow-up questionnaires every 2 months for up to 12 months or until conception. We used data from 4133 participants with no more than one menstrual cycle of pregnancy attempt at study entry.
PARTICIPANTS/MATERIALS, SETTING, METHODS
On the baseline questionnaire, participants reported data on sociodemographic factors, lifestyle and behavioral factors, diet quality, medical history and selected male partner characteristics. A total of 163 predictors were considered in this study. We implemented regularized logistic regression, support vector machines, neural networks and gradient boosted decisionmore »
MAIN RESULTS AND THE ROLE OF CHANCE
Model I and II AUCs were 70% and 66%, respectively, in parsimonious models, and the concordance index for Model III was 63%. The predictors that were positively associated with pregnancy in all models were: having previously breastfed an infant and using multivitamins or folic acid supplements. The predictors that were inversely associated with pregnancy in all models were: female age, female BMI and history of infertility. Among nulligravid women with no history of infertility, the most important predictors were: female age, female BMI, male BMI, use of a fertility app, attempt time at study entry and perceived stress.
LIMITATIONS, REASONS FOR CAUTION
Reliance on self-reported predictor data could have introduced misclassification, which would likely be non-differential with respect to the pregnancy outcome given the prospective design. In addition, we cannot be certain that all relevant predictor variables were considered. Finally, though we validated the models using split-sample replication techniques, we did not conduct an external validation study.
WIDER IMPLICATIONS OF THE FINDINGS
Given a wide range of predictor data, machine learning algorithms can be leveraged to analyze epidemiologic data and predict the probability of conception with discrimination that exceeds earlier work.
STUDY FUNDING/COMPETING INTEREST(S)
The research was partially supported by the U.S. National Science Foundation (under grants DMS-1664644, CNS-1645681 and IIS-1914792) and the National Institutes for Health (under grants R01 GM135930 and UL54 TR004130). In the last 3 years, L.A.W. has received in-kind donations for primary data collection in PRESTO from FertilityFriend.com, Kindara.com, Sandstone Diagnostics and Swiss Precision Diagnostics. L.A.W. also serves as a fibroid consultant to AbbVie, Inc. The other authors declare no competing interests.
TRIAL REGISTRATION NUMBER
Optimizing Automated Kriging to Improve Spatial Interpolation of Monthly Rainfall over Complex Terrain
Gridded monthly rainfall estimates can be used for a number of research applications, including hydrologic modeling and weather forecasting. Automated interpolation algorithms, such as the “autoKrige” function in R, can produce gridded rainfall estimates that validate well but produce unrealistic spatial patterns. In this work, an optimized geostatistical kriging approach is used to interpolate relative rainfall anomalies, which are then combined with long-term means to develop the gridded estimates. The optimization consists of the following: 1) determining the most appropriate offset (constant) to use when log-transforming data; 2) eliminating poor quality data prior to interpolation; 3) detecting erroneous maps using a machine learning algorithm; and 4) selecting the most appropriate parameterization scheme for fitting the model used in the interpolation. Results of this effort include a 30-yr (1990–2019), high-resolution (250-m) gridded monthly rainfall time series for the state of Hawai‘i. Leave-one-out cross validation (LOOCV) is performed using an extensive network of 622 observation stations. LOOCV results are in good agreement with observations (
R2= 0.78; MAE = 55 mm month−1; 1.4%); however, predictions can underestimate high rainfall observations (bias = 34 mm month−1; −1%) due to a well-known smoothing effect that occurs with kriging. This research highlights the fact thatmore » Significance Statement
A new method is developed to map rainfall in Hawai‘i using an optimized geostatistical kriging approach. A machine learning technique is used to detect erroneous rainfall maps and several conditions are implemented to select the optimal parameterization scheme for fitting the model used in the kriging interpolation. A key finding is that optimization of the interpolation approach is necessary because maps may validate well but have unrealistic spatial patterns. This approach demonstrates how, with a moderate amount of data, a low-level machine learning algorithm can be trained to evaluate and classify an unrealistic map output.