Previous moderate- and high-temperature geothermal resource assessments of the western United States utilized weight-of-evidence and logistic regression methodstoestimateresourcefavorability,buttheseanalyses relied uponsomeexpert decisions.Whileexpert decisions can add confidence to aspects of the modeling process by ensuring only reasonable models are employed, expert decisions also introduce human bias into assessments. This bias presents a source of error that may affect the performance of the models and resulting resource estimates. Our study aims to reduce expert input through robust data-driven analyses and better-suited data science techniques, with the goals of saving time, reducing bias, and improving predictive ability. We present six favorability maps for geothermal resources in the western United States created using two strategies applied to three modern machine learning algorithms (logistic regression, support- vector machines, and XGBoost). To provide a direct comparison to previous assessments, we use the same input data as the 2008 U.S. Geological Survey (USGS) conventional moderate- to high-temperature geothermal resource assessment. The six new favorability maps required far less expert decision-making, but broadly agree with the previous assessment. Despite the fact that the 2008 assessment results employed linear methods, the non-linear machine learning algorithms (i.e., support-vector machines and XGBoost) produced greater agreement with the previous assessment than the linear machine learning algorithm (i.e., logistic regression). It is not surprising that geothermal systems depend on non-linear combinations of features, and we postulate that the expert decisions during the 2008 assessment accounted for system non-linearities. Substantial challenges to applying machine learning algorithms to predict geothermal resource favorability include severe class imbalance (i.e., there are very few known geothermal systems compared to the large area considered), and while there are known geothermal systems (i.e., positive labels), all other sites have an unknown status (i.e., they are unlabeled), instead of receiving a negative label (i.e., the known/proven absence of a geothermal resource). We address both challenges through a custom undersampling strategy that can be used with any algorithm and then evaluated using F1 scores.
more »
« less
Automated Analysis of the US Drought Monitor Maps With Machine Learning and Multiple Drought Indicators
The US Drought Monitor (USDM) is a hallmark in real time drought monitoring and assessment as it was developed by multiple agencies to provide an accurate and timely assessment of drought conditions in the US on a weekly basis. The map is built based on multiple physical indicators as well as reported observations from local contributors before human analysts combine the information and produce the drought map using their best judgement. Since human subjectivity is included in the production of the USDM maps, it is not an entirely clear quantitative procedure for other entities to reproduce the maps. In this study, we developed a framework to automatically generate the maps through a machine learning approach by predicting the drought categories across the domain of study. A persistence model served as the baseline model for comparison in the framework. Three machine learning algorithms, logistic regression, random forests, and support vector machines, with four different groups of input data, which formed an overall of 12 different configurations, were used for the prediction of drought categories. Finally, all the configurations were evaluated against the baseline model to select the best performing option. The results showed that our proposed framework could reproduce the drought maps to a near-perfect level with the support vector machines algorithm and the group 4 data. The rest of the findings of this study can be highlighted as: 1) employing the past week drought data as a predictor in the models played an important role in achieving high prediction scores, 2) the nonlinear models, random forest, and support vector machines had a better overall performance compared to the logistic regression models, and 3) with borrowing the neighboring grid cells information, we could compensate the lack of training data in the grid cells with insufficient historical USDM data particularly for extreme and exceptional drought conditions.
more »
« less
- Award ID(s):
- 2006633
- PAR ID:
- 10358728
- Date Published:
- Journal Name:
- Frontiers in Big Data
- Volume:
- 4
- ISSN:
- 2624-909X
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Previous moderate- and high-temperature geothermal resource assessments of the western United States utilized weight-of-evidence and logistic regression methodstoestimateresourcefavorability,buttheseanalyses relied uponsomeexpert decisions.Whileexpert decisions can add confidence to aspects of the modeling process by ensuring only reasonable models are employed, expert decisions also introduce human bias into assessments. This bias presents a source of error that may affect the performance of the models and resulting resource estimates. Our study aims to reduce expert input through robust data-driven analyses and better-suited data science techniques, with the goals of saving time, reducing bias, and improving predictive ability. We present six favorability maps for geothermal resources in the western United States created using two strategies applied to three modern machine learning algorithms (logistic regression, support- vector machines, and XGBoost). To provide a direct comparison to previous assessments, we use the same input data as the 2008 U.S. Geological Survey (USGS) conventional moderate- to high-temperature geothermal resource assessment. The six new favorability maps required far less expert decision-making, but broadly agree with the previous assessment. Despite the fact that the 2008 assessment results employed linear methods, the non-linear machine learning algorithms (i.e., support-vector machines and XGBoost) produced greater agreement with the previous assessment than the linear machine learning algorithm (i.e., logistic regression). It is not surprising that geothermal systems depend on non-linear combinations of features, and we postulate that the expert decisions during the 2008 assessment accounted for system non-linearities. Substantial challenges to applying machine learning algorithms to predict geothermal resource favorability include severe class imbalance (i.e., there are very few known geothermal systems compared to the large area considered), and while there are known geothermal systems (i.e., positive labels), all other sites have an unknown status (i.e., they are unlabeled), instead of receiving a negative label (i.e., the known/proven absence of a geothermal resource). We address both challenges through a custom undersampling strategy that can be used with any algorithm and then evaluated using F1 scores.more » « less
-
In this study, optical and microwave satellite observations are integrated to estimate soil moisture at the same spatial resolution as the optical sensors (5km here) and applied for drought analysis in the continental United States. A new refined model is proposed to include auxiliary data like soil texture, topography, surface types, accumulated precipitation, in addition to Normalized Difference Vegetation Index (NDVI) and Land Surface Temperature (LST) used in the traditional universal triangle method. It is found the new proposed soil moisture model using accumulated precipitation demonstrated close agreements with the U.S. Drought Monitor (USDM) spatial patterns. Currently, the USDM is providing a weekly map. Recently, “flash” drought concept appears. To obtain drought map on daily basis, LST is derived from microwave observations and downscaled to the same resolution as the thermal infrared LST product and used to fill the gaps due to clouds in optical LST data. With the integrated daily LST available under nearly all weather conditions, daily soil moisture can be estimated at relatively higher spatial resolution than those traditionally derived from passive microwave sensors, thus drought maps based on soil moisture anomalies can be obtained on daily basis and made the flash drought analysis and monitoring become possible.more » « less
-
In this study, optical and microwave satellite observations are integrated to estimate soil moisture at high spatial resolution and applied for drought analysis in the continental United States. To estimate soil moisture, a new refined model is proposed to estimate soil moisture (SM) with auxiliary data like soil texture, topography, surface types, accumulated precipitation, in addition to Normalized Difference Vegetation Index and Land Surface Temperature (LST) used in the traditional universal triangle method. It is found the new proposed SM model using accumulated precipitation demonstrated close agreements with the U.S. Drought Monitor (USDM) spatial patterns. Currently, the USDM is providing a weekly map. Recently, “flash” drought concept appears. To obtain drought map on daily basis, LST is derived from microwave observations and downscaled to the same resolution as the thermal infrared LST product and used to fill the gaps due to clouds in optical LST data. With the integrated daily LST available under nearly all weather conditions, daily soil moisture can be estimated at relatively high spatial resolution, thus drought maps based on soil moisture anomalies can be obtained at high spatial resolution on daily basis and made the flash drought analysis and monitoring become possible.more » « less
-
null (Ed.)Abstract Soil moisture (SM) and evapotranspiration (ET) are key variables of the terrestrial water cycle with a strong relationship. This study examines remotely sensed soil moisture and evapotranspiration data assimilation (DA) with the aim of improving drought monitoring. Although numerous efforts have gone into assimilating satellite soil moisture observations into land surface models to improve their predictive skills, little attention has been given to the combined use of soil moisture and evapotranspiration to better characterize hydrologic fluxes. In this study, we assimilate two remotely sensed datasets, namely, Soil Moisture Operational Product System (SMOPS) and MODIS evapotranspiration (MODIS16 ET), at 1-km spatial resolution, into the VIC land surface model by means of an evolutionary particle filter method. To achieve this, a fully parallelized framework based on model and domain decomposition using a parallel divide-and-conquer algorithm was implemented. The findings show improvement in soil moisture predictions by multivariate assimilation of both ET and SM as compared to univariate scenarios. In addition, monthly and weekly drought maps are produced using the updated root-zone soil moisture percentiles over the Apalachicola–Chattahoochee–Flint basin in the southeastern United States. The model-based estimates are then compared against the corresponding U.S. Drought Monitor (USDM) archive maps. The results are consistent with the USDM maps during the winter and spring season considering the drought extents; however, the drought severity was found to be slightly higher according to DA method. Comparing different assimilation scenarios showed that ET assimilation results in wetter conditions comparing to open-loop and univariate SM DA. The multivariate DA then combines the effects of the two variables and provides an in-between condition.more » « less