skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Automated Analysis of the US Drought Monitor Maps With Machine Learning and Multiple Drought Indicators
The US Drought Monitor (USDM) is a hallmark in real time drought monitoring and assessment as it was developed by multiple agencies to provide an accurate and timely assessment of drought conditions in the US on a weekly basis. The map is built based on multiple physical indicators as well as reported observations from local contributors before human analysts combine the information and produce the drought map using their best judgement. Since human subjectivity is included in the production of the USDM maps, it is not an entirely clear quantitative procedure for other entities to reproduce the maps. In this study, we developed a framework to automatically generate the maps through a machine learning approach by predicting the drought categories across the domain of study. A persistence model served as the baseline model for comparison in the framework. Three machine learning algorithms, logistic regression, random forests, and support vector machines, with four different groups of input data, which formed an overall of 12 different configurations, were used for the prediction of drought categories. Finally, all the configurations were evaluated against the baseline model to select the best performing option. The results showed that our proposed framework could reproduce the drought maps to a near-perfect level with the support vector machines algorithm and the group 4 data. The rest of the findings of this study can be highlighted as: 1) employing the past week drought data as a predictor in the models played an important role in achieving high prediction scores, 2) the nonlinear models, random forest, and support vector machines had a better overall performance compared to the logistic regression models, and 3) with borrowing the neighboring grid cells information, we could compensate the lack of training data in the grid cells with insufficient historical USDM data particularly for extreme and exceptional drought conditions.  more » « less
Award ID(s):
2006633
PAR ID:
10358728
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Frontiers in Big Data
Volume:
4
ISSN:
2624-909X
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Previous moderate- and high-temperature geothermal resource assessments of the western United States utilized weight-of-evidence and logistic regression methodstoestimateresourcefavorability,buttheseanalyses relied uponsomeexpert decisions.Whileexpert decisions can add confidence to aspects of the modeling process by ensuring only reasonable models are employed, expert decisions also introduce human bias into assessments. This bias presents a source of error that may affect the performance of the models and resulting resource estimates. Our study aims to reduce expert input through robust data-driven analyses and better-suited data science techniques, with the goals of saving time, reducing bias, and improving predictive ability. We present six favorability maps for geothermal resources in the western United States created using two strategies applied to three modern machine learning algorithms (logistic regression, support- vector machines, and XGBoost). To provide a direct comparison to previous assessments, we use the same input data as the 2008 U.S. Geological Survey (USGS) conventional moderate- to high-temperature geothermal resource assessment. The six new favorability maps required far less expert decision-making, but broadly agree with the previous assessment. Despite the fact that the 2008 assessment results employed linear methods, the non-linear machine learning algorithms (i.e., support-vector machines and XGBoost) produced greater agreement with the previous assessment than the linear machine learning algorithm (i.e., logistic regression). It is not surprising that geothermal systems depend on non-linear combinations of features, and we postulate that the expert decisions during the 2008 assessment accounted for system non-linearities. Substantial challenges to applying machine learning algorithms to predict geothermal resource favorability include severe class imbalance (i.e., there are very few known geothermal systems compared to the large area considered), and while there are known geothermal systems (i.e., positive labels), all other sites have an unknown status (i.e., they are unlabeled), instead of receiving a negative label (i.e., the known/proven absence of a geothermal resource). We address both challenges through a custom undersampling strategy that can be used with any algorithm and then evaluated using F1 scores. 
    more » « less
  2. Previous moderate- and high-temperature geothermal resource assessments of the western United States utilized weight-of-evidence and logistic regression methodstoestimateresourcefavorability,buttheseanalyses relied uponsomeexpert decisions.Whileexpert decisions can add confidence to aspects of the modeling process by ensuring only reasonable models are employed, expert decisions also introduce human bias into assessments. This bias presents a source of error that may affect the performance of the models and resulting resource estimates. Our study aims to reduce expert input through robust data-driven analyses and better-suited data science techniques, with the goals of saving time, reducing bias, and improving predictive ability. We present six favorability maps for geothermal resources in the western United States created using two strategies applied to three modern machine learning algorithms (logistic regression, support- vector machines, and XGBoost). To provide a direct comparison to previous assessments, we use the same input data as the 2008 U.S. Geological Survey (USGS) conventional moderate- to high-temperature geothermal resource assessment. The six new favorability maps required far less expert decision-making, but broadly agree with the previous assessment. Despite the fact that the 2008 assessment results employed linear methods, the non-linear machine learning algorithms (i.e., support-vector machines and XGBoost) produced greater agreement with the previous assessment than the linear machine learning algorithm (i.e., logistic regression). It is not surprising that geothermal systems depend on non-linear combinations of features, and we postulate that the expert decisions during the 2008 assessment accounted for system non-linearities. Substantial challenges to applying machine learning algorithms to predict geothermal resource favorability include severe class imbalance (i.e., there are very few known geothermal systems compared to the large area considered), and while there are known geothermal systems (i.e., positive labels), all other sites have an unknown status (i.e., they are unlabeled), instead of receiving a negative label (i.e., the known/proven absence of a geothermal resource). We address both challenges through a custom undersampling strategy that can be used with any algorithm and then evaluated using F1 scores. 
    more » « less
  3. In this study, optical and microwave satellite observations are integrated to estimate soil moisture at the same spatial resolution as the optical sensors (5km here) and applied for drought analysis in the continental United States. A new refined model is proposed to include auxiliary data like soil texture, topography, surface types, accumulated precipitation, in addition to Normalized Difference Vegetation Index (NDVI) and Land Surface Temperature (LST) used in the traditional universal triangle method. It is found the new proposed soil moisture model using accumulated precipitation demonstrated close agreements with the U.S. Drought Monitor (USDM) spatial patterns. Currently, the USDM is providing a weekly map. Recently, “flash” drought concept appears. To obtain drought map on daily basis, LST is derived from microwave observations and downscaled to the same resolution as the thermal infrared LST product and used to fill the gaps due to clouds in optical LST data. With the integrated daily LST available under nearly all weather conditions, daily soil moisture can be estimated at relatively higher spatial resolution than those traditionally derived from passive microwave sensors, thus drought maps based on soil moisture anomalies can be obtained on daily basis and made the flash drought analysis and monitoring become possible. 
    more » « less
  4. In this study, optical and microwave satellite observations are integrated to estimate soil moisture at high spatial resolution and applied for drought analysis in the continental United States.  To estimate soil moisture, a new refined model is proposed to estimate soil moisture (SM) with auxiliary data like soil texture, topography, surface types, accumulated precipitation, in addition to Normalized Difference Vegetation Index and Land Surface Temperature (LST) used in the traditional universal triangle method. It is found the new proposed SM model using accumulated precipitation demonstrated close agreements with the U.S. Drought Monitor (USDM) spatial patterns.  Currently, the USDM is providing a weekly map.  Recently, “flash” drought concept appears. To obtain drought map on daily basis, LST is derived from microwave observations and downscaled to the same resolution as the thermal infrared LST product and used to fill the gaps due to clouds in optical LST data. With the integrated daily LST available under nearly all weather conditions, daily soil moisture can be estimated at relatively high spatial resolution, thus drought maps based on soil moisture anomalies can be obtained at high spatial resolution on daily basis and made the flash drought analysis and monitoring become possible. 
    more » « less
  5. Abstract We present a novel data set for drought in the continental US (CONUS) built to enable computationally efficient spatio-temporal statistical and probabilistic models of drought. We converted drought data obtained from the widely-used US Drought Monitor (USDM) from its native geo-referenced polygon format to a 0.5 degree regular grid. We merged known environmental drivers of drought, including those obtained from the North American Land Data Assimilation System (NLDAS-2), US Geological Survey (USGS) streamflow data, and National Oceanic and Atmospheric Administration (NOAA) teleconnections data. The resulting data set permits statistical and probabilistic modeling of drought with explicit spatial and/or temporal dependence. Such models could be used to forecast drought at short-range, seasonal to sub-seasonal, and inter-annual timescales with uncertainty, extending the reach and value of the current US Drought Outlook from the National Weather Service Climate Prediction Center. This novel data product provides the first common gridded dataset that includes critical variables used to inform hydrological and meteorological drought. 
    more » « less