skip to main content

Title: Application of Machine Learning for Predicting Building Energy Use at Different Temporal and Spatial Resolution under Climate Change in USA
Given the urgency of climate change, development of fast and reliable methods is essential to understand urban building energy use in the sector that accounts for 40% of total energy use in USA. Although machine learning (ML) methods may offer promise and are less difficult to develop, discrepancy in methods, results, and recommendations have emerged that requires attention. Existing research also shows inconsistencies related to integrating climate change models into energy modeling. To address these challenges, four models: random forest (RF), extreme gradient boosting (XGBoost), single regression tree, and multiple linear regression (MLR), were developed using the Commercial Building Energy Consumption Survey dataset to predict energy use intensity (EUI) under projected heating and cooling degree days by the Intergovernmental Panel on Climate Change (IPCC) across the USA during the 21st century. The RF model provided better performance and reduced the mean absolute error by 4%, 11%, and 12% compared to XGBoost, single regression tree, and MLR, respectively. Moreover, using the RF model for climate change analysis showed that office buildings’ EUI will increase between 8.9% to 63.1% compared to 2012 baseline for different geographic regions between 2030 and 2080. One region is projected to experience an EUI reduction of almost more » 1.5%. Finally, good data enhance the predicting ability of ML therefore, comprehensive regional building datasets are crucial to assess counteraction of building energy use in the face of climate change at finer spatial scale. « less
Award ID(s):
Publication Date:
Journal Name:
Page Range or eLocation-ID:
Sponsoring Org:
National Science Foundation
More Like this
  1. Basal area is a key measure of forest stocking and an important proxy of forest productivity in the face of climate change. Black walnut ( Juglans nigra ) is one of the most valuable timber species in North America. However, little is known about how the stocking of black walnut would change with differed bioclimatic conditions under climate change. In this study, we projected the current and future basal area of black walnut. We trained different machine learning models using more than 1.4 million tree records from 10,162 Forest Inventory and Analysis (FIA) sample plots and 42 spatially explicit bioclimate and other environmental attributes. We selected random forests (RF) as the final model to estimate the basal area of black walnut under climate change because RF had a higher coefficient of determination ( R 2 ), lower root mean square error (RMSE), and lower mean absolute error (MAE) than the other two models (XGBoost and linear regression). The most important variables to predict basal area were the mean annual temperature and precipitation, potential evapotranspiration, topology, and human footprint. Under two emission scenarios (Representative Concentration Pathway 4.5 and 8.5), the RF model projected that black walnut stocking would increase in themore »northern part of the current range in the USA by 2080, with a potential shift of species distribution range although uncertainty still exists due to unpredictable events, including extreme abiotic (heat, drought) and biotic (pests, disease) occurrences. Our models can be adapted to other hardwood tree species to predict tree changes in basal area based on future climate scenarios.« less
  2. Abstract. In the past decades, data-driven machine-learning (ML) models have emerged as promising tools for short-term streamflow forecasting. Among other qualities, the popularity of ML models for such applications is due to their relative ease in implementation, less strict distributional assumption, and competitive computational and predictive performance. Despite the encouraging results, most applications of ML for streamflow forecasting have been limited to watersheds in which rainfall is the major source of runoff. In this study, we evaluate the potential of random forests (RFs), a popular ML method, to make streamflow forecasts at 1 d of lead time at 86 watersheds in the Pacific Northwest. These watersheds cover diverse climatic conditions and physiographic settings and exhibit varied contributions of rainfall and snowmelt to their streamflow. Watersheds are classified into three hydrologic regimes based on the timing of center-of-annual flow volume: rainfall-dominated, transient, and snowmelt-dominated. RF performance is benchmarked against naïve and multiple linear regression (MLR) models and evaluated using four criteria: coefficient of determination, root mean squared error, mean absolute error, and Kling–Gupta efficiency (KGE). Model evaluation scores suggest that the RF performs better in snowmelt-driven watersheds compared to rainfall-driven watersheds. The largest improvements in forecasts compared to benchmark models are foundmore »among rainfall-driven watersheds. RF performance deteriorates with increases in catchment slope and soil sandiness. We note disagreement between two popular measures of RF variable importance and recommend jointly considering these measures with the physical processes under study. These and other results presented provide new insights for effective application of RF-based streamflow forecasting.« less
  3. Abstract
    Excessive phosphorus (P) applications to croplands can contribute to eutrophication of surface waters through surface runoff and subsurface (leaching) losses. We analyzed leaching losses of total dissolved P (TDP) from no-till corn, hybrid poplar (Populus nigra X P. maximowiczii), switchgrass (Panicum virgatum), miscanthus (Miscanthus giganteus), native grasses, and restored prairie, all planted in 2008 on former cropland in Michigan, USA. All crops except corn (13 kg P ha−1 year−1) were grown without P fertilization. Biomass was harvested at the end of each growing season except for poplar. Soil water at 1.2 m depth was sampled weekly to biweekly for TDP determination during March–November 2009–2016 using tension lysimeters. Soil test P (0–25 cm depth) was measured every autumn. Soil water TDP concentrations were usually below levels where eutrophication of surface waters is frequently observed (> 0.02 mg L−1) but often higher than in deep groundwater or nearby streams and lakes. Rates of P leaching, estimated from measured concentrations and modeled drainage, did not differ statistically among cropping systems across years; 7-year cropping system means ranged from 0.035 to 0.072 kg P ha−1 year−1 with large interannual variation. Leached P was positively related to STP, which decreased over the 7 years in all systems. These results indicate that both P-fertilized and unfertilized cropping systems mayMore>>
  4. Abstract. Climate warming will cause mountain snowpacks to melt earlier, reducing summer streamflow and threatening water supplies and ecosystems. Quantifying how sensitive streamflow timing is to climate change and where it is most sensitive remain key questions. Physically based hydrological models are often used for this purpose; however, they have embedded assumptions that translate into uncertain hydrological projections that need to be quantified and constrained to provide reliable inferences. The purpose of this study is to evaluate differences in projected end-of-century changes to streamflow timing between a new empirical model based on diel (daily) streamflow cycles and regional land surface simulations across the mountainous western USA. We develop an observational technique for detecting streamflow responses to snowmelt using diel cycles of incoming solar radiation and streamflow to detect when snowmelt occurs. We measure the date of the 20th percentile of snowmelt days (DOS20) across 31 western USA watersheds affected by snow, as a proxy for the beginning of snowmelt-initiated streamflow. Historic DOS20 varies from mid-January to late May among our sites, with warmer basins having earlier snowmelt-mediated streamflow. Mean annual DOS20 strongly correlates with the dates of 25 % and 50 % annual streamflow volume (DOQ25 and DOQ50, both R2=0.85), suggesting that a 1 d earlier DOS20more »corresponds with a 1 d earlier DOQ25 and 0.7 d earlier DOQ50. Empirical projections of future DOS20 based on a stepwise multiple linear regression across sites and years under the RCP8.5 scenario for the late 21st century show that DOS20 will occur on average 11±4 d earlier per 1 ∘C of warming. However, DOS20 in colder watersheds (mean November–February air temperature, TNDJF<-8 ∘C) is on average 70 % more sensitive to climate change than in warmer watersheds (TNDJF>0 ∘C). Moreover, empirical projections of DOQ25 and DOQ50 based on DOS20 are about four and two times more sensitive to climate change, respectively, than those simulated by a state-of-the-art land surface model (NoahMP-WRF) under the same scenario. Given the importance of changes in streamflow timing for water resources, and the significant discrepancies found in projected streamflow sensitivity, snowmelt detection methods such as DOS20 based on diel streamflow cycles may help to constrain model parameters, improve hydrological predictions, and inform process understanding.« less
  5. Machine learning (ML) methods, such as artificial neural networks (ANN), k-nearest neighbors (kNN), random forests (RF), support vector machines (SVM), and boosted decision trees (DTs), may offer stronger predictive performance than more traditional, parametric methods, such as linear regression, multiple linear regression, and logistic regression (LR), for specific mapping and modeling tasks. However, this increased performance is often accompanied by increased model complexity and decreased interpretability, resulting in critiques of their “black box” nature, which highlights the need for algorithms that can offer both strong predictive performance and interpretability. This is especially true when the global model and predictions for specific data points need to be explainable in order for the model to be of use. Explainable boosting machines (EBM), an augmentation and refinement of generalize additive models (GAMs), has been proposed as an empirical modeling method that offers both interpretable results and strong predictive performance. The trained model can be graphically summarized as a set of functions relating each predictor variable to the dependent variable along with heat maps representing interactions between selected pairs of predictor variables. In this study, we assess EBMs for predicting the likelihood or probability of slope failure occurrence based on digital terrain characteristics inmore »four separate Major Land Resource Areas (MLRAs) in the state of West Virginia, USA and compare the results to those obtained with LR, kNN, RF, and SVM. EBM provided predictive accuracies comparable to RF and SVM and better than LR and kNN. The generated functions and visualizations for each predictor variable and included interactions between pairs of predictor variables, estimation of variable importance based on average mean absolute scores, and provided scores for each predictor variable for new predictions add interpretability, but additional work is needed to quantify how these outputs may be impacted by variable correlation, inclusion of interaction terms, and large feature spaces. Further exploration of EBM is merited for geohazard mapping and modeling in particular and spatial predictive mapping and modeling in general, especially when the value or use of the resulting predictions would be greatly enhanced by improved interpretability globally and availability of prediction explanations at each cell or aggregating unit within the mapped or modeled extent.« less