skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Improving Low‐Cloud Fraction Prediction Through Machine Learning
Abstract In this study, we evaluated the performance of machine learning (ML) models (XGBoost) in predicting low‐cloud fraction (LCF), compared to two generations of the community atmospheric model (CAM5 and CAM6) and ERA5 reanalysis data, each having a different cloud scheme. ML models show a substantial enhancement in predicting LCF regarding root mean squared errors and correlation coefficients. The good performance is consistent across the full spectrums of atmospheric stability and large‐scale vertical velocity. Employing an explainable ML approach, we revealed the importance of including the amount of available moisture in ML models for representing spatiotemporal variations in LCF in the midlatitudes. Also, ML models demonstrated marked improvement in capturing the LCF variations during the stratocumulus‐to‐cumulus transition (SCT). This study suggests ML models' great potential to address the longstanding issues of “too few” low clouds and “too rapid” SCT in global climate models.  more » « less
Award ID(s):
2126098
PAR ID:
10579003
Author(s) / Creator(s):
 ;  ;  
Publisher / Repository:
DOI PREFIX: 10.1029
Date Published:
Journal Name:
Geophysical Research Letters
Volume:
51
Issue:
15
ISSN:
0094-8276
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract This study leveraged a Lagrangian framework to examine the evolution of stratocumulus clouds under cold and warm advections (CADV and WADV) in the Community Earth System Model 2 (CESM2) against observations. We found that CESM2 simulates a too rapid decline in low‐cloud fraction (LCF) and cloud liquid water path (CLWP) under CADV conditions, while it better aligns closely with observed LCF under WADV conditions but overestimates the increase in CLWP. Employing an explainable machine learning approach, we found that too rapid decreases in LCF and CLWP under CADV conditions are related to overestimated drying effects induced by sea surface temperature, whereas the substantial increase in CLWP under WADV conditions is associated with the overestimated moistening effects due to free‐tropospheric moisture and surface winds. Our findings suggest that overestimated drying effects of sea surface temperature on cloud properties might be one of crucial causes for the high equilibrium climate sensitivity in CESM2. 
    more » « less
  2. Abstract The Southern Ocean is covered by a large amount of clouds with high cloud albedo. However, as reported by previous climate model intercomparison projects, underestimated cloudiness and overestimated absorption of solar radiation (ASR) over the Southern Ocean lead to substantial biases in climate sensitivity. The present study revisits this long-standing issue and explores the uncertainty sources in the latest CMIP6 models. We employ 10-year satellite observations to evaluate cloud radiative effect (CRE) and cloud physical properties in five CMIP6 models that provide comprehensive output of cloud, radiation, and aerosol. The simulated longwave, shortwave, and net CRE at the top of atmosphere in CMIP6 are comparable with the CERES satellite observations. Total cloud fraction (CF) is also reasonably simulated in CMIP6, but the comparison of liquid cloud fraction (LCF) reveals marked biases in spatial pattern and seasonal variations. The discrepancies between the CMIP6 models and the MODIS satellite observations become even larger in other cloud macro- and micro-physical properties, including liquid water path (LWP), cloud optical depth (COD), and cloud effective radius, as well as aerosol optical depth (AOD). However, the large underestimation of both LWP and cloud effective radius (regional means ∼20% and 11%, respectively) results in relatively smaller bias in COD, and the impacts of the biases in COD and LCF also cancel out with each other, leaving CRE and ASR reasonably predicted in CMIP6. An error estimation framework is employed, and the different signs of the sensitivity errors and biases from CF and LWP corroborate the notions that there are compensating errors in the modeled shortwave CRE. Further correlation analyses of the geospatial patterns reveal that CF is the most relevant factor in determining CRE in observations, while the modeled CRE is too sensitive to LWP and COD. The relationships between cloud effective radius, LWP, and COD are also analyzed to explore the possible uncertainty sources in different models. Our study calls for more rigorous calibration of detailed cloud physical properties for future climate model development and climate projection. 
    more » « less
  3. Abstract. There has been a growing concern that most climate models predict precipitation that is too frequent, likely due to lack of reliable subgrid variabilityand vertical variations in microphysical processes in low-level warm clouds.In this study, the warm-cloud physics parameterizations in the singe-columnconfigurations of NCAR Community Atmospheric Model version 6 and 5 (SCAM6and SCAM5, respectively) are evaluated using ground-based and airborneobservations from the Department of Energy (DOE) Atmospheric Radiation Measurement (ARM) Aerosol and Cloud Experiments in the EasternNorth Atlantic (ACE-ENA) field campaign near the Azores islands during2017–2018. The 8-month single-column model (SCM) simulations show that both SCAM6 and SCAM5 cangenerally reproduce marine boundary layer cloud structure, majormacrophysical properties, and their transition. The improvement in warm-cloud properties from the Community Atmospheric Model 5 and 6 (CAM5 to CAM6) physics can be found through comparison with the observations. Meanwhile, both physical schemes underestimate cloud liquidwater content, cloud droplet size, and rain liquid water content butoverestimate surface rainfall. Modeled cloud condensation nuclei (CCN)concentrations are comparable with aircraft-observed ones in the summer but areoverestimated by a factor of 2 in winter, largely due to the biases in thelong-range transport of anthropogenic aerosols like sulfate. We also testthe newly recalibrated autoconversion and accretion parameterizations thataccount for vertical variations in droplet size. Compared to theobservations, more significant improvement is found in SCAM5 than in SCAM6.This result is likely explained by the introduction of subgrid variationsin cloud properties in CAM6 cloud microphysics, which further suppresses thescheme's sensitivity to individual warm-rain microphysical parameters. Thepredicted cloud susceptibilities to CCN perturbations in CAM6 are within areasonable range, indicating significant progress since CAM5 which produces anaerosol indirect effect that is too strong. The present study emphasizes theimportance of understanding biases in cloud physics parameterizations bycombining SCM with in situ observations. 
    more » « less
  4. Abstract. Remote sensing measurements have been widely used to estimate the planetary boundary layer height (PBLHT). Each remote sensing approach offers unique strengths and faces different limitations. In this study, we use machine learning (ML) methods to produce a best-estimate PBLHT (PBLHT-BE-ML) by integrating four PBLHT estimates derived from remote sensing measurements at the Department of Energy (DOE) Atmospheric Radiation Measurement (ARM) Southern Great Plains (SGP) observatory. Three ML models – random forest (RF) classifier, RF regressor, and light gradient-boosting machine (LightGBM) – were trained on a dataset from 2017 to 2023 that included radiosonde, various remote sensing PBLHT estimates, and atmospheric meteorological conditions. Evaluations indicated that PBLHT-BE-ML from all three models improved alignment with the PBLHT derived from radiosonde data (PBLHT-SONDE), with LightGBM demonstrating the highest accuracy under both stable and unstable boundary layer conditions. Feature analysis revealed that the most influential input features at the SGP site were the PBLHT estimates derived from (a) potential temperature profiles retrieved using Raman lidar (RL) and atmospheric emitted radiance interferometer (AERI) measurements (PBLHT-THERMO), (b) vertical velocity variance profiles from Doppler lidar (PBLHT-DL), and (c) aerosol backscatter profiles from micropulse lidar (PBLHT-MPL). The trained models were then used to predict PBLHT-BE-ML at a temporal resolution of 10 min, effectively capturing the diurnal evolution of PBLHT and its significant seasonal variations, with the largest diurnal variation observed over summer at the SGP site. We applied these trained models to data from the ARM Eastern Pacific Cloud Aerosol Precipitation Experiment (EPCAPE) field campaign (EPC), where the PBLHT-BE-ML, particularly with the LightGBM model, demonstrated improved accuracy against PBLHT-SONDE. Analyses of model performance at both the SGP and EPC sites suggest that expanding the training dataset to include various surface types, such as ocean and ice-covered areas, could further enhance ML model performance for PBLHT estimation across varied geographic regions. 
    more » « less
  5. The advances of Machine Learning (ML) have sparked a growing demand of ML-as-a-Service: developers train ML models and publish them in the cloud as online services to provide low-latency inference at scale. The key challenge of ML model serving is to meet the response-time Service-Level Objectives (SLOs) of inference workloads while minimizing the serving cost. In this paper, we tackle the dual challenge of SLO compliance and cost effectiveness with MArk (Model Ark), a general-purpose inference serving system built in Amazon Web Services (AWS). MArk employs three design choices tailor-made for inference workload. First, MArk dynamically batches requests and opportunistically serves them using expensive hardware accelerators (e.g., GPU) for improved performance-cost ratio. Second, instead of relying on feedback control scaling or over-provisioning to serve dynamic workload, which can be too slow or too expensive for inference serving, MArk employs predictive autoscaling to hide the provisioning latency at low cost. Third, given the stateless nature of inference serving, MArk exploits the flexible, yet costly serverless instances to cover the occasional load spikes that are hard to predict. We evaluated the performance of MArk using several state-of-the-art ML models trained in popular frameworks including TensorFlow, MXNet, and Keras. Compared with the premier industrial ML serving platform SageMaker, MArk reduces the serving cost up to 7.8× while achieving even better latency performance. 
    more » « less