Abstract. Lightning is affected by many factors, many of which are not routinely measured, well understood, or accounted for in physical models. Several commonly used machine learning (ML) models have been applied to analyze the relationship between Atmospheric Radiation Measurement (ARM) data and lightning data from the Earth Networks Total Lightning Network (ENTLN) in order to identify important variables affecting lightning occurrence in the vicinity of the Southern Great Plains (SGP) ARM site during the summer months (June, July, August and September) of 2012 to 2020. Testing various ML models, we found that the random forest model is the best predictor among common classifiers. When convective clouds were detected, it predicts lightning occurrence with an accuracy of 76.9 % and an area under the curve (AUC) of 0.850. Using this model, we further ranked the variables in terms of their effectiveness in nowcasting lightning and identified geometric cloud thickness, rain rate and convective available potential energy (CAPE) as the most effective predictors. The contrast in meteorological variables between no-lightning and frequent-lightning periods was examined for hours with CAPE values conducive to thunderstorm formation. Besides the variables considered for the ML models, surface variables and mid-altitude variables (e.g., equivalent potential temperature and minimum equivalent potential temperature, respectively) have statistically significant contrasts between no-lightning and frequent-lightning hours. For example, the minimum equivalent potential temperature from 700 to 500 hPa is significantly lower during frequent-lightning hours compared with no-lightning hours. Finally, a notable positive relationship between the intracloud (IC) flash fraction and the square root of CAPE (CAPE) was found, suggesting that stronger updrafts increase the height of the electrification zone, resulting in fewer flashes reaching the surface and consequently a greater IC flash fraction. 
                        more » 
                        « less   
                    
                            
                            Machine Learning based investigation of the variables affecting summertime lightning frequency over the Southern Great Plains
                        
                    
    
            Abstract. Lightning is affected by many factors, many of which are not routinely measured, well understood, or accounted for in physical models. Machine learning (ML) excels in exploring and revealing complex relationships between meteorological variables such as those measured at the South Great Plains (SGP) Atmospheric Radiation Measurement (ARM) site; a site that provides an unprecedented level of detail on atmospheric conditions and clouds. Several commonly used ML models have been applied to analyse the relationship between ARM data and lightning data from the Earth Networks Total Lightning Network (ENTLN) in order to identify important variables affecting lightning occurrence in the vicinity of the SGP site during the summers (June, July, August and September) of 2012 to 2020. Testing various ML models, we found that the Random Forest model is the best predictor among common classifiers. It predicted lightning occurrence with an accuracy of 76.9 % and an area under curve (AUC) of 0.850. Using this model, we further ranked the variables in terms of their effectiveness in predicting lightning and identified geometric cloud thickness, rain rate and convective available potential energy (CAPE) as the most effective predictors. The contrast in meteorological variables between no-lightning and frequent-lightning periods was examined on hours with CAPE values conducive to thunderstorm formation. Besides the variables considered for the ML models, surface variables such as equivalent potential temperature and mid-altitude variables such as minimum equivalent potential temperature have a large contrast between no-lightning and frequent-lightning hours. Finally, a notable positive relationship between intra-cloud (IC) flash fraction and the square root of CAPE was found suggesting that stronger updrafts increase the height of the electrification zone, resulting in fewer flashes reaching the surface and consequently a greater IC flash fraction. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 2126098
- PAR ID:
- 10629552
- Publisher / Repository:
- EGU
- Date Published:
- Subject(s) / Keyword(s):
- Aerosol lightning
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            Abstract. Remote sensing measurements have been widely used to estimate the planetary boundary layer height (PBLHT). Each remote sensing approach offers unique strengths and faces different limitations. In this study, we use machine learning (ML) methods to produce a best-estimate PBLHT (PBLHT-BE-ML) by integrating four PBLHT estimates derived from remote sensing measurements at the Department of Energy (DOE) Atmospheric Radiation Measurement (ARM) Southern Great Plains (SGP) observatory. Three ML models – random forest (RF) classifier, RF regressor, and light gradient-boosting machine (LightGBM) – were trained on a dataset from 2017 to 2023 that included radiosonde, various remote sensing PBLHT estimates, and atmospheric meteorological conditions. Evaluations indicated that PBLHT-BE-ML from all three models improved alignment with the PBLHT derived from radiosonde data (PBLHT-SONDE), with LightGBM demonstrating the highest accuracy under both stable and unstable boundary layer conditions. Feature analysis revealed that the most influential input features at the SGP site were the PBLHT estimates derived from (a) potential temperature profiles retrieved using Raman lidar (RL) and atmospheric emitted radiance interferometer (AERI) measurements (PBLHT-THERMO), (b) vertical velocity variance profiles from Doppler lidar (PBLHT-DL), and (c) aerosol backscatter profiles from micropulse lidar (PBLHT-MPL). The trained models were then used to predict PBLHT-BE-ML at a temporal resolution of 10 min, effectively capturing the diurnal evolution of PBLHT and its significant seasonal variations, with the largest diurnal variation observed over summer at the SGP site. We applied these trained models to data from the ARM Eastern Pacific Cloud Aerosol Precipitation Experiment (EPCAPE) field campaign (EPC), where the PBLHT-BE-ML, particularly with the LightGBM model, demonstrated improved accuracy against PBLHT-SONDE. Analyses of model performance at both the SGP and EPC sites suggest that expanding the training dataset to include various surface types, such as ocean and ice-covered areas, could further enhance ML model performance for PBLHT estimation across varied geographic regions.more » « less
- 
            Abstract An intimate knowledge of aerosol transport is essential in reducing the uncertainty of the impacts of aerosols on cloud development. Data sets from the U.S. Department of Energy (DOE) Atmospheric Radiation Measurement platform in the Southern Great Plains region (ARM‐SGP) and the National Aeronautics and Space Administration (NASA) Modern‐Era Retrospective Analysis for Research and Applications, version 2 (MERRA‐2), showed seasonal increases in aerosol loading and total carbon concentration during the spring and summer months (2008–2016) which was attributed to fire activity and smoke transport within North America. The monthly mean MERRA‐2 surface carbonaceous aerosol mass concentration and ARM‐SGP total carbon products were strongly correlated (R = 0.82,p < 0.01) along with a moderate correlation with the ARM‐SGP cloud condensation nuclei (NCCN) product (0.5,p ~ 0.1). The monthly mean ARM‐SGP total carbon andNCCNproducts were strongly correlated (0.7,p ~ 0.01). An additional product denoting fire number and coverage taken from the National Interagency Fire Center (NIFC) showed a moderate correlation with the MERRA‐2 carbonaceous product (0.45,p < 0.01) during the 1981–2016 warm season months (March–September). With respect to meteorological conditions, the correlation between the NIFC fire product and MERRA‐2 850‐hPa isobaric height anomalies was lower (0.26,p ~ 0.13) due to the variability in the frequency, intensity, and number of fires in North America. An observed increase in the isobaric height anomaly during the past decade may lead to frequent synoptic ridging and drier conditions with more fires, thereby potentially impacting cloud/precipitation processes and decreasing air quality.more » « less
- 
            A multi-variable investigation of thunderstorm environments in two distinct geographic regions is conducted to assess the aerosol and thermodynamic environments surrounding thunderstorm initiation. 12-years of cloud-to-ground (CG) lightning flash data are used to reconstruct thunderstorms occurring in a 225 km radius centered on the Washington, DC. and Kansas City Metropolitan Regions. A total of 196,836 and 310,209 thunderstorms were identified for Washington, D.C. and Kansas City, MO, respectively. Hourly meteorological and aerosol data were then merged with the thunderstorm event database. Evidence suggests, warm season thunderstorm environments in benign synoptic conditions are considerably different in thermodynamics, aerosol properties, and aerosol concentrations within the Washington, D.C. and Kansas City regions. However, thunderstorm intensity, as measured by flash counts, appears regulated by similar thermodynamic-aerosol relationships despite the differences in their ambient environments. When examining thunderstorm initiation environments, there exists statistically significant, positive relationships between convective available potential energy (CAPE) and flash counts. Aerosol concentration also appears to be a more important quantity than particle size for lightning augmentation.more » « less
- 
            Abstract There is a need for long-term observations of cloud and precipitation fall speeds in validating and improving rainfall forecasts from climate models. To this end, the U.S. Department of Energy Atmospheric Radiation Measurement (ARM) user facility Southern Great Plains (SGP) site at Lamont, Oklahoma, hosts five ARM Doppler lidars that can measure cloud and aerosol properties. In particular, the ARM Doppler lidars record Doppler spectra that contain information about the fall speeds of cloud and precipitation particles. However, due to bandwidth and storage constraints, the Doppler spectra are not routinely stored. This calls for the automation of cloud and rain detection in ARM Doppler lidar data so that the spectral data in clouds can be selectively saved and further analyzed. During the ARMing the Edge field experiment, a Waggle node capable of performing machine learning applications in situ was deployed at the ARM SGP site for this purpose. In this paper, we develop and test four algorithms for the Waggle node to automatically classify ARM Doppler lidar data. We demonstrate that supervised learning using a ResNet50-based classifier will classify 97.6% of the clear-air images and 94.7% of cloudy images correctly, outperforming traditional peak detection methods. We also show that a convolutional autoencoder paired withk-means clustering identifies 10 clusters in the ARM Doppler lidar data. Three clusters correspond to mostly clear conditions with scattered high clouds, and seven others correspond to cloudy conditions with varying cloud-base heights.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                    