skip to main content

Title: Automated Identification of Characteristic Droplet Size Distributions in Stratocumulus Clouds Utilizing a Data Clustering Algorithm

Droplet-level interactions in clouds are often parameterized by a modified gamma fitted to a “global” droplet size distribution. Do “local” droplet size distributions of relevance to microphysical processes look like these average distributions? This paper describes an algorithm to search and classify characteristic size distributions within a cloud. The approach combines hypothesis testing, specifically, the Kolmogorov–Smirnov (KS) test, and a widely used class of machine learning algorithms for identifying clusters of samples with similar properties: density-based spatial clustering of applications with noise (DBSCAN) is used as the specific example for illustration. The two-sample KS test does not presume any specific distribution, is parameter free, and avoids biases from binning. Importantly, the number of clusters is not an input parameter of the DBSCAN-type algorithms but is independently determined in an unsupervised fashion. As implemented, it works on an abstract space from the KS test results, and hence spatial correlation is not required for a cluster. The method is explored using data obtained from the Holographic Detector for Clouds (HOLODEC) deployed during the Aerosol and Cloud Experiments in the Eastern North Atlantic (ACE-ENA) field campaign. The algorithm identifies evidence of the existence of clusters of nearly identical local size distributions. It more » is found that cloud segments have as few as one and as many as seven characteristic size distributions. To validate the algorithm’s robustness, it is tested on a synthetic dataset and successfully identifies the predefined distributions at plausible noise levels. The algorithm is general and is expected to be useful in other applications, such as remote sensing of cloud and rain properties.

Significance Statement

A typical cloud can have billions of drops spread over tens or hundreds of kilometers in space. Keeping track of the sizes, positions, and interactions of all of these droplets is impractical, and, as such, information about the relative abundance of large and small drops is typically quantified with a “size distribution.” Droplets in a cloud interact locally, however, so this work is motivated by the question of whether the cloud droplet size distribution is different in different parts of a cloud. A new method, based on hypothesis testing and machine learning, determines how many different size distributions are contained in a given cloud. This is important because the size distribution describes processes such as cloud droplet growth and light transmission through clouds.

« less
 ;  ;  ;  
Award ID(s):
Publication Date:
Journal Name:
Artificial Intelligence for the Earth Systems
American Meteorological Society
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Precipitation efficiency and optical properties of clouds, both central to determining Earth's weather and climate, depend on the size distribution of cloud particles. In this work theoretical expressions for cloud droplet size distribution shape are evaluated using measurements from controlled experiments in a convective‐cloud chamber. The experiments are a unique opportunity to constrain theory because they are in steady‐state and because the initial and boundary conditions are well characterized compared to typical atmospheric measurements. Three theoretical distributions obtained from a Langevin drift‐diffusion approach to cloud formation via stochastic condensation are tested: (a) stochastic condensation with a constant removal time‐scale; (b) stochastic condensation with a size‐dependent removal time‐scale; (c) droplet growth in a fixed supersaturation condition and with size‐dependent removal. In addition, a similar Weibull distribution that can be obtained from the drift‐diffusion approach, as well as from mechanism‐independent probabilistic arguments (e.g., maximum entropy), is tested as a fourth hypothesis. Statistical techniques such as theχ2test, sum of squared errors of prediction, and residual analysis are employed to judge relative success or failure of the theoretical distributions to describe the experimental data. An extensive set of cloud droplet size distributions are measured under different aerosol injection rates. Five different aerosol injectionmore »rates are run both for size‐selected aerosol particles, and six aerosol injection rates are run for broad‐distribution, polydisperse aerosol particles. In relative comparison, the most favourable comparison to the measurements is the expression for stochastic condensation with size‐dependent droplet removal rate. However, even this optimal distribution breaks down for broad aerosol size distributions, primarily due to deviations from the measured large‐droplet tail. A possible explanation for the deviation is the Ostwald ripening effect coupled with deactivation/activation in polluted cloud conditions.

    « less
  2. Abstract

    Data collected with a holographic instrument [Holographic Detector for Clouds (HOLODEC)] on board the High-Performance Instrumented Airborne Platform for Environmental Research Gulfstream-V (HIAPER GV) aircraft from marine stratocumulus clouds during the Cloud System Evolution in the Trades (CSET) field project are examined for spatial uniformity. During one flight leg at 1190 m altitude, 1816 consecutive holograms were taken, which were approximately 40 m apart with individual hologram dimensions of 1.16 cm × 0.68 cm × 12.0 cm and with droplet concentrations of up to 500 cm−3. Unlike earlier studies, minimally intrusive data processing (e.g., bypassing calculation of number concentrations, binning, and parametric fitting) is used to test for spatial uniformity of clouds on intra- and interhologram spatial scales (a few centimeters and 40 m, respectively). As a means to test this, measured droplet count fluctuations are normalized with the expected standard deviation from theoretical Poisson distributions, which signifies randomness. Despite the absence of trends in the mean concentration, it is found that the null hypothesis of spatial uniformity on both spatial scales can be rejected with compelling statistical confidence. Monte Carlo simulations suggest that weak clustering explains this signature. These findings also hold for size-resolved analysis but with lessmore »certainty. Clustering of droplets caused by, for example, entrainment and turbulence, is size dependent and is likely to influence key processes such as droplet growth and thus cloud lifetime.

    « less
  3. Abstract

    Entrainment and associated mixing (i.e., entrainment‐mixing) have been shown to impact drop size distributions. However, most past studies have focused on warm clouds and have not considered the impacts on mixed phase clouds (i.e., those containing liquid and ice particles). This study characterizes the impacts of entrainment‐mixing on mixed phase cloud properties over the Southern Ocean using in situ observations. By taking advantage of strong correlations between droplet clustering and entrainment‐mixing, a clustering metric is used as a proxy to assess the degree of mixing. This maximizes the available sample size for a statistical analysis of entrainment‐mixing impacts on mixed phase properties. A positive relationship is found between the magnitude of droplet clustering and large ice concentrations (those with maximum dimensions greater than ∼300 μm), suggesting entrainment‐mixing enhances the Wegener‐Bergeron‐Findeisen (WBF) process. Particle size distributions are averaged over different ranges of liquid (liquid water content (LWC)) to total water content (TWC) ratio. Since the ratio is expected to transition from 1 to 0 during glaciation, differences in the distributions provide insight into the relation of entrainment‐mixing to mixed phase cloud evolution. Mixed phase samples with the greatest large ice concentrations occur at LWC/TWC < 0.4 in low clustering regions. However, these samplesmore »are relatively few, whereas high clustering regions have a greater frequency of samples with LWC/TWC < 0.4. This suggests sublimation/vapor sinks associated with entrainment can counteract the enhanced WBF. In high clustering regions, distributions of small droplets are relatively constant and large droplets (>30 μm) are preferentially removed as LWC/TWC transitions from 1 to 0.

    « less
  4. Abstract

    Recent advances in hail trajectory modeling regularly produce datasets containing millions of hail trajectories. Because hail growth within a storm cannot be entirely separated from the structure of the trajectories producing it, a method to condense the multidimensionality of the trajectory information into a discrete number of features analyzable by humans is necessary. This article presents a three-dimensional trajectory clustering technique that is designed to group trajectories that have similar updraft-relative structures and orientations. The new technique is an application of a two-dimensional method common in the data mining field. Hail trajectories (or “parent” trajectories) are partitioned into segments before they are clustered using a modified version of the density-based spatial applications with noise (DBSCAN) method. Parent trajectories with segments that are members of at least two common clusters are then grouped into parent trajectory clusters before output. This multistep method has several advantages. Hail trajectories with structural similarities along only portions of their length, e.g., sourced from different locations around the updraft before converging to a common pathway, can still be grouped. However, the physical information inherent in the full length of the trajectory is retained, unlike methods that cluster trajectory segments alone. The conversion of trajectories tomore »an updraft-relative space also allows trajectories separated in time to be clustered. Once the final output trajectory clusters are identified, a method for calculating a representative trajectory for each cluster is proposed. Cluster distributions of hailstone and environmental characteristics at each time step in the representative trajectory can also be calculated.

    Significance Statement

    To understand how a storm produces large hail, we need to understand the paths that hailstones take in a storm when growing. We can simulate these paths using computer models. However, the millions of hailstones in a simulated storm create millions of paths, which is hard to analyze. This article describes a machine learning method that groups together hailstone paths based on how similar their three-dimensional structures look. It will let hail scientists analyze hailstone pathways in storms more easily, and therefore better understand how hail growth happens.

    « less
  5. Abstract

    Tropical anvil clouds have a profound impact on Earth's weather and climate. Their role in Earth's energy balance and hydrologic cycle is heavily modulated by the vertical structure of the microphysical properties for various hydrometeors in these clouds and their dependence on the ambient environmental conditions. Accurate representations of the variability and covariability of such vertical structures are key to both the satellite remote sensing of cloud and precipitation and numerical modeling of weather and climate, which remain a challenge. This study presents a new method to combine vertically resolved observations from CloudSat radar reflectivity and Cloud‐Aerosol Lidar and Infrared Pathfinder Satellite Observation cloud masks with probability distributions of cloud microphysical properties and the ambient atmospheric conditions from detailed in situ measurements on tropical anvils sampled during the National Aeronautics and Space Administration TC4 (Tropical Composition, Cloud and Climate Coupling) mission. We focus on the microphysical properties of the vertical distribution of ice water content, particle size distributions, and effective sizes for different hydrometeors, including ice particles and supercooled liquid droplets. Results from this method are compared with those from in situ data alone and various CloudSat/Cloud‐Aerosol Lidar and Infrared Pathfinder Satellite Observation cloud retrievals. The sampling limitation ofmore »the field experiment and algorithm limitations in the current retrievals is highlighted, especially for the liquid cloud particles, while a generally good agreement with ice cloud microphysical properties is seen from different methods. While the method presented in this study is applied to tropical anvil clouds observed during TC4, it can be readily employed to study a broad range of ice clouds sampled by various field campaigns.

    « less