skip to main content


Title: S-SOM v1.0: a structural self-organizing map algorithm for weather typing
Abstract. This study proposes a novel structural self-organizingmap (S-SOM) algorithm for synoptic weather typing. A novel feature of theS-SOM compared with traditional SOMs is its ability to deal with input datawith spatial or temporal structures. In detail, the search scheme for thebest matching unit (BMU) in a S-SOM is built based on a structuralsimilarity (S-SIM) index rather than by using the traditional Euclideandistance (ED). S-SIM enables the BMU search to consider the correlation inspace between weather states, such as the locations of highs or lows, that is impossible when using ED. The S-SOM performance is evaluated by multipledemo simulations of clustering weather patterns over Japan using theERA-Interim sea-level pressure data. The results show the S-SOM'ssuperiority compared with a standard SOM with ED (or ED-SOM) in tworespects: clustering quality based on silhouette analysis and topologicalpreservation based on topological error. Better performance of S-SOM versusED is consistent with results from different tests and node-sizeconfigurations. S-SOM performs better than a SOM using the Pearsoncorrelation coefficient (or COR-SOM), though the difference is not as clear as it is compared to ED-SOM.  more » « less
Award ID(s):
1739705
NSF-PAR ID:
10310476
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Geoscientific Model Development
Volume:
14
Issue:
4
ISSN:
1991-9603
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Traditional network embedding primarily focuses on learning a continuous vector representation for each node, preserving network structure and/or node content information, such that off-the-shelf machine learning algorithms can be easily applied to the vector-format node representations for network analysis. However, the learned continuous vector representations are inefficient for large-scale similarity search, which often involves finding nearest neighbors measured by distance or similarity in a continuous vector space. In this article, we propose a search efficient binary network embedding algorithm called BinaryNE to learn a binary code for each node, by simultaneously modeling node context relations and node attribute relations through a three-layer neural network. BinaryNE learns binary node representations using a stochastic gradient descent-based online learning algorithm. The learned binary encoding not only reduces memory usage to represent each node, but also allows fast bit-wise comparisons to support faster node similarity search than using Euclidean or other distance measures. Extensive experiments and comparisons demonstrate that BinaryNE not only delivers more than 25 times faster search speed, but also provides comparable or better search quality than traditional continuous vector based network embedding methods. The binary codes learned by BinaryNE also render competitive performance on node classification and node clustering tasks. The source code of the BinaryNE algorithm is available at https://github.com/daokunzhang/BinaryNE. 
    more » « less
  2. Abstract

    There is demand for scalable algorithms capable of clustering and analyzing large time series data. The Kohonen self-organizing map (SOM) is an unsupervised artificial neural network for clustering, visualizing, and reducing the dimensionality of complex data. Like all clustering methods, it requires a measure of similarity between input data (in this work time series). Dynamic time warping (DTW) is one such measure, and a top performer that accommodates distortions when aligning time series. Despite its popularity in clustering, DTW is limited in practice because the runtime complexity is quadratic with the length of the time series. To address this, we present a new a self-organizing map for clustering TIME Series, called SOMTimeS, which uses DTW as the distance measure. The method has similar accuracy compared with other DTW-based clustering algorithms, yet scales better and runs faster. The computational performance stems from the pruning of unnecessary DTW computations during the SOM’s training phase. For comparison, we implement a similar pruning strategy for K-means, and call the latter K-TimeS. SOMTimeS and K-TimeS pruned 43% and 50% of the total DTW computations, respectively. Pruning effectiveness, accuracy, execution time and scalability are evaluated using 112 benchmark time series datasets from the UC Riverside classification archive, and show that for similar accuracy, a 1.8$$\times$$×speed-up on average for SOMTimeS and K-TimeS, respectively with that rates vary between 1$$\times$$×and 18$$\times$$×depending on the dataset. We also apply SOMTimeS to a healthcare study of patient-clinician serious illness conversations to demonstrate the algorithm’s utility with complex, temporally sequenced natural language.

     
    more » « less
  3. Abstract

    Weather regime based stochastic weather generators (WR‐SWGs) have recently been proposed as a tool to better understand multi‐sector vulnerability to deeply uncertain climate change. WR‐SWGs can distinguish and simulate different types of climate change that have varying degrees of uncertainty in future projections, including thermodynamic changes (e.g., rising temperatures, Clausius‐Clapeyron scaling of extreme precipitation) and dynamic changes (e.g., shifting circulation and storm tracks). These models require the accurate identification of WRs that are representative of both historical and plausible future patterns of atmospheric circulation, while preserving the complex space–time variability of weather processes. This study proposes a novel framework to identify such WRs based on WR‐SWG performance over a broad geographic area and applies this framework to a case study in California. We test two components of WR‐SWG design, including the method used for WR identification (Hidden Markov Models (HMMs) vs.K‐means clustering) and the number of WRs. For different combinations of these components, we assess performance of a multi‐site WR‐SWG using 14 metrics across 13 major California river basins during the cold season. Results show that performance is best using a small number of WRs (4–5) identified using an HMM. We then juxtapose the number of WRs selected based on WR‐SWG performance against the number of regimes identified using metastability analysis of atmospheric fields. Results show strong agreement in the number of regimes between the two approaches, suggesting that the use of metastable regimes could inform WR‐SWG design. We conclude with a discussion of the potential to expand this framework for additional WR‐SWG design parameters and spatial scales.

     
    more » « less
  4. null (Ed.)
    Heat loss quantification (HLQ) is an essential step in improving a building’s thermal performance and optimizing its energy usage. While this problem is well-studied in the literature, most of the existing studies are either qualitative or minimally driven quantitative studies that rely on localized building envelope points and are, thus, not suitable for automated solutions in energy audit applications. This research work is an attempt to fill this gap of knowledge by utilizing intensive thermal data (on the order of 100,000 plus images) and constitutes a relatively new area of analysis in energy audit applications. Specifically, we demonstrate a novel process using deep-learning methods to segment more than 100,000 thermal images collected from an unmanned aerial system (UAS). To quantify the heat loss for a building envelope, multiple stages of computations need to be performed: object detection (using Mask-RCNN/Faster R-CNN), estimating the surface temperature (using two clustering methods), and finally calculating the overall heat transfer coefficient (e.g., the U-value). The proposed model was applied to eleven academic campuses across the state of North Dakota. The preliminary findings indicate that Mask R-CNN outperformed other instance segmentation models with an mIOU of 73% for facades, 55% for windows, 67% for roofs, 24% for doors, and 11% for HVACs. Two clustering methods, namely K-means and threshold-based clustering (TBC), were deployed to estimate surface temperatures with TBC providing consistent estimates across all times of the day over K-means. Our analysis demonstrated that thermal efficiency not only depended on the accurate acquisition of thermal images but also relied on other factors, such as the building geometry and seasonal weather parameters, such as the outside/inside building temperatures, wind, time of day, and indoor heating/cooling conditions. Finally, the resultant U-values of various building envelopes were compared with recommendations from the American Society of Heating, Refrigerating, and Air-conditioning Engineers (ASHRAE) building standards. 
    more » « less
  5. The non-stationary nature of data streams strongly challenges traditional machine learning techniques. Although some solutions have been proposed to extend traditional machine learning techniques for handling data streams, these approaches either require an initial label set or rely on specialized design parameters. The overlap among classes and the labeling of data streams constitute other major challenges for classifying data streams. In this paper, we proposed a clustering-based data stream classification framework to handle non-stationary data streams without utilizing an initial label set. A density-based stream clustering procedure is used to capture novel concepts with a dynamic threshold and an effective active label querying strategy is introduced to continuously learn the new concepts from the data streams. The sub-cluster structure of each cluster is explored to handle the overlap among classes. Experimental results and quantitative comparison studies reveal that the proposed method provides statistically better or comparable performance than the existing methods.

     
    more » « less