skip to main content

Search for: All records

Creators/Authors contains: "Yan, Da"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Free, publicly-accessible full text available November 27, 2024
  2. The prediction of stable crystal structures is an important part of designing solid-state crystalline materials with desired properties. Recent advances in structural feature representations and generative neural networks promise the ability to efficiently create new stable structures to use for inverse design and to search for materials with tailored functionalities. 
    more » « less
    Free, publicly-accessible full text available July 3, 2024
  3. Finding frequent subgraph patterns in a big graph is an important problem with many applications such as classifying chemical compounds and building indexes to speed up graph queries. Since this problem is NP-hard, some recent parallel systems have been developed to accelerate the mining. However, they often have a huge memory cost, very long running time, suboptimal load balancing, and possibly inaccurate results. In this paper, we propose an efficient system called T-FSM for parallel mining of frequent subgraph patterns in a big graph. T-FSM adopts a novel task-based execution engine design to ensure high concurrency, bounded memory consumption, and effective load balancing. It also supports a new anti-monotonic frequentness measure called Fraction-Score, which is more accurate than the widely used MNI measure. Our experiments show that T-FSM is orders of magnitude faster than SOTA systems for frequent subgraph pattern mining. Our system code has been released at 
    more » « less
    Free, publicly-accessible full text available May 26, 2024
  4. The k-core of a graph is the largest induced sub-graph with minimum degree k. The problem of k-core decomposition finds the k-cores of a graph for all valid values of k, and it has many applications such as network analysis, computational biology and graph visualization. Currently, there are two types of parallel algorithms for k-core decomposition: (1) degree-based vertex peeling, and (2) iterative h-index refinement. There is, however, few studies on accelerating k-core decomposition using GPU. In this paper, we propose a highly optimized peeling algorithm on a GPU, and compare it with possible implementations on top of think-like-a-vertex graph-parallel GPU systems as well as existing serial and parallel k-core decomposition algorithms on CPUs. Extensive experiments show that our GPU algorithm is the overall winner in both time and space. Our source code is released at 
    more » « less
    Free, publicly-accessible full text available April 1, 2024
  5. Free, publicly-accessible full text available January 1, 2024
  6. Flood inundation mapping from Earth imagery plays a vital role in rapid disaster response and national water forecasting. However, the problem is non-trivial due to significant imagery noise and obstacles, complex spatial dependency on 3D terrains, spatial non-stationarity, and high computational cost. Existing machine learning approaches are mostly terrain-unaware and are prone to produce spurious results due to imagery noise and obstacles, requiring significant efforts in post-processing. Recently, several terrain- aware methods were proposed that incorporate complex spatial dependency (e.g., water flow directions on 3D terrains) but they assume that the inferred flood surface level is spatially stationary, making them insufficient for a large heterogeneous geographic area. To address these limitations, this paper proposes a novel spatial learning framework called hidden Markov forest, which decomposes a large heterogeneous area into local stationary zones, represents spatial dependency on 3D terrains via zonal trees (forest), and jointly infers the class map in different zonal trees with spatial regularization. We design efficient inference algorithms based on dynamic programming and multi-resolution filtering. Evaluations on real-world datasets show that our method outperforms baselines and our proposed computational refinement significantly reduces the time cost. 
    more » « less
    Free, publicly-accessible full text available January 1, 2024
  7. Given raster imagery features and imperfect vector training labels with registration uncertainty, this paper studies a deep learning framework that can quantify and reduce the registration uncertainty of training labels as well as train neural network parameters simultaneously. The problem is important in broad applications such as streamline classification on Earth imagery or tissue segmentation on medical imagery, whereby annotating precise vector labels is expensive and time-consuming. However, the problem is challenging due to the gap between the vector representation of class labels and the raster representation of image features and the need for training neural networks with uncertain label locations. Existing research on uncertain training labels often focuses on uncertainty in label class semantics or characterizes label registration uncertainty at the pixel level (not contiguous vectors). To fill the gap, this paper proposes a novel learning framework that explicitly quantifies vector labels' registration uncertainty. We propose a registration-uncertainty-aware loss function and design an iterative uncertainty reduction algorithm by re-estimating the posterior of true vector label locations distribution based on a Gaussian process. Evaluations on real-world datasets in National Hydrography Dataset refinement show that the proposed approach significantly outperforms several baselines in the registration uncertainty estimations performance and classification performance. 
    more » « less
  8. Given earth imagery with spectral features on a terrain surface, this paper studies surface segmentation based on both explanatory features and surface topology. The problem is important in many spatial and spatiotemporal applications such as flood extent mapping in hydrology. The problem is uniquely challenging for several reasons: first, the size of earth imagery on a terrain surface is often much larger than the input of popular deep convolutional neural networks; second, there exists topological structure dependency between pixel classes on the surface, and such dependency can follow an unknown and non-linear distribution; third, there are often limited training labels. Existing methods for earth imagery segmentation often divide the imagery into patches and consider the elevation as an additional feature channel. These methods do not fully incorporate the spatial topological structural constraint within and across surface patches and thus often show poor results, especially when training labels are limited. Existing methods on semi-supervised and unsupervised learning for earth imagery often focus on learning representation without explicitly incorporating surface topology. In contrast, we propose a novel framework that explicitly models the topological skeleton of a terrain surface with a contour tree from computational topology, which is guided by the physical constraint (e.g., water flow direction on terrains). Our framework consists of two neural networks: a convolutional neural network (CNN) to learn spatial contextual features on a 2D image grid, and a graph neural network (GNN) to learn the statistical distribution of physics-guided spatial topological dependency on the contour tree. The two models are co-trained via variational EM. Evaluations on the real-world flood mapping datasets show that the proposed models outperform baseline methods in classification accuracy, especially when training labels are limited. 
    more » « less