skip to main content


Search for: All records

Creators/Authors contains: "Sang, Huiyan"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Arctic sea ice extent (SIE) has drawn increasing attention from scientists in recent years because of its fast decline in the Boreal summer and early fall. The measurement of SIE is derived from remote sensing data and is both a lagged and leading indicator of climate change. To characterize at a local level the decline in SIE, we use remote-sensing data at 25 km resolution to fit a spatio-temporal logistic autoregressive model of the sea-ice evolution in the Arctic region. The model incorporates last year’s ice/water binary observations at nearby grid cells in an autoregressive manner with autoregressive coefficients that vary both in space and time. Using the model-based estimates of ice/water probabilities in the Arctic region, we propose several graphical summaries to visualize the spatio-temporal changes in Arctic sea ice beyond what can be visualized with the single time series of SIE. In ever-higher latitude bands, we observe a consistently declining temporal trend of sea ice in the early fall. We also observe a clear decline in and contraction of the sea ice’s distribution between 70∘N–75∘N, and of most concern is that this may reflect the future behavior of sea ice at ever-higher latitudes under climate change.

     
    more » « less
  2. Random partition models are widely used in Bayesian methods for various clustering tasks, such as mixture models, topic models, and community detection problems. While the number of clusters induced by random partition models has been studied extensively, another important model property regarding the balancedness of partition has been largely neglected. We formulate a framework to define and theoretically study the balancedness of exchangeable random partition models, by analyzing how a model assigns probabilities to partitions with different levels of balancedness. We demonstrate that the "rich-get-richer" characteristic of many existing popular random partition models is an inevitable consequence of two common assumptions: product-form exchangeability and projectivity. We propose a principled way to compare the balancedness of random partition models, which gives a better understanding of what model works better and what doesn’t for different applications. We also introduce the "rich-get-poorer" random partition models and illustrate their application to entity resolution tasks. 
    more » « less
  3. The COVID-19 pandemic has limited people’s visitation to public places because of social distancing and shelter-in-place orders. According to Google’s community mobility reports, some countries showed a decrease in park visitation during the pandemic, while others showed an increase. Although government responses played a significant role in this variation, little is known about park visitation changes and the park attributes that are associated with these changes. Therefore, we aimed to examine the associations between park characteristics and percent changes in park visitation in Harris County, TX, for three time periods: before, during, and after the shelter-in-place order of Harris County. We utilized SafeGraph’s point-of-interest data to extract weekly park visitation counts for the Harris County area. This dataset included the size of each park and its weekly number of visits from 2 March to 31 May 2020. In addition, we measured park characteristics, including greenness density, using the normalized difference vegetation index; park type (mini, neighborhood, community, regional/metropolitan); presence of sidewalks and bikeways; sidewalk and bikeway quantity; and bikeway quality. Results showed that park visitation decreased after issuing the shelter-in-place order and increased after this order was lifted. Results from linear regression models indicated that the higher the greenness density of the park, the smaller the decrease in park visitation during the shelter-in-place period compared to before the shelter-in-place order. This relationship also appeared after the shelter-in-place order. The presence of more sidewalks was related to less visitation increase after the shelter-in-place order. These findings can guide planners and designers to implement parks that promote public visitation during pandemics and potentially benefit people’s physical and mental health. 
    more » « less
  4. Structured point process data harvested from various platforms poses new challenges to the machine learning community. To cluster repeatedly observed marked point processes, we propose a novel mixture model of multi-level marked point processes for identifying potential heterogeneity in the observed data. Specifically, we study a matrix whose entries are marked log-Gaussian Cox processes and cluster rows of such a matrix. An efficient semi-parametric Expectation-Solution (ES) algorithm combined with functional principal component analysis (FPCA) of point processes is proposed for model estimation. The effectiveness of the proposed framework is demonstrated through simulation studies and real data analyses. 
    more » « less
  5. Abstract

    In unconventional reservoirs, optimal completion controls are essential to improving well productivity and reducing costs. In this article, we propose a statistical model to investigate associations between shale oil production and completion parameters (e.g., completion lateral length, total proppant, number of hydraulic fracturing stages), while accounting for the influence of spatially heterogeneous geological conditions on hydrocarbon production. We develop a non-parametric regression method that combines a generalized additive model with a fused LASSO regularization for geological homogeneity pursuit. We present an alternating augmented Lagrangian method for model parameter estimations. The novelty and advantages of our method over the published ones are a) it can control or remove the heterogeneous non-completion effects; 2) it can account for and analyze the interactions among the completion parameters. We apply our method to the analysis of a real case from a Permian Basin US onshore field and show how our model can account for the interaction between the completion parameters. Our results provide key findings on how completion parameters affect oil production in that can lead to optimal well completion designs.

     
    more » « less
  6. null (Ed.)