skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Award ID contains: 1901099

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Free, publicly-accessible full text available March 1, 2026
  2. Adams, Benjamin; Griffin, Amy L; Scheider, Simon; McKenzie, Grant (Ed.)
    Given a collection of Boolean spatial feature types, their instances, a neighborhood relation (e.g., proximity), and a hierarchical taxonomy of the feature types, the goal is to find the subsets of feature types or their parents whose spatial interaction is statistically significant. This problem is for taxonomy-reliant applications such as ecology (e.g., finding new symbiotic relationships across the food chain), spatial pathology (e.g., immunotherapy for cancer), retail, etc. The problem is computationally challenging due to the exponential number of candidate co-location patterns generated by the taxonomy. Most approaches for co-location pattern detection overlook the hierarchical relationships among spatial features, and the statistical significance of the detected patterns is not always considered, leading to potential false discoveries. This paper introduces two methods for incorporating taxonomies and assessing the statistical significance of co-location patterns. The baseline approach iteratively checks the significance of co-locations between leaf nodes or their ancestors in the taxonomy. Using the Benjamini-Hochberg procedure, an advanced approach is proposed to control the false discovery rate. This approach effectively reduces the risk of false discoveries while maintaining the power to detect true co-location patterns. Experimental evaluation and case study results show the effectiveness of the approach. 
    more » « less
  3. Calibration of automotive engines to ensure compliance with emission regulations is a critical phase in product development. Control of engine-out particulate emissions, which directly impact the environment and public health, is particularly important. Detailed physics-based models are typically used to gain a rich understanding of the complex physical phenomena that drive the soot particle formation in an engine cylinder. However, such models often fail to correctly represent the highly dynamic nature of the underlying mechanisms under transient combustion conditions. Moreover, most physics-based models were initially developed for diesel engine applications and their applicability to gasoline engines remains questionable due to differences in flame structure and fuel-wall interactions. Black-box models have been previously proposed to predict engine-out soot emissions, but their lack of physical interpretability is an unsolved drawback. To address these limitations, we present a physics-aware twin-model machine learning framework to predict and analyze engine-out soot mass from a gasoline direct injection (GDI) engine. The framework combines a physics-based model with a bagging-type ensemble learning model that both maintains high accuracy and allows physical interpretation of results without using computationally intensive high-fidelity models. This work shows why a one-model-fits-all approach fails in the case of predicting soot emissions due to clustered co-occurrences of operating conditions that cause non-compliant behavior. We compare the performance of the proposed framework with that of the standalone baseline model and a feed-forward deep neural network. Using WLTP data from a 2.0L naturally aspirated GDI engine, the proposed framework predicts engine-out soot mass with an improvement of 29% in the R2 value and 21% in the root mean squared error from the baseline physics-based model, without compromising physical interpretability. These improvements are significant enough to warrant further framework development with additional engine datasets. 
    more » « less
  4. Mapping of spatial hotspots, i.e., regions with significantly higher rates of generating cases of certain events (e.g., disease or crime cases), is an important task in diverse societal domains, including public health, public safety, transportation, agriculture, environmental science, and so on. Clustering techniques required by these domains differ from traditional clustering methods due to the high economic and social costs of spurious results (e.g., false alarms of crime clusters). As a result, statistical rigor is needed explicitly to control the rate of spurious detections. To address this challenge, techniques for statistically-robust clustering (e.g., scan statistics) have been extensively studied by the data mining and statistics communities. In this survey, we present an up-to-date and detailed review of the models and algorithms developed by this field. We first present a general taxonomy for statistically-robust clustering, covering key steps of data and statistical modeling, region enumeration and maximization, and significance testing. We further discuss different paradigms and methods within each of the key steps. Finally, we highlight research gaps and potential future directions, which may serve as a stepping stone in generating new ideas and thoughts in this growing field and beyond. 
    more » « less
  5. Beecham, Roger; Long, Jed A; Smith, Dianna; Zhao, Qunshan; Wise, Sarah (Ed.)
    Given a set S of spatial feature types, its feature instances, a study area, and a neighbor relationship, the goal is to find pairs such that C is a statistically significant regional-colocation pattern in r_{g}. This problem is important for applications in various domains including ecology, economics, and sociology. The problem is computationally challenging due to the exponential number of regional colocation patterns and candidate regions. Previously, we proposed a miner [Subhankar et. al, 2022] that finds statistically significant regional colocation patterns. However, the numerous simultaneous statistical inferences raise the risk of false discoveries (also known as the multiple comparisons problem) and carry a high computational cost. We propose a novel algorithm, namely, multiple comparisons regional colocation miner (MultComp-RCM) which uses a Bonferroni correction. Theoretical analysis, experimental evaluation, and case study results show that the proposed method reduces both the false discovery rate and computational cost. 
    more » « less
  6. The eco-toll estimation problem quantifies the expected environmental cost (e.g., energy consumption, exhaust emissions) for a vehicle to travel along a path. This problem is important for societal applications such as eco-routing, which aims to find paths with the lowest exhaust emissions or energy need. The challenges of this problem are threefold: (1) the dependence of a vehicle's eco-toll on its physical parameters; (2) the lack of access to data with eco-toll information; and (3) the influence of contextual information (i.e. the connections of adjacent segments in the path) on the eco-toll of road segments. Prior work on eco-toll estimation has mostly relied on pure data-driven approaches and has high estimation errors given the limited training data. To address these limitations, we propose a novel Eco-toll estimation Physics-informed Neural Network framework (Eco-PiNN) using three novel ideas, namely, (1) a physics-informed decoder that integrates the physical laws governing vehicle dynamics into the network, (2) an attention-based contextual information encoder, and (3) a physics-informed regularization to reduce overfitting. Experiments on real-world heavy-duty truck data show that the proposed method can greatly improve the accuracy of eco-toll estimation compared with state-of-the-art methods. *The full version of the paper can be accessed at https://arxiv.org/abs/2301.05739 
    more » « less