skip to main content

Attention:

The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Friday, September 13 until 2:00 AM ET on Saturday, September 14 due to maintenance. We apologize for the inconvenience.


Search for: All records

Award ID contains: 1901099

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Free, publicly-accessible full text available June 2, 2025
  2. Free, publicly-accessible full text available October 6, 2024
  3. Calibration of automotive engines to ensure compliance with emission regulations is a critical phase in product development. Control of engine-out particulate emissions, which directly impact the environment and public health, is particularly important. Detailed physics-based models are typically used to gain a rich understanding of the complex physical phenomena that drive the soot particle formation in an engine cylinder. However, such models often fail to correctly represent the highly dynamic nature of the underlying mechanisms under transient combustion conditions. Moreover, most physics-based models were initially developed for diesel engine applications and their applicability to gasoline engines remains questionable due to differences in flame structure and fuel-wall interactions. Black-box models have been previously proposed to predict engine-out soot emissions, but their lack of physical interpretability is an unsolved drawback. To address these limitations, we present a physics-aware twin-model machine learning framework to predict and analyze engine-out soot mass from a gasoline direct injection (GDI) engine. The framework combines a physics-based model with a bagging-type ensemble learning model that both maintains high accuracy and allows physical interpretation of results without using computationally intensive high-fidelity models. This work shows why a one-model-fits-all approach fails in the case of predicting soot emissions due to clustered co-occurrences of operating conditions that cause non-compliant behavior. We compare the performance of the proposed framework with that of the standalone baseline model and a feed-forward deep neural network. Using WLTP data from a 2.0L naturally aspirated GDI engine, the proposed framework predicts engine-out soot mass with an improvement of 29% in the R2 value and 21% in the root mean squared error from the baseline physics-based model, without compromising physical interpretability. These improvements are significant enough to warrant further framework development with additional engine datasets.

     
    more » « less
  4. Mapping of spatial hotspots, i.e., regions with significantly higher rates of generating cases of certain events (e.g., disease or crime cases), is an important task in diverse societal domains, including public health, public safety, transportation, agriculture, environmental science, and so on. Clustering techniques required by these domains differ from traditional clustering methods due to the high economic and social costs of spurious results (e.g., false alarms of crime clusters). As a result, statistical rigor is needed explicitly to control the rate of spurious detections. To address this challenge, techniques for statistically-robust clustering (e.g., scan statistics) have been extensively studied by the data mining and statistics communities. In this survey, we present an up-to-date and detailed review of the models and algorithms developed by this field. We first present a general taxonomy for statistically-robust clustering, covering key steps of data and statistical modeling, region enumeration and maximization, and significance testing. We further discuss different paradigms and methods within each of the key steps. Finally, we highlight research gaps and potential future directions, which may serve as a stepping stone in generating new ideas and thoughts in this growing field and beyond. 
    more » « less
  5. Beecham, Roger ; Long, Jed A ; Smith, Dianna ; Zhao, Qunshan ; Wise, Sarah (Ed.)
    Given a set S of spatial feature types, its feature instances, a study area, and a neighbor relationship, the goal is to find pairs such that C is a statistically significant regional-colocation pattern in r_{g}. This problem is important for applications in various domains including ecology, economics, and sociology. The problem is computationally challenging due to the exponential number of regional colocation patterns and candidate regions. Previously, we proposed a miner [Subhankar et. al, 2022] that finds statistically significant regional colocation patterns. However, the numerous simultaneous statistical inferences raise the risk of false discoveries (also known as the multiple comparisons problem) and carry a high computational cost. We propose a novel algorithm, namely, multiple comparisons regional colocation miner (MultComp-RCM) which uses a Bonferroni correction. Theoretical analysis, experimental evaluation, and case study results show that the proposed method reduces both the false discovery rate and computational cost. 
    more » « less
  6. The eco-toll estimation problem quantifies the expected environmental cost (e.g., energy consumption, exhaust emissions) for a vehicle to travel along a path. This problem is important for societal applications such as eco-routing, which aims to find paths with the lowest exhaust emissions or energy need. The challenges of this problem are threefold: (1) the dependence of a vehicle's eco-toll on its physical parameters; (2) the lack of access to data with eco-toll information; and (3) the influence of contextual information (i.e. the connections of adjacent segments in the path) on the eco-toll of road segments. Prior work on eco-toll estimation has mostly relied on pure data-driven approaches and has high estimation errors given the limited training data. To address these limitations, we propose a novel Eco-toll estimation Physics-informed Neural Network framework (Eco-PiNN) using three novel ideas, namely, (1) a physics-informed decoder that integrates the physical laws governing vehicle dynamics into the network, (2) an attention-based contextual information encoder, and (3) a physics-informed regularization to reduce overfitting. Experiments on real-world heavy-duty truck data show that the proposed method can greatly improve the accuracy of eco-toll estimation compared with state-of-the-art methods. *The full version of the paper can be accessed at https://arxiv.org/abs/2301.05739 
    more » « less
  7. Cluster detection is important and widely used in a variety of applications, including public health, public safety, transportation, and so on. Given a collection of data points, we aim to detect density-connected spatial clusters with varying geometric shapes and densities, under the constraint that the clusters are statistically significant. The problem is challenging, because many societal applications and domain science studies have low tolerance for spurious results, and clusters may have arbitrary shapes and varying densities. As a classical topic in data mining and learning, a myriad of techniques have been developed to detect clusters with both varying shapes and densities (e.g., density-based, hierarchical, spectral, or deep clustering methods). However, the vast majority of these techniques do not consider statistical rigor and are susceptible to detecting spurious clusters formed as a result of natural randomness. On the other hand, scan statistic approaches explicitly control the rate of spurious results, but they typically assume a single “hotspot” of over-density and many rely on further assumptions such as a tessellated input space. To unite the strengths of both lines of work, we propose a statistically robust formulation of a multi-scale DBSCAN, namely Significant DBSCAN+, to identify significant clusters that are density connected. As we will show, incorporation of statistical rigor is a powerful mechanism that allows the new Significant DBSCAN+ to outperform state-of-the-art clustering techniques in various scenarios. We also propose computational enhancements to speed-up the proposed approach. Experiment results show that Significant DBSCAN+ can simultaneously improve the success rate of true cluster detection (e.g., 10–20% increases in absolute F1 scores) and substantially reduce the rate of spurious results (e.g., from thousands/hundreds of spurious detections to none or just a few across 100 datasets), and the acceleration methods can improve the efficiency for both clustered and non-clustered data. 
    more » « less