skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: A Review of Data Analytic Applications in Road Traffic Safety. Part 1: Descriptive and Predictive Modeling
This part of the review aims to reduce the start-up burden of data collection and descriptive analytics for statistical modeling and route optimization of risk associated with motor vehicles. From a data-driven bibliometric analysis, we show that the literature is divided into two disparate research streams: (a) predictive or explanatory models that attempt to understand and quantify crash risk based on different driving conditions, and (b) optimization techniques that focus on minimizing crash risk through route/path-selection and rest-break scheduling. Translation of research outcomes between these two streams is limited. To overcome this issue, we present publicly available high-quality data sources (different study designs, outcome variables, and predictor variables) and descriptive analytic techniques (data summarization, visualization, and dimension reduction) that can be used to achieve safer-routing and provide code to facilitate data collection/exploration by practitioners/researchers. Then, we review the statistical and machine learning models used for crash risk modeling. We show that (near) real-time crash risk is rarely considered, which might explain why the optimization models (reviewed in Part 2) have not capitalized on the research outcomes from the first stream.  more » « less
Award ID(s):
1635927
PAR ID:
10212199
Author(s) / Creator(s):
; ; ; ; ; ; ; ;
Date Published:
Journal Name:
Sensors
Volume:
20
Issue:
4
ISSN:
1424-8220
Page Range / eLocation ID:
1107
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    In the first part of the review, we observed that there exists a significant gap between the predictive and prescriptive models pertaining to crash risk prediction and minimization, respectively. In this part, we review and categorize the optimization/ prescriptive analytic models that focus on minimizing crash risk. Although the majority of works in this segment of the literature are related to the hazardous materials (hazmat) trucking problems, we show that (with some exceptions) many can also be utilized in non-hazmat scenarios. In an effort to highlight the effect of crash risk prediction model on the accumulated risk obtained from the prescriptive model, we present a simulated example where we utilize four risk indicators (obtained from logistic regression, Poisson regression, XGBoost, and neural network) in the k-shortest path algorithm. From our example, we demonstrate two major designed takeaways: (a) the shortest path may not always result in the lowest crash risk, and (b) a similarity in overall predictive performance may not always translate to similar outcomes from the prescriptive models. Based on the review and example, we highlight several avenues for future research. 
    more » « less
  2. Abstract Bioenergy is widely considered a sustainable alternative to fossil fuels. However, large‐scale applications of biomass‐based energy products are limited due to challenges related to feedstock variability, conversion economics, and supply chain reliability. Artificial intelligence (AI), an emerging concept, has been applied to bioenergy systems in recent decades to address those challenges. This paper reviewed 164 articles published between 2005 and 2019 that applied different AI techniques to bioenergy systems. This review focuses on identifying the unique capabilities of various AI techniques in addressing bioenergy‐related research challenges and improving the performance of bioenergy systems. Specifically, we characterized AI studies by their input variables, output variables, AI techniques, dataset size, and performance. We examined AI applications throughout the life cycle of bioenergy systems. We identified four areas in which AI has been mostly applied, including (1) the prediction of biomass properties, (2) the prediction of process performance of biomass conversion, including different conversion pathways and technologies, (3) the prediction of biofuel properties and the performance of bioenergy end‐use systems, and (4) supply chain modeling and optimization. Based on the review, AI is particularly useful in generating data that are hard to be measured directly, improving traditional models of biomass conversion and biofuel end‐uses, and overcoming the challenges of traditional computing techniques for bioenergy supply chain design and optimization. For future research, efforts are needed to develop standardized and practical procedures for selecting AI techniques and determining training data samples, to enhance data collection, documentation, and sharing across bioenergy‐related areas, and to explore the potential of AI in supporting the sustainable development of bioenergy systems from holistic perspectives. 
    more » « less
  3. Researchers often frame quantitative research as objective, but every step in data collection and analysis can bias findings in often unexamined ways. In this investigation, we examined how the process of selecting variables to include in regression models (model specification) can bias findings about inequities in science and math student outcomes. We identified the four most commonly used methods for model specification in discipline-based education research about equity: a priori, statistical significance, variance explained, and information criterion. Using a quantitative critical perspective that blends statistical theory with critical theory, we reanalyzed the data from a prior publication (Van Dusen & Nissen, 2020) using each of the four methods and compared the findings from each. We concluded that using information criterion produced models that best aligned with our quantitative critical perspective’s emphasis on intersectionality and models with more accurate coefficients and uncertainties. Based on these findings, we recommend researchers use information criterion for specifying models about inequities in STEM student outcomes. 
    more » « less
  4. Abstract Cancer is an umbrella term that includes a wide spectrum of disease severity, from those that are malignant, metastatic, and aggressive to benign lesions with very low potential for progression or death. The ability to prognosticate patient outcomes would facilitate management of various malignancies: patients whose cancer is likely to advance quickly would receive necessary treatment that is commensurate with the predicted biology of the disease. Former prognostic models based on clinical variables (age, gender, cancer stage, tumor grade, etc.), though helpful, cannot account for genetic differences, molecular etiology, tumor heterogeneity, and important host biological mechanisms. Therefore, recent prognostic models have shifted toward the integration of complementary information available in both molecular data and clinical variables to better predict patient outcomes: vital status (overall survival), metastasis (metastasis-free survival), and recurrence (progression-free survival). In this article, we review 20 survival prediction approaches that integrate multi-omics and clinical data to predict patient outcomes. We discuss their strategies for modeling survival time (continuous and discrete), the incorporation of molecular measurements and clinical variables into risk models (clinical and multi-omics data), how to cope with censored patient records, the effectiveness of data integration techniques, prediction methodologies, model validation, and assessment metrics. The goal is to inform life scientists of available resources, and to provide a complete review of important building blocks in survival prediction. At the same time, we thoroughly describe the pros and cons of each methodology, and discuss in depth the outstanding challenges that need to be addressed in future method development. 
    more » « less
  5. Although fragility function development for structures is a mature field, it has recently thrived on new algorithms propelled by machine learning (ML) methods along with heightened emphasis on functions tailored for community- to regional-scale application. This article seeks to critically assess the implications of adopting alternative traditional and emerging fragility modeling practices within seismic risk and resilience quantification to guide future analyses that span from the structure to infrastructure network scale. For example, this article probes the similarities and differences in traditional and ML techniques for demand modeling, discusses the shift from one-parameter to multiparameter fragility models, and assesses the variations in fragility outcomes via statistical distance concepts. Moreover, the previously unexplored influence of these practices on a range of performance measures (e.g. conditional probability of damage, risk of losses to individual structures, portfolio risks, and network recovery trajectories) is systematically evaluated via the posed statistical distance metrics. To this end, case studies using bridges and transportation networks are leveraged to systematically test the implications of alternative seismic fragility modeling practices. The results show that, contrary to the classically adopted archetype fragilities, parameterized ML-based models achieve similar results on individual risk metrics compared to structure-specific fragilities, promising to improve portfolio fragility definitions, deliver satisfactory risk and resilience outcomes at different scales, and pinpoint structures whose poor performance extends to the global network resilience estimates. Using flexible fragility models to depict heterogeneous portfolios is expected to support dynamic decisions that may take place at different scales, space, and time, throughout infrastructure systems. 
    more » « less