skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Kernel Ridge Regression in Predicting Railway Crossing Accidents
Abstract Expanding on the insights from our initial investigation into railway accident patterns, this paper delves deeper into the predictive capabilities of machine learning to forecast potential accident trends in railway crossings. Focusing on critical factors such as “Highway User Position” and “Equipment Involved,” we integrate Kernel Ridge Regression (KRR) models tailored to distinct clusters, as well as a global model for the entire dataset. These models, trained on historical data, discern patterns and correlations that might elude traditional statistical methods. Our findings are compelling: certain clusters, despite limited data points, showcase remarkably Root Mean Squared Error (RMSE) values between predictions and real data, indicating superior model performance. However, certain clusters hint at potential overfitting, given the disparities between model predictions and actual data. Conversely, clusters with vast datasets underperform compared to the global model, suggesting intricate interactions within the data that might challenge the model’s capabilities. The performance nuances across clusters emphasize the value of specialized, cluster-specific models in capturing the intricacies of each dataset segment. This study underscores the efficacy of KRR in predicting future railway crossing incidents, fostering the implementation of data-driven strategies in public safety.  more » « less
Award ID(s):
2112650
PAR ID:
10591478
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
American Society of Mechanical Engineers
Date Published:
ISBN:
978-0-7918-8777-6
Format(s):
Medium: X
Location:
Columbia, South Carolina, USA
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract This study employs graph mining and spectral clustering to analyze patterns in railway crossing accidents, utilizing a comprehensive dataset from the US Department of Transportation. By constructing a graph of implicit relationships between railway companies based on shared accident localities, we apply spectral clustering to identify distinct clusters of companies with similar accident patterns. This offers nuanced insight into the underlying structure of these incidents. Our results indicate that “Highway User Position” and “Equipment Involved” play pivotal roles in accident clustering, while temporal elements like “Date” and “Time” exert a diminished impact. This research not only sheds light on potential accident causation factors but also sets the stage for subsequent predictive safety analyses. It aims to serve as a cornerstone for future studies that aspire to leverage advanced data-driven techniques for improving railway crossing safety protocols. 
    more » « less
  2. Bacteriophages are being widely harnessed as an alternative to antibiotics due to the global emergence of drug-resistant pathogens. To guide the usage of these bactericidal agents, characterization of their host specificity is vital—however, host range information remains limited for many bacteriophages. This is particularly the case for bacteriophages infecting the Microbacterium genus, despite their importance in agriculture, biomedicine, and biotechnology. Here, we elucidate the phylogenomic relationships between 125 Microbacterium cluster EA bacteriophages—including members from 11 sub-clusters (EA1 to EA11)—and infer their putative host ranges using insights from codon usage bias patterns as well as predictions from both exploratory and confirmatory computational methods. Our computational analyses suggest that cluster EA bacteriophages have a shared infection history across the Microbacterium clade. Interestingly, bacteriophages of all sub-clusters exhibit codon usage preference patterns that resemble those of bacterial strains different from ones used for isolation, suggesting that they might be able to infect additional hosts. Furthermore, host range predictions indicate that certain sub-clusters may be better suited in prospective biotechnological and medical applications such as phage therapy. 
    more » « less
  3. Abstract BackgroundMacArthur and Wilson's theory of island biogeography has been a foundation for obtaining testable predictions from models of community assembly and for developing models that integrate across scales and disciplines. Historically, however, these developments have focused on integration across ecological and macroevolutionary scales and on predicting patterns of species richness, abundance distributions, trait data and/or phylogenies. The distribution of genetic variation across species within a community is an emerging pattern that contains signatures of past population histories, which might provide an historical lens for the study of contemporary communities. As intraspecific genetic diversity data become increasingly available at the scale of entire communities, there is an opportunity to integrate microevolutionary processes into our models, moving towards development of a genetic theory of island biogeography. Motivation/goalWe aim to promote the development of process‐based biodiversity models that predict community genetic diversity patterns together with other community‐scale patterns. To this end, we review models of ecological, microevolutionary and macroevolutionary processes that are best suited to the creation of unified models, and the patterns that these predict. We then discuss ongoing and potential future efforts to unify models operating at different organizational levels, with the goal of predicting multidimensional community‐scale data including a genetic component. Main conclusionsOur review of the literature shows that despite recent efforts, further methodological developments are needed, not only to incorporate the genetic component into existing island biogeography models, but also to unify processes across scales of biological organization. To catalyse these developments, we outline two potential ways forward, adopting either a top‐down or a bottom‐up approach. Finally, we highlight key ecological and evolutionary questions that might be addressed by unified models including a genetic component and establish hypotheses about how processes across scales might impact patterns of community genetic diversity. 
    more » « less
  4. Abstract Mechanistic understanding of organic reactions can facilitate reaction development, impurity prediction, and in principle, reaction discovery. While several machine learning models have sought to address the task of predicting reaction products, their extension to predicting reaction mechanisms has been impeded by the lack of a corresponding mechanistic dataset. In this study, we construct such a dataset by imputing intermediates between experimentally reported reactants and products using expert reaction templates and train several machine learning models on the resulting dataset of 5,184,184 elementary steps. We explore the performance and capabilities of these models, focusing on their ability to predict reaction pathways and recapitulate the roles of catalysts and reagents. Additionally, we demonstrate the potential of mechanistic models in predicting impurities, often overlooked by conventional models. We conclude by evaluating the generalizability of mechanistic models to new reaction types, revealing challenges related to dataset diversity, consecutive predictions, and violations of atom conservation. 
    more » « less
  5. In recent years, the utilization of machine learning algorithms and advancements in unmanned aerial vehicle (UAV) technology have caused significant shifts in remote sensing practices. In particular, the integration of machine learning with physical models and their application in UAV–satellite data fusion have emerged as two prominent approaches for the estimation of vegetation biochemistry. This study evaluates the performance of five machine learning regression algorithms (MLRAs) for the mapping of crop canopy chlorophyll at the Kellogg Biological Station (KBS) in Michigan, USA, across three scenarios: (1) application to Landsat 7, RapidEye, and PlanetScope satellite images; (2) application to UAV–satellite data fusion; and (3) integration with the PROSAIL radiative transfer model (hybrid methods PROSAIL + MLRAs). The results indicate that the majority of the five MLRAs utilized in UAV–satellite data fusion perform better than the five PROSAIL + MLRAs. The general trend suggests that the integration of satellite data with UAV-derived information, including the normalized difference red-edge index (NDRE), canopy height model, and leaf area index (LAI), significantly enhances the performance of MLRAs. The UAV–RapidEye dataset exhibits the highest coefficient of determination (R2) and the lowest root mean square errors (RMSE) when employing kernel ridge regression (KRR) and Gaussian process regression (GPR) (R2 = 0.89 and 0.89 and RMSE = 8.99 µg/cm2 and 9.65 µg/cm2, respectively). Similar performance is observed for the UAV–Landsat and UAV–PlanetScope datasets (R2 = 0.86 and 0.87 for KRR, respectively). For the hybrid models, the maximum performance is attained with the Landsat data using KRR and GPR (R2 = 0.77 and 0.51 and RMSE = 33.10 µg/cm2 and 42.91 µg/cm2, respectively), followed by R2 = 0.75 and RMSE = 39.78 µg/cm2 for the PlanetScope data upon integrating partial least squares regression (PLSR) into the hybrid model. Across all hybrid models, the RapidEye data yield the most stable performance, with the R2 ranging from 0.45 to 0.71 and RMSE ranging from 19.16 µg/cm2 to 33.07 µg/cm2. The study highlights the importance of synergizing UAV and satellite data, which enables the effective monitoring of canopy chlorophyll in small agricultural lands. 
    more » « less