skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Kernel Ridge Regression in Predicting Railway Crossing Accidents
Abstract Expanding on the insights from our initial investigation into railway accident patterns, this paper delves deeper into the predictive capabilities of machine learning to forecast potential accident trends in railway crossings. Focusing on critical factors such as “Highway User Position” and “Equipment Involved,” we integrate Kernel Ridge Regression (KRR) models tailored to distinct clusters, as well as a global model for the entire dataset. These models, trained on historical data, discern patterns and correlations that might elude traditional statistical methods. Our findings are compelling: certain clusters, despite limited data points, showcase remarkably Root Mean Squared Error (RMSE) values between predictions and real data, indicating superior model performance. However, certain clusters hint at potential overfitting, given the disparities between model predictions and actual data. Conversely, clusters with vast datasets underperform compared to the global model, suggesting intricate interactions within the data that might challenge the model’s capabilities. The performance nuances across clusters emphasize the value of specialized, cluster-specific models in capturing the intricacies of each dataset segment. This study underscores the efficacy of KRR in predicting future railway crossing incidents, fostering the implementation of data-driven strategies in public safety.  more » « less
Award ID(s):
2112650
PAR ID:
10591478
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
American Society of Mechanical Engineers
Date Published:
ISBN:
978-0-7918-8777-6
Format(s):
Medium: X
Location:
Columbia, South Carolina, USA
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract This study employs graph mining and spectral clustering to analyze patterns in railway crossing accidents, utilizing a comprehensive dataset from the US Department of Transportation. By constructing a graph of implicit relationships between railway companies based on shared accident localities, we apply spectral clustering to identify distinct clusters of companies with similar accident patterns. This offers nuanced insight into the underlying structure of these incidents. Our results indicate that “Highway User Position” and “Equipment Involved” play pivotal roles in accident clustering, while temporal elements like “Date” and “Time” exert a diminished impact. This research not only sheds light on potential accident causation factors but also sets the stage for subsequent predictive safety analyses. It aims to serve as a cornerstone for future studies that aspire to leverage advanced data-driven techniques for improving railway crossing safety protocols. 
    more » « less
  2. Bacteriophages are being widely harnessed as an alternative to antibiotics due to the global emergence of drug-resistant pathogens. To guide the usage of these bactericidal agents, characterization of their host specificity is vital—however, host range information remains limited for many bacteriophages. This is particularly the case for bacteriophages infecting the Microbacterium genus, despite their importance in agriculture, biomedicine, and biotechnology. Here, we elucidate the phylogenomic relationships between 125 Microbacterium cluster EA bacteriophages—including members from 11 sub-clusters (EA1 to EA11)—and infer their putative host ranges using insights from codon usage bias patterns as well as predictions from both exploratory and confirmatory computational methods. Our computational analyses suggest that cluster EA bacteriophages have a shared infection history across the Microbacterium clade. Interestingly, bacteriophages of all sub-clusters exhibit codon usage preference patterns that resemble those of bacterial strains different from ones used for isolation, suggesting that they might be able to infect additional hosts. Furthermore, host range predictions indicate that certain sub-clusters may be better suited in prospective biotechnological and medical applications such as phage therapy. 
    more » « less
  3. Abstract Machine learning (ML) has been applied to space weather problems with increasing frequency in recent years, driven by an influx of in-situ measurements and a desire to improve modeling and forecasting capabilities throughout the field. Space weather originates from solar perturbations and is comprised of the resulting complex variations they cause within the numerous systems between the Sun and Earth. These systems are often tightly coupled and not well understood. This creates a need for skillful models with knowledge about the confidence of their predictions. One example of such a dynamical system highly impacted by space weather is the thermosphere, the neutral region of Earth’s upper atmosphere. Our inability to forecast it has severe repercussions in the context of satellite drag and computation of probability of collision between two space objects in low Earth orbit (LEO) for decision making in space operations. Even with (assumed) perfect forecast of model drivers, our incomplete knowledge of the system results in often inaccurate thermospheric neutral mass density predictions. Continuing efforts are being made to improve model accuracy, but density models rarely provide estimates of confidence in predictions. In this work, we propose two techniques to develop nonlinear ML regression models to predict thermospheric density while providing robust and reliable uncertainty estimates: Monte Carlo (MC) dropout and direct prediction of the probability distribution, both using the negative logarithm of predictive density (NLPD) loss function. We show the performance capabilities for models trained on both local and global datasets. We show that the NLPD loss provides similar results for both techniques but the direct probability distribution prediction method has a much lower computational cost. For the global model regressed on the Space Environment Technologies High Accuracy Satellite Drag Model (HASDM) density database, we achieve errors of approximately 11% on independent test data with well-calibrated uncertainty estimates. Using an in-situ CHAllenging Minisatellite Payload (CHAMP) density dataset, models developed using both techniques provide test error on the order of 13%. The CHAMP models—on validation and test data—are within 2% of perfect calibration for the twenty prediction intervals tested. We show that this model can also be used to obtain global density predictions with uncertainties at a given epoch. 
    more » « less
  4. In recent years, the utilization of machine learning algorithms and advancements in unmanned aerial vehicle (UAV) technology have caused significant shifts in remote sensing practices. In particular, the integration of machine learning with physical models and their application in UAV–satellite data fusion have emerged as two prominent approaches for the estimation of vegetation biochemistry. This study evaluates the performance of five machine learning regression algorithms (MLRAs) for the mapping of crop canopy chlorophyll at the Kellogg Biological Station (KBS) in Michigan, USA, across three scenarios: (1) application to Landsat 7, RapidEye, and PlanetScope satellite images; (2) application to UAV–satellite data fusion; and (3) integration with the PROSAIL radiative transfer model (hybrid methods PROSAIL + MLRAs). The results indicate that the majority of the five MLRAs utilized in UAV–satellite data fusion perform better than the five PROSAIL + MLRAs. The general trend suggests that the integration of satellite data with UAV-derived information, including the normalized difference red-edge index (NDRE), canopy height model, and leaf area index (LAI), significantly enhances the performance of MLRAs. The UAV–RapidEye dataset exhibits the highest coefficient of determination (R2) and the lowest root mean square errors (RMSE) when employing kernel ridge regression (KRR) and Gaussian process regression (GPR) (R2 = 0.89 and 0.89 and RMSE = 8.99 µg/cm2 and 9.65 µg/cm2, respectively). Similar performance is observed for the UAV–Landsat and UAV–PlanetScope datasets (R2 = 0.86 and 0.87 for KRR, respectively). For the hybrid models, the maximum performance is attained with the Landsat data using KRR and GPR (R2 = 0.77 and 0.51 and RMSE = 33.10 µg/cm2 and 42.91 µg/cm2, respectively), followed by R2 = 0.75 and RMSE = 39.78 µg/cm2 for the PlanetScope data upon integrating partial least squares regression (PLSR) into the hybrid model. Across all hybrid models, the RapidEye data yield the most stable performance, with the R2 ranging from 0.45 to 0.71 and RMSE ranging from 19.16 µg/cm2 to 33.07 µg/cm2. The study highlights the importance of synergizing UAV and satellite data, which enables the effective monitoring of canopy chlorophyll in small agricultural lands. 
    more » « less
  5. Abstract Mechanistic understanding of organic reactions can facilitate reaction development, impurity prediction, and in principle, reaction discovery. While several machine learning models have sought to address the task of predicting reaction products, their extension to predicting reaction mechanisms has been impeded by the lack of a corresponding mechanistic dataset. In this study, we construct such a dataset by imputing intermediates between experimentally reported reactants and products using expert reaction templates and train several machine learning models on the resulting dataset of 5,184,184 elementary steps. We explore the performance and capabilities of these models, focusing on their ability to predict reaction pathways and recapitulate the roles of catalysts and reagents. Additionally, we demonstrate the potential of mechanistic models in predicting impurities, often overlooked by conventional models. We conclude by evaluating the generalizability of mechanistic models to new reaction types, revealing challenges related to dataset diversity, consecutive predictions, and violations of atom conservation. 
    more » « less