Abstract This study employs graph mining and spectral clustering to analyze patterns in railway crossing accidents, utilizing a comprehensive dataset from the US Department of Transportation. By constructing a graph of implicit relationships between railway companies based on shared accident localities, we apply spectral clustering to identify distinct clusters of companies with similar accident patterns. This offers nuanced insight into the underlying structure of these incidents. Our results indicate that “Highway User Position” and “Equipment Involved” play pivotal roles in accident clustering, while temporal elements like “Date” and “Time” exert a diminished impact. This research not only sheds light on potential accident causation factors but also sets the stage for subsequent predictive safety analyses. It aims to serve as a cornerstone for future studies that aspire to leverage advanced data-driven techniques for improving railway crossing safety protocols.
more »
« less
Kernel Ridge Regression in Predicting Railway Crossing Accidents
Abstract Expanding on the insights from our initial investigation into railway accident patterns, this paper delves deeper into the predictive capabilities of machine learning to forecast potential accident trends in railway crossings. Focusing on critical factors such as “Highway User Position” and “Equipment Involved,” we integrate Kernel Ridge Regression (KRR) models tailored to distinct clusters, as well as a global model for the entire dataset. These models, trained on historical data, discern patterns and correlations that might elude traditional statistical methods. Our findings are compelling: certain clusters, despite limited data points, showcase remarkably Root Mean Squared Error (RMSE) values between predictions and real data, indicating superior model performance. However, certain clusters hint at potential overfitting, given the disparities between model predictions and actual data. Conversely, clusters with vast datasets underperform compared to the global model, suggesting intricate interactions within the data that might challenge the model’s capabilities. The performance nuances across clusters emphasize the value of specialized, cluster-specific models in capturing the intricacies of each dataset segment. This study underscores the efficacy of KRR in predicting future railway crossing incidents, fostering the implementation of data-driven strategies in public safety.
more »
« less
- Award ID(s):
- 2112650
- PAR ID:
- 10591478
- Publisher / Repository:
- American Society of Mechanical Engineers
- Date Published:
- ISBN:
- 978-0-7918-8777-6
- Format(s):
- Medium: X
- Location:
- Columbia, South Carolina, USA
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Prompt engineering emerges as an innovative computational methodology for advancing AI-driven sustainable technologies. This study introduces a novel taxonomical prompt engineering framework utilizing advanced natural language processing to systematically analyze large language models (LLMs) for intelligent safety data extraction. Using a comprehensive corpus of 300 OSHA accident narratives focused on fall-from-height accidents, we developed an AI-driven prompt strategy for semantic classification and contextual analysis, directly addressing efficiency and sustainability challenges in construction safety. The machine learning optimization demonstrated outstanding taxonomic performance by achieving binary classification with zero false negatives for fall-related accidents. Empirical data extraction revealed that falls predominantly occurred on roofs, ladders, and scaffolds, with a substantive "Other" category capturing architectural complexity. Critically, the computational model exhibited probabilistic restraint, strategically abstaining from speculative height extrapolation in 45.67% of ambiguous instances across all categories in the entire dataset ? a pivotal characteristic in safety-critical computational linguistics. This framework improves safety documentation, enabling quicker, targeted interventions and reducing rework, material waste, and redundant inspections caused by unclear accident data. By minimizing project delays, inefficient labor cycles, and overuse of emergency resources, the approach supports operational efficiency and advances sustainable construction practices through better use of time, materials, and workforce capacity. Comparative statistical analyses unveiled correlational patterns between fall protection reporting and height data completeness, indicating systemic reporting variabilities. Through a scalable and context-aware framework for advanced occupational risk analysis, our findings demonstrate the potential of AI and prompt engineering in enhancing data interpretive capabilities, with broader implications for sustainable technological innovation.more » « less
-
Bacteriophages are being widely harnessed as an alternative to antibiotics due to the global emergence of drug-resistant pathogens. To guide the usage of these bactericidal agents, characterization of their host specificity is vital—however, host range information remains limited for many bacteriophages. This is particularly the case for bacteriophages infecting the Microbacterium genus, despite their importance in agriculture, biomedicine, and biotechnology. Here, we elucidate the phylogenomic relationships between 125 Microbacterium cluster EA bacteriophages—including members from 11 sub-clusters (EA1 to EA11)—and infer their putative host ranges using insights from codon usage bias patterns as well as predictions from both exploratory and confirmatory computational methods. Our computational analyses suggest that cluster EA bacteriophages have a shared infection history across the Microbacterium clade. Interestingly, bacteriophages of all sub-clusters exhibit codon usage preference patterns that resemble those of bacterial strains different from ones used for isolation, suggesting that they might be able to infect additional hosts. Furthermore, host range predictions indicate that certain sub-clusters may be better suited in prospective biotechnological and medical applications such as phage therapy.more » « less
-
Abstract BackgroundMacArthur and Wilson's theory of island biogeography has been a foundation for obtaining testable predictions from models of community assembly and for developing models that integrate across scales and disciplines. Historically, however, these developments have focused on integration across ecological and macroevolutionary scales and on predicting patterns of species richness, abundance distributions, trait data and/or phylogenies. The distribution of genetic variation across species within a community is an emerging pattern that contains signatures of past population histories, which might provide an historical lens for the study of contemporary communities. As intraspecific genetic diversity data become increasingly available at the scale of entire communities, there is an opportunity to integrate microevolutionary processes into our models, moving towards development of a genetic theory of island biogeography. Motivation/goalWe aim to promote the development of process‐based biodiversity models that predict community genetic diversity patterns together with other community‐scale patterns. To this end, we review models of ecological, microevolutionary and macroevolutionary processes that are best suited to the creation of unified models, and the patterns that these predict. We then discuss ongoing and potential future efforts to unify models operating at different organizational levels, with the goal of predicting multidimensional community‐scale data including a genetic component. Main conclusionsOur review of the literature shows that despite recent efforts, further methodological developments are needed, not only to incorporate the genetic component into existing island biogeography models, but also to unify processes across scales of biological organization. To catalyse these developments, we outline two potential ways forward, adopting either a top‐down or a bottom‐up approach. Finally, we highlight key ecological and evolutionary questions that might be addressed by unified models including a genetic component and establish hypotheses about how processes across scales might impact patterns of community genetic diversity.more » « less
-
Abstract Mechanistic understanding of organic reactions can facilitate reaction development, impurity prediction, and in principle, reaction discovery. While several machine learning models have sought to address the task of predicting reaction products, their extension to predicting reaction mechanisms has been impeded by the lack of a corresponding mechanistic dataset. In this study, we construct such a dataset by imputing intermediates between experimentally reported reactants and products using expert reaction templates and train several machine learning models on the resulting dataset of 5,184,184 elementary steps. We explore the performance and capabilities of these models, focusing on their ability to predict reaction pathways and recapitulate the roles of catalysts and reagents. Additionally, we demonstrate the potential of mechanistic models in predicting impurities, often overlooked by conventional models. We conclude by evaluating the generalizability of mechanistic models to new reaction types, revealing challenges related to dataset diversity, consecutive predictions, and violations of atom conservation.more » « less
An official website of the United States government

