skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Improving RNA Branching Predictions: Advances and Limitations
Minimum free energy prediction of RNA secondary structures is based on the Nearest Neighbor Thermodynamics Model. While such predictions are typically good, the accuracy can vary widely even for short sequences, and the branching thermodynamics are an important factor in this variance. Recently, the simplest model for multiloop energetics—a linear function of the number of branches and unpaired nucleotides—was found to be the best. Subsequently, a parametric analysis demonstrated that per family accuracy can be improved by changing the weightings in this linear function. However, the extent of improvement was not known due to the ad hoc method used to find the new parameters. Here we develop a branch-and-bound algorithm that finds the set of optimal parameters with the highest average accuracy for a given set of sequences. Our analysis shows that the previous ad hoc parameters are nearly optimal for tRNA and 5S rRNA sequences on both training and testing sets. Moreover, cross-family improvement is possible but more difficult because competing parameter regions favor different families. The results also indicate that restricting the unpaired nucleotide penalty to small values is warranted. This reduction makes analyzing longer sequences using the present techniques more feasible.  more » « less
Award ID(s):
1815832 1815044
PAR ID:
10292289
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Genes
Volume:
12
Issue:
4
ISSN:
2073-4425
Page Range / eLocation ID:
469
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. This paper studies a security problem for a class cloud-connected multi-agent systems, where autonomous agents coordinate via a combination of short-range ad-hoc communication links and long-range cloud services. We consider a simplified model for the dynamics of a cloud-connected multi-agent system and attacks, where the states evolve according to linear time-invariant impulsive dynamics, and attacks are modeled as exogenous inputs designed by an omniscent attacker that alters the continuous and impulsive updates. We propose a definition of attack detectability, characterize the existence of stealthy attacks as a function of the system parameters and attack properties, and design a family of undetectable attacks. We illustrate our results on a cloud-based surveillance example. 
    more » « less
  2. Grammar induction, the task of learning a set of syntactic rules from minimally annotated training data, provides a means of exploring the longstanding question of whether humans rely on innate knowledge to acquire language. Of the various formalisms available for grammar induction, categorial grammars provide an appealing option due to their transparent interface between syntax and semantics. However, to obtain competitive results, previous categorial grammar inducers have relied on shortcuts such as part-of-speech annotations or an ad hoc bias term in the objective function to ensure desirable branching behavior. We present a categorial grammar inducer that eliminates both shortcuts: it learns from raw data, and does not rely on a biased objective function. This improvement is achieved through a novel stochastic process used to select the set of available syntactic categories. On a corpus of English child-directed speech, the model attains a recall-homogeneity of 0.48, a large improvement over previous categorial grammar inducers. 
    more » « less
  3. Abstract Alzheimer’s Disease (AD) is a progressive neurodegenerative disorder, posing a growing public health challenge. Traditional machine learning models for AD prediction have relied on single omics data or phenotypic assessments, limiting their ability to capture the disease’s molecular complexity and resulting in poor performance. Recent advances in high-throughput multi-omics have provided deeper biological insights. However, due to the scarcity of paired omics datasets, existing multi-omics AD prediction models rely on unpaired omics data, where different omics profiles are combined without being derived from the same biological sample, leading to biologically less meaningful pairings and causing less accurate predictions. To address these issues, we propose UnCOT-AD, a novel deep learning framework for Unpaired Cross-Omics Translation enabling effective multi-omics integration for AD prediction. Our method introduces the first-ever cross-omics translation model trained on unpaired omics datasets, using two coupled Variational Autoencoders and a novel cycle consistency mechanism to ensure accurate bidirectional translation between omics types. We integrate adversarial training to ensure that the generated omics profiles are biologically realistic. Moreover, we employ contrastive learning to capture the disease specific patterns in latent space to make the cross-omics translation more accurate and biologically relevant. We rigorously validate UnCOT-AD on both cross-omics translation and AD prediction tasks. Results show that UnCOT-AD empowers multi-omics based AD prediction by combining real omics profiles with corresponding omics profiles generated by our cross-omics translation module and achieves state-of-the-art performance in accuracy and robustness. Source code is available at https://github.com/abrarrahmanabir/UnCOT-AD 
    more » « less
  4. The emerging connected and autonomous vehicles (CAVs) challenge ad hoc wireless multi-hop communications by mobility, large-scale, new data acquisition and computing patterns. The Named Data Networking (NDN) is suitable for such vehicle ad hoc networks due to its information centric networking approach. However, flooding interest packets in ad-hoc NDN can lead to broadcast storm issue. Existing solutions will either increase the number of redundant interest packets or need a global knowledge about data producers. In this paper, a Location-Based Deferred Broadcast (LBDB) scheme is introduced to improve the efficiency and performance of interest broadcast in ad-hoc NDN. The scheme takes advantage of location information to set up timers when rebroadcasting an interest. The LBDB is implemented in V-NDN network architecture using ndnSIM simulator. Comparisons with several existing protocols are conducted in simulation. The results show that LBDB improves the overhead, the average number of hops and delay while maintaining an average satisfaction ratio when compared with several other broadcast schemes. The improvement can help offer timely data acquisition for quick responses in emergent CAV application situations. 
    more » « less
  5. In a previous study, we introduced a new computational protocol to accurately predict the index of refraction (RI) of organic polymers using a combination of first-principles and data modeling. This protocol is based on the Lorentz–Lorenz equation and involves the calculation of static polarizabilities and number densities of oligomer sequences, which are extrapolated to the polymer limit. We chose to compute the polarizabilities within the density functional theory (DFT) framework using the PBE0/def2-TZVP-D3 model chemistry. While this ad hoc choice proved remarkably successful, it is also relatively expensive from a computational perspective. It represents the bottleneck step in the overall RI modeling protocol, thus limiting its utility for virtual high-throughput screening studies, in which efficiency is essential. For polymers that exhibit late-onset extensivity, the employed linear extrapolation scheme can require demanding calculations on long-oligomer sequences, thus becoming another bottleneck. In the work presented here, we benchmark DFT model chemistries to identify approaches that optimize the balance between accuracy and efficiency for this application domain. We compare results for conjugated and non-conjugated polymers, augment our original extrapolation approach with a non-linear option, analyze how the polarizability errors propagate into the RI predictions, and offer guidance for method selection. 
    more » « less