Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Marshall, Christopher W. (Ed.)ABSTRACT Identification of genes encoding β-lactamases (BLs) from short-read sequences remains challenging due to the high frequency of shared amino acid functional domains and motifs in proteins encoded by BL genes and related non-BL gene sequences. Divergent BL homologs can be frequently missed during similarity searches, which has important practical consequences for monitoring antibiotic resistance. To address this limitation, we built ROCker models that targeted broad classes (e.g., class A, B, C, and D) and individual families (e.g., TEM) of BLs and challenged them with mock 150-bp- and 250-bp-read data sets of known composition. ROCker identifies most-discriminant bit score thresholds in sliding windows along the sequence of the target protein sequence and hence can account for nondiscriminative domains shared by unrelated proteins. BL ROCker models showed a 0% false-positive rate (FPR), a 0% to 4% false-negative rate (FNR), and an up-to-50-fold-higher F1 score [2 × precision × recall/(precision + recall)] compared to alternative methods, such as similarity searches using BLASTx with various e-value thresholds and BL hidden Markov models, or tools like DeepARG, ShortBRED, and AMRFinder. The ROCker models and the underlying protein sequence reference data sets and phylogenetic trees for read placement are freely available through http://enve-omics.ce.gatech.edu/data/rocker-bla . Applicationmore »Free, publicly-accessible full text available June 28, 2023
-
Dense subgraph detection is a fundamental building block for a va- riety of applications. Most of the existing methods aim to discover dense subgraphs within either a single network or a multi-view network while ignoring the informative node dependencies across multiple layers of networks in a complex system. To date, it largely remains a daunting task to detect dense subgraphs on multi-layered networks. In this paper, we formulate the problem of dense sub- graph detection on multi-layered networks based on cross-layer consistency principle. We further propose a novel algorithm Des- tine based on projected gradient descent with the following ad- vantages. First, armed with the cross-layer dependencies, Destine is able to detect significantly more accurate and meaningful dense subgraphs at each layer. Second, it scales linearly w.r.t. the num- ber of links in the multi-layered network. Extensive experiments demonstrate the efficacy of the proposed Destine algorithm in various cases.
-
Networks (i.e., graphs) are often collected from multiple sources and platforms, such as social networks extracted from multiple online platforms, team-specific collaboration networks within an organization, and inter-dependent infrastructure networks, etc. Such networks from different sources form the multi-networks, which can exhibit the unique patterns that are invisible if we mine the individual network separately. However, compared with single-network mining, multi-network mining is still under-explored due to its unique challenges. First ( multi-network models ), networks under different circumstances can be modeled into a variety of models. How to properly build multi-network models from the complex data? Second ( multi-network mining algorithms ), it is often nontrivial to either extend single-network mining algorithms to multi-networks or design new algorithms. How to develop effective and efficient mining algorithms on multi-networks? The objectives of this tutorial are to: (1) comprehensively review the existing multi-network models, (2) elaborate the techniques in multi-network mining with a special focus on recent advances, and (3) elucidate open challenges and future research directions. We believe this tutorial could be beneficial to various application domains, and attract researchers and practitioners from data mining as well as other interdisciplinary fields.
-
Cann, Isaac (Ed.)ABSTRACT Arsenic (As) metabolism genes are generally present in soils, but their diversity, relative abundance, and transcriptional activity in response to different As concentrations remain unclear, limiting our understanding of the microbial activities that control the fate of an important environmental pollutant. To address this issue, we applied metagenomics and metatranscriptomics to paddy soils showing a gradient of As concentrations to investigate As resistance genes ( ars ) including arsR , acr3 , arsB , arsC , arsM , arsI , arsP , and arsH as well as energy-generating As respiratory oxidation ( aioA ) and reduction ( arrA ) genes. Somewhat unexpectedly, the relative DNA abundances and diversities of ars , aioA , and arrA genes were not significantly different between low and high (∼10 versus ∼100 mg kg −1 ) As soils. Compared to available metagenomes from other soils, geographic distance rather than As levels drove the different compositions of microbial communities. Arsenic significantly increased ars gene abundance only when its concentration was higher than 410 mg kg −1 . In contrast, metatranscriptomics revealed that relative to low-As soils, high-As soils showed a significant increase in transcription of ars and aioA genes, which are induced by arsenite, themore »
-
Network alignment plays an important role in a variety of applications. Many traditional methods explicitly or implicitly assume the alignment consistency which might suffer from over-smoothness, whereas some recent embedding based methods could somewhat embrace the alignment disparity by sampling negative alignment pairs. However, under different or even competing designs of negative sampling distributions, some methods advocate positive correlation which could result in false negative samples incorrectly violating the alignment consistency, whereas others champion negative correlation or uniform distribution to sample nodes which may contribute little to learning meaningful embeddings. In this paper, we demystify the intrinsic relationships behind various network alignment methods and between these competing design principles of sampling. Specifically, in terms of model design, we theoretically reveal the close connections between a special graph convolutional network model and the traditional consistency based alignment method. For model training, we quantify the risk of embedding learning for network alignment with respect to the sampling distributions. Based on these, we propose NeXtAlign which strikes a balance between alignment consistency and disparity. We conduct extensive experiments that demonstrate the proposed method achieves significant improvements over the state-of-the-arts.
-
Species retaining ancestral features, such as species called living fossils, are often regarded as less derived than their sister groups, but such discussions are usually based on qualitative enumeration of conserved traits. This approach creates a major barrier, especially when quantifying the degree of phenotypic evolution or degree of derivedness, since it focuses only on commonly shared traits, and newly acquired or lost traits are often overlooked. To provide a potential solution to this problem, especially for inter-species comparison of gene expression profiles, we propose a new method named “derivedness index” to quantify the degree of derivedness. In contrast to the conservation-based approach, which deals with expressions of commonly shared genes among species being compared, the derivedness index also considers those that were potentially lost or duplicated during evolution. By applying our method, we found that the gene expression profiles of penta-radial phases in echinoderm tended to be more highly derived than those of the bilateral phase. However, our results suggest that echinoderms may not have experienced much larger modifications to their developmental systems than chordates, at least at the transcriptomic level. In vertebrates, we found that the mid-embryonic and organogenesis stages were generally less derived than the earlier ormore »
-
Ranking on networks plays an important role in many high-impact applications, including recommender systems, social network analysis, bioinformatics and many more. In the age of big data, a recent trend is to address the variety aspect of network ranking. Among others, two representative lines of research include (1) heterogeneous information network with different types of nodes and edges, and (2) network of networks with edges at different resolutions. In this paper, we propose a new network model named Network of Heterogeneous Information Networks (NeoHIN for short) that is capable of simultaneously modeling both different types of nodes/edges, and different edge resolutions. We further propose two new ranking algorithms on NeoHIN based on the cross-domain consistency principle. Experiments on synthetic and real-world networks show that our proposed algorithms are (1) effective, which outperform other existing methods, and (2) efficient, without additional time cost per iteration to their counterparts.
-
Network alignment is a fundamental task in many high-impact applications. Most of the existing approaches either explicitly or implicitly consider the alignment matrix as a linear transformation to map one network to another, and might overlook the complicated alignment relationship across networks. On the other hand, node representation learning based alignment methods are hampered by the incomparability among the node representations of different networks. In this paper, we propose a unified semi-supervised deep model (ORIGIN) that simultaneously finds the non-rigid network alignment and learns node representations in multiple networks in a mutually beneficial way. The key idea is to learn node representations by the effective graph convolutional networks, which subsequently enable us to formulate network alignment as a point set alignment problem. The proposed method offers two distinctive advantages. First (node representations), unlike the existing graph convolutional networks that aggregate the node information within a single network, we can effectively aggregate the auxiliary information from multiple sources, achieving far-reaching node representations. Second (network alignment), guided by the highquality node representations, our proposed non-rigid point set alignment approach overcomes the bottleneck of the linear transformation assumption. We conduct extensive experiments that demonstrate the proposed non-rigid alignment method is (1) effective, outperformingmore »