skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on May 19, 2026

Title: An Introduction to Network Analysis in Plant Biology
This beginner’s guide is intended for plant biologists new to network analysis. Here, we introduce key concepts and resources for researchers interested in incorporating network analysis into research, either as a stand-alone component for generating hypotheses or as a framework for examining and visualizing experimental results. Network analysis provides a powerful tool to predict gene functions. Advances in and reduced costs for systems biology techniques, such as genomics, transcriptomics, and proteomics, have generated abundant -omics data for plants; however, the functional annotation of plant genes lags. Therefore, predictions from network analysis can be a starting point to annotate genes and ultimately elucidate genotype-phenotype relationships. In this paper, we introduce networks and compare network-building resources available for plant biologists, including databases and software for network analysis. We then compare four databases available for plant biologists in more detail: AraNet, GeneMANIA, ATTED-II, and STRING. AraNet, and GeneMANIA are functional association networks, ATTED-II is a gene coexpression database, and STRING is a protein-protein interaction database. AraNet, and ATTED-II are plant-specific databases that can analyze multiple plant species, whereas GeneMANIA builds networks for Arabidopsis thaliana and non-plant species, and STRING for multiple species. Finally, we compare the performance of the four databases in predicting known and probable gene functions of the A. thaliana Nuclear Factor-Y (NF-Y) genes. We conclude that plant biologists have an invaluable resource in these databases and discuss how users can decide which type of database to use depending on their research question.  more » « less
Award ID(s):
2048410
PAR ID:
10637943
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
Authorea
Date Published:
Subject(s) / Keyword(s):
NUCLEAR FACTOR-Y, Network analysis, AraNet, GeneMANIA, ATTED-II, STRING
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract MotivationProtein function prediction, based on the patterns of connection in a protein–protein interaction (or association) network, is perhaps the most studied of the classical, fundamental inference problems for biological networks. A highly successful set of recent approaches use random walk-based low-dimensional embeddings that tend to place functionally similar proteins into coherent spatial regions. However, these approaches lose valuable local graph structure from the network when considering only the embedding. We introduce GLIDER, a method that replaces a protein–protein interaction or association network with a new graph-based similarity network. GLIDER is based on a variant of our previous GLIDE method, which was designed to predict missing links in protein–protein association networks, capturing implicit local and global (i.e. embedding-based) graph properties. ResultsGLIDER outperforms competing methods on the task of predicting GO functional labels in cross-validation on a heterogeneous collection of four human protein–protein association networks derived from the 2016 DREAM Disease Module Identification Challenge, and also on three different protein–protein association networks built from the STRING database. We show that this is due to the strong functional enrichment that is present in the local GLIDER neighborhood in multiple different types of protein–protein association networks. Furthermore, we introduce the GLIDER graph neighborhood as a way for biologists to visualize the local neighborhood of a disease gene. As an application, we look at the local GLIDER neighborhoods of a set of known Parkinson’s Disease GWAS genes, rediscover many genes which have known involvement in Parkinson’s disease pathways, plus suggest some new genes to study. Availability and implementationAll code is publicly available and can be accessed here: https://github.com/kap-devkota/GLIDER. Supplementary informationSupplementary data are available at Bioinformatics online. 
    more » « less
  2. Abstract Identifying genes that interact to confer a biological function to an organism is one of the main goals of functional genomics. High‐throughput technologies for assessment and quantification of genome‐wide gene expression patterns have enabled systems‐level analyses to infer pathways or networks of genes involved in different functions under many different conditions. Here, we leveraged the publicly available, information‐rich RNA‐Seq datasets of the model plantArabidopsis thalianato construct a gene co‐expression network, which was partitioned into clusters or modules that harbor genes correlated by expression. Gene ontology and pathway enrichment analyses were performed to assess functional terms and pathways that were enriched within the different gene modules. By interrogating the co‐expression network for genes in different modules that associate with a gene of interest, diverse functional roles of the gene can be deciphered. By mapping genes differentially expressing under a certain condition inArabidopsisonto the co‐expression network, we demonstrate the ability of the network to uncover novel genes that are likely transcriptionally active but prone to be missed by standard statistical approaches due to their falling outside of the confidence zone of detection. To our knowledge, this is the firstA. thalianaco‐expression network constructed using the entire mRNA‐Seq datasets (>20,000) available at the NCBI SRA database. The developed network can serve as a useful resource for theArabidopsisresearch community to interrogate specific genes of interest within the network, retrieve the respective interactomes, decipher gene modules that are transcriptionally altered under certain condition or stage, and gain understanding of gene functions. 
    more » « less
  3. Abstract Background Identification of genes responsible for anatomical entities is a major requirement in many fields including developmental biology, medicine, and agriculture. Current wet lab techniques used for this purpose, such as gene knockout, are high in resource and time consumption. Protein–protein interaction (PPI) networks are frequently used to predict disease genes for humans and gene candidates for molecular functions, but they are rarely used to predict genes for anatomical entities. Moreover, PPI networks suffer from network quality issues, which can be a limitation for their usage in predicting candidate genes. Therefore, we developed an integrative framework to improve the candidate gene prediction accuracy for anatomical entities by combining existing experimental knowledge about gene-anatomical entity relationships with PPI networks using anatomy ontology annotations. We hypothesized that this integration improves the quality of the PPI networks by reducing the number of false positive and false negative interactions and is better optimized to predict candidate genes for anatomical entities. We used existing Uberon anatomical entity annotations for zebrafish and mouse genes to construct gene networks by calculating semantic similarity between the genes. These anatomy-based gene networks were semantic networks, as they were constructed based on the anatomy ontology annotations that were obtained from the experimental data in the literature. We integrated these anatomy-based gene networks with mouse and zebrafish PPI networks retrieved from the STRING database and compared the performance of their network-based candidate gene predictions. Results According to evaluations of candidate gene prediction performance tested under four different semantic similarity calculation methods (Lin, Resnik, Schlicker, and Wang), the integrated networks, which were semantically improved PPI networks, showed better performances by having higher area under the curve values for receiver operating characteristic and precision-recall curves than PPI networks for both zebrafish and mouse. Conclusion Integration of existing experimental knowledge about gene-anatomical entity relationships with PPI networks via anatomy ontology improved the candidate gene prediction accuracy and optimized them for predicting candidate genes for anatomical entities. 
    more » « less
  4. To identify sets of genes that exhibit similar expression characteristics, co-expression networks were constructed from transcriptome datasets that were obtained from plant samples at various stages of growth and development or treated with diverse biotic, abiotic, and other environmental stresses. In addition, co-expression network analysis can provide deeper insights into gene regulation when combined with transcriptomics. The coordination and integration of all these complex networks to deduce gene regulation are major challenges for plant biologists. Python and R have emerged as major tools for managing complex scientific data over the past decade. In this study, we describe a reproducible protocol POTFUL (pant co-expression transcription factor regulators), implemented in Python 3, for integrating co-expression and transcription factor target protein networks to infer gene regulation. 
    more » « less
  5. Abstract FatPlants, an open-access, web-based database, consolidates data, annotations, analysis results, and visualizations of lipid-related genes, proteins, and metabolic pathways in plants. Serving as a minable resource, FatPlants offers a user-friendly interface for facilitating studies into the regulation of plant lipid metabolism and supporting breeding efforts aimed at increasing crop oil content. This web resource, developed using data derived from our own research, curated from public resources, and gleaned from academic literature, comprises information on known fatty-acid-related proteins, genes, and pathways in multiple plants, with an emphasis on Glycine max, Arabidopsis thaliana, and Camelina sativa. Furthermore, the platform includes machine-learning based methods and navigation tools designed to aid in characterizing metabolic pathways and protein interactions. Comprehensive gene and protein information cards, a Basic Local Alignment Search Tool search function, similar structure search capacities from AphaFold, and ChatGPT-based query for protein information are additional features. Database URL: https://www.fatplants.net/ 
    more » « less