NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Applications of 3D Zernike Descriptors in Protein Structure Comparison

https://doi.org/10.2142/biophys.65.201

KAGAYA, Yuki; KIHARA, Daisuke (January 2025, Seibutsu Butsuri)

Full Text Available
SHREC 2025: Protein surface shape retrieval including electrostatic potential

https://doi.org/10.1016/j.cag.2025.104394

Yacoub, Taher; Depenveiller, Camille; Tatsuma, Atsushi; Barisin, Tin; Rusakov, Eugen; Göbel, Udo; Peng, Yuxu; Deng, Shiqiang; Kagaya, Yuki; Park, Joon Hong; et al (November 2025, Computers & Graphics)

Free, publicly-accessible full text available November 1, 2026
Genetic adaptation despite high gene flow in a range‐expanding population

https://doi.org/10.1111/mec.17511

Lee, Andy; Daniels, Benjamin N; Hemstrom, William; López, Cataixa; Kagaya, Yuki; Kihara, Daisuke; Davidson, Jean M; Toonen, Robert J; White, Crow; Christie, Mark R (August 2024, Molecular Ecology)

Abstract Signals of natural selection can be quickly eroded in high gene flow systems, curtailing efforts to understand how and when genetic adaptation occurs in the ocean. This long‐standing, unresolved topic in ecology and evolution has renewed importance because changing environmental conditions are driving range expansions that may necessitate rapid evolutionary responses. One example occurs in Kellet's whelk (Kelletia kelletii), a common subtidal gastropod with an ~40‐ to 60‐day pelagic larval duration that expanded their biogeographic range northwards in the 1970s by over 300 km. To test for genetic adaptation, we performed a series of experimental crosses with Kellet's whelk adults collected from their historical (HxH) and recently expanded range (ExE), and conducted RNA‐Seq on offspring that we reared in a common garden environment. We identified 2770 differentially expressed genes (DEGs) between 54 offspring samples with either only historical range (HxH offspring) or expanded range (ExE offspring) ancestry. Using SNPs called directly from the DEGs, we assigned samples of known origin back to their range of origin with unprecedented accuracy for a marine species (92.6% and 94.5% for HxH and ExE offspring, respectively). The SNP with the highest predictive importance occurred on triosephosphate isomerase (TPI), an essential metabolic enzyme involved in cold stress response.TPIwas significantly upregulated and contained a non‐synonymous mutation in the expanded range. Our findings pave the way for accurately identifying patterns of dispersal, gene flow and population connectivity in the ocean by demonstrating that experimental transcriptomics can reveal mechanisms for how marine organisms respond to changing environmental conditions.
more » « less
Full Text Available
Domain-PFP allows protein function prediction using function-aware domain embedding representations

https://doi.org/10.1038/s42003-023-05476-9

Ibtehaz, Nabil; Kagaya, Yuki; Kihara, Daisuke (October 2023, Communications Biology)

Abstract Domains are functional and structural units of proteins that govern various biological functions performed by the proteins. Therefore, the characterization of domains in a protein can serve as a proper functional representation of proteins. Here, we employ a self-supervised protocol to derive functionally consistent representations for domains by learning domain-Gene Ontology (GO) co-occurrences and associations. The domain embeddings we constructed turned out to be effective in performing actual function prediction tasks. Extensive evaluations showed that protein representations using the domain embeddings are superior to those of large-scale protein language models in GO prediction tasks. Moreover, the new function prediction method built on the domain embeddings, named Domain-PFP, substantially outperformed the state-of-the-art function predictors. Additionally, Domain-PFP demonstrated competitive performance in the CAFA3 evaluation, achieving overall the best performance among the top teams that participated in the assessment.
more » « less
ContactPFP: Protein Function Prediction Using Predicted Contact Information

https://doi.org/10.3389/fbinf.2022.896295

Kagaya, Yuki; Flannery, Sean T.; Jain, Aashish; Kihara, Daisuke (June 2022, Frontiers in Bioinformatics)

Computational function prediction is one of the most important problems in bioinformatics as elucidating the function of genes is a central task in molecular biology and genomics. Most of the existing function prediction methods use protein sequences as the primary source of input information because the sequence is the most available information for query proteins. There are attempts to consider other attributes of query proteins. Among these attributes, the three-dimensional (3D) structure of proteins is known to be very useful in identifying the evolutionary relationship of proteins, from which functional similarity can be inferred. Here, we report a novel protein function prediction method, ContactPFP, which uses predicted residue-residue contact maps as input structural features of query proteins. Although 3D structure information is known to be useful, it has not been routinely used in function prediction because the 3D structure is not experimentally determined for many proteins. In ContactPFP, we overcome this limitation by using residue-residue contact prediction, which has become increasingly accurate due to rapid development in the protein structure prediction field. ContactPFP takes a query protein sequence as input and uses predicted residue-residue contact as a proxy for the 3D protein structure. To characterize how predicted contacts contribute to function prediction accuracy, we compared the performance of ContactPFP with several well-established sequence-based function prediction methods. The comparative study revealed the advantages and weaknesses of ContactPFP compared to contemporary sequence-based methods. There were many cases where it showed higher prediction accuracy. We examined factors that affected the accuracy of ContactPFP using several illustrative cases that highlight the strength of our method.
more » « less
Full Text Available
Analyzing effect of quadruple multiple sequence alignments on deep learning based protein inter-residue distance prediction

https://doi.org/10.1038/s41598-021-87204-z

Jain, Aashish; Terashi, Genki; Kagaya, Yuki; Maddhuri Venkata Subramaniya, Sai Raghavendra; Christoffer, Charles; Kihara, Daisuke (December 2021, Scientific Reports)
null (Ed.)
Abstract Protein 3D structure prediction has advanced significantly in recent years due to improving contact prediction accuracy. This improvement has been largely due to deep learning approaches that predict inter-residue contacts and, more recently, distances using multiple sequence alignments (MSAs). In this work we present AttentiveDist, a novel approach that uses different MSAs generated with different E-values in a single model to increase the co-evolutionary information provided to the model. To determine the importance of each MSA’s feature at the inter-residue level, we added an attention layer to the deep neural network. We show that combining four MSAs of different E-value cutoffs improved the model prediction performance as compared to single E-value MSA features. A further improvement was observed when an attention layer was used and even more when additional prediction tasks of bond angle predictions were added. The improvement of distance predictions were successfully transferred to achieve better protein tertiary structure modeling.
more » « less
Full Text Available
Protein contact map refinement for improving structure prediction using generative adversarial networks

https://doi.org/10.1093/bioinformatics/btab220

Maddhuri Venkata Subramaniya, Sai Raghavendra; Terashi, Genki; Jain, Aashish; Kagaya, Yuki; Kihara, Daisuke (March 2021, Bioinformatics)
Valencia, Alfonso (Ed.)
Abstract Motivation Protein structure prediction remains as one of the most important problems in computational biology and biophysics. In the past few years, protein residue–residue contact prediction has undergone substantial improvement, which has made it a critical driving force for successful protein structure prediction. Boosting the accuracy of contact predictions has, therefore, become the forefront of protein structure prediction. Results We show a novel contact map refinement method, ContactGAN, which uses Generative Adversarial Networks (GAN). ContactGAN was able to make a significant improvement over predictions made by recent contact prediction methods when tested on three datasets including protein structure modeling targets in CASP13 and CASP14. We show improvement of precision in contact prediction, which translated into improvement in the accuracy of protein tertiary structure models. On the other hand, observed improvement over trRosetta was relatively small, reasons for which are discussed. ContactGAN will be a valuable addition in the structure prediction pipeline to achieve an extra gain in contact prediction accuracy. Availability and implementation https://github.com/kiharalab/ContactGAN. Supplementary information Supplementary data are available at Bioinformatics online.
more » « less
Full Text Available
Impact of AlphaFold on structure prediction of protein complexes: The CASP15‐CAPRI experiment

https://doi.org/10.1002/prot.26609

Lensink, Marc_F; Brysbaert, Guillaume; Raouraoua, Nessim; Bates, Paul_A; Giulini, Marco; Honorato, Rodrigo_V; van_Noort, Charlotte; Teixeira, Joao_M_C; Bonvin, Alexandre_M_J_J; Kong, Ren; et al (October 2023, Proteins: Structure, Function, and Bioinformatics)

Abstract We present the results for CAPRI Round 54, the 5th joint CASP‐CAPRI protein assembly prediction challenge. The Round offered 37 targets, including 14 homodimers, 3 homo‐trimers, 13 heterodimers including 3 antibody–antigen complexes, and 7 large assemblies. On average ~70 CASP and CAPRI predictor groups, including more than 20 automatics servers, submitted models for each target. A total of 21 941 models submitted by these groups and by 15 CAPRI scorer groups were evaluated using the CAPRI model quality measures and the DockQ score consolidating these measures. The prediction performance was quantified by a weighted score based on the number of models of acceptable quality or higher submitted by each group among their five best models. Results show substantial progress achieved across a significant fraction of the 60+ participating groups. High‐quality models were produced for about 40% of the targets compared to 8% two years earlier. This remarkable improvement is due to the wide use of the AlphaFold2 and AlphaFold2‐Multimer software and the confidence metrics they provide. Notably, expanded sampling of candidate solutions by manipulating these deep learning inference engines, enriching multiple sequence alignments, or integration of advanced modeling tools, enabled top performing groups to exceed the performance of a standard AlphaFold2‐Multimer version used as a yard stick. This notwithstanding, performance remained poor for complexes with antibodies and nanobodies, where evolutionary relationships between the binding partners are lacking, and for complexes featuring conformational flexibility, clearly indicating that the prediction of protein complexes remains a challenging problem.
more » « less

Search for: All records