Abstract With the progress of structural biology, the Protein Data Bank (PDB) has witnessed rapid accumulation of experimentally solved protein structures. Since many structures are determined with purification and crystallization additives that are unrelated to a protein's in vivo function, it is nontrivial to identify the subset of protein–ligand interactions that are biologically relevant. We developed the BioLiP2 database (https://zhanggroup.org/BioLiP) to extract biologically relevant protein–ligand interactions from the PDB database. BioLiP2 assesses the functional relevance of the ligands by geometric rules and experimental literature validations. The ligand binding information is further enriched with other function annotations, including Enzyme Commission numbers, Gene Ontology terms, catalytic sites, and binding affinities collected from other databases and a manual literature survey. Compared to its predecessor BioLiP, BioLiP2 offers significantly greater coverage of nucleic acid-protein interactions, and interactions involving large complexes that are unavailable in PDB format. BioLiP2 also integrates cutting-edge structural alignment algorithms with state-of-the-art structure prediction techniques, which for the first time enables composite protein structure and sequence-based searching and significantly enhances the usefulness of the database in structure-based function annotations. With these new developments, BioLiP2 will continue to be an important and comprehensive database for docking, virtual screening, and structure-based protein function analyses. 
                        more » 
                        « less   
                    
                            
                            Students authoring molecular case studies as a partial course‐based undergraduate research experience (CURE) for lab instruction
                        
                    
    
            Abstract Understanding the relationship between protein structure and function is a core‐learning goal in biochemistry. Students often struggle to visualize proteins as three‐dimensional objects that interact with other molecules to affect its biochemical consequences. We describe here a partial course‐based undergraduate research experiences that has students exploring protein structure and function hands‐on while authoring a molecular case study intended for others to use. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 1827011
- PAR ID:
- 10359940
- Publisher / Repository:
- Wiley Blackwell (John Wiley & Sons)
- Date Published:
- Journal Name:
- Biochemistry and Molecular Biology Education
- Volume:
- 49
- Issue:
- 6
- ISSN:
- 1470-8175
- Page Range / eLocation ID:
- p. 853-855
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            Abstract MotivationGiven a protein of unknown function, fast identification of similar protein structures from the Protein Data Bank (PDB) is a critical step for inferring its biological function. Such structural neighbors can provide evolutionary insights into protein conformation, interfaces and binding sites that are not detectable from sequence similarity. However, the computational cost of performing pairwise structural alignment against all structures in PDB is prohibitively expensive. Alignment-free approaches have been introduced to enable fast but coarse comparisons by representing each protein as a vector of structure features or fingerprints and only computing similarity between vectors. As a notable example, FragBag represents each protein by a ‘bag of fragments’, which is a vector of frequencies of contiguous short backbone fragments from a predetermined library. Despite being efficient, the accuracy of FragBag is unsatisfactory because its backbone fragment library may not be optimally constructed and long-range interacting patterns are omitted. ResultsHere we present a new approach to learning effective structural motif presentations using deep learning. We develop DeepFold, a deep convolutional neural network model to extract structural motif features of a protein structure. We demonstrate that DeepFold substantially outperforms FragBag on protein structural search on a non-redundant protein structure database and a set of newly released structures. Remarkably, DeepFold not only extracts meaningful backbone segments but also finds important long-range interacting motifs for structural comparison. We expect that DeepFold will provide new insights into the evolution and hierarchical organization of protein structural motifs. Availability and implementationhttps://github.com/largelymfs/DeepFoldmore » « less
- 
            Abstract Predicting protein function from protein sequence, structure, interaction, and other relevant information is important for generating hypotheses for biological experiments and studying biological systems, and therefore has been a major challenge in protein bioinformatics. Numerous computational methods had been developed to advance protein function prediction gradually in the last two decades. Particularly, in the recent years, leveraging the revolutionary advances in artificial intelligence (AI), more and more deep learning methods have been developed to improve protein function prediction at a faster pace. Here, we provide an in‐depth review of the recent developments of deep learning methods for protein function prediction. We summarize the significant advances in the field, identify several remaining major challenges to be tackled, and suggest some potential directions to explore. The data sources and evaluation metrics widely used in protein function prediction are also discussed to assist the machine learning, AI, and bioinformatics communities to develop more cutting‐edge methods to advance protein function prediction.more » « less
- 
            Abstract Deep learning techniques have significantly advanced the field of protein structure prediction. LOMETS3 (https://zhanglab.ccmb.med.umich.edu/LOMETS/) is a new generation meta-server approach to template-based protein structure prediction and function annotation, which integrates newly developed deep learning threading methods. For the first time, we have extended LOMETS3 to handle multi-domain proteins and to construct full-length models with gradient-based optimizations. Starting from a FASTA-formatted sequence, LOMETS3 performs four steps of domain boundary prediction, domain-level template identification, full-length template/model assembly and structure-based function prediction. The output of LOMETS3 contains (i) top-ranked templates from LOMETS3 and its component threading programs, (ii) up to 5 full-length structure models constructed by L-BFGS (limited-memory Broyden–Fletcher–Goldfarb–Shanno algorithm) optimization, (iii) the 10 closest Protein Data Bank (PDB) structures to the target, (iv) structure-based functional predictions, (v) domain partition and assembly results, and (vi) the domain-level threading results, including items (i)–(iii) for each identified domain. LOMETS3 was tested in large-scale benchmarks and the blind CASP14 (14th Critical Assessment of Structure Prediction) experiment, where the overall template recognition and function prediction accuracy is significantly beyond its predecessors and other state-of-the-art threading approaches, especially for hard targets without homologous templates in the PDB. Based on the improved developments, LOMETS3 should help significantly advance the capability of broader biomedical community for template-based protein structure and function modelling.more » « less
- 
            Abstract Biochemistry is about structure and function, but it is also about data and this is where computers come in. From my time as a graduate student and post doc, whenever I encountered data I thought, “I can work this up by hand, but I think a computer could do a better job.” Since that time, I have been working at the interface of biochemistry and computers, by attracting talented students and collaborating with colleagues with complementary skills. This has resulted in several exciting projects: a simulation of 2D electrophoresis and tandem mass spectrometry, the human visualization project, and two different programs that enable biochemists to search protein structures for enzyme active sites: ProMOL (promol.org) and Moltimate (moltimate.appspot.com). The human side of software development for education involved finding the right students and colleagues, communicating effectively across disciplines, building and managing effective teams and the importance of serendipity throughout the process.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
