From Unsupervised Multi-Instance Learning to Identification of Near-Native Protein Structures

Alam, Fardina; Shehu, Amarda

doi:10.29007/pjcf

Citation Details

From Unsupervised Multi-Instance Learning to Identification of Near-Native Protein Structures

A major challenge in computational biology regards recognizing one or more biologically- active/native tertiary protein structures among thousands of physically-realistic structures generated via template-free protein structure prediction algorithms. Clustering structures based on structural similarity remains a popular approach. However, clustering orga- nizes structures into groups and does not directly provide a mechanism to select individual structures for prediction. In this paper, we provide a few algorithms for this selection prob- lem. We approach the problem under unsupervised multi-instance learning and address it in three stages, first organizing structures into bags, identifying relevant bags, and then drawing individual structures/instances from these bags. We present both non-parametric and parametric algorithms for drawing individual instances. In the latter, parameters are trained over training data and evaluated over testing data via rigorous metrics. more »

Award ID(s):: 1763233 1900061 1821154

PAR ID:: 10164978

Author(s) / Creator(s):: Alam, Fardina; Shehu, Amarda

Date Published:: 2020-03-11

Journal Name:: EPiC Series in Computing

Volume:: 70

ISSN:: 2398-7340

Page Range / eLocation ID:: 59 to 48

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.29007/pjcf

More Like this