Extraction of protein-protein interactions using natural language processing based pattern matching

Yu, Kaixian; Zhao, Tingting; Zhao, Peixiang; Zhang, Jinfeng Zhang

Citation Details

A significant part of our knowledge is relationships between two terms. However, most of these information is documented as unstructured text in various forms, like books, online articles and webpages. Extract those information and store them in a structured database could help people utilize these information more conveniently. In this study, we proposed a novel approach to extract the relationships information based on Nature Language Processing (NLP) and graph theoretic algorithm. Our method, Grammatical Relationship Graph for Triplets (GRGT), extracts three layers of information: the pairs of terms that have certain relationship, exactly what type of the relationship is, and what direct this relationship is. GRGT works on a grammatical graph obtained by parsed the sentence using Natural Language Processing. Patterns were extracted from the graph by shortest path among the words of interests. We have designed a decision tree to make the pattern matching. GRGT was applied to extract the protein-protein-interactions (PPIs) from biomedical literature, and obtained better precision than the best performing method in literature. Beyond extracting PPIs, our method could be easily extended to extracting relationship information between other bioentities. more »

Award ID(s):: 1743142

PAR ID:: 10057535

Author(s) / Creator(s):: Yu, Kaixian; Zhao, Tingting; Zhao, Peixiang; Zhang, Jinfeng Zhang

Date Published:: 2017-01-01

Journal Name:: 2017 IEEE International Conference on Bioinformatics and Biomedicine

Page Range / eLocation ID:: 1292-1295

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this