Classification of DNA Sequences: Performance Evaluation of Multiple Machine Learning Methods

Wang, Yiren; Khandelwal, Vikram; Das, Arindam K.; Anantram, M.P.

doi:10.1109/NANO54668.2022.9928773

Citation Details

Classification of DNA Sequences: Performance Evaluation of Multiple Machine Learning Methods

Polymerase chain reaction (PCR) has long been the mainstay in genetic sequencing and identification. Irrespective of whether short read or long read technologies are adopted, PCR methods are generally time consuming and expensive. Recently, an all-electronic approach, the so-called Single Molecule Break Junction (SMBJ) method, has been proposed as a possible alternative to PCR. In this article, we evaluate the performance of four different classifier models on the current signatures of ten short strand sequences, including a pair that differs by a single mismatch. We find that a gradient boosted tree classifier model achieves impressive accuracies, ranging from approximately 96% for molecules differing by a single mismatch to 99.5% otherwise. more »

Award ID(s):: 1807391

PAR ID:: 10467293

Author(s) / Creator(s):: Wang, Yiren; Khandelwal, Vikram; Das, Arindam K.; Anantram, M.P.

Publisher / Repository:: IEEE

Date Published:: 2022-07-04

ISSN:: 1944-9380

ISBN:: 978-1-6654-5225-0

Page Range / eLocation ID:: 333 to 336

Format(s):: Medium: X

Location:: Palma de Mallorca, Spain

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1109/NANO54668.2022.9928773

More Like this