Search for: All records

Creators/Authors contains: "Zelikovsky, Alexander"

« Prev Next »

Total Resources

7

Resource Type
Conference Paper

0

Conference Proceeding

0

Dataset

0

Journal Article

7

Workshop Report

0

Availability
Full Text / Resource Available

4

Citation Only

3

Save Results
Excel (limit 2000)
CSV (limit 5000)
XML (limit 5000)

Have feedback or suggestions for a way to improve these results?
!

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Benchmarking machine learning robustness in Covid-19 genome sequence classification

https://doi.org/10.1038/s41598-023-31368-3

Ali, Sarwan ; Sahoo, Bikram ; Zelikovsky, Alexander ; Chen, Pin-Yu ; Patterson, Murray ( December 2023 , Scientific Reports)

Abstract The rapid spread of the COVID-19 pandemic has resulted in an unprecedented amount of sequence data of the SARS-CoV-2 genome—millions of sequences and counting. This amount of data, while being orders of magnitude beyond the capacity of traditional approaches to understanding the diversity, dynamics, and evolution of viruses, is nonetheless a rich resource for machine learning (ML) approaches as alternatives for extracting such important information from these data. It is of hence utmost importance to design a framework for testing and benchmarking the robustness of these ML models. This paper makes the first effort (to our knowledge) to benchmark the robustness of ML models by simulating biological sequences with errors. In this paper, we introduce several ways to perturb SARS-CoV-2 genome sequences to mimic the error profiles of common sequencing platforms such as Illumina and PacBio. We show from experiments on a wide array of ML models that some simulation-based approaches with different perturbation budgets are more robust (and accurate) than others for specific embedding methods to certain noise simulations on the input sequences. Our benchmarking framework may assist researchers in properly assessing different ML models and help them understand the behavior of the SARS-CoV-2 virus or avoid possible future pandemics.
more » « less
Free, publicly-accessible full text available December 1, 2024
Special Issue, Part I 18th International Symposium on Bioinformatics Research and Applications (ISBRA 2022)

https://doi.org/10.1089/cmb.2023.29095.az

Cai, Zhipeng ; Skums, Pavel ; Zelikovsky, Alexander ( August 2023 , Journal of Computational Biology)

Free, publicly-accessible full text available August 1, 2024
Assessing the Resilience of Machine Learning Classification Algorithms on SARS-CoV-2 Genome Sequences Generated with Long-Read Specific Errors

https://doi.org/10.3390/biom13060934

Sahoo, Bikram ; Ali, Sarwan ; Chen, Pin-Yu ; Patterson, Murray ; Zelikovsky, Alexander ( June 2023 , Biomolecules)

The emergence of third-generation single-molecule sequencing (TGS) technology has revolutionized the generation of long reads, which are essential for genome assembly and have been widely employed in sequencing the SARS-CoV-2 virus during the COVID-19 pandemic. Although long-read sequencing has been crucial in understanding the evolution and transmission of the virus, the high error rate associated with these reads can lead to inadequate genome assembly and downstream biological interpretation. In this study, we evaluate the accuracy and robustness of machine learning (ML) models using six different embedding techniques on SARS-CoV-2 error-incorporated genome sequences. Our analysis includes two types of error-incorporated genome sequences: those generated using simulation tools to emulate error profiles of long-read sequencing platforms and those generated by introducing random errors. We show that the spaced k-mers embedding method achieves high accuracy in classifying error-free SARS-CoV-2 genome sequences, and the spaced k-mers and weighted k-mers embedding methods are highly accurate in predicting error-incorporated sequences. The fixed-length vectors generated by these methods contribute to the high accuracy achieved. Our study provides valuable insights for researchers to effectively evaluate ML models and gain a better understanding of the approach for accurate identification of critical SARS-CoV-2 genome sequences.
more » « less
Free, publicly-accessible full text available June 1, 2024
17th International Symposium on Bioinformatics Research and Applications (ISBRA 2021)

https://doi.org/10.1089/cmb.2022.29070.zc

Cai, Zhipeng ; Skums, Pavel ; Zelikovsky, Alexander ( October 2022 , Journal of Computational Biology)

Full Text Available
Special Issue: 16th International Symposium on Bioinformatics Research and Applications (ISBRA 2020)

https://doi.org/10.1089/cmb.2021.29041.zc

Cai, Zhipeng ; Skums, Pavel ; Porozov, Yuri ; Zelikovsky, Alexander ( August 2021 , Journal of Computational Biology)

Full Text Available
Special Issue: 16th International Symposium on Bioinformatics Research and Applications (ISBRA 2020)

https://doi.org/10.1089/cmb.2021.29038.zc

Cai, Zhipeng ; Skums, Pavel ; Porozov, Yuri ; Zelikovsky, Alexander ( July 2021 , Journal of Computational Biology)

Full Text Available
Preface Special Issue: 15th International Symposium on Bioinformatics Research and Applications (ISBRA 2019)

https://doi.org/10.1089/cmb.2019.29024.zc

Cai, Zhipeng ; Skums, Pavel ; Zelikovsky, Alexander ( February 2020 , Journal of Computational Biology)
null (Ed.)
Full Text Available