NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Haplotype-aware variant selection for genome graphs

https://doi.org/10.1145/3535508.3545556

Tavakoli, Neda; Gibney, Daniel; Aluru, Srinivas (August 2022, ACM)

Full Text Available
A variant selection framework for genome graphs

https://doi.org/10.1093/bioinformatics/btab302

Jain, Chirag; Tavakoli, Neda; Aluru, Srinivas (July 2021, Bioinformatics)

Abstract Motivation Variation graph representations are projected to either replace or supplement conventional single genome references due to their ability to capture population genetic diversity and reduce reference bias. Vast catalogues of genetic variants for many species now exist, and it is natural to ask which among these are crucial to circumvent reference bias during read mapping. Results In this work, we propose a novel mathematical framework for variant selection, by casting it in terms of minimizing variation graph size subject to preserving paths of length α with at most δ differences. This framework leads to a rich set of problems based on the types of variants [e.g. single nucleotide polymorphisms (SNPs), indels or structural variants (SVs)], and whether the goal is to minimize the number of positions at which variants are listed or to minimize the total number of variants listed. We classify the computational complexity of these problems and provide efficient algorithms along with their software implementation when feasible. We empirically evaluate the magnitude of graph reduction achieved in human chromosome variation graphs using multiple α and δ parameter values corresponding to short and long-read resequencing characteristics. When our algorithm is run with parameter settings amenable to long-read mapping (α = 10 kbp, δ = 1000), 99.99% SNPs and 73% SVs can be safely excluded from human chromosome 1 variation graph. The graph size reduction can benefit downstream pan-genome analysis. Availability and implementation https://github.com/AT-CG/VF. Supplementary information Supplementary data are available at Bioinformatics online.
more » « less
Full Text Available
Using experiential learning to teach and learn digital forensics: Educator and student perspectives

https://doi.org/10.1016/j.caeo.2021.100045

Flores, Raymond; Siami Namin, Akbar; Tavakoli, Neda; Siami-Namini, Sima; Jones, Keith S. (December 2021, Computers and Education Open)
null (Ed.)
Full Text Available
A Concern Analysis of Federal Reserve Statements: The Great Recession vs. The COVID-19 Pandemic

https://doi.org/10.1109/BigData50022.2020.9377828

Gutierrez, Luis Felipe; Siami-Namini, Sima; Tavakoli, Neda; Namin, Akbar Siami (December 2020, 2020 IEEE International Conference on Big Data (Big Data))
null (Ed.)
Full Text Available
The Performance of LSTM and BiLSTM in Forecasting Time Series

https://doi.org/10.1109/BigData47090.2019.9005997

Siami-Namini, Sima; Tavakoli, Neda; Namin, Akbar Siami (December 2019, 2019 IEEE International Conference on Big Data (Big Data))

Machine and deep learning-based algorithms are the emerging approaches in addressing prediction problems in time series. These techniques have been shown to produce more accurate results than conventional regression-based modeling. It has been reported that artificial Recurrent Neural Networks (RNN) with memory, such as Long Short-Term Memory (LSTM), are superior compared to Autoregressive Integrated Moving Average (ARIMA) with a large margin. The LSTM-based models incorporate additional “gates” for the purpose of memorizing longer sequences of input data. The major question is that whether the gates incorporated in the LSTM architecture already offers a good prediction and whether additional training of data would be necessary to further improve the prediction. Bidirectional LSTMs (BiLSTMs) enable additional training by traversing the input data twice (i.e., 1) left-to-right, and 2) right-to-left). The research question of interest is then whether BiLSTM, with additional training capability, outperforms regular unidirectional LSTM. This paper reports a behavioral analysis and comparison of BiLSTM and LSTM models. The objective is to explore to what extend additional layers of training of data would be beneficial to tune the involved parameters. The results show that additional training of data and thus BiLSTM-based modeling offers better predictions than regular LSTM-based models. More specifically, it was observed that BiLSTM models provide better predictions compared to ARIMA and LSTM models. It was also observed that BiLSTM models reach the equilibrium much slower than LSTM-based models.
more » « less
Full Text Available
An autoencoder-based deep learning approach for clustering time series data

https://doi.org/10.1007/s42452-020-2584-8

Tavakoli, Neda; Siami-Namini, Sima; Adl Khanghah, Mahdi; Mirza Soltani, Fahimeh; Siami Namin, Akbar (May 2020, SN Applied Sciences)

Full Text Available
A Comparison of ARIMA and LSTM in Forecasting Time Series

https://doi.org/10.1109/ICMLA.2018.00227

Siami-Namini, Sima; Tavakoli, Neda; Siami Namin, Akbar (December 2018, 2018 17th IEEE International Conference on Machine Learning and Applications)

Forecasting time series data is an important subject in economics, business, and finance. Traditionally, there are several techniques to effectively forecast the next lag of time series data such as univariate Autoregressive (AR), univariate Moving Average (MA), Simple Exponential Smoothing (SES), and more notably Autoregressive Integrated Moving Average (ARIMA) with its many variations. In particular, ARIMA model has demonstrated its outperformance in precision and accuracy of predicting the next lags of time series. With the recent advancement in computational power of computers and more importantly development of more advanced machine learning algorithms and approaches such as deep learning, new algorithms are developed to analyze and forecast time series data. The research question investigated in this article is that whether and how the newly developed deep learning-based algorithms for forecasting time series data, such as “Long Short-Term Memory (LSTM)”, are superior to the traditional algorithms. The empirical studies conducted and reported in this article show that deep learning-based algorithms such as LSTM outperform traditional-based algorithms such as ARIMA model. More specifically, the average reduction in error rates obtained by LSTM was between 84 - 87 percent when compared to ARIMA indicating the superiority of LSTM to ARIMA. Furthermore, it was noticed that the number of training times, known as “epoch” in deep learning, had no effect on the performance of the trained forecast model and it exhibited a truly random behavior.
more » « less
Full Text Available
On Computing Average Common Substring Over Run Length Encoded Sequences

https://doi.org/10.3233/FI-2018-1743

Hooshmand, Sahar; Tavakoli, Neda; Abedin, Paniz; Thankachan, Sharma V.; Charalampopoulos, Panagiotis; Crochemore, Maxime; Pissis, Solon P. (November 2018, Fundamenta Informaticae)

Full Text Available

Search for: All records