This content will become publicly available on May 2, 2023
- Wahl, Lindi
- Award ID(s):
- Publication Date:
- NSF-PAR ID:
- Journal Name:
- PLOS Genetics
- Page Range or eLocation-ID:
- Sponsoring Org:
- National Science Foundation
More Like this
Abstract Background Investigation of outbreaks to identify the primary case is crucial for the interruption and prevention of transmission of infectious diseases. These individuals may have a higher risk of participating in near future transmission events when compared to the other patients in the outbreak, so directing more transmission prevention resources towards these individuals is a priority. Although the genetic characterization of intra-host viral populations can aid the identification of transmission clusters, it is not trivial to determine the directionality of transmissions during outbreaks, owing to complexity of viral evolution. Here, we present a new computational framework, PYCIVO: primary case inference in viral outbreaks. This framework expands upon our earlier work in development of QUENTIN, which builds a probabilistic disease transmission tree based on simulation of evolution of intra-host hepatitis C virus (HCV) variants between cases involved in direct transmission during an outbreak. PYCIVO improves upon QUENTIN by also adding a custom heterogeneity index and identifying the scenario when the primary case may have not been sampled. Results These approaches were validated using a set of 105 sequence samples from 11 distinct HCV transmission clusters identified during outbreak investigations, in which the primary case was epidemiologically verified. Both models canmore »
Assessing cost-effectiveness of hepatitis C testing pathways in Georgia using the Hep C Testing CalculatorAbstract The cost of testing can be a substantial contributor to hepatitis C virus (HCV) elimination program costs in many low- and middle-income countries such as Georgia, resulting in the need for innovative and cost-effective strategies for testing. Our objective was to investigate the most cost-effective testing pathways for scaling-up HCV testing in Georgia. We developed a Markov-based model with a lifetime horizon that simulates the natural history of HCV, and the cost of detection and treatment of HCV. We then created an interactive online tool that uses results from the Markov-based model to evaluate the cost-effectiveness of different HCV testing pathways. We compared the current standard-of-care (SoC) testing pathway and four innovative testing pathways for Georgia. The SoC testing was cost-saving compared to no testing, but all four new HCV testing pathways further increased QALYs and decreased costs. The pathway with the highest patient follow-up, due to on-site testing, resulted in the highest discounted QALYs (123 QALY more than the SoC) and lowest costs ($127,052 less than the SoC) per 10,000 persons screened. The current testing algorithm in Georgia can be replaced with a new pathway that is more effective while being cost-saving.
Cost-Effectiveness and Value-of-Information Analysis Using Machine Learning–Based Metamodeling: A Case of Hepatitis C Treatment
Metamodels can address some of the limitations of complex simulation models by formulating a mathematical relationship between input parameters and simulation model outcomes. Our objective was to develop and compare the performance of a machine learning (ML)–based metamodel against a conventional metamodeling approach in replicating the findings of a complex simulation model.
We constructed 3 ML-based metamodels using random forest, support vector regression, and artificial neural networks and a linear regression-based metamodel from a previously validated microsimulation model of the natural history hepatitis C virus (HCV) consisting of 40 input parameters. Outcomes of interest included societal costs and quality-adjusted life-years (QALYs), the incremental cost-effectiveness (ICER) of HCV treatment versus no treatment, cost-effectiveness analysis curve (CEAC), and expected value of perfect information (EVPI). We evaluated metamodel performance using root mean squared error (RMSE) and Pearson’s R2on the normalized data.
The R2values for the linear regression metamodel for QALYs without treatment, QALYs with treatment, societal cost without treatment, societal cost with treatment, and ICER were 0.92, 0.98, 0.85, 0.92, and 0.60, respectively. The corresponding R2values for our ML-based metamodels were 0.96, 0.97, 0.90, 0.95, and 0.49 for support vector regression; 0.99, 0.83, 0.99, 0.99, and 0.82 for artificial neural network; and 0.99,more »
ML-based metamodels generally outperformed traditional linear regression metamodels at replicating results from complex simulation models, with random forest metamodels performing best.
Decision-analytic models are frequently used by policy makers and other stakeholders to assess the impact of new medical technologies and interventions. However, complex models can impose limitations on conducting probabilistic sensitivity analysis and value-of-information analysis, and may not be suitable for developing online decision-support tools. Metamodels, which accurately formulate a mathematical relationship between input parameters and model outcomes, can replicate complex simulation models and address the above limitation. The machine learning–based random forest model can outperform linear regression in replicating the findings of a complex simulation model. Such a metamodel can be used for conducting cost-effectiveness and value-of-information analyses or developing online decision support tools.
Comparative Analysis of Within-Host Mutation Patterns and Diversity of Hepatitis C Virus Subtypes 1a, 1b, and 3aUnderstanding within-host evolution is critical for predicting viral evolutionary outcomes, yet such studies are currently lacking due to difficulty involving human subjects. Hepatitis C virus (HCV) is an RNA virus with high mutation rates. Its complex evolutionary dynamics and extensive genetic diversity are demonstrated in over 67 known subtypes. In this study, we analyzed within-host mutation frequency patterns of three HCV subtypes, using a large number of samples obtained from treatment-naïve participants by next-generation sequencing. We report that overall mutation frequency patterns are similar among subtypes, yet subtype 3a consistently had lower mutation frequencies and nucleotide diversity, while subtype 1a had the highest. We found that about 50% of genomic sites are highly conserved across subtypes, which are likely under strong purifying selection. We also compared within-host and between-host selective pressures, which revealed that Hyper Variable Region 1 within hosts was under positive selection, but was under slightly negative selection between hosts, which indicates that many mutations created within hosts are removed during the transmission bottleneck. Examining the natural prevalence of known resistance-associated variants showed their consistent existence in the treatment-naïve participants. These results provide insights into the differences and similarities among HCV subtypes that may be used to developmore »
Data-driven supervised learning of a viral protease specificity landscape from deep sequencing and molecular simulations
Biophysical interactions between proteins and peptides are key determinants of molecular recognition specificity landscapes. However, an understanding of how molecular structure and residue-level energetics at protein−peptide interfaces shape these landscapes remains elusive. We combine information from yeast-based library screening, next-generation sequencing, and structure-based modeling in a supervised machine learning approach to report the comprehensive sequence−energetics−function mapping of the specificity landscape of the hepatitis C virus (HCV) NS3/4A protease, whose function—site-specific cleavages of the viral polyprotein—is a key determinant of viral fitness. We screened a library of substrates in which five residue positions were randomized and measured cleavability of ∼30,000 substrates (∼1% of the library) using yeast display and fluorescence-activated cell sorting followed by deep sequencing. Structure-based models of a subset of experimentally derived sequences were used in a supervised learning procedure to train a support vector machine to predict the cleavability of 3.2 million substrate variants by the HCV protease. The resulting landscape allows identification of previously unidentified HCV protease substrates, and graph-theoretic analyses reveal extensive clustering of cleavable and uncleavable motifs in sequence space. Specificity landscapes of known drug-resistant variants are similarly clustered. The described approach should enable the elucidation and redesign of specificity landscapes of a wide varietymore »