Optimizing genomic sampling for demographic and epidemiological inference with Markov decision processes

Rasmussen, David A (ORCID:0000000194577561); Bursell, Madeline G; Burkhart, Frank

doi:10.1093/genetics/iyaf244

Abstract Inferences from population genomic data provide valuable insights into the demographic history of a population. Likewise, in genomic epidemiology, pathogen genomic data provide key insights into epidemic dynamics and potential sources of transmission. Yet, predicting what information will be gained from genomic data about variables of interest and how different sampling strategies will impact the quality of downstream inferences remains challenging. As a result, population genomics and related fields such as phylodynamics and phylogeography largely lack theory to guide decisions on how best to sample individuals for genomic sequencing. By adopting a sequential decision making framework based on Markov decision processes, we model how sampling interacts with a population’s demographic history to shape the ancestral or genealogical relationships of sampled individuals. By probabilistically considering these ancestral relationships, we can use Markov decision processes to predict the expected value of sampling in terms of information gained about estimated variables. This in turn allows us to very efficiently explore and identify optimal sampling strategies even when the informational value of sampling depends on past or future sampling events. To illustrate our framework, we develop Markov decision processes for three common demographic and epidemiological inference problems: estimating population growth rates, minimizing the transmission distance between sampled individuals and estimating migration rates between subpopulations. In each case, the Markov decision process allows us to identify optimal sampling strategies that maximize the information gained from genomic data while minimizing the associated costs of sampling.

More Like this