skip to main content


Title: A weak‐signal‐assisted procedure for variable selection and statistical inference with an informative subsample
Abstract

This paper is motivated from an HIV‐1 drug resistance study where we encounter three analytical challenges: to analyze data with an informative subsample, to take into account the weak signals, and to detect important signals and also conduct statistical inference. We start with an initial estimation method, which adopts a penalized pairwise conditional likelihood approach for variable selection. This initial estimator incorporates the informative subsample issue. To accounting for the effect of weak signals, we use a key idea of partial ridge regression. We also propose a one‐step estimation method for each of the signal coefficients and then construct confidence intervals accordingly. We apply the proposed method to the Stanford HIV‐1 drug resistance study and compare the results with existing approaches. We also conduct comprehensive simulation studies to demonstrate the superior performance of our proposed method.

 
more » « less
Award ID(s):
2019461
NSF-PAR ID:
10450068
Author(s) / Creator(s):
 ;  ;  ;  
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Biometrics
Volume:
77
Issue:
3
ISSN:
0006-341X
Format(s):
Medium: X Size: p. 996-1010
Size(s):
["p. 996-1010"]
Sponsoring Org:
National Science Foundation
More Like this
  1. AIDS is a syndrome caused by the HIV. During the progression of AIDS, a patient's immune system is weakened, which increases the patient's susceptibility to infections and diseases. Although antiretroviral drugs can effectively suppress HIV, the virus mutates very quickly and can become resistant to treatment. In addition, the virus can also become resistant to other treatments not currently being used through mutations, which is known in the clinical research community as cross-resistance. Since a single HIV strain can be resistant to multiple drugs, this problem is naturally represented as a multilabel classification problem. Given this multilabel relationship, traditional single-label classification methods often fail to effectively identify the drug resistances that may develop after a particular virus mutation. In this work, we propose a novel multilabel Robust Sample Specific Distance (RSSD) method to identify multiclass HIV drug resistance. Our method is novel in that it can illustrate the relative strength of the drug resistance of a reverse transcriptase (RT) sequence against a given drug nucleoside analog and learn the distance metrics for all the drug resistances. To learn the proposed RSSDs, we formulate a learning objective that maximizes the ratio of the summations of a number of ℓ1-norm distances, which is difficult to solve in general. To solve this optimization problem, we derive an efficient, nongreedy iterative algorithm with rigorously proved convergence. Our new method has been verified on a public HIV type 1 drug resistance data set with over 600 RT sequences and five nucleoside analogs. We compared our method against several state-of-the-art multilabel classification methods, and the experimental results have demonstrated the effectiveness of our proposed method. 
    more » « less
  2. Acquired immunodeficiency syndrome (AIDS) is a syndrome caused by the human immunodeficiency virus (HIV). During the progression of AIDS, a patient’s the immune system is weakened, which increases the patient’s susceptibility to infections and diseases. Although antiretroviral drugs can effectively suppress HIV, the virus mutates very quickly and can become resistant to treatment. In addition, the virus can also become resistant to other treatments not currently being used through mutations, which is known in the clinical research community as cross-resistance. Since a single HIV strain can be resistant to multiple drugs, this problem is naturally represented as a multi-label classification problem. Given this multi-class relationship, traditional single-label classification methods usually fail to effectively identify the drug resistances that may develop after a particular virus mutation. In this paper, we propose a novel multi-label Robust Sample Specific Distance (RSSD) method to identify multi-class HIV drug resistance. Our method is novel in that it can illustrate the relative strength of the drug resistance of a reverse transcriptase sequence against a given drug nucleoside analogue and learn the distance metrics for all the drug resistances. To learn the proposed RSSDs, we formulate a learning objective that maximizes the ratio of the summations of a number of ℓ1-norm distances, which is difficult to solve in general. To solve this optimization problem, we derive an efficient, non-greedy, iterative algorithm with rigorously proved convergence. Our new method has been verified on a public HIV-1 drug resistance data set with over 600 RT sequences and five nucleoside analogues. We compared our method against other state-of-the-art multi-label classification methods and the experimental results have demonstrated the effectiveness of our proposed method. 
    more » « less
  3. Summary

    This article investigates a generalized semiparametric varying-coefficient model for longitudinal data that can flexibly model three types of covariate effects: time-constant effects, time-varying effects, and covariate-varying effects. Different link functions can be selected to provide a rich family of models for longitudinal data. The model assumes that the time-varying effects are unspecified functions of time and the covariate-varying effects are parametric functions of an exposure variable specified up to a finite number of unknown parameters. The estimation procedure is developed using local linear smoothing and profile weighted least squares estimation techniques. Hypothesis testing procedures are developed to test the parametric functions of the covariate-varying effects. The asymptotic distributions of the proposed estimators are established. A working formula for bandwidth selection is discussed and examined through simulations. Our simulation study shows that the proposed methods have satisfactory finite sample performance. The proposed methods are applied to the ACTG 244 clinical trial of HIV infected patients being treated with Zidovudine to examine the effects of antiretroviral treatment switching before and after HIV develops the T215Y/F drug resistance mutation. Our analysis shows benefits of treatment switching to the combination therapies as compared to continuing with ZDV monotherapy before and after developing the 215-mutation.

     
    more » « less
  4. Drug-resistant HIV-1 has caused a growing concern in clinic and public health. Although combination antiretroviral therapy can contribute massively to the suppression of viral loads in patients with HIV-1, it cannot lead to viral eradication. Continuing viral replication during sub-optimal therapy (due to poor adherence or other reasons) may lead to the accumulation of drug resistance mutations, resulting in an increased risk of disease progression. Many studies also suggest that events occurring during the early stage of HIV-1 infection (i.e., the first few hours to days following HIV exposure) may determine whether the infection can be successfully established. However, the numbers of infected cells and viruses during the early stage are extremely low and stochasticity may play a critical role in dictating the fate of infection. In this paper, we use stochastic models to investigate viral infection and the emergence of drug resistance of HIV-1. The stochastic model is formulated by a continuous-time Markov chain (CTMC), which is derived based on an ordinary differential equation model proposed by Kitayimbwa et al. that includes both forward and backward mutations. An analytic estimate of the probability of the clearance of HIV infection of the CTMC model near the infection-free equilibrium is obtained by a multitype branching process approximation. The analytical predictions are validated by numerical simulations. Unlike the deterministic dynamics where the basic reproduction number $ \mathcal{R}_0 $ serves as a sharp threshold parameter (i.e., the disease dies out if $ \mathcal{R}_0 < 1 $ and persists if $ \mathcal{R}_0 > 1 $), the stochastic models indicate that there is always a positive probability for HIV infection to be eradicated in patients. In the presence of antiretroviral therapy, our results show that the chance of clearance of the infection tends to increase although drug resistance is likely to emerge.

     
    more » « less
  5. Abstract

    HIV-1 viral transcription persists in patients despite antiretroviral treatment, potentially due to intermittent HIV-1 LTR activation. While several mathematical models have been explored in the context of LTR-protein interactions, in this work for the first time HIV-1 LTR model featuring repressed, intermediate, and activated LTR states is integrated with generation of long (env) and short (TAR) RNAs and proteins (Tat, Pr55, and p24) in T-cells and macrophages using both cell lines and infected primary cells. This type of extended modeling framework allows us to compare and contrast behavior of these two cell types. We demonstrate that they exhibit unique LTR dynamics, which ultimately results in differences in the magnitude of viral products generated. One of the distinctive features of this work is that it relies on experimental data in reaction rate computations. Two RNA transcription rates from the activated promoter states are fit by comparison of experimental data to model predictions. Fitting to the data also provides estimates for the degradation/exit rates for long and short viral RNA. Our experimentally generated data is in reasonable agreement for the T-cell as well macrophage population and gives strong evidence in support of using the proposed integrated modeling paradigm. Sensitivity analysis performed using Latin hypercube sampling method confirms robustness of the model with respect to small parameter perturbations. Finally, incorporation of a transcription inhibitor (F07#13) into the governing equations demonstrates how the model can be used to assess drug efficacy. Collectively, our model indicates transcriptional differences between latently HIV-1 infected T-cells and macrophages and provides a novel platform to study various transcriptional dynamics leading to latency or activation in numerous cell types and physiological conditions.

     
    more » « less