skip to main content


The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 5:00 PM ET until 11:00 PM ET on Friday, June 21 due to maintenance. We apologize for the inconvenience.

Search for: All records

Creators/Authors contains: "Rao, Arvind"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Free, publicly-accessible full text available December 1, 2024
  2. Abstract

    Integrative analyses based on statistically relevant associations between genomics and a wealth of intermediary phenotypes (such as imaging) provide vital insights into their clinical relevance in terms of the disease mechanisms. Estimates for uncertainty in the resulting integrative models are however unreliable unless inference accounts for the selection of these associations with accuracy. In this paper, we develop selection-aware Bayesian methods, which (1) counteract the impact of model selection bias through a “selection-aware posterior” in a flexible class of integrative Bayesian models post a selection of promising variables via ℓ1-regularized algorithms; (2) strike an inevitable trade-off between the quality of model selection and inferential power when the same data set is used for both selection and uncertainty estimation. Central to our methodological development, a carefully constructed conditional likelihood function deployed with a reparameterization mapping provides tractable updates when gradient-based Markov chain Monte Carlo (MCMC) sampling is used for estimating uncertainties from the selection-aware posterior. Applying our methods to a radiogenomic analysis, we successfully recover several important gene pathways and estimate uncertainties for their associations with patient survival times.

    more » « less
  3. PURPOSE Lehmann et al have identified four molecular subtypes of triple-negative breast cancer (TNBC)—basal-like (BL) 1, BL2, mesenchymal (M), and luminal androgen receptor—and an immunomodulatory (IM) gene expression signature modifier. Our group previously showed that the response of TNBC to neoadjuvant systemic chemotherapy (NST) differs by molecular subtype, but whether NST affects the subtype was unknown. Here, we tested the hypothesis that in patients without pathologic complete response, TNBC subtypes can change after NST. Moreover, in cases with the changed subtype, we determined whether epithelial-to-mesenchymal transition (EMT) had occurred. MATERIALS AND METHODS From the Pan-Pacific TNBC Consortium data set containing TNBC patient samples from four countries, we examined 64 formalin-fixed, paraffin-embedded pairs of matched pre- and post-NST tumor samples. The TNBC subtype was determined using the TNBCtype-IM assay. We analyzed a partial EMT gene expression scoring metric using mRNA data. RESULTS Of the 64 matched pairs, 36 (56%) showed a change in the TNBC subtype after NST. The most frequent change was from BL1 to M subtypes (38%). No tumors changed from M to BL1. The IM signature was positive in 14 (22%) patients before NST and eight (12.5%) patients after NST. The EMT score increased after NST in 28 (78%) of the 36 patients with the changed subtype ( v 39% of the 28 patients without change; P = .002254). CONCLUSION We report, to our knowledge, for the first time that the TNBC molecular subtype and IM signature frequently change after NST. Our results also suggest that EMT is promoted by NST. Our findings may lead to innovative adjuvant therapy strategies in TNBC cases with residual tumor after NST. 
    more » « less
  4. Summary

    We propose a curve-based Riemannian geometric approach for general shape-based statistical analyses of tumours obtained from radiologic images. A key component of the framework is a suitable metric that enables comparisons of tumour shapes, provides tools for computing descriptive statistics and implementing principal component analysis on the space of tumour shapes and allows for a rich class of continuous deformations of a tumour shape. The utility of the framework is illustrated through specific statistical tasks on a data set of radiologic images of patients diagnosed with glioblastoma multiforme, a malignant brain tumour with poor prognosis. In particular, our analysis discovers two patient clusters with very different survival, subtype and genomic characteristics. Furthermore, it is demonstrated that adding tumour shape information to survival models containing clinical and genomic variables results in a significant increase in predictive power.

    more » « less
  5. Abstract

    The prognostic and therapeutic value of the tumor microenvironment (TME) in various cancer types is of major interest. Characterization of the TME often relies on a small representative tissue sample. However, the adequacy of such a sample for assessing components of the TME is not yet known. Here, we used immunohistochemical (IHC) staining and 7-color multiplex staining to evaluate CD8 (cluster of differentiation 8), CD68, PD-L1 (programmed death-ligand 1), CD34, FAP (fibroblast activation protein), and cytokeratin in 220 tissue cores from 26 high-grade serous ovarian cancer samples. Comparisons were drawn between a larger tumor specimen and smaller core biopsies based on number and location (central tumor vs. peripheral tumor) of biopsies. Our analysis found that the correlation between marker-specific cell subsets in larger tumorversussmaller core was stronger with two core biopsies and was not further strengthened with additional biopsies. Moreover, this correlation was consistently strong regardless of whether the biopsy was taken at the center or at the periphery of the original tumor sample. These findings could have a substantial impact on longitudinal assessment for detection of biomarkers in clinical trials.

    more » « less
  6. Summary

    Inferring dependence structure through undirected graphs is crucial for uncovering the major modes of multivariate interaction among high-dimensional genomic markers that are potentially associated with cancer. Traditionally, conditional independence has been studied using sparse Gaussian graphical models for continuous data and sparse Ising models for discrete data. However, there are two clear situations when these approaches are inadequate. The first occurs when the data are continuous but display non-normal marginal behavior such as heavy tails or skewness, rendering an assumption of normality inappropriate. The second occurs when a part of the data is ordinal or discrete (e.g., presence or absence of a mutation) and the other part is continuous (e.g., expression levels of genes or proteins). In this case, the existing Bayesian approaches typically employ a latent variable framework for the discrete part that precludes inferring conditional independence among the data that are actually observed. The current article overcomes these two challenges in a unified framework using Gaussian scale mixtures. Our framework is able to handle continuous data that are not normal and data that are of mixed continuous and discrete nature, while still being able to infer a sparse conditional sign independence structure among the observed data. Extensive performance comparison in simulations with alternative techniques and an analysis of a real cancer genomics data set demonstrate the effectiveness of the proposed approach.

    more » « less
  7. Biobanks linked to electronic health records provide rich resources for health‐related research. With improvements in administrative and informatics infrastructure, the availability and utility of data from biobanks have dramatically increased. In this paper, we first aim to characterize the current landscape of available biobanks and to describe specific biobanks, including their place of origin, size, and data types. The development and accessibility of large‐scale biorepositories provide the opportunity to accelerate agnostic searches, expedite discoveries, and conduct hypothesis‐generating studies of disease‐treatment, disease‐exposure, and disease‐gene associations. Rather than designing and implementing a single study focused on a few targeted hypotheses, researchers can potentially use biobanks' existing resources to answer an expanded selection of exploratory questions as quickly as they can analyze them. However, there are many obvious and subtle challenges with the design and analysis of biobank‐based studies. Our second aim is to discuss statistical issues related to biobank research such as study design, sampling strategy, phenotype identification, and missing data. We focus our discussion on biobanks that are linked to electronic health records. Some of the analytic issues are illustrated using data from the Michigan Genomics Initiative and UK Biobank, two biobanks with two different recruitment mechanisms. We summarize the current body of literature for addressing these challenges and discuss some standing open problems. This work complements and extends recent reviews about biobank‐based research and serves as a resource catalog with analytical and practical guidance for statisticians, epidemiologists, and other medical researchers pursuing research using biobanks.

    more » « less