    The advancement of high-throughput technology characterizes a wide variety of epigenetic modifications and noncoding RNAs across the genome involved in disease pathogenesis via regulating gene expression. The high dimensionality of both epigenetic/noncoding RNA and gene expression data make it challenging to identify the important regulators of genes. Conducting univariate test for each possible regulator–gene pair is subject to serious multiple comparison burden, and direct application of regularization methods to select regulator–gene pairs is computationally infeasible. Applying fast screening to reduce dimension first before regularization is more efficient and stable than applying regularization methods alone.


    We propose a novel screening method based on robust partial correlation to detect epigenetic and noncoding RNA regulators of gene expression over the whole genome, a problem that includes both high-dimensional predictors and high-dimensional responses. Compared to existing screening methods, our method is conceptually innovative that it reduces the dimension of both predictor and response, and screens at both node (regulators or genes) and edge (regulator–gene pairs) levels. We develop data-driven procedures to determine the conditional sets and the optimal screening threshold, and implement a fast iterative algorithm. Simulations and applications to long noncoding RNA and microRNA regulation in Kidney cancer and DNA methylation regulationmore »in Glioblastoma Multiforme illustrate the validity and advantage of our method.

  3. Abstract Detection of prognostic factors associated with patients’ survival outcome helps gain insights into a disease and guide treatment decisions. The rapid advancement of high-throughput technologies has yielded plentiful genomic biomarkers as candidate prognostic factors, but most are of limited use in clinical application. As the price of the technology drops over time, many genomic studies are conducted to explore a common scientific question in different cohorts to identify more reproducible and credible biomarkers. However, new challenges arise from heterogeneity in study populations and designs when jointly analyzing the multiple studies. For example, patients from different cohorts show different demographic characteristics and risk profiles. Existing high-dimensional variable selection methods for survival analysis, however, are restricted to single study analysis. We propose a novel Cox model based two-stage variable selection method called “Cox-TOTEM” to detect survival-associated biomarkers common in multiple genomic studies. Simulations showed our method greatly improved the sensitivity of variable selection as compared to the separate applications of existing methods to each study, especially when the signals are weak or when the studies are heterogeneous. An application of our method to TCGA transcriptomic data identified essential survival associated genes related to the common disease mechanism of five Pan-Gynecologic cancers.
  4. Streams in the southeastern United States Coastal Plains serve as an essential source of energy and nutrients for important estuarine ecosystems, and dissolved organic matter (DOM) exported from these streams can have profound impacts on the biogeochemical and ecological functions of fluvial networks. Here, we examined hydrological and temperature controls of DOM during low-flow periods from a forested stream located within the Coastal Plain physiographic region of Alabama, USA. We analyzed DOM via combining dissolved organic carbon (DOC) analysis, fluorescence excitation–emission matrix combined with parallel factor analysis (EEM-PARAFAC), and microbial degradation experiments. Four fluorescence components were identified: terrestrial humic-like DOM, microbial humic-like DOM, tyrosine-like DOM, and tryptophan-like DOM. Humic-like DOM accounted for ~70% of total fluorescence, and biodegradation experiments showed that it was less bioreactive than protein-like DOM that accounted for ~30% of total fluorescence. This observation indicates fluorescent DOM (FDOM) was controlled primarily by soil inputs and not substantially influenced by instream production and processing, suggesting that the bulk of FDOM in these streams is transported to downstream environments with limited in situ modification. Linear regression and redundancy analysis models identified that the seasonal variations in DOM were dictated primarily by hydrology and temperature. Overall, high discharge and shallowmore »flow paths led to the enrichment of less-degraded DOM with higher percentages of microbial humic-like and tyrosine-like compounds, whereas high temperatures favored the accumulation of high-aromaticity, high-molecular-weight, terrestrial, humic-like compounds in stream water. The flux of DOC and four fluorescence components was driven primarily by water discharge. Thus, the instantaneous exports of both refractory humic-like DOM and reactive protein-like DOM were higher in wetter seasons (winter and spring). As high temperatures and severe precipitation are projected to become more prominent in the southeastern U.S. due to climate change, our findings have important implications for future changes in the amount, source, and composition of DOM in Coastal Plain streams and the associated impacts on downstream carbon and nutrient supplies and water quality.« less
