skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Friday, July 11 until 2:00 AM ET on Saturday, July 12 due to maintenance. We apologize for the inconvenience.


Title: Online two‐way estimation and inference via linear mixed‐effects models
In this article, we tackle the estimation and inference problem of analyzing distributed streaming data that is collected continuously over multiple data sites. We propose an online two‐way approach via linear mixed‐effects models. We explicitly model the site‐specific effects as random‐effect terms, and tackle both between‐site heterogeneity and within‐site correlation. We develop an online updating procedure that does not need to re‐access the previous data and can efficiently update the parameter estimate, when either new data sites, or new streams of sample observations of the existing data sites, become available. We derive the non‐asymptotic error bound for our proposed online estimator, and show that it is asymptotically equivalent to the offline counterpart based on all the raw data. We compare with some key alternative solutions both analytically and numerically, and demonstrate the advantages of our proposal. We further illustrate our method with two data applications.  more » « less
Award ID(s):
2102227
PAR ID:
10376195
Author(s) / Creator(s):
 ;  
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
Statistics in Medicine
Volume:
41
Issue:
25
ISSN:
0277-6715
Page Range / eLocation ID:
p. 5113-5133
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract MotivationAlternative polyadenylation (polyA) sites near the 3′ end of a pre-mRNA create multiple mRNA transcripts with different 3′ untranslated regions (3′ UTRs). The sequence elements of a 3′ UTR are essential for many biological activities such as mRNA stability, sub-cellular localization, protein translation, protein binding and translation efficiency. Moreover, numerous studies in the literature have reported the correlation between diseases and the shortening (or lengthening) of 3′ UTRs. As alternative polyA sites are common in mammalian genes, several machine learning tools have been published for predicting polyA sites from sequence data. These tools either consider limited sequence features or use relatively old algorithms for polyA site prediction. Moreover, none of the previous tools consider RNA secondary structures as a feature to predict polyA sites. ResultsIn this paper, we propose a new deep learning model, called DeepPASTA, for predicting polyA sites from both sequence and RNA secondary structure data. The model is then extended to predict tissue-specific polyA sites. Moreover, the tool can predict the most dominant (i.e. frequently used) polyA site of a gene in a specific tissue and relative dominance when two polyA sites of the same gene are given. Our extensive experiments demonstrate that DeepPASTA signisficantly outperforms the existing tools for polyA site prediction and tissue-specific relative and absolute dominant polyA site prediction. Availability and implementationhttps://github.com/arefeen/DeepPASTA Supplementary informationSupplementary data are available at Bioinformatics online. 
    more » « less
  2. null (Ed.)
    Aromatase (CYP19) catalyzes the last biosynthetic step of estrogens in mammals and is a primary drug target for hormone-related breast cancer. However, treatment with aromatase inhibitors is often associated with adverse effects and drug resistance. In this study, we used virtual screening targeting a predicted cytochrome P450 reductase binding site on aromatase to discover four novel non-steroidal aromatase inhibitors. The inhibitors have potencies comparable to the noncompetitive tamoxifen metabolite, endoxifen. Our two most potent inhibitors, AR11 and AR13, exhibit both mixed-type and competitive-type inhibition. The cytochrome P450 reductase-CYP19 coupling interface likely acts as a transient binding site. Our modeling shows that our inhibitors bind better at different sites near the catalytic site. Our results predict the location of multiple ligand binding sites on aromatase. The combination of modeling and experimental results supports the important role of the reductase binding interface as a low affinity, promiscuous ligand binding site. Our new inhibitors may be useful as alternative chemical scaffolds that may show different adverse effects profiles than current clinically used aromatase inhibitors. 
    more » « less
  3. ABSTRACT Empirical transfer functions (ETFs) between seismic records observed at the surface and depth represent a powerful tool to estimate site effects for earthquake hazard analysis. However, conventional modeling of site amplification, with assumptions of horizontally polarized shear waves propagating vertically through 1D layered homogeneous media, often poorly predicts the ETFs, particularly, in which large lateral variations of velocity are present. Here, we test whether more accurate site effects can be obtained from theoretical transfer functions (TTFs) extracted from physics-based simulations that naturally incorporate the complex material properties. We select two well-documented downhole sites (the KiK-net site TKCH05 in Japan and the Garner Valley site, Garner Valley Downhole Array, in southern California) for our study. The 3D subsurface geometry at the two sites is estimated by means of the surface topography near the sites and information from the shear-wave profiles obtained from borehole logs. By comparing the TTFs to ETFs at the selected sites, we show how simulations using the calibrated 3D models can significantly improve site amplification estimates as compared to 1D model predictions. The primary reason for this improvement in 3D models is redirection of scattering from vertically propagating to more realistic obliquely propagating waves, which alleviates artificial amplification at nodes in the vertical-incidence response of corresponding 1D approximations, resulting in improvement of site effect estimation. The results demonstrate the importance of reliable calibration of subsurface structure and material properties in site response studies. 
    more » « less
  4. Abstract Sedimentary pyrite records are essential for reconstructing paleoenvironmental conditions, but these records may be affected by seasonal fluctuations in oxygen concentration and temperature, which can impact bioturbation, sulfide fluxes, and distributions of sulfide oxidizing microbes (SOMs). To investigate how seasonal oxygen stress influences surficial (<2 cm) pyrite formation, we measured time‐series concentrations and sulfur isotope (δ34S) compositions of pyrite sulfur along with those of potential precursor compounds at a bioturbated shoal site and an oxygen‐deficient channel site in Chesapeake Bay. We also measured radioisotope depth profiles to estimate sedimentation rates and bioturbation intensities. Results show that net pyrite precipitation was restricted to summer and early autumn at both sites. Pyrite concentration was higher and apparently more responsive to precursor compound concentration at the mildly bioturbated site than at the non‐bioturbated site. This disparity may be driven by differences in the dominant SOM communities between the two sites. Despite this, the sites' similar pyrite δ34S values imply that changes in SOM communities have limited effects on surficial pyrite δ34S values here. However, we found that pyrite δ34S values are consistently and anomalously lower than coeval precursor compounds at both sites. A steady‐state model demonstrates that equilibrium position‐specific isotope fractionation (PSIF) effects in the S8‐polysulfide pool can create a 4.3–7.3‰ gap between δ34S values of pyrite and zero‐valent sulfur. This study suggests that SOM communities may have distinct effects on pyrite accumulation in seasonally dynamic systems, and that PSIF in the polysulfide pool may leave an imprint in pyrite isotope records. 
    more » « less
  5. Understanding the underlying mechanisms behind protein allostery and non-additivity of substitution outcomes (i.e., epistasis) is critical when attempting to predict the functional impact of mutations, particularly at non-conserved sites. In an effort to model these two biological properties, we extend the framework of our metric to calculate dynamic coupling between residues, the Dynamic Coupling Index (DCI) to two new metrics: (i) EpiScore, which quantifies the difference between the residue fluctuation response of a functional site when two other positions are perturbed with random Brownian kicks simultaneously versus individually to capture the degree of cooperativity of these two other positions in modulating the dynamics of the functional site and (ii) DCIasym, which measures the degree of asymmetry between the residue fluctuation response of two sites when one or the other is perturbed with a random force. Applied to four independent systems, we successfully show that EpiScore and DCIasym can capture important biophysical properties in dual mutant substitution outcomes. We propose that allosteric regulation and the mechanisms underlying non-additive amino acid substitution outcomes (i.e., epistasis) can be understood as emergent properties of an anisotropic network of interactions where the inclusion of the full network of interactions is critical for accurate modeling. Consequently, mutations which drive towards a new function may require a fine balance between functional site asymmetry and strength of dynamic coupling with the functional sites. These two tools will provide mechanistic insight into both understanding and predicting the outcome of dual mutations. 
    more » « less