skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Online two‐way estimation and inference via linear mixed‐effects models
In this article, we tackle the estimation and inference problem of analyzing distributed streaming data that is collected continuously over multiple data sites. We propose an online two‐way approach via linear mixed‐effects models. We explicitly model the site‐specific effects as random‐effect terms, and tackle both between‐site heterogeneity and within‐site correlation. We develop an online updating procedure that does not need to re‐access the previous data and can efficiently update the parameter estimate, when either new data sites, or new streams of sample observations of the existing data sites, become available. We derive the non‐asymptotic error bound for our proposed online estimator, and show that it is asymptotically equivalent to the offline counterpart based on all the raw data. We compare with some key alternative solutions both analytically and numerically, and demonstrate the advantages of our proposal. We further illustrate our method with two data applications.  more » « less
Award ID(s):
2102227
PAR ID:
10376195
Author(s) / Creator(s):
 ;  
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
Statistics in Medicine
Volume:
41
Issue:
25
ISSN:
0277-6715
Page Range / eLocation ID:
p. 5113-5133
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract MotivationAlternative polyadenylation (polyA) sites near the 3′ end of a pre-mRNA create multiple mRNA transcripts with different 3′ untranslated regions (3′ UTRs). The sequence elements of a 3′ UTR are essential for many biological activities such as mRNA stability, sub-cellular localization, protein translation, protein binding and translation efficiency. Moreover, numerous studies in the literature have reported the correlation between diseases and the shortening (or lengthening) of 3′ UTRs. As alternative polyA sites are common in mammalian genes, several machine learning tools have been published for predicting polyA sites from sequence data. These tools either consider limited sequence features or use relatively old algorithms for polyA site prediction. Moreover, none of the previous tools consider RNA secondary structures as a feature to predict polyA sites. ResultsIn this paper, we propose a new deep learning model, called DeepPASTA, for predicting polyA sites from both sequence and RNA secondary structure data. The model is then extended to predict tissue-specific polyA sites. Moreover, the tool can predict the most dominant (i.e. frequently used) polyA site of a gene in a specific tissue and relative dominance when two polyA sites of the same gene are given. Our extensive experiments demonstrate that DeepPASTA signisficantly outperforms the existing tools for polyA site prediction and tissue-specific relative and absolute dominant polyA site prediction. Availability and implementationhttps://github.com/arefeen/DeepPASTA Supplementary informationSupplementary data are available at Bioinformatics online. 
    more » « less
  2. null (Ed.)
    Aromatase (CYP19) catalyzes the last biosynthetic step of estrogens in mammals and is a primary drug target for hormone-related breast cancer. However, treatment with aromatase inhibitors is often associated with adverse effects and drug resistance. In this study, we used virtual screening targeting a predicted cytochrome P450 reductase binding site on aromatase to discover four novel non-steroidal aromatase inhibitors. The inhibitors have potencies comparable to the noncompetitive tamoxifen metabolite, endoxifen. Our two most potent inhibitors, AR11 and AR13, exhibit both mixed-type and competitive-type inhibition. The cytochrome P450 reductase-CYP19 coupling interface likely acts as a transient binding site. Our modeling shows that our inhibitors bind better at different sites near the catalytic site. Our results predict the location of multiple ligand binding sites on aromatase. The combination of modeling and experimental results supports the important role of the reductase binding interface as a low affinity, promiscuous ligand binding site. Our new inhibitors may be useful as alternative chemical scaffolds that may show different adverse effects profiles than current clinically used aromatase inhibitors. 
    more » « less
  3. Abstract Recent advancements in neurotechnology enable precise spatiotemporal patterns of micros- timulations with single-cell resolution. The choice of perturbation sites must satisfy two key criteria: efficacy in evoking significant responses and selectivity for the desired target effects. This choice is currently based on laborious trial-and-error procedures, unfeasible for sequences of multi-site stimulations. Efficient methods to design complex perturbation patterns are ur- gently needed. Can we design a spatiotemporal pattern of stimulation to steer neural activity and behavior towards a desired target? We outline a method for achieving this goal in two steps. First, we identify the most effective perturbation sites, or hubs, only based on short observations of spontaneous neural activity. Second, we provide an efficient method to design multi-site stimulation patterns by combining approaches from nonlinear dynamical systems, control theory and data-driven methods. We demonstrate the feasibility of our approach using multi-site stimulation patterns in recurrent network models. 
    more » « less
  4. ABSTRACT Empirical transfer functions (ETFs) between seismic records observed at the surface and depth represent a powerful tool to estimate site effects for earthquake hazard analysis. However, conventional modeling of site amplification, with assumptions of horizontally polarized shear waves propagating vertically through 1D layered homogeneous media, often poorly predicts the ETFs, particularly, in which large lateral variations of velocity are present. Here, we test whether more accurate site effects can be obtained from theoretical transfer functions (TTFs) extracted from physics-based simulations that naturally incorporate the complex material properties. We select two well-documented downhole sites (the KiK-net site TKCH05 in Japan and the Garner Valley site, Garner Valley Downhole Array, in southern California) for our study. The 3D subsurface geometry at the two sites is estimated by means of the surface topography near the sites and information from the shear-wave profiles obtained from borehole logs. By comparing the TTFs to ETFs at the selected sites, we show how simulations using the calibrated 3D models can significantly improve site amplification estimates as compared to 1D model predictions. The primary reason for this improvement in 3D models is redirection of scattering from vertically propagating to more realistic obliquely propagating waves, which alleviates artificial amplification at nodes in the vertical-incidence response of corresponding 1D approximations, resulting in improvement of site effect estimation. The results demonstrate the importance of reliable calibration of subsurface structure and material properties in site response studies. 
    more » « less
  5. Understanding the underlying mechanisms behind protein allostery and non-additivity of substitution outcomes (i.e., epistasis) is critical when attempting to predict the functional impact of mutations, particularly at non-conserved sites. In an effort to model these two biological properties, we extend the framework of our metric to calculate dynamic coupling between residues, the Dynamic Coupling Index (DCI) to two new metrics: (i) EpiScore, which quantifies the difference between the residue fluctuation response of a functional site when two other positions are perturbed with random Brownian kicks simultaneously versus individually to capture the degree of cooperativity of these two other positions in modulating the dynamics of the functional site and (ii) DCIasym, which measures the degree of asymmetry between the residue fluctuation response of two sites when one or the other is perturbed with a random force. Applied to four independent systems, we successfully show that EpiScore and DCIasym can capture important biophysical properties in dual mutant substitution outcomes. We propose that allosteric regulation and the mechanisms underlying non-additive amino acid substitution outcomes (i.e., epistasis) can be understood as emergent properties of an anisotropic network of interactions where the inclusion of the full network of interactions is critical for accurate modeling. Consequently, mutations which drive towards a new function may require a fine balance between functional site asymmetry and strength of dynamic coupling with the functional sites. These two tools will provide mechanistic insight into both understanding and predicting the outcome of dual mutations. 
    more » « less