skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: A Relaxation Approach to Feature Selection for Linear Mixed Effects Models
Linear Mixed-Effects (LME) models are a fundamental tool for modeling correlated data, including cohort studies, longitudinal data analysis, and meta-analysis. Design and analysis of variable selection methods for LMEs is more difficult than for linear regression because LME models are nonlinear. In this article we propose a novel optimization strategy that enables a wide range of variable selection methods for LMEs using both convex and nonconvex regularizers, including 𝓁1, Adaptive-𝓁1, SCAD, and 𝓁0. The computational framework only requires the proximal operator for each regularizer to be readily computable, and the implementation is available in an open source python package pysr3, consistent with the sklearn standard. The numerical results on simulated data sets indicate that the proposed strategy improves on the state of the art for both accuracy and compute time. The variable selection techniques are also validated on a real example using a data set on bullying victimization. Supplementary materials for this article are available online.  more » « less
Award ID(s):
1908890
PAR ID:
10470298
Author(s) / Creator(s):
; ; ; ;
Corporate Creator(s):
Editor(s):
Jones, G.; Faming, L.
Publisher / Repository:
Taylor & Francis Online
Date Published:
Journal Name:
Journal of Computational and Graphical Statistics
ISSN:
1061-8600
Page Range / eLocation ID:
1 to 42
Subject(s) / Keyword(s):
feature selection, mixed effects models, nonconvex optimization
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Mapped monthly data products of surface ocean acidification indicators from 1998 to 2022 on a 0.25° by 0.25° spatial grid have been developed for eleven U.S. large marine ecosystems (LMEs). The data products were constructed using observations from the Surface Ocean CO2Atlas, co-located surface ocean properties, and two types of machine learning algorithms: Gaussian mixture models to organize LMEs into clusters of similar environmental variability and random forest regressions (RFRs) that were trained and applied within each cluster to spatiotemporally interpolate the observational data. The data products, called RFR-LMEs, have been averaged into regional timeseries to summarize the status of ocean acidification in U.S. coastal waters, showing a domain-wide carbon dioxide partial pressure increase of 1.4 ± 0.4 ÎŒatm yr−1and pH decrease of 0.0014 ± 0.0004 yr−1. RFR-LMEs have been evaluated via comparisons to discrete shipboard data, fixed timeseries, and other mapped surface ocean carbon chemistry data products. Regionally averaged timeseries of RFR-LME indicators are provided online through the NOAA National Marine Ecosystem Status web portal. 
    more » « less
  2. Abstract Gallium‐based liquid metal alloys (GaLMAs) have widespread applications ranging from soft electronics, energy devices, and catalysis. GaLMAs can be transformed into liquid metal emulsions (LMEs) to modify their rheology for facile patterning, processing, and material integration for GaLMA‐based device fabrication. One drawback of using LMEs is reduced electrical conductivity owing to the oxides that form on the surface of dispersed liquid metal droplets. LMEs thus need to be activated by coalescing liquid metal droplets into an electrically conductive network, which usually involves techniques that subject the LME to harsh conditions. This study presents a way to coalesce these droplets through a chemical reaction at mild temperatures (T∌ 80 °C). Chemical activation is enabled by adding halide compounds into the emulsion that chemically etch the oxide skin on the surface of dispersed droplets of eutectic gallium indium (eGaIn). LMEs synthesized with halide activators can achieve electrical conductivities close to bulk liquid metal (2.4 × 104S cm−1) after being heated. 3D printable chemically coalescing LME ink formulations are optimized by systematically exploring halide activator type and concentration, along with mixing conditions, while maximizing for electrical conductivity, shape retention, and compatibility with direct ink writing (DIW). The utility of this ink is demonstrated in a hybrid 3D printing process to create a battery‐integrated light emitting diode array, followed by a nondestructive low temperature heat activation that produces a functional device. 
    more » « less
  3. We study the Bayesian multi-task variable selection problem, where the goal is to select activated variables for multiple related data sets simultaneously. We propose a new variational Bayes algorithm which generalizes and improves the recently developed “sum of single effects” model of Wang et al. (2020a). Motivated by differential gene network analysis in biology, we further extend our method to joint structure learning of multiple directed acyclic graphical models, a problem known to be computationally highly challenging. We propose a novel order MCMC sampler where our multi-task variable selection algorithm is used to quickly evaluate the posterior probability of each ordering. Both simulation studies and real gene expression data analysis are conducted to show the efficiency of our method. Finally, we also prove a posterior consistency result for multi-task variable selection, which provides a theoretical guarantee for the proposed algorithms. Supplementary materials for this article are available online. 
    more » « less
  4. Evolutionarily stable strategy (ESS) analysis pioneered by Maynard Smith and Price took off in part because it often does not require explicit assumptions about the genetics and demography of a population in contrast to population genetic models. Though this simplicity is useful, it obscures the degree to which ESS analysis applies to populations with more realistic genetics and demography: for example, how does ESS analysis handle complexities such as kin selection, group selection and variable environments when phenotypes are affected by multiple genes? In this paper, I review the history of the ESS concept and show how early uncertainty about the method lead to important mathematical theory linking ESS analysis to general population genetic models. I use this theory to emphasize the link between ESS analysis and the concept of invasion fitness . I give examples of how invasion fitness can measure kin selection, group selection and the evolution of linked modifier genes in response to variable environments. The ESSs in these examples depend crucially on demographic and genetic parameters, which highlights how ESS analysis will continue to be an important tool in understanding evolutionary patterns as new models address the increasing abundance of genetic and long-term demographic data in natural populations. This article is part of the theme issue ‘Half a century of evolutionary games: a synthesis of theory, application and future directions’. 
    more » « less
  5. Abstract Many recent studies have explored remote sensing approaches to facilitate non‐destructive sampling of aboveground biomass (AGB). Lidar platforms (e.g., iPhone and iPad PRO models) have recently made remote sensing technologies widely available and present an alternative to traditional approaches for estimating AGB. Lidar approaches can be completed within a fraction of the time required by many analog methods. However, it is unknown if handheld sensors are capable of accurately predicting AGB or how different modeling techniques affect prediction accuracy. Here, we collected AGB from 0.25‐m2plots (N = 45) from three sites along an elevational gradient within rangelands surrounding Flagstaff, Arizona, USA. Each plot was scanned with a mobile laser scanner (MLS) and iPad before plants were clipped, dried, and weighed. We compared the capability of iPad and MLS sensors to estimate AGB via minimization of model normalized root mean square error (NRMSE). This process was performed on predictor subsets describing structural, spectral, and field‐based characteristics across a suite of modeling approaches including simple linear, stepwise, lasso, and random forest regression. We found that models developed from MLS and iPad data were equally capable of predicting AGB (NRMSE 26.6% and 29.3%, respectively) regardless of the variable subsets considered. We also found that stepwise regression regularly resulted in the lowest NRMSE. Structural variables were consistently selected during each modeling approach, while spectral variables were rarely included. Field‐based variables were important in linear regression models but were not included after variable selection within random forest models. These findings support the notion that remote sensing techniques offer a valid alternative to analog field‐based data collection methods. Together, our results demonstrate that data collected using a more widely available platform will perform similarly to a more costly option and outline a workflow for modeling AGB using remote sensing systems alone. 
    more » « less