skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Confidence regions for the location of response surface optima: the R package OptimaRegion
Statistical inference on the location of the optima (global maxima or minima) is one of the main goals in the area of Response Surface Methodology, with many applications in engineering and science. While there exist previous methods for computing confidence regions on the location of optima, these are for linear models based on a Normal distribution assumption, and do not address specifically the difficulties associated with guaranteeing global optimality. This paper describes distribution-free methods for the computation of confidence regions on the location of the global optima of response surface models. The methods are based on bootstrapping and Tukey's data depth, and therefore their performance does not rely on distributional assumptions about the errors affecting the response. An R language implementation, the package \code{OptimaRegion}, is described. Both parametric (quadratic and cubic polynomials in up to 5 covariates) and nonparametric models (thin plate splines in 2 covariates) are supported. A coverage analysis is presented demonstrating the quality of the regions found. The package also contains an R implementation of the Gloptipoly algorithm for the global optimization of polynomial responses subject to bounds.  more » « less
Award ID(s):
1634878
PAR ID:
10029358
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Communications in Statistics - Simulation and Computation
ISSN:
0361-0918
Page Range / eLocation ID:
1 to 21
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Mireji, Paul O (Ed.)
    Mosquito vectors of pathogens (e.g.,Aedes,Anopheles, andCulexspp. which transmit dengue, Zika, chikungunya, West Nile, malaria, and others) are of increasing concern for global public health. These vectors are geographically shifting under climate and other anthropogenic changes. As small-bodied ectotherms, mosquitoes are strongly affected by temperature, which causes unimodal responses in mosquito life history traits (e.g., biting rate, adult mortality rate, mosquito development rate, and probability of egg-to-adult survival) that exhibit upper and lower thermal limits and intermediate thermal optima in laboratory studies. However, it remains unknown how mosquito thermal responses measured in laboratory experiments relate to the realized thermal responses of mosquitoes in the field. To address this gap, we leverage thousands of global mosquito occurrences and geospatial satellite data at high spatial resolution to construct machine-learning based species distribution models, from which vector thermal responses are estimated. We apply methods to restrict models to the relevant mosquito activity season and to conduct ecologically plausible spatial background sampling centered around ecoregions for comparison to mosquito occurrence records. We found that thermal minima estimated from laboratory studies were highly correlated with those from the species distributions (r = 0.87). The thermal optima were less strongly correlated (r = 0.69). For most species, we did not detect thermal maxima from their observed distributions so were unable to compare to laboratory-based estimates. The results suggest that laboratory studies have the potential to be highly transportable to predicting lower thermal limits and thermal optima of mosquitoes in the field. At the same time, lab-based models likely capture physiological limits on mosquito persistence at high temperatures that are not apparent from field-based observational studies but may critically determine mosquito responses to climate warming. Our results indicate that lab-based and field-based studies are highly complementary; performing the analyses in concert can help to more comprehensively understand vector response to climate change. 
    more » « less
  2. Abstract In land surface models (LSMs), the hydraulic properties of the subsurface are commonly estimated according to the texture of soils at the Earth's surface. This approach ignores macropores, fracture flow, heterogeneity, and the effects of variable distribution of water in the subsurface oneffectivewatershed‐scale hydraulic variables. Using hydrograph recession analysis, we empirically constrain estimates of watershed‐scale effective hydraulic conductivities (K) and effective drainable aquifer storages (S) of all reference watersheds in the conterminous United States for which sufficient streamflow data are available (n = 1,561). Then, we use machine learning methods to model these properties across the entire conterminous United States. Model validation results in high confidence for estimates of log(K) (r2 > 0.89; 1% < bias < 9%) and reasonable confidence forS(r2 > 0.83; −70% < bias < −18%). Our estimates of effectiveKare, on average, two orders of magnitude higher than comparable soil‐texture‐based estimates of averageK, confirming the importance of soil structure and preferential flow pathways at the watershed scale. Our estimates of effectiveScompare favorably with recent global estimates of mobile groundwater and are spatially heterogeneous (5–3,355 mm). Because estimates ofSare much lower than the global maximums generally used in LSMs (e.g., 5,000 mm in Noah‐MP), they may serve both to limit model spin‐up time and to constrain model parameters to more realistic values. These results represent the first attempt to constrain estimates of watershed‐scale effective hydraulic variables that are necessary for the implementation of LSMs for the entire conterminous United States. 
    more » « less
  3. We consider inference for the parameters of a linear model when the covariates are random and the relationship between response and covariates is possibly non-linear. Conventional inference methods such as z intervals perform poorly in these cases. We propose a double bootstrap-based calibrated percentile method, perc-cal, as a general-purpose CI method which performs very well relative to alternative methods in challenging situations such as these. The superior performance of perc-cal is demonstrated by a thorough, full-factorial design synthetic data study as well as a data example involving the length of criminal sentences. We also provide theoretical justification for the perc-cal method under mild conditions. The method is implemented in the R package "perccal", available through CRAN and coded primarily in C++, to make it easier for practitioners to use. 
    more » « less
  4. Abstract Adaptive multiple testing with covariates is an important research direction that has gained major attention in recent years. It has been widely recognised that leveraging side information provided by auxiliary covariates can improve the power of false discovery rate (FDR) procedures. Currently, most such procedures are devised with p-values as their main statistics. However, for two-sided hypotheses, the usual data processing step that transforms the primary statistics, known as p-values, into p-values not only leads to a loss of information carried by the main statistics, but can also undermine the ability of the covariates to assist with the FDR inference. We develop a p-value based covariate-adaptive (ZAP) methodology that operates on the intact structural information encoded jointly by the p-values and covariates. It seeks to emulate the oracle p-value procedure via a working model, and its rejection regions significantly depart from those of the p-value adaptive testing approaches. The key strength of ZAP is that the FDR control is guaranteed with minimal assumptions, even when the working model is misspecified. We demonstrate the state-of-the-art performance of ZAP using both simulated and real data, which shows that the efficiency gain can be substantial in comparison with p-value-based methods. Our methodology is implemented in the R package zap. 
    more » « less
  5. Sun, Xiaoyong (Ed.)
    Convolutional neural network (CNN)-based deep learning (DL) methods have transformed the analysis of geospatial, Earth observation, and geophysical data due to their ability to model spatial context information at multiple scales. Such methods are especially applicable to pixel-level classification or semantic segmentation tasks. A variety of R packages have been developed for processing and analyzing geospatial data. However, there are currently no packages available for implementing geospatial DL in the R language and data science environment. This paper introduces the geodl R package, which supports pixel-level classification applied to a wide range of geospatial or Earth science data that can be represented as multidimensional arrays where each channel or band holds a predictor variable. geodl is built on the torch package, which supports the implementation of DL using the R and C++ languages without the need for installing a Python/PyTorch environment. This greatly simplifies the software environment needed to implement DL in R. Using geodl, geospatial raster-based data with varying numbers of bands, spatial resolutions, and coordinate reference systems are read and processed using the terra package, which makes use of C++ and allows for processing raster grids that are too large to fit into memory. Training loops are implemented with the luz package. The geodl package provides utility functions for creating raster masks or labels from vector-based geospatial data and image chips and associated masks from larger files and extents. It also defines a torch dataset subclass for geospatial data for use with torch dataloaders. UNet-based models are provided with a variety of optional ancillary modules or modifications. Common assessment metrics (i.e., overall accuracy, class-level recalls or producer’s accuracies, class-level precisions or user’s accuracies, and class-level F1-scores) are implemented along with a modified version of the unified focal loss framework, which allows for defining a variety of loss metrics using one consistent implementation and set of hyperparameters. Users can assess models using standard geospatial and remote sensing metrics and methods and use trained models to predict to large spatial extents. This paper introduces the geodl workflow, design philosophy, and goals for future development. 
    more » « less