Systems with both quantitative and qualitative responses are widely encountered in many applications. Design of experiment methods are needed when experiments are conducted to study such systems. Classic experimental design methods are unsuitable here because they often focus on one type of response. In this paper, we develop a Bayesian D-optimal design method for experiments with one continuous and one binary response. Both noninformative and conjugate informative prior distributions on the unknown parameters are considered. The proposed design criterion has meaningful interpretations regarding the D-optimality for the models for both types of responses. An efficient point-exchange search algorithm is developed to construct the local D-optimal designs for given parameter values. Global D-optimal designs are obtained by accumulating the frequencies of the design points in local D-optimal designs, where the parameters are sampled from the prior distributions. The performances of the proposed methods are evaluated through two examples. 
                        more » 
                        « less   
                    This content will become publicly available on January 13, 2026
                            
                            Bootstrap aggregated designs for generalized linear models
                        
                    
    
            Many experiments require modeling a non-Normal response. In particular, count responses and binary responses are quite common. The relationship between predictors and the responses are typically modeled via a Generalized Linear Model (GLM). Finding D-optimal designs for GLMs, which reduce the generalized variance of the model coefficients, is desired. A common approach to finding optimal designs for GLMs is to use a local design, but local designs are vulnerableto parameter misspecification. The focus of this paper is to provide designs for GLMs that are robust to parameter misspecification. This is done by applying a bagging procedure to pilot data, where the results of many locally optimal designsare aggregated to produce an approximate design that reflects the uncertainty in the model coefficients. Results show that the proposed bagging procedure is robust to changes in the underlying model parameters. Furthermore, the proposed designs are shown to be preferable to traditional methods, which may be over-conservative. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 2304767
- PAR ID:
- 10630838
- Publisher / Repository:
- International Association for Statistical Computing
- Date Published:
- Journal Name:
- Journal of Data Science, Statistics, and Visualisation
- Volume:
- 5
- Issue:
- 1
- ISSN:
- 2773-0689
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            null (Ed.)Background Ecological communities tend to be spatially structured due to environmental gradients and/or spatially contagious processes such as growth, dispersion and species interactions. Data transformation followed by usage of algorithms such as Redundancy Analysis (RDA) is a fairly common approach in studies searching for spatial structure in ecological communities, despite recent suggestions advocating the use of Generalized Linear Models (GLMs). Here, we compared the performance of GLMs and RDA in describing spatial structure in ecological community composition data. We simulated realistic presence/absence data typical of many β -diversity studies. For model selection we used standard methods commonly used in most studies involving RDA and GLMs. Methods We simulated communities with known spatial structure, based on three real spatial community presence/absence datasets (one terrestrial, one marine and one freshwater). We used spatial eigenvectors as explanatory variables. We varied the number of non-zero coefficients of the spatial variables, and the spatial scales with which these coefficients were associated and then compared the performance of GLMs and RDA frameworks to correctly retrieve the spatial patterns contained in the simulated communities. We used two different methods for model selection, Forward Selection (FW) for RDA and the Akaike Information Criterion (AIC) for GLMs. The performance of each method was assessed by scoring overall accuracy as the proportion of variables whose inclusion/exclusion status was correct, and by distinguishing which kind of error was observed for each method. We also assessed whether errors in variable selection could affect the interpretation of spatial structure. Results Overall GLM with AIC-based model selection (GLM/AIC) performed better than RDA/FW in selecting spatial explanatory variables, although under some simulations the methods performed similarly. In general, RDA/FW performed unpredictably, often retaining too many explanatory variables and selecting variables associated with incorrect spatial scales. The spatial scale of the pattern had a negligible effect on GLM/AIC performance but consistently affected RDA’s error rates under almost all scenarios. Conclusion We encourage the use of GLM/AIC for studies searching for spatial drivers of species presence/absence patterns, since this framework outperformed RDA/FW in situations most likely to be found in natural communities. It is likely that such recommendations might extend to other types of explanatory variables.more » « less
- 
            Abstract We propose a new procedure for inference on optimal treatment regimes in the model‐free setting, which does not require to specify an outcome regression model. Existing model‐free estimators for optimal treatment regimes are usually not suitable for the purpose of inference, because they either have nonstandard asymptotic distributions or do not necessarily guarantee consistent estimation of the parameter indexing the Bayes rule due to the use of surrogate loss. We first study a smoothed robust estimator that directly targets the parameter corresponding to the Bayes decision rule for optimal treatment regimes estimation. This estimator is shown to have an asymptotic normal distribution. Furthermore, we verify that a resampling procedure provides asymptotically accurate inference for both the parameter indexing the optimal treatment regime and the optimal value function. A new algorithm is developed to calculate the proposed estimator with substantially improved speed and stability. Numerical results demonstrate the satisfactory performance of the new methods.more » « less
- 
            Evans, Robin J.; Shpitser, Ilya (Ed.)We study linear bandits when the underlying reward function is not linear. Existing work relies on a uniform misspecification parameter $$\epsilon$$ that measures the sup-norm error of the best linear approximation. This results in an unavoidable linear regret whenever $$\epsilon > 0$$. We describe a more natural model of misspecification which only requires the approximation error at each input $$x$$ to be proportional to the suboptimality gap at $$x$$. It captures the intuition that, for optimization problems, near-optimal regions should matter more and we can tolerate larger approximation errors in suboptimal regions. Quite surprisingly, we show that the classical LinUCB algorithm — designed for the realizable case — is automatically robust against such gap-adjusted misspecification. It achieves a near-optimal $$\sqrt{T}$$ regret for problems that the best-known regret is almost linear in time horizon $$T$$. Technically, our proof relies on a novel self-bounding argument that bounds the part of the regret due to misspecification by the regret itself.more » « less
- 
            In this paper, we propose a Spatial Robust Mixture Regression model to investigate the relationship between a response variable and a set of explanatory variables over the spatial domain, assuming that the relationships may exhibit complex spatially dynamic patterns that cannot be captured by constant regression coefficients. Our method integrates the robust finite mixture Gaussian regression model with spatial constraints, to simultaneously handle the spatial non-stationarity, local homogeneity, and outlier contaminations. Compared with existing spatial regression models, our proposed model assumes the existence a few distinct regression models that are estimated based on observations that exhibit similar response-predictor relationships. As such, the proposed model not only accounts for non-stationarity in the spatial trend, but also clusters observations into a few distinct and homogenous groups. This provides an advantage on interpretation with a few stationary sub-processes identified that capture the predominant relationships between response and predictor variables. Moreover, the proposed method incorporates robust procedures to handle contaminations from both regression outliers and spatial outliers. By doing so, we robustly segment the spatial domain into distinct local regions with similar regression coefficients, and sporadic locations that are purely outliers. Rigorous statistical hypothesis testing procedure has been designed to test the significance of such segmentation. Experimental results on many synthetic and real-world datasets demonstrate the robustness, accuracy, and effectiveness of our proposed method, compared with other robust finite mixture regression, spatial regression and spatial segmentation methods.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
