skip to main content


The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00PM ET on Friday, December 15 until 2:00 AM ET on Saturday, December 16 due to maintenance. We apologize for the inconvenience.

Search for: All records

Creators/Authors contains: "Xiao, Li"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. This article proposes a set of categories, each one representing a particular distillation of important statistical ideas. Each category is labeled a “sense” because we think of these as essential in helping every statistical mind connect in constructive and insightful ways with statistical theory, methodologies, and computation, toward the ultimate goal of building statistical phronesis. The illustration of each sense with statistical principles and methods provides a sensical tour of the conceptual landscape of statistics, as a leading discipline in the data science ecosystem. Expected final online publication date for the Annual Review of Statistics and Its Application, Volume 10 is March 2023. Please see for revised estimates. 
    more » « less
    Free, publicly-accessible full text available March 7, 2024
  2. This article expands upon my presentation to the panel on “The Radical Prescription for Change” at the 2017 ASA (American Statistical Association) symposium on A World Beyond $p<0.05$. It emphasizes that, to greatly enhance the reliability of—and hence public trust in—statistical and data scientific findings, we need to take a holistic approach. We need to lead by example, incentivize study quality, and inoculate future generations with profound appreciations for the world of uncertainty and the uncertainty world. The four “radical” proposals in the title—with all their inherent defects and trade-offs—are designed to provoke reactions and actions. First, research methodologies are trustworthy only if they deliver what they promise, even if this means that they have to be overly protective, a necessary trade-off for practicing quality-guaranteed statistics. This guiding principle may compel us to doubling variance in some situations, a strategy that also coincides with the call to raise the bar from $p<0.05$ to $p<0.005$ [3]. Second, teaching principled practicality or corner-cutting is a promising strategy to enhance the scientific community’s as well as the general public’s ability to spot—and hence to deter—flawed arguments or findings. A remarkable quick-and-dirty Bayes formula for rare events, which simply divides the prevalence by the sum of the prevalence and the false positive rate (or the total error rate), as featured by the popular radio show Car Talk, illustrates the effectiveness of this strategy. Third, it should be a routine mental exercise to put ourselves in the shoes of those who would be affected by our research finding, in order to combat the tendency of rushing to conclusions or overstating confidence in our findings. A pufferfish/selfish test can serve as an effective reminder, and can help to institute the mantra “Thou shalt not sell what thou refuseth to buy” as the most basic professional decency. Considering personal stakes in our statistical endeavors also points to the concept of behavioral statistics, in the spirit of behavioral economics. Fourth, the current mathematical education paradigm that puts “deterministic first, stochastic second” is likely responsible for the general difficulties with reasoning under uncertainty, a situation that can be improved by introducing the concept of histogram, or rather kidstogram, as early as the concept of counting. 
    more » « less
  3. Chaudhuri, Kamalika and (Ed.)
    Spike-and-slab priors are commonly used for Bayesian variable selection, due to their interpretability and favorable statistical properties. However, existing samplers for spike-and-slab posteriors incur prohibitive computational costs when the number of variables is large. In this article, we propose Scalable Spike-and-Slab (S^3), a scalable Gibbs sampling implementation for high-dimensional Bayesian regression with the continuous spike-and-slab prior of George & McCulloch (1993). For a dataset with n observations and p covariates, S^3 has order max{n^2 p_t, np} computational cost at iteration t where p_t never exceeds the number of covariates switching spike-and-slab states between iterations t and t-1 of the Markov chain. This improves upon the order n^2 p per-iteration cost of state-of-the-art implementations as, typically, p_t is substantially smaller than p. We apply S^3 on synthetic and real-world datasets, demonstrating orders of magnitude speed-ups over existing exact samplers and significant gains in inferential quality over approximate samplers with comparable cost. 
    more » « less