skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Insights into Criteria for Statistical Significance from Signal Detection Analysis
What is best criterion for determining statistical significance? In psychology, the criterion has been p < .05. This criterion has been criticized since its inception, and the criticisms have been rejuvenated with recent failures to replicate studies published in top psychology journals. Several replacement criteria have been suggested including reducing the alpha level to .005 or switching to other types of criteria such as Bayes factors or effect sizes. Here, various decision criteria for statistical significance were evaluated using signal detection analysis on the outcomes of simulated data. The signal detection measure of area under the curve (AUC) is a measure of discriminability with a value of 1 indicating perfect discriminability and 0.5 indicating chance performance. Applied to criteria for statistical significance, it provides an estimate of the decision criterion’s performance in discriminating real effects from null effects. AUCs were high (M = .96, median = .97) for p values, suggesting merit in using p values to discriminate significant effects. AUCs can be used to assess methodological questions such as how much improvement will be gained with increased sample size, how much discriminability will be lost with questionable research practices, and whether it is better to run a single high-powered study or a study plus a replication at lower powers. AUCs were also used to compare performance across p values, Bayes factors, and effect size (Cohen’s d). AUCs were equivalent for p values and Bayes factors and were slightly higher for effect size. Signal detection analysis provides separate measures of discriminability and bias. With respect to bias, the specific thresholds that produced maximally-optimal utility depended on sample size, although this dependency was particularly notable for p values and less so for Bayes factors. The application of signal detection theory to the issue of statistical significance highlights the need to focus on both false alarms and misses, rather than false alarms alone.  more » « less
Award ID(s):
1632222
PAR ID:
10107686
Author(s) / Creator(s):
Date Published:
Journal Name:
Meta-Psychology
Volume:
3
ISSN:
2003-2714
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract We develop alternative families of Bayes factors for use in hypothesis tests as alternatives to the popular default Bayes factors. The alternative Bayes factors are derived for the statistical analyses most commonly used in psychological research – one-sample and two-samplet tests, regression, and ANOVA analyses. They possess the same desirable theoretical and practical properties as the default Bayes factors and satisfy additional theoretical desiderata while mitigating against two features of the default priors that we consider implausible. They can be conveniently computed via an R package that we provide. Furthermore, hypothesis tests based on Bayes factors and those based on significance tests are juxtaposed. This discussion leads to the insight that default Bayes factors as well as the alternative Bayes factors are equivalent to test-statistic-based Bayes factors as proposed by Johnson.Journal of the Royal Statistical Society Series B: Statistical Methodology,67, 689–701. (2005). We highlight test-statistic-based Bayes factors as a general approach to Bayes-factor computation that is applicable to many hypothesis-testing problems for which an effect-size measure has been proposed and for which test power can be computed. 
    more » « less
  2. This study investigated uniform differential item functioning (DIF) detection in response times. We proposed a regression analysis approach with both the working speed and the group membership as independent variables, and logarithm transformed response times as the dependent variable. Effect size measures such as Δ[Formula: see text] and percentage change in regression coefficients in conjunction with the statistical significance tests were used to flag DIF items. A simulation study was conducted to assess the performance of three DIF detection criteria: (a) significance test, (b) significance test with Δ[Formula: see text], and (c) significance test with the percentage change in regression coefficients. The simulation study considered factors such as sample sizes, proportion of the focal group in relation to total sample size, number of DIF items, and the amount of DIF. The results showed that the significance test alone was too strict; using the percentage change in regression coefficients as an effect size measure reduced the flagging rate when the sample size was large, but the effect was inconsistent across different conditions; using Δ R2with significance test reduced the flagging rate and was fairly consistent. The PISA 2018 data were used to illustrate the performance of the proposed method in a real dataset. Furthermore, we provide guidelines for conducting DIF studies with response time. 
    more » « less
  3. Abstract The gun embodiment effect is the consequence caused by wielding a gun on judgments of whether others are also holding a gun. This effect could be responsible for real-world instances when police officers shoot an unarmed person because of the misperception that the person had a gun. The gun embodiment effect is an instance of embodied cognition for which a person’s tool-augmented body affects their judgments. The replication crisis in psychology has raised concern about embodied cognition effects in particular, and the issue of low statistical power applies to the original research on the gun embodiment effect.Thus, the first step was to conduct a high-powered replication. We found a significant gun embodiment effect in participants’ reaction times and in their proportion of correct responses, but not in signal detection measures of bias, as had been originally reported. To help prevent the gun embodiment effect from leading to fatal encounters, it would be useful to know whether individuals with certain traits are less prone to the effect and whether certain kinds of experiences help alleviate the effect. With the new and reliable measure of the gun embodiment effect, we tested for moderation by individual differences related to prior gun experience, attitudes, personality, and factors related to emotion regulation and impulsivity. Despite the variety of these measures, there was little evidence for moderation. The results were more consistent with the idea of the gun embodiment effect being a universal, fixed effect, than being a flexible, malleable effect. 
    more » « less
  4. Partial correlation coefficients are widely applied in the social sciences to evaluate the relationship between two variables after accounting for the influence of others. In this article, we present Bayes Factor Functions (BFFs) for assessing the presence of partial correlation. BFFs represent Bayes factors derived from test statistics and are expressed as functions of a standardized effect size. While traditional frequentist methods based on p-values have been criticized for their inability to provide cumulative evidence in favor of the true hypothesis, Bayesian approaches are often challenged due to their computational demands and sensitivity to prior distributions. BFFs overcome these limitations and offer summaries of hypothesis tests as alternative hypotheses are varied over a range of prior distributions on standardized effects. They also enable the integration of evidence across multiple studies. 
    more » « less
  5. We describe Bayes factors functions based on the sampling distributions of z, t, χ2, and F statistics, using a class of inverse-moment prior distributions to define alternative hypotheses. These non-local alternative prior distributions are centered on standardized effects, which serve as indices for the Bayes factor function. We compare the conclusions drawn from resulting Bayes factor functions to those drawn from Bayes factors defined using local alternative prior specifications and examine their frequentist operating characteristics. Finally, an application of Bayes factor functions for replicated experimental designs in psychology are provided. 
    more » « less