Testing for reviewer anchoring in peer review: A randomized controlled trial

Liu, Ryan; Jecmen, Steven; Conitzer, Vincent; Fang, Fei; Shah, Nihar B

doi:10.1371/journal.pone.0301111

ObjectivePeer review frequently follows a process where reviewers first provide initial reviews, authors respond to these reviews, then reviewers update their reviews based on the authors’ response. There is mixed evidence regarding whether this process is useful, including frequent anecdotal complaints that reviewers insufficiently update their scores. In this study, we aim to investigate whether reviewersanchorto their original scores when updating their reviews, which serves as a potential explanation for the lack of updates in reviewer scores. DesignWe design a novel randomized controlled trial to test if reviewers exhibit anchoring. In the experimental condition, participants initially see a flawed version of a paper that is corrected after they submit their initial review, while in the control condition, participants only see the correct version. We take various measures to ensure that in the absence of anchoring, reviewers in the experimental group should revise their scores to be identically distributed to the scores from the control group. Furthermore, we construct the reviewed paper to maximize the difference between the flawed and corrected versions, and employ deception to hide the true experiment purpose. ResultsOur randomized controlled trial consists of 108 researchers as participants. First, we find that our intervention was successful at creating a difference in perceived paper quality between the flawed and corrected versions: Using a permutation test with the Mann-WhitneyUstatistic, we find that the experimental group’s initial scores are lower than the control group’s scores in both the Evaluation category (Vargha-DelaneyA= 0.64,p= 0.0096) and Overall score (A= 0.59,p= 0.058). Next, we test for anchoring by comparing the experimental group’s revised scores with the control group’s scores. We find no significant evidence of anchoring in either the Overall (A= 0.50,p= 0.61) or Evaluation category (A= 0.49,p= 0.61). The Mann-WhitneyUrepresents the number of individual pairwise comparisons across groups in which the value from the specified group is stochastically greater, while the Vargha-DelaneyAis the normalized version in [0, 1].

More Like this