skip to main content


Title: Peer review: Risk and risk tolerance
Peer review, commonly used in grant funding decisions, relies on scientists’ ability to evaluate research proposals’ quality. Such judgments are sometimes beyond reviewers’ discriminatory power and could lead to a reliance on subjective biases, including preferences for lower risk, incremental projects. However, peer reviewers’ risk tolerance has not been well studied. We conducted a cross-sectional experiment of peer reviewers’ evaluations of mock primary reviewers’ comments in which the level and sources of risks and weaknesses were manipulated. Here we show that proposal risks more strongly predicted reviewers’ scores than proposal strengths based on mock proposal evaluations. Risk tolerance was not predictive of scores but reviewer scoring leniency was predictive of overall and criteria scores. The evaluation of risks dominates reviewers’ evaluation of research proposals and is a source of inter-reviewer variability. These results suggest that reviewer scoring variability may be attributed to the interpretation of proposal risks, and could benefit from intervention to improve the reliability of reviews. Additionally, the valuation of risk drives proposal evaluations and may reduce the chances that risky, but highly impactful science, is supported.  more » « less
Award ID(s):
1951132
PAR ID:
10353659
Author(s) / Creator(s):
;
Editor(s):
Lam, Hon-Ming
Date Published:
Journal Name:
PLOS ONE
Volume:
17
Issue:
8
ISSN:
1932-6203
Page Range / eLocation ID:
e0273813
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Peer review is integral to the evaluation of grant proposals. Reviewer perceptions and characteristics have received limited study, especially their associations with reviewers’ evaluations. This mixed methods study analyzed the unstructured comments of 270 experienced peer reviewers after they scored proposals based on mock overall evaluations written by the primary reviewer. Comments were coded for topical content and emotional valence blind to participants’ characteristics. The most frequent comments were about their experiences with peer review with negative valence. Additional themes were identified within the content codes, including concerns about favoritism and inappropriate behavior observed in other reviewers. Reviewers who made negative comments gave poorer scores than reviewers who did not. Reviewer mindsets are understudied: negative moods and cognitions may affect reviewers’ overall evaluative severity. Future studies should further investigate these associations.

     
    more » « less
  2. This research paper study was situated within a peer review mentoring program in which novice reviewers were paired with mentors who are former National Science Foundation (NSF) program directors with experience running discipline-based education research (DBER) panels. Whether it be a manuscript or grant proposal, the outcome of peer review can greatly influence academic careers and the impact of research on a field. Yet the criteria upon which reviewers base their recommendations and the processes they follow as they review are poorly understood. Mentees reviewed three previously submitted proposals to the NSF and drafted pre-panel reviews regarding the proposals’ intellectual merit and broader impacts, strengths, and weaknesses relative to solicitation-specific criteria. After participation in one mock review panel, mentees could then revise their pre-review evaluations based on the panel discussion. Using a lens of transformative learning theory, this study sought to answer the following research questions: 1) What are the tacit criteria used to inform recommendations for grant proposal reviews among scholars new to the review process? 2) To what extent are there changes in these tacit criteria and subsequent recommendations for grant proposal reviews after participation in a mock panel review? Using a single case study approach to explore one mock review panel, we conducted document analyses of six mentees’ reviews completed before and after their participation in the mock review panel. Findings from this study suggest that reviewers primarily focus on the positive broader impacts proposed by a study and the level of detail within a submitted proposal. Although mentees made few changes to their reviews after the mock panel discussion, changes which were present illustrate that reviewers more deeply considered the broader impacts of the proposed studies. These results can inform review panel practices as well as approaches to training to support new reviewers in DBER fields. 
    more » « less
  3. This research paper study was situated within a peer review mentoring program in which novice reviewers were paired with mentors who are former National Science Foundation (NSF) program directors with experience running discipline-based education research (DBER) panels. Whether it be a manuscript or grant proposal, the outcome of peer review can greatly influence academic careers and the impact of research on a field. Yet the criteria upon which reviewers base their recommendations and the processes they follow as they review are poorly understood. Mentees reviewed three previously submitted proposals to the NSF and drafted pre-panel reviews regarding the proposals’ intellectual merit and broader impacts, strengths, and weaknesses relative to solicitation-specific criteria. After participation in one mock review panel, mentees could then revise their pre-review evaluations based on the panel discussion. Using a lens of transformative learning theory, this study sought to answer the following research questions: 1) What are the tacit criteria used to inform recommendations for grant proposal reviews among scholars new to the review process? 2) To what extent are there changes in these tacit criteria and subsequent recommendations for grant proposal reviews after participation in a mock panel review? Using a single case study approach to explore one mock review panel, we conducted document analyses of six mentees’ reviews completed before and after their participation in the mock review panel. Findings from this study suggest that reviewers primarily focus on the positive broader impacts proposed by a study and the level of detail within a submitted proposal. Although mentees made few changes to their reviews after the mock panel discussion, changes which were present illustrate that reviewers more deeply considered the broader impacts of the proposed studies. These results can inform review panel practices as well as approaches to training to support new reviewers in DBER fields. 
    more » « less
  4. null (Ed.)
    Considerable attention has focused on studying reviewer agreement via inter-rater reliability (IRR) as a way to assess the quality of the peer review process. Inspired by a recent study that reported an IRR of zero in the mock peer review of top-quality grant proposals, we use real data from a complete range of submissions to the National Institutes of Health and to the American Institute of Biological Sciences to bring awareness to two important issues with using IRR for assessing peer review quality. First, we demonstrate that estimating local IRR from subsets of restricted-quality proposals will likely result in zero estimates under many scenarios. In both data sets, we find that zero local IRR estimates are more likely when subsets of top-quality proposals rather than bottom-quality proposals are considered. However, zero estimates from range-restricted data should not be interpreted as indicating arbitrariness in peer review. On the contrary, despite different scoring scales used by the two agencies, when complete ranges of proposals are considered, IRR estimates are above 0.6 which indicates good reviewer agreement. Furthermore, we demonstrate that, with a small number of reviewers per proposal, zero estimates of IRR are possible even when the true value is not zero. 
    more » « less
  5. In the absence of gold standard for evaluating quality of peer review, considerable attention has been focused on studying reviewer agreement via inter-rater reliability (IRR) which can be thought of as the correlation between scores of different reviewers given to the same grant proposal. Noting that it is not uncommon for IRR in grant peer review studies to be estimated from some range-restricted subset of submissions, we use statistical methods and data analysis of real peer review data to illustrate behavior of such local IRR estimates when only fractions of top-quality proposal submissions are considered. We demonstrate that local IRR estimates are smaller than those obtained from all submissions and that zero local IRR estimates are quite plausible. We note that, from a measurement perspective, when reviewers are asked to differentiate among grant proposals across the whole range of submissions, only IRR measures that correspond to the complete range of submissions are warranted. We recommend against using local IRR estimates in those situations. Moreover, if review scores are intended to be used for differentiating among top proposals, we recommend peer review administrators and researchers to align review procedures with their intended measurement. 
    more » « less