skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Local versus global inter-rater reliability for evaluating the internal validity of grant peer review: Considerations of measurement
In the absence of gold standard for evaluating quality of peer review, considerable attention has been focused on studying reviewer agreement via inter-rater reliability (IRR) which can be thought of as the correlation between scores of different reviewers given to the same grant proposal. Noting that it is not uncommon for IRR in grant peer review studies to be estimated from some range-restricted subset of submissions, we use statistical methods and data analysis of real peer review data to illustrate behavior of such local IRR estimates when only fractions of top-quality proposal submissions are considered. We demonstrate that local IRR estimates are smaller than those obtained from all submissions and that zero local IRR estimates are quite plausible. We note that, from a measurement perspective, when reviewers are asked to differentiate among grant proposals across the whole range of submissions, only IRR measures that correspond to the complete range of submissions are warranted. We recommend against using local IRR estimates in those situations. Moreover, if review scores are intended to be used for differentiating among top proposals, we recommend peer review administrators and researchers to align review procedures with their intended measurement.  more » « less
Award ID(s):
1759825
PAR ID:
10272380
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
2nd International Conference on Peer Review
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Considerable attention has focused on studying reviewer agreement via inter-rater reliability (IRR) as a way to assess the quality of the peer review process. Inspired by a recent study that reported an IRR of zero in the mock peer review of top-quality grant proposals, we use real data from a complete range of submissions to the National Institutes of Health and to the American Institute of Biological Sciences to bring awareness to two important issues with using IRR for assessing peer review quality. First, we demonstrate that estimating local IRR from subsets of restricted-quality proposals will likely result in zero estimates under many scenarios. In both data sets, we find that zero local IRR estimates are more likely when subsets of top-quality proposals rather than bottom-quality proposals are considered. However, zero estimates from range-restricted data should not be interpreted as indicating arbitrariness in peer review. On the contrary, despite different scoring scales used by the two agencies, when complete ranges of proposals are considered, IRR estimates are above 0.6 which indicates good reviewer agreement. Furthermore, we demonstrate that, with a small number of reviewers per proposal, zero estimates of IRR are possible even when the true value is not zero. 
    more » « less
  2. Lam, Hon-Ming (Ed.)
    Peer review, commonly used in grant funding decisions, relies on scientists’ ability to evaluate research proposals’ quality. Such judgments are sometimes beyond reviewers’ discriminatory power and could lead to a reliance on subjective biases, including preferences for lower risk, incremental projects. However, peer reviewers’ risk tolerance has not been well studied. We conducted a cross-sectional experiment of peer reviewers’ evaluations of mock primary reviewers’ comments in which the level and sources of risks and weaknesses were manipulated. Here we show that proposal risks more strongly predicted reviewers’ scores than proposal strengths based on mock proposal evaluations. Risk tolerance was not predictive of scores but reviewer scoring leniency was predictive of overall and criteria scores. The evaluation of risks dominates reviewers’ evaluation of research proposals and is a source of inter-reviewer variability. These results suggest that reviewer scoring variability may be attributed to the interpretation of proposal risks, and could benefit from intervention to improve the reliability of reviews. Additionally, the valuation of risk drives proposal evaluations and may reduce the chances that risky, but highly impactful science, is supported. 
    more » « less
  3. This research paper study was situated within a peer review mentoring program in which novice reviewers were paired with mentors who are former National Science Foundation (NSF) program directors with experience running discipline-based education research (DBER) panels. Whether it be a manuscript or grant proposal, the outcome of peer review can greatly influence academic careers and the impact of research on a field. Yet the criteria upon which reviewers base their recommendations and the processes they follow as they review are poorly understood. Mentees reviewed three previously submitted proposals to the NSF and drafted pre-panel reviews regarding the proposals’ intellectual merit and broader impacts, strengths, and weaknesses relative to solicitation-specific criteria. After participation in one mock review panel, mentees could then revise their pre-review evaluations based on the panel discussion. Using a lens of transformative learning theory, this study sought to answer the following research questions: 1) What are the tacit criteria used to inform recommendations for grant proposal reviews among scholars new to the review process? 2) To what extent are there changes in these tacit criteria and subsequent recommendations for grant proposal reviews after participation in a mock panel review? Using a single case study approach to explore one mock review panel, we conducted document analyses of six mentees’ reviews completed before and after their participation in the mock review panel. Findings from this study suggest that reviewers primarily focus on the positive broader impacts proposed by a study and the level of detail within a submitted proposal. Although mentees made few changes to their reviews after the mock panel discussion, changes which were present illustrate that reviewers more deeply considered the broader impacts of the proposed studies. These results can inform review panel practices as well as approaches to training to support new reviewers in DBER fields. 
    more » « less
  4. This research paper study was situated within a peer review mentoring program in which novice reviewers were paired with mentors who are former National Science Foundation (NSF) program directors with experience running discipline-based education research (DBER) panels. Whether it be a manuscript or grant proposal, the outcome of peer review can greatly influence academic careers and the impact of research on a field. Yet the criteria upon which reviewers base their recommendations and the processes they follow as they review are poorly understood. Mentees reviewed three previously submitted proposals to the NSF and drafted pre-panel reviews regarding the proposals’ intellectual merit and broader impacts, strengths, and weaknesses relative to solicitation-specific criteria. After participation in one mock review panel, mentees could then revise their pre-review evaluations based on the panel discussion. Using a lens of transformative learning theory, this study sought to answer the following research questions: 1) What are the tacit criteria used to inform recommendations for grant proposal reviews among scholars new to the review process? 2) To what extent are there changes in these tacit criteria and subsequent recommendations for grant proposal reviews after participation in a mock panel review? Using a single case study approach to explore one mock review panel, we conducted document analyses of six mentees’ reviews completed before and after their participation in the mock review panel. Findings from this study suggest that reviewers primarily focus on the positive broader impacts proposed by a study and the level of detail within a submitted proposal. Although mentees made few changes to their reviews after the mock panel discussion, changes which were present illustrate that reviewers more deeply considered the broader impacts of the proposed studies. These results can inform review panel practices as well as approaches to training to support new reviewers in DBER fields. 
    more » « less
  5. Cameron, Carrie (Ed.)
    Grant writing is an essential skill to develop for academic and other career success but providing individual feedback to large numbers of trainees is challenging. In 2014, we launched the Stanford Biosciences Grant Writing Academy to support graduate students and postdocs in writing research proposals. Its core program is a multi-week Proposal Bootcamp designed to increase the feedback writers receive as they develop and refine their proposals. The Proposal Bootcamp consisted of two-hour weekly meetings that included mini lectures and peer review. Bootcamp participants also attended faculty review workshops to obtain faculty feedback. Postdoctoral trainees were trained and hired as course teaching assistants and facilitated weekly meetings and review workshops. Over the last six years, the annual Bootcamp has provided 525 doctoral students and postdocs with multi-level feedback (peer and faculty). Proposals from Bootcamp participants were almost twice as likely to be funded than proposals from non-Bootcamp trainees. Overall, this structured program provided opportunities for feedback from multiple peer and faculty reviewers, increased the participants’ confidence in developing and submitting research proposals, while accommodating a large number of participants. 
    more » « less