skip to main content


Title: Content validity evidence for new problem-solving measures (PSM3, PSM4, and PSM5)
Instrument development should adhere to the Standards (AERA et al., 2014). “Content oriented evidence of validation is at the heart of the [validation] process” (AERA et al., 2014, p.15) and is one of the five sources of validity evidence. The research question for this study is: What is the evidence related to test content for the three instruments called the PSM3, PSM4, and PSM5? The study’s purpose is to describe content validity evidence related to new problem-solving measures currently under development. We have previously published validity evidence for problem-solving measures (PSM6, PSM7, and PSM8) that address middle grades math standards (see Bostic & Sondergeld, 2015; Bostic, Sondergeld, Folger, & Kruse, 2017).  more » « less
Award ID(s):
1720646
NSF-PAR ID:
10106806
Author(s) / Creator(s):
Date Published:
Journal Name:
Proceedings for the 40th Annual Meeting of the North American Chapter of the International Group for the Psychology of Mathematics Education
Page Range / eLocation ID:
1641
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Lischka, A. ; Dyer, E. ; Jones, R. ; Lovett, J. ; Strayer, J. ; Drown, S. (Ed.)
    Using a test for a purpose it was not intended for can promote misleading results and interpretations, potentially leading to negative consequences from testing (AERA et al., 2014). For example, a mathematics test designed for use with grade 7 students is likely inappropriate for use with grade 3 students. There may be cases when a test can be used with a population related to the intended one; however, validity evidence and claims must be examined. We explored the use of student measures with preservice teachers (PSTs) in a teacher-education context. The present study intends to spark a discussion about using some student measures with teachers. The Problem-solving Measures (PSMs) were developed for use with grades 3-8 students. They measure students’ problem-solving performance within the context of the Common Core State Standards for Mathematics (CCSSI, 2010; see Bostic & Sondergeld, 2015; Bostic et al., 2017; Bostic et al., 2021). After their construction, the developers wondered: If students were expected to engage successfully on the PSMs, then might future grades 3-8 teachers also demonstrate proficiency? 
    more » « less
  2. Lischka, A ; Dyer, E. ; Lovett, J. Strayer ; Drown, S. (Ed.)
    Using a test for a purpose it was not intended for can promote misleading results and interpretations, potentially leading to negative consequences from testing (AERA et al., 2014). For example, a mathematics test designed for use with grade 7 students is likely inappropriate for use with grade 3 students. There may be cases when a test can be used with a population related to the intended one; however, validity evidence and claims must be examined. We explored the use of student measures with preservice teachers (PSTs) in a teacher-education context. The present study intends to spark a discussion about using some student measures with teachers. The Problem-solving Measures (PSMs) were developed for use with grades 3-8 students. They measure students’ problem-solving performance within the context of the Common Core State Standards for Mathematics (CCSSI, 2010; see Bostic & Sondergeld, 2015; Bostic et al., 2017; Bostic et al., 2021). After their construction, the developers wondered: If students were expected to engage successfully on the PSMs, then might future grades 3-8 teachers also demonstrate proficiency? 
    more » « less
  3. A. Lischka, E. Dyer (Ed.)
    Using a test for a purpose it was not intended for can promote misleading results and interpretations, potentially leading to negative consequences from testing (AERA et al., 2014). For example, a mathematics test designed for use with grade 7 students is likely inappropriate for use with grade 3 students. There may be cases when a test can be used with a population related to the intended one; however, validity evidence and claims must be examined. We explored the use of student measures with preservice teachers (PSTs) in a teacher-education context. The present study intends to spark a discussion about using some student measures with teachers. The Problem-solving Measures (PSMs) were developed for use with grades 3-8 students. They measure students’ problem-solving performance within the context of the Common Core State Standards for Mathematics (CCSSI, 2010; see Bostic & Sondergeld, 2015; Bostic et al., 2017; Bostic et al., 2021). After their construction, the developers wondered: If students were expected to engage successfully on the PSMs, then might future grades 3-8 teachers also demonstrate proficiency? 
    more » « less
  4. Lischka, A ; Dyer, E. ; Jones, E. ; Lovett, J. ; Strayer, J. ; Drown, S. (Ed.)
    Using a test for a purpose it was not intended for can promote misleading results and interpretations, potentially leading to negative consequences from testing (AERA et al., 2014). For example, a mathematics test designed for use with grade 7 students is likely inappropriate for use with grade 3 students. There may be cases when a test can be used with a population related to the intended one; however, validity evidence and claims must be examined. We explored the use of student measures with preservice teachers (PSTs) in a teacher-education context. The present study intends to spark a discussion about using some student measures with teachers. The Problem-solving Measures (PSMs) were developed for use with grades 3-8 students. They measure students’ problem-solving performance within the context of the Common Core State Standards for Mathematics (CCSSI, 2010; see Bostic & Sondergeld, 2015; Bostic et al., 2017; Bostic et al., 2021). After their construction, the developers wondered: If students were expected to engage successfully on the PSMs, then might future grades 3-8 teachers also demonstrate proficiency? 
    more » « less
  5. The Standards for educational and psychological assessment were developed by the American Educational Research Association, American Psychological Association, and National Council on Measurement in Education (AERA et al., 2014). The Standards specify assessment developers establish five types of validity evidence: test content, response processes, internal structure, relationship to other variables, and consequential/bias. Relevant to this proposal is consequential validity evidence that identifies the potential negative impact of testing or bias. Standard 3.1 of The Standards (2014) on fairness in testing states that “those responsible for test development, revision, and administration should design all steps of the testing process to promote valid score interpretations for intended score uses for the widest possible range of individuals and relevant sub-groups in the intended populations” (p. 63). Three types of bias include construct, method, and item bias (Boer et al., 2018). Testing for differential item functioning (DIF) is a standard analysis adopted to detect item bias against a subgroup (Boer et al., 2018). Example subgroups include gender, race/ethnic group, socioeconomic status, native language, or disability. DIF is when “equally able test takers differ in their probabilities answering a test item correctly as a function of group membership” (AERA et al., 2005, p. 51). DIF indicates systematic error as compared to real mean group differences (Camilli & Shepard, 1994). Items exhibiting significant DIF are removed or reviewed for sources leading to bias to determine modifications to retain and further test an item. The Delphi technique is an emergent systematic research method whereby expert panel members review item content through an iterative process (Yildirim & Büyüköztürk, 2018). Experts independently evaluate each item for potential sources leading to DIF, researchers group their responses, and experts then independently complete a survey to rate their level of agreement with the anonymously grouped responses. This process continues until saturation and consensus are reached among experts as established through some criterion (e.g., median agreement rating, item quartile range, and percent agreement). The technique allows researchers to “identify, learn, and share the ideas of experts by searching for agreement among experts” (Yildirim & Büyüköztürk, 2018, p. 451). Research has illustrated this technique applied after DIF is detected, but not before administering items in the field. The current research is a methodological illustration of the Delphi technique applied in the item construction phase of assessment development as part of a five-year study to develop and test new problem-solving measures (PSM; Bostic et al., 2015, 2017) for U.S.A. grades 6-8 in a computer adaptive testing environment. As part of an iterative design-science-based methodology (Middleton et al., 2008), we illustrate the integration of the Delphi technique into the item writing process. Results from two three-person panels each reviewing a set of 45 PSM items are utilized to illustrate the technique. Advantages and limitations identified through a survey by participating experts and researchers are outlined to advance the method. 
    more » « less