Problem solving is central to mathematics learning (NCTM, 2014). Assessments are needed that appropriately measure students’ problem-solving performance. More importantly, assessments must be grounded in robust validity evidence that justifies their interpretations and outcomes (AERA et al., 2014). Thus, measures that are grounded in validity evidence are warranted for use by practitioners and scholars. The purpose of this presentation is to convey validity evidence for a new measure titled Problem-Solving Measure for grade four (PSM4). The research question is: What validity evidence supports PSM4 administration? The PSM4 is one assessment within the previously published PSM series designed for elementary and middle grades students. Problems are grounded in Schoenfeld’s (2011) framework and rely upon Verschaffel et al. (1999) perspective that word problems be open, complex, and realistic. The mathematics in the problems is tied to USA grade-level content and practice standards (CCSSI, 2010).
more »
« less
Engaging hearts and minds in assessment research
Assessment continues to be an important conversation point within Science, Technology, Engineering, and Mathematics (STEM) education scholarship and practice (Krupa et al., 2019; National Research Council, 2001). There are guidelines for developing and evaluating assess- ments (e.g., AERA et al., 2014; Carney et al., 2022; Lavery et al., 2019; Wilson & Wilmot, 2019). There are also Standards for Educational & Psychological Testing (Standards; AERA et al., 2014) that discuss important rele- vant frameworks and information about using assessment results and interpretations. Quantitative assessments are used as part of daily STEM instruction, STEM research, and STEM evaluation; therefore, having robust assess- ments is necessary (National Research Council, 2001). An aim of this editorial is to give readers a few relevant ideas about modern assessment research, some guidance for the use of quantitative assessments, and framing validation and assessment research as equity-forward work.
more »
« less
- PAR ID:
- 10519463
- Publisher / Repository:
- Wiley
- Date Published:
- Journal Name:
- School science and mathematics
- Volume:
- 123
- Issue:
- 6
- ISSN:
- 1949-8594
- Page Range / eLocation ID:
- 217-219
- Subject(s) / Keyword(s):
- math equity validity assessment validation quantitive assessment
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
The Standards for educational and psychological assessment were developed by the American Educational Research Association, American Psychological Association, and National Council on Measurement in Education (AERA et al., 2014). The Standards specify assessment developers establish five types of validity evidence: test content, response processes, internal structure, relationship to other variables, and consequential/bias. Relevant to this proposal is consequential validity evidence that identifies the potential negative impact of testing or bias. Standard 3.1 of The Standards (2014) on fairness in testing states that “those responsible for test development, revision, and administration should design all steps of the testing process to promote valid score interpretations for intended score uses for the widest possible range of individuals and relevant sub-groups in the intended populations” (p. 63). Three types of bias include construct, method, and item bias (Boer et al., 2018). Testing for differential item functioning (DIF) is a standard analysis adopted to detect item bias against a subgroup (Boer et al., 2018). Example subgroups include gender, race/ethnic group, socioeconomic status, native language, or disability. DIF is when “equally able test takers differ in their probabilities answering a test item correctly as a function of group membership” (AERA et al., 2005, p. 51). DIF indicates systematic error as compared to real mean group differences (Camilli & Shepard, 1994). Items exhibiting significant DIF are removed or reviewed for sources leading to bias to determine modifications to retain and further test an item. The Delphi technique is an emergent systematic research method whereby expert panel members review item content through an iterative process (Yildirim & Büyüköztürk, 2018). Experts independently evaluate each item for potential sources leading to DIF, researchers group their responses, and experts then independently complete a survey to rate their level of agreement with the anonymously grouped responses. This process continues until saturation and consensus are reached among experts as established through some criterion (e.g., median agreement rating, item quartile range, and percent agreement). The technique allows researchers to “identify, learn, and share the ideas of experts by searching for agreement among experts” (Yildirim & Büyüköztürk, 2018, p. 451). Research has illustrated this technique applied after DIF is detected, but not before administering items in the field. The current research is a methodological illustration of the Delphi technique applied in the item construction phase of assessment development as part of a five-year study to develop and test new problem-solving measures (PSM; Bostic et al., 2015, 2017) for U.S.A. grades 6-8 in a computer adaptive testing environment. As part of an iterative design-science-based methodology (Middleton et al., 2008), we illustrate the integration of the Delphi technique into the item writing process. Results from two three-person panels each reviewing a set of 45 PSM items are utilized to illustrate the technique. Advantages and limitations identified through a survey by participating experts and researchers are outlined to advance the method.more » « less
-
Instrument development should adhere to the Standards (AERA et al., 2014). “Content oriented evidence of validation is at the heart of the [validation] process” (AERA et al., 2014, p.15) and is one of the five sources of validity evidence. The research question for this study is: What is the evidence related to test content for the three instruments called the PSM3, PSM4, and PSM5? The study’s purpose is to describe content validity evidence related to new problem-solving measures currently under development. We have previously published validity evidence for problem-solving measures (PSM6, PSM7, and PSM8) that address middle grades math standards (see Bostic & Sondergeld, 2015; Bostic, Sondergeld, Folger, & Kruse, 2017).more » « less
-
An instrumental case study (Stake, 1995) explored the messages STEM postdoctoral scholar women receive about balancing an academic career with a family. Concerningly, women with children are less likely than men with children, or women and men without children, to be offered tenure-track positions or to be promoted (Bird & Rhoton, 2021; Cech & Blair-Lory, 2019; Gregor et al., 2021; Williams & Ceci, 2012; Ysseldyk et al., 2019). This reality suggests that motherhood is in opposition to professional legitimacy in academia (Hill et al., 2014; Thébaud & Taylor, 2021). Furthermore, postdoctoral scholar mothers are more likely than their peers to cite children as their primary reason for not entering the faculty job market (NPA ADVANCE, 2011). Interviews were conducted with 22 demographically diverse STEM postdoctoral scholar women to explore how messages about balancing career and family are considered. Using inductive and deductive methods (Silverman, 1993; Stake, 1995), interview transcripts were analyzed using the ideal worker conceptual framework (Kossek et al., 2021). Two themes arose: (1) STEM postdoctoral women receive messages suggesting they must sacrifice family pursuits for an academic career, and (2) positive modeling and support for balancing career and family are vital for retaining STEM postdoctoral women in the professoriate pathway. These findings illustrate a systemic conflict for STEM postdoctoral scholar women. They describe a necessity to sacrifice family desires, yet positive modeling and support for balancing career and family send messages suggesting it is possible to plan for both. This research is sponsored by the National Science Foundation (NSF) Alliance for Graduate Education and the Professoriate (AGEP; award #1821008).more » « less
-
Problem-solving is a typical type of assessment in engineering dynamics tests. To solve a problem, students need to set up equations and find a numerical answer. Depending on its difficulty and complexity, it can take anywhere from ten to thirty minutes to solve a quantitative problem. Due to the time constraint of in-class testing, a typical test may only contain a limited number of problems, covering an insufficient range of problem types. This can potentially reduce validity and reliability, two crucial factors which contribute to assessment results. A test with high validity should cover proper content. It should be able to distinguish high-performing students from low-performing students and every student in between. A reliable test should have a sufficient number of items to provide consistent information about students’ mastery of the materials. In this work-in-progress study, we will investigate to what extent a newly developed assessment is valid and reliable. Symbolic problem solving in this study refers to solving problems by setting up a system of equations without finding numeric solutions. Such problems usually take much less time. As a result, we can include more problems of a variety of types in a test. We evaluate the new assessment's validity and reliability. The efficient approach focused in symbolic problem-solving allows for a diverse range of problems in a single test. We will follow Standards for Educational and Psychological Testing, referred to as the Standards, for our study. The Standards were developed jointly by three professional organizations including the American Educational Research Association (AERA), the American Psychological Association (APA), and the National Council on Measurement in Education (NCME). We will use the standards to evaluate the content validity and internal consistency of a collection of symbolic problems. Examples on rectilinear kinematics and angular motion will be provided to illustrate how symbolic problem solving is used in both homework and assessments. Numerous studies in the literature have shown that symbolic questions impose greater challenges because of students’ algebraic difficulties. Thus, we will share strategies on how to prepare students to approach such problems.more » « less