Title: Defining Test‐Score Interpretation, Use, and Claims: Delphi Study for the Validity Argument
Abstract Validity is a fundamental consideration of test development and test evaluation. The purpose of this study is to define and reify three key aspects of validity and validation, namely test‐score interpretation, test‐score use, and the claims supporting interpretation and use. This study employed a Delphi methodology to explore how experts in validity and validation conceptualize test‐score interpretation, use, and claims. Definitions were developed through multiple iterations of data collection and analysis. By clarifying the language used when conducting validation, validation may be more accessible to a broader audience, including but not limited to test developers, test users, and test consumers. more »« less
Bostic, J.; Krupa, E.; Folger, T.; Bentley, B.; Stokes, D.
(, Psychology of Mathematics Education North America)
A. Lischka, E. Dyer
(Ed.)
Validity and validation is central to conducting high quality quantitative mathematics education scholarship. This presentation aims to support scholars engaged in quantitative research by providing information about the degrees to which validity evidence related to their instrument use or interpretation, were found in mathematics education scholarship. Findings have potential to steer future quantitatively focused scholarship and support equity aims.
Bostic, J.
(, Proceedings for the 40th Annual Meeting of the North American Chapter of the International Group for the Psychology of Mathematics Education)
Instrument development should adhere to the Standards (AERA et al., 2014). “Content oriented evidence of validation is at the heart of the [validation] process” (AERA et al., 2014, p.15) and is one of the five sources of validity evidence. The research question for this study is: What is the evidence related to test content for the three instruments called the PSM3, PSM4, and PSM5? The study’s purpose is to describe content validity evidence related to new problem-solving measures currently under development. We have previously published validity evidence for problem-solving measures (PSM6, PSM7, and PSM8) that address middle grades math standards (see Bostic & Sondergeld, 2015; Bostic, Sondergeld, Folger, & Kruse, 2017).
Merzdorf, Hillary E.; Jaison, Donna; Weaver, Morgan B.; Linsey, Julie; Hammond, Tracy; Douglas, Kerrie A.
(, 2022 IEEE Frontiers in Education Conference (FIE))
This Research Work-In-Progress reports the implementation of an Object Assembly Test for sketching skills in an undergraduate mechanical engineering graphics course. Sketching is essential for generating and refining ideas, and for communication among team members. Design thinking is supported through sketching as a means of translating between internal and external representations, and creating shared representations of collaborative thinking. While many spatial tests exist in engineering education, these tests have not directly used sketching or tested sketching skill. The Object Assembly Test is used to evaluate sketching skills on 3-dimensional mental imagery and mental rotation tasks in 1- and 2-point perspective. We describe revisions to the Object Assembly Test skills and grading rubric since its pilot test, and implement the test in an undergraduate mechanical engineering course for further validation. We summarize inter-rater reliability for each sketching exercise and for each grading metric for a sample of sketches, with discussion of score use and interpretation.
Sondergeld, T.
(, Proceedings of the 43rd Meeting of the International Group for the Psychology of Mathematics Education.)
Multiple forms of validity evidence should be reviewed to produce assessments with valid and reliable results (AERA, APA, NCME, 2014). Most mathematics validation studies do not, however, investigate beyond content and internal structure (Bostic, Krupa, Carney, & Shih, in press). The purpose of this study is to examine the less commonly reviewed validity evidence of "relationships to other variables" (RTOV) using mathematics problem-solving assessments (PSM3-5) as an example. RTOV explores how test scores may be related to other variables. When RTOV has been examined in mathematics validation studies, it was at the overall test level (see Bostic, Sondergeld, Folger, & Kruse, 2017 for an example). As such, the research question guiding our study is: What information is present when examining RTOV at both the overall test and individual item-levels?
Charpignon, Marie-Laure; Carrel, Adrien; Jiang, Yihang; Kwaga, Teddy; Cantada, Beatriz; Hyslop, Terry; Cox, Christopher E; Haines, Krista; Koomson, Valencia; Dumas, Guillaume; et al
(, PLOS Digital Health)
Marcelo, Alvin
(Ed.)
BackgroundIn light of recent retrospective studies revealing evidence of disparities in access to medical technology and of bias in measurements, this narrative review assesses digital determinants of health (DDoH) in both technologies and medical formulae that demonstrate either evidence of bias or suboptimal performance, identifies potential mechanisms behind such bias, and proposes potential methods or avenues that can guide future efforts to address these disparities. ApproachMechanisms are broadly grouped intophysical and biological biases(e.g., pulse oximetry, non-contact infrared thermometry [NCIT]),interaction of human factors and cultural practices(e.g., electroencephalography [EEG]), andinterpretation bias(e.g, pulmonary function tests [PFT], optical coherence tomography [OCT], and Humphrey visual field [HVF] testing). This review scope specifically excludes technologies incorporating artificial intelligence and machine learning. For each technology, we identify both clinical and research recommendations. ConclusionsMany of the DDoH mechanisms encountered in medical technologies and formulae result in lower accuracy or lower validity when applied to patients outside the initial scope of development or validation. Our clinical recommendations caution clinical users in completely trusting result validity and suggest correlating with other measurement modalities robust to the DDoH mechanism (e.g., arterial blood gas for pulse oximetry, core temperatures for NCIT). Our research recommendations suggest not only increasing diversity in development and validation, but also awareness in the modalities of diversity required (e.g., skin pigmentation for pulse oximetry but skin pigmentation and sex/hormonal variation for NCIT). By increasing diversity that better reflects patients in all scenarios of use, we can mitigate DDoH mechanisms and increase trust and validity in clinical practice and research.
Folger, Timothy D., Bostic, Jonathan, and Krupa, Erin E. Defining Test‐Score Interpretation, Use, and Claims: Delphi Study for the Validity Argument. Educational Measurement: Issues and Practice 42.3 Web. doi:10.1111/emip.12569.
Folger, Timothy D., Bostic, Jonathan, & Krupa, Erin E. Defining Test‐Score Interpretation, Use, and Claims: Delphi Study for the Validity Argument. Educational Measurement: Issues and Practice, 42 (3). https://doi.org/10.1111/emip.12569
Folger, Timothy D., Bostic, Jonathan, and Krupa, Erin E.
"Defining Test‐Score Interpretation, Use, and Claims: Delphi Study for the Validity Argument". Educational Measurement: Issues and Practice 42 (3). Country unknown/Code not available: Wiley-Blackwell. https://doi.org/10.1111/emip.12569.https://par.nsf.gov/biblio/10441942.
@article{osti_10441942,
place = {Country unknown/Code not available},
title = {Defining Test‐Score Interpretation, Use, and Claims: Delphi Study for the Validity Argument},
url = {https://par.nsf.gov/biblio/10441942},
DOI = {10.1111/emip.12569},
abstractNote = {Abstract Validity is a fundamental consideration of test development and test evaluation. The purpose of this study is to define and reify three key aspects of validity and validation, namely test‐score interpretation, test‐score use, and the claims supporting interpretation and use. This study employed a Delphi methodology to explore how experts in validity and validation conceptualize test‐score interpretation, use, and claims. Definitions were developed through multiple iterations of data collection and analysis. By clarifying the language used when conducting validation, validation may be more accessible to a broader audience, including but not limited to test developers, test users, and test consumers.},
journal = {Educational Measurement: Issues and Practice},
volume = {42},
number = {3},
publisher = {Wiley-Blackwell},
author = {Folger, Timothy D. and Bostic, Jonathan and Krupa, Erin E.},
}
Warning: Leaving National Science Foundation Website
You are now leaving the National Science Foundation website to go to a non-government website.
Website:
NSF takes no responsibility for and exercises no control over the views expressed or the accuracy of
the information contained on this site. Also be aware that NSF's privacy policy does not apply to this site.