skip to main content

Attention:

The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Friday, May 17 until 8:00 AM ET on Saturday, May 18 due to maintenance. We apologize for the inconvenience.


Search for: All records

Creators/Authors contains: "Koskey, K."

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. In the United States, national and state standardized assessments have become a metric for measuring student learning and high-quality learning environments. As the COVID- 19 pandemic offered a multitude of learning modalities (e.g., hybrid, socially distanced face-to-face instruction, virtual environment), it becomes critical to examine how this learning disruption influenced elementary mathematic performance. This study tested for differences in mathematics performance on fourth grade standardized tests before and during COVID-19 in a case study of a rural Ohio school district using the Measure of Academic Progress (MAP) mathematics test. A two-way ANOVA showed that fourth- grade MAP mathematics scores were statistically similar for the 2019 pre-COVID cohort (n = 31) and 2020 COVID-19 cohort (n = 82), and by gender group, between Fall 2019 and Fall 2020. Implications for rural students’ academic performance in virtual learning environments are discussed. 
    more » « less
  2. The Delphi method has been adapted to inform item refinements in educational and psychological assessment development. An explanatory sequential mixed methods design using Delphi is a common approach to gain experts' insight into why items might have exhibited differential item functioning (DIF) for a sub-group, indicating potential item bias. Use of Delphi before quantitative field testing to screen for potential sources leading to item bias is lacking in the literature. An exploratory sequential design is illustrated as an additional approach using a Delphi technique in Phase I and Rasch DIF analyses in Phase II. We introduce the 2 × 2 Concordance Integration Typology as a systematic way to examine agreement and disagreement across the qualitative and quantitative findings using a concordance joint display table. A worked example from the development of the Problem-Solving Measures Grades 6–8 Computer Adaptive Tests supported using an exploratory sequential design to inform item refinement. The 2 × 2 Concordance Integration Typology (a) crystallized instances where additional refinements were potentially needed and (b) provided for evaluating the distribution of bias across the set of items as a whole. Implications are discussed for advancing data integration techniques and using mixed methods to improve instrument development. 
    more » « less
  3. Determining the most appropriate method of scoring an assessment is based on multiple factors, including the intended use of results, the assessment's purpose, and time constraints. Both the dichotomous and partial credit models have their advantages, yet direct comparisons of assessment outcomes from each method are not typical with constructed response items. The present study compared the impact of both scoring methods on the internal structure and consequential validity of a middle-grades problem-solving assessment called the problem solving measure for grade six (PSM6). After being scored both ways, Rasch dichotomous and partial credit analyses indicated similarly strong psychometric findings across models. Student outcome measures on the PSM6, scored both dichotomously and with partial credit, demonstrated strong, positive, significant correlation. Similar demographic patterns were noted regardless of scoring method. Both scoring methods produced similar results, suggesting that either would be appropriate to use with the PSM6. 
    more » « less
  4. Lischka, A. ; Dyer, E. ; Jones, R. ; Lovett, J. ; Strayer, J. ; Drown, S. (Ed.)
    Using a test for a purpose it was not intended for can promote misleading results and interpretations, potentially leading to negative consequences from testing (AERA et al., 2014). For example, a mathematics test designed for use with grade 7 students is likely inappropriate for use with grade 3 students. There may be cases when a test can be used with a population related to the intended one; however, validity evidence and claims must be examined. We explored the use of student measures with preservice teachers (PSTs) in a teacher-education context. The present study intends to spark a discussion about using some student measures with teachers. The Problem-solving Measures (PSMs) were developed for use with grades 3-8 students. They measure students’ problem-solving performance within the context of the Common Core State Standards for Mathematics (CCSSI, 2010; see Bostic & Sondergeld, 2015; Bostic et al., 2017; Bostic et al., 2021). After their construction, the developers wondered: If students were expected to engage successfully on the PSMs, then might future grades 3-8 teachers also demonstrate proficiency? 
    more » « less
  5. Lischka, A ; Dyer, E. ; Lovett, J. Strayer ; Drown, S. (Ed.)
    Using a test for a purpose it was not intended for can promote misleading results and interpretations, potentially leading to negative consequences from testing (AERA et al., 2014). For example, a mathematics test designed for use with grade 7 students is likely inappropriate for use with grade 3 students. There may be cases when a test can be used with a population related to the intended one; however, validity evidence and claims must be examined. We explored the use of student measures with preservice teachers (PSTs) in a teacher-education context. The present study intends to spark a discussion about using some student measures with teachers. The Problem-solving Measures (PSMs) were developed for use with grades 3-8 students. They measure students’ problem-solving performance within the context of the Common Core State Standards for Mathematics (CCSSI, 2010; see Bostic & Sondergeld, 2015; Bostic et al., 2017; Bostic et al., 2021). After their construction, the developers wondered: If students were expected to engage successfully on the PSMs, then might future grades 3-8 teachers also demonstrate proficiency? 
    more » « less
  6. Lischka, A ; Dyer, E. ; Jones, E. ; Lovett, J. ; Strayer, J. ; Drown, S. (Ed.)
    Using a test for a purpose it was not intended for can promote misleading results and interpretations, potentially leading to negative consequences from testing (AERA et al., 2014). For example, a mathematics test designed for use with grade 7 students is likely inappropriate for use with grade 3 students. There may be cases when a test can be used with a population related to the intended one; however, validity evidence and claims must be examined. We explored the use of student measures with preservice teachers (PSTs) in a teacher-education context. The present study intends to spark a discussion about using some student measures with teachers. The Problem-solving Measures (PSMs) were developed for use with grades 3-8 students. They measure students’ problem-solving performance within the context of the Common Core State Standards for Mathematics (CCSSI, 2010; see Bostic & Sondergeld, 2015; Bostic et al., 2017; Bostic et al., 2021). After their construction, the developers wondered: If students were expected to engage successfully on the PSMs, then might future grades 3-8 teachers also demonstrate proficiency? 
    more » « less
  7. In the United States, national and state standardized assessments have become a metric for measuring student learning and high-quality learning environments. As the COVID- 19 pandemic offered a multitude of learning modalities (e.g., hybrid, socially distanced face-to-face instruction, virtual environment), it becomes critical to examine how this learning disruption influenced elementary mathematic performance. This study tested for differences in mathematics performance on fourth grade standardized tests before and during COVID-19 in a case study of a rural Ohio school district using the Measure of Academic Progress (MAP) mathematics test. A two-way ANOVA showed that fourth- grade MAP mathematics scores were statistically similar for the 2019 pre-COVID cohort (n = 31) and 2020 COVID-19 cohort (n = 82), and by gender group, between Fall 2019 and Fall 2020. Implications for rural students’ academic performance in virtual learning environments are discussed. 
    more » « less
  8. In the United States, national and state standardized assessments have become a metric for measuring student learning and high-quality learning environments. As the COVID- 19 pandemic offered a multitude of learning modalities (e.g., hybrid, socially distanced face-to-face instruction, virtual environment), it becomes critical to examine how this learning disruption influenced elementary mathematic performance. This study tested for differences in mathematics performance on fourth grade standardized tests before and during COVID-19 in a case study of a rural Ohio school district using the Measure of Academic Progress (MAP) mathematics test. A two-way ANOVA showed that fourth- grade MAP mathematics scores were statistically similar for the 2019 pre-COVID cohort (n = 31) and 2020 COVID-19 cohort (n = 82), and by gender group, between Fall 2019 and Fall 2020. Implications for rural students’ academic performance in virtual learning environments are discussed. 
    more » « less
  9. The Standards for educational and psychological assessment were developed by the American Educational Research Association, American Psychological Association, and National Council on Measurement in Education (AERA et al., 2014). The Standards specify assessment developers establish five types of validity evidence: test content, response processes, internal structure, relationship to other variables, and consequential/bias. Relevant to this proposal is consequential validity evidence that identifies the potential negative impact of testing or bias. Standard 3.1 of The Standards (2014) on fairness in testing states that “those responsible for test development, revision, and administration should design all steps of the testing process to promote valid score interpretations for intended score uses for the widest possible range of individuals and relevant sub-groups in the intended populations” (p. 63). Three types of bias include construct, method, and item bias (Boer et al., 2018). Testing for differential item functioning (DIF) is a standard analysis adopted to detect item bias against a subgroup (Boer et al., 2018). Example subgroups include gender, race/ethnic group, socioeconomic status, native language, or disability. DIF is when “equally able test takers differ in their probabilities answering a test item correctly as a function of group membership” (AERA et al., 2005, p. 51). DIF indicates systematic error as compared to real mean group differences (Camilli & Shepard, 1994). Items exhibiting significant DIF are removed or reviewed for sources leading to bias to determine modifications to retain and further test an item. The Delphi technique is an emergent systematic research method whereby expert panel members review item content through an iterative process (Yildirim & Büyüköztürk, 2018). Experts independently evaluate each item for potential sources leading to DIF, researchers group their responses, and experts then independently complete a survey to rate their level of agreement with the anonymously grouped responses. This process continues until saturation and consensus are reached among experts as established through some criterion (e.g., median agreement rating, item quartile range, and percent agreement). The technique allows researchers to “identify, learn, and share the ideas of experts by searching for agreement among experts” (Yildirim & Büyüköztürk, 2018, p. 451). Research has illustrated this technique applied after DIF is detected, but not before administering items in the field. The current research is a methodological illustration of the Delphi technique applied in the item construction phase of assessment development as part of a five-year study to develop and test new problem-solving measures (PSM; Bostic et al., 2015, 2017) for U.S.A. grades 6-8 in a computer adaptive testing environment. As part of an iterative design-science-based methodology (Middleton et al., 2008), we illustrate the integration of the Delphi technique into the item writing process. Results from two three-person panels each reviewing a set of 45 PSM items are utilized to illustrate the technique. Advantages and limitations identified through a survey by participating experts and researchers are outlined to advance the method. 
    more » « less