skip to main content


Title: Applying prerequisite structure inference to adaptive testing
Modeling student knowledge is important for assessment design, adaptive testing, curriculum design, and pedagogical intervention. The assessment design community has primarily focused on continuous latent-skill models with strong conditional independence assumptions among knowledge items, while the prerequisite discovery community has developed many models that aim to exploit the interdependence of discrete knowledge items. This paper attempts to bridge the gap by asking, "When does modeling assessment item interdependence improve predictive accuracy?" A novel adaptive testing evaluation framework is introduced that is amenable to techniques from both communities, and an efficient algorithm, Directed Item-Dependence And Confidence Thresholds (DIDACT), is introduced and compared with an Item-Response-Theory based model on several real and synthetic datasets. Experiments suggest that assessments with closely related questions benefit significantly from modeling item interdependence.  more » « less
Award ID(s):
1836948
NSF-PAR ID:
10331353
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Learning Analytics & Knowledge Conference
Page Range / eLocation ID:
422 to 427
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Science teacher knowledge for effective teaching consists of multiple knowledge bases, one of which includes science content knowledge and pedagogical knowledge. With the inclusion of science and engineering practices into the national science education standards in the US, teachers’ content knowledge goes beyond subject matter knowledge and into the realm of how scientists use practices for scientific inquiry. This study compares two approaches to constructing and validating two different versions of a survey that aims to measure the construct of teachers’ knowledge of models and modeling in science teaching. In the first version, a 24-item Likert scale survey containing content and pedagogical knowledge items was found to lack the ability to distinguish different knowledge levels for respondents, and validation through factor analysis indicated content and pedagogical knowledge items could not be separated. Findings from the validation results of the first survey influenced revisions to the second version of the survey, a 25-item multiple-choice instrument. The second survey employed a competence model framework for models and modeling for item specifications, and results from exploratory factor analysis revealed this approach to assessing the construct to be more appropriate. Recommendations for teacher assessment of science practices using competence models and points to consider in survey design, including norm-referenced or criterion-referenced tests, are discussed. 
    more » « less
  2. The Delphi method has been adapted to inform item refinements in educational and psychological assessment development. An explanatory sequential mixed methods design using Delphi is a common approach to gain experts' insight into why items might have exhibited differential item functioning (DIF) for a sub-group, indicating potential item bias. Use of Delphi before quantitative field testing to screen for potential sources leading to item bias is lacking in the literature. An exploratory sequential design is illustrated as an additional approach using a Delphi technique in Phase I and Rasch DIF analyses in Phase II. We introduce the 2 × 2 Concordance Integration Typology as a systematic way to examine agreement and disagreement across the qualitative and quantitative findings using a concordance joint display table. A worked example from the development of the Problem-Solving Measures Grades 6–8 Computer Adaptive Tests supported using an exploratory sequential design to inform item refinement. The 2 × 2 Concordance Integration Typology (a) crystallized instances where additional refinements were potentially needed and (b) provided for evaluating the distribution of bias across the set of items as a whole. Implications are discussed for advancing data integration techniques and using mixed methods to improve instrument development. 
    more » « less
  3. Abstract

    Integrating engineering into the K‐12 science curriculum continues to be a focus in national reform efforts in science education. Although there is an increasing interest in research in and practice of integrating engineering in K‐12 science education, to date only a few studies have focused on the development of an assessment tool to measure students’ understanding of engineering design. Most of the existing measures focus only on knowledge and understanding of engineering design concepts using multiple‐choice items with the exception of the mixed‐format Engineering Concept Assessment (ECA). Also, advanced measurement models are lacking application in the testing of such mixed‐format assessments in science education. This study applied many‐faceted Rasch measurement to the modified ECA for eighth‐grade (ECA/M8) and a newly constructed rubric applied by five judges across 497 eighth‐grade students’ responses after experiencing an integrated learning unit on the engineering design process. The results supported the fit of the items and rubric rating scales to the Rasch specifications. Recommendations are made for item wording, and further reliability and validity testing of the ECA/M8, and use of the ECA/M8 in science education and research.

     
    more » « less
  4. null (Ed.)
    Research-based assessment instruments (RBAIs) are essential tools to measure aspects of student learning and improve pedagogical practice. RBAIs are designed to measure constructs related to a well-defined learning goal. However, relatively few RBAIs exist that are suitable for the specific learning goals of upper-division physics lab courses. One such learning goal is modeling, the process of constructing, testing, and refining models of physical and measurement systems. Here, we describe the creation of one component of an RBAI to measure proficiency with modeling. The RBAI is called the Modeling Assessment for Physics Laboratory Experiments (MAPLE). For use with large numbers of students, MAPLE must be scalable, which includes not requiring impractical amounts of labor to analyze its data as is often the case with large free-response assessments. We, therefore, use the coupled multiple response (CMR) format, from which data can be analyzed by a computer, to create items for measuring student reasoning in this component of MAPLE.We describe the process we used to create a set of CMR items for MAPLE, provide an example of this process for an item, and lay out an argument for construct validity of the resulting items based on our process. 
    more » « less
  5. The Standards for educational and psychological assessment were developed by the American Educational Research Association, American Psychological Association, and National Council on Measurement in Education (AERA et al., 2014). The Standards specify assessment developers establish five types of validity evidence: test content, response processes, internal structure, relationship to other variables, and consequential/bias. Relevant to this proposal is consequential validity evidence that identifies the potential negative impact of testing or bias. Standard 3.1 of The Standards (2014) on fairness in testing states that “those responsible for test development, revision, and administration should design all steps of the testing process to promote valid score interpretations for intended score uses for the widest possible range of individuals and relevant sub-groups in the intended populations” (p. 63). Three types of bias include construct, method, and item bias (Boer et al., 2018). Testing for differential item functioning (DIF) is a standard analysis adopted to detect item bias against a subgroup (Boer et al., 2018). Example subgroups include gender, race/ethnic group, socioeconomic status, native language, or disability. DIF is when “equally able test takers differ in their probabilities answering a test item correctly as a function of group membership” (AERA et al., 2005, p. 51). DIF indicates systematic error as compared to real mean group differences (Camilli & Shepard, 1994). Items exhibiting significant DIF are removed or reviewed for sources leading to bias to determine modifications to retain and further test an item. The Delphi technique is an emergent systematic research method whereby expert panel members review item content through an iterative process (Yildirim & Büyüköztürk, 2018). Experts independently evaluate each item for potential sources leading to DIF, researchers group their responses, and experts then independently complete a survey to rate their level of agreement with the anonymously grouped responses. This process continues until saturation and consensus are reached among experts as established through some criterion (e.g., median agreement rating, item quartile range, and percent agreement). The technique allows researchers to “identify, learn, and share the ideas of experts by searching for agreement among experts” (Yildirim & Büyüköztürk, 2018, p. 451). Research has illustrated this technique applied after DIF is detected, but not before administering items in the field. The current research is a methodological illustration of the Delphi technique applied in the item construction phase of assessment development as part of a five-year study to develop and test new problem-solving measures (PSM; Bostic et al., 2015, 2017) for U.S.A. grades 6-8 in a computer adaptive testing environment. As part of an iterative design-science-based methodology (Middleton et al., 2008), we illustrate the integration of the Delphi technique into the item writing process. Results from two three-person panels each reviewing a set of 45 PSM items are utilized to illustrate the technique. Advantages and limitations identified through a survey by participating experts and researchers are outlined to advance the method. 
    more » « less