skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Applying prerequisite structure inference to adaptive testing
Modeling student knowledge is important for assessment design, adaptive testing, curriculum design, and pedagogical intervention. The assessment design community has primarily focused on continuous latent-skill models with strong conditional independence assumptions among knowledge items, while the prerequisite discovery community has developed many models that aim to exploit the interdependence of discrete knowledge items. This paper attempts to bridge the gap by asking, "When does modeling assessment item interdependence improve predictive accuracy?" A novel adaptive testing evaluation framework is introduced that is amenable to techniques from both communities, and an efficient algorithm, Directed Item-Dependence And Confidence Thresholds (DIDACT), is introduced and compared with an Item-Response-Theory based model on several real and synthetic datasets. Experiments suggest that assessments with closely related questions benefit significantly from modeling item interdependence.  more » « less
Award ID(s):
1836948
PAR ID:
10331353
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Learning Analytics & Knowledge Conference
Page Range / eLocation ID:
422 to 427
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Physics instructors and education researchers use research-based assessments (RBAs) to evaluate students' preparation for physics courses. This preparation can cover a wide range of constructs including mathematics and physics content. Using separate mathematics and physics RBAs consumes course time. We are developing a new RBA for introductory mechanics as an online test using both computerized adaptive testing and cognitive diagnostic models. This design allows the adaptive RBA to assess mathematics and physics content knowledge within a single assessment. In this article, we used an evidence-centered design framework to inform the extent to which our models of skills students develop in physics courses fit the data from three mathematics RBAs. Our dataset came from the LASSO platform and includes 3,491 responses from the Calculus Concept Assessment, Calculus Concept Inventory, and Pre-calculus Concept Assessment. Our model included five skills: apply vectors, conceptual relationships, algebra, visualizations, and calculus. The "deterministic inputs, noisy 'and' gate'' (DINA) analyses demonstrated a good fit for the five skills. The classification accuracies for the skills were satisfactory. Including items from the three mathematics RBAs in the item bank for the adaptive RBA will provide a flexible assessment of these skills across mathematics and physics content areas that can adapt to instructors' needs. 
    more » « less
  2. Science teacher knowledge for effective teaching consists of multiple knowledge bases, one of which includes science content knowledge and pedagogical knowledge. With the inclusion of science and engineering practices into the national science education standards in the US, teachers’ content knowledge goes beyond subject matter knowledge and into the realm of how scientists use practices for scientific inquiry. This study compares two approaches to constructing and validating two different versions of a survey that aims to measure the construct of teachers’ knowledge of models and modeling in science teaching. In the first version, a 24-item Likert scale survey containing content and pedagogical knowledge items was found to lack the ability to distinguish different knowledge levels for respondents, and validation through factor analysis indicated content and pedagogical knowledge items could not be separated. Findings from the validation results of the first survey influenced revisions to the second version of the survey, a 25-item multiple-choice instrument. The second survey employed a competence model framework for models and modeling for item specifications, and results from exploratory factor analysis revealed this approach to assessing the construct to be more appropriate. Recommendations for teacher assessment of science practices using competence models and points to consider in survey design, including norm-referenced or criterion-referenced tests, are discussed. 
    more » « less
  3. The Standards for educational and psychological assessment were developed by the American Educational Research Association, American Psychological Association, and National Council on Measurement in Education (AERA et al., 2014). The Standards specify assessment developers establish five types of validity evidence: test content, response processes, internal structure, relationship to other variables, and consequential/bias. Relevant to this proposal is consequential validity evidence that identifies the potential negative impact of testing or bias. Standard 3.1 of The Standards (2014) on fairness in testing states that “those responsible for test development, revision, and administration should design all steps of the testing process to promote valid score interpretations for intended score uses for the widest possible range of individuals and relevant sub-groups in the intended populations” (p. 63). Three types of bias include construct, method, and item bias (Boer et al., 2018). Testing for differential item functioning (DIF) is a standard analysis adopted to detect item bias against a subgroup (Boer et al., 2018). Example subgroups include gender, race/ethnic group, socioeconomic status, native language, or disability. DIF is when “equally able test takers differ in their probabilities answering a test item correctly as a function of group membership” (AERA et al., 2005, p. 51). DIF indicates systematic error as compared to real mean group differences (Camilli & Shepard, 1994). Items exhibiting significant DIF are removed or reviewed for sources leading to bias to determine modifications to retain and further test an item. The Delphi technique is an emergent systematic research method whereby expert panel members review item content through an iterative process (Yildirim & Büyüköztürk, 2018). Experts independently evaluate each item for potential sources leading to DIF, researchers group their responses, and experts then independently complete a survey to rate their level of agreement with the anonymously grouped responses. This process continues until saturation and consensus are reached among experts as established through some criterion (e.g., median agreement rating, item quartile range, and percent agreement). The technique allows researchers to “identify, learn, and share the ideas of experts by searching for agreement among experts” (Yildirim & Büyüköztürk, 2018, p. 451). Research has illustrated this technique applied after DIF is detected, but not before administering items in the field. The current research is a methodological illustration of the Delphi technique applied in the item construction phase of assessment development as part of a five-year study to develop and test new problem-solving measures (PSM; Bostic et al., 2015, 2017) for U.S.A. grades 6-8 in a computer adaptive testing environment. As part of an iterative design-science-based methodology (Middleton et al., 2008), we illustrate the integration of the Delphi technique into the item writing process. Results from two three-person panels each reviewing a set of 45 PSM items are utilized to illustrate the technique. Advantages and limitations identified through a survey by participating experts and researchers are outlined to advance the method. 
    more » « less
  4. null (Ed.)
    Research-based assessment instruments (RBAIs) are essential tools to measure aspects of student learning and improve pedagogical practice. RBAIs are designed to measure constructs related to a well-defined learning goal. However, relatively few RBAIs exist that are suitable for the specific learning goals of upper-division physics lab courses. One such learning goal is modeling, the process of constructing, testing, and refining models of physical and measurement systems. Here, we describe the creation of one component of an RBAI to measure proficiency with modeling. The RBAI is called the Modeling Assessment for Physics Laboratory Experiments (MAPLE). For use with large numbers of students, MAPLE must be scalable, which includes not requiring impractical amounts of labor to analyze its data as is often the case with large free-response assessments. We, therefore, use the coupled multiple response (CMR) format, from which data can be analyzed by a computer, to create items for measuring student reasoning in this component of MAPLE.We describe the process we used to create a set of CMR items for MAPLE, provide an example of this process for an item, and lay out an argument for construct validity of the resulting items based on our process. 
    more » « less
  5. Recent years have seen a movement within the research-based assessment development community towards item formats that go beyond simple multiple-choice formats. Some have moved towards free-response questions, particularly at the upper-division level; however, free-response items have the constraint that they must be scored by hand. To avoid this limitation, some assessment developers have moved toward formats that maintain the closed-response format, while still providing more nuanced insight into student reasoning. One such format is known as coupled, multiple response (CMR). This format pairs multiple-choice and multiple-response formats to allow students to both commit to an answer in addition to selecting options that correspond with their reasoning. In addition to being machine-scorable, this format allows for more nuanced scoring than simple right or wrong. However, such nuanced scoring presents a potential challenge with respect to utilizing certain testing theories to construct validity arguments for the assessment. In particular, Item Response Theory (IRT) models often assume dichotomously scored items. While polytomous IRT models do exist, each brings with it certain constraints and limitations. Here, we will explore multiple IRT models and scoring schema using data from an existing CMR test, with the goal of providing guidance and insight for possible methods for simultaneously leveraging the affordances of both the CMR format and IRT models in the context of constructing validity arguments for research-based assessments. 
    more » « less