Large-scale standardized tests are regularly used to measure student achievement overall and for student subgroups. These uses assume tests provide comparable measures of outcomes across student subgroups, but prior research suggests score comparisons across gender groups may be complicated by the type of test items used. This paper presents evidence that among nationally representative samples of 15-year-olds in the United States participating in the 2009, 2012, and 2015 PISA math and reading tests, there are consistent item format by gender differences. On average, male students answer multiple-choice items correctly relatively more often and female students answer constructed-response items correctly relatively more often. These patterns were consistent across 34 additional participating PISA jurisdictions, although the size of the format differences varied and were larger on average in reading than math. The average magnitude of the format differences is not large enough to be flagged in routine differential item functioning analyses intended to detect test bias but is large enough to raise questions about the validity of inferences based on comparisons of scores across gender groups. Researchers and other test users should account for test item format, particularly when comparing scores across gender groups. 
                        more » 
                        « less   
                    
                            
                            Integrating a Statistical Topic Model and a Diagnostic Classification Model for Analyzing Items in a Mixed Format Assessment
                        
                    
    
            Selected response items and constructed response (CR) items are often found in the same test. Conventional psychometric models for these two types of items typically focus on using the scores for correctness of the responses. Recent research suggests, however, that more information may be available from the CR items than just scores for correctness. In this study, we describe an approach in which a statistical topic model along with a diagnostic classification model (DCM) was applied to a mixed item format formative test of English and Language Arts. The DCM was used to estimate students’ mastery status of reading skills. These mastery statuses were then included in a topic model as covariates to predict students’ use of each of the latent topics in their written answers to a CR item. This approach enabled investigation of the effects of mastery status of reading skills on writing patterns. Results indicated that one of the skills, Integration of Knowledge and Ideas, helped detect and explain students’ writing patterns with respect to students’ use of individual topics. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 1813760
- PAR ID:
- 10288220
- Date Published:
- Journal Name:
- Frontiers in Psychology
- Volume:
- 11
- ISSN:
- 1664-1078
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            Abstract Many large‐scale assessments are designed to yield two or more scores for an individual by administering multiple sections measuring different but related skills. Multidimensional tests, or more specifically, simple structured tests, such as these rely on multiple multiple‐choice and/or constructed responses sections of items to generate multiple scores. In the current article, we propose an extension of the hierarchical rater model (HRM) to be applied with simple structured tests with constructed response items. In addition to modeling the appropriate trait structure, the multidimensional HRM (M‐HRM) presented here also accounts for rater severity bias and rater variability or inconsistency. We introduce the model formulation, test parameter recovery with a focus on latent traits, and compare the M‐HRM to other scoring approaches (unidimensional HRMs and a traditional multidimensional item response theory model) using simulated and empirical data. Results show more precise scores under the M‐HRM, with a major improvement in scores when incorporating rater effects versus ignoring them in the traditional multidimensional item response theory model.more » « less
- 
            NA (Ed.)We conducted two studies to investigate the extent to which brief, spaced, mastery practice on skills relevant to introductory physics affects student performance. The first study investigated the effect of practice of “specific” physics skills, each one relevant to only one or a few items on the course exam. This study employed a quasiexperimental design with 766 students assigned to “intervention” or “control” conditions by lecture section sharing common exams. Results of the first study indicate significant improvement in the performance for only some of the exam items relevant to the specific skills practiced. We also observed between-section performance differences on other exam items not relevant to training, which may be due to specific prior quiz items from individual instructors. The second study investigated the effect of practice on the “general” skill of algebra relevant to introductory physics, a skill which was relevant to most of the exam items. This study employed a similar quasiexperimental design with 363 students assigned to treatment or control conditions, and we also administered a reliable pre- and post-test assessment of the algebra skills that was iteratively developed for this project. Results from the second study indicate that 75% of students had high accuracy on the algebra pretest. Students in the control condition who scored low on the pretest gained about 0.7 standard deviations on the post-test, presumably from engagement with the course alone, and students in the algebra practice condition had statistically similar gains, indicating no observed effect of algebra practice on algebra pre- to post-test gains. In contrast, we find some potential evidence that the algebra practice improved final exam performance for students with high pretest scores and did not benefit students with low pretest scores, although this result is inconclusive: the point estimate of the effect size was 0.24 for high pretest scoring students, but the 95% confidence interval [ , 0.48] slightly overlapped with zero. Further, we find a statistically significant positive effect of algebra practice on exam items that have higher algebraic complexity and no effect for items with low complexity. One possible explanation for the added benefit of algebra practice for high-scoring students is fluency in algebra skills may have improved. Overall, our observations provide some evidence that spaced, mastery practice is beneficial for exam performance for specific and general skills, and that students who are better prepared in algebra may be especially benefitting from mastery practice in relevant algebra skills in terms of improved final exam performance.more » « less
- 
            Societal Impact StatementThe practice of writing science blogs benefits both the scientist and society alike by providing professional development opportunities and delivering information in a format that is accessible to large and diverse audiences. By designing a project that introduced upper‐level undergraduate students to science blog writing with a focus on plant biology, we piqued students' interest in science writing and the content of a popular plant science blog website. If adopted more widely, this work could broaden the scope of science education and promote the development of effective science communication skills for the next generation of scientists. SummarySuccessful scientists must communicate their research to broad audiences, including distilling key scientific concepts for the general public. Students pursuing careers in Science, Technology, Engineering, and Mathematics (STEM) fields benefit from developing public communication skills early in their careers, but opportunities are limited in traditional biology curricula.We created the “Plant Science Blogging Project” for a Plant Biology undergraduate course at the University of Pittsburgh in Fall 2018 and 2019. Students wrote blog posts merging personal connections with plants with plant biology concepts for the popular science blogsPlant Love StoriesandEvoBites. By weaving biology into their narratives, students learned how to share botanical knowledge with the general public.The project had positive impacts on student learning and public engagement. In post‐assignment surveys, the majority of students reported that they enjoyed the assignment, felt it improved their understanding of plant biology, and piqued their interest in reading and writing science blogs in the future. Approximately one‐third of the student‐authored blogs were published, including two that rose to the top 10 most‐read posts on Plant Love Stories. Some dominant themes in student blogs, including medicine and culture, differed from common story themes published on the web, indicating the potential for students to diversify science blog content.Overall, the Plant Science Blogging Project allows undergraduate students to engage with plant biology topics in a new way, sharpen their scientific communication skills in accordance with today's world of mass information sharing, and contribute to the spread of scientific knowledge for public benefit.more » « less
- 
            Lischka, A. E. (Ed.)Response Process Validity (RPV) reflects the degree to which items are interpreted as intended by item developers. In this study, teacher responses to constructed response (CR) items to assess pedagogical content knowledge (PCK) of middle school mathematics teachers were evaluated to determine what types of teacher responses signaled weak RPV. We analyzed 38 CR pilot items on proportional reasoning across up to 13 middle school mathematics teachers per item. By coding teacher responses and using think-alouds, we found teachers' responses deemed indicative of low item RPV often had one of the following characteristics: vague answers, unanticipated assumptions, a focus on unintended topics, and paraphrasing. To develop a diverse pool of items with strong RPV, we suggest it is helpful to be aware of these symptoms, use them to consider how to improve items, and then revise and retest items accordingly.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                    