Visualization literacy is an essential skill for accurately interpreting data to inform critical decisions. Consequently, it is vital to understand the evolution of this ability and devise targeted interventions to enhance it, requiring concise and repeatable assessments of visualization literacy for individuals. However, current assessments, such as the Visualization Literacy Assessment Test ( vlat ), are time-consuming due to their fixed, lengthy format. To address this limitation, we develop two streamlined computerized adaptive tests ( cats ) for visualization literacy, a-vlat and a-calvi , which measure the same set of skills as their original versions in half the number of questions. Specifically, we (1) employ item response theory (IRT) and non-psychometric constraints to construct adaptive versions of the assessments, (2) finalize the configurations of adaptation through simulation, (3) refine the composition of test items of a-calvi via a qualitative study, and (4) demonstrate the test-retest reliability (ICC: 0.98 and 0.98) and convergent validity (correlation: 0.81 and 0.66) of both CATS via four online studies. We discuss practical recommendations for using our CATS and opportunities for further customization to leverage the full potential of adaptive assessments. All supplemental materials are available at https://osf.io/a6258/ .
more »
« less
CALVI: Critical Thinking Assessment for Literacy in Visualizations
Visualization misinformation is a prevalent problem, and combating it requires understanding people’s ability to read, interpret, and reason about erroneous or potentially misleading visualizations, which lacks a reliable measurement: existing visualization literacy tests focus on well-formed visualizations. We systematically develop an assessment for this ability by: (1) developing a precise definition of misleaders (decisions made in the construction of visualizations that can lead to conclusions not supported by the data), (2) constructing initial test items using a design space of misleaders and chart types, (3) trying out the provisional test on 497 participants, and (4) analyzing the test tryout results and refining the items using Item Response Theory, qualitative analysis, a wrong-due-to-misleader score, and the content validity index. Our final bank of 45 items shows high reliability, and we provide item bank usage recommendations for future tests and different use cases. Related materials are available at: https://osf.io/pv67z/.
more »
« less
- PAR ID:
- 10504921
- Publisher / Repository:
- ACM
- Date Published:
- Journal Name:
- CHI '23: Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems
- ISBN:
- 9781450394215
- Page Range / eLocation ID:
- 1 to 18
- Format(s):
- Medium: X
- Location:
- Hamburg Germany
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
This study explored an alternative approach to assessing individuals’ word knowledge by gauging the ability to recognize subtle similarities and differences among associated terms. Informed by the theoretical and empirical work on relational reasoning, the Measure of Vocabulary Knowledge through Relational Reasoning (MVKR2) was developed and validated. Participants were 338 college students who completed the MVKR2, the Test of Relational Reasoning (TORR), and released items from the SAT Verbal and Math tests. The TORR and SAT tests were administered to examine the convergent and concurrent validities of the MVKR2. Findings from item confirmatory analyses and correlations demonstrated that the MVKR2 is a reliable and valid measure of vocabulary knowledge for college-age students. In addition, fluid relational reasoning ability was associated with the performance on this novel measure, but the association with vocabulary knowledge was stronger. When examined on the scale and item levels, the contribution of fluid relational reasoning varied across scales and items within each scale. This study offered an alternative way to examine vocabulary knowledge that has implications for future empirical research and instructional practice.more » « less
-
We contribute an autoethnographic reflection on the complexity of defining and measuring visualization literacy (i.e., the ability to interpret and construct visualizations) to expose our tacit thoughts that often exist in-between polished works and remain unreported in individual research papers. Our work is inspired by the growing number of empirical studies in visualization research that rely on visualization literacy as a basis for developing effective data representations or educational interventions. Researchers have already made various efforts to assess this construct, yet it is often hard to pinpoint either what we want to measure or what we are effectively measuring. In this autoethnography, we gather insights from 14 internal interviews with researchers who are users or designers of visualization literacy tests. We aim to identify what makes visualization literacy assessment a ``wicked'' problem. We further reflect on the fluidity of visualization literacy and discuss how this property may lead to misalignment between what the construct is and how measurements of it are used or designed. We also examine potential threats to measurement validity from conceptual, operational, and methodological perspectives. Based on our experiences and reflections, we propose several calls to action aimed at tackling the wicked problem of visualization literacy measurement, such as by broadening test scopes and modalities, improving test ecological validity, making it easier to use tests, seeking interdisciplinary collaboration, and drawing from continued dialogue on visualization literacy to expect and be more comfortable with its fluidity.more » « less
-
Large-scale standardized tests are regularly used to measure student achievement overall and for student subgroups. These uses assume tests provide comparable measures of outcomes across student subgroups, but prior research suggests score comparisons across gender groups may be complicated by the type of test items used. This paper presents evidence that among nationally representative samples of 15-year-olds in the United States participating in the 2009, 2012, and 2015 PISA math and reading tests, there are consistent item format by gender differences. On average, male students answer multiple-choice items correctly relatively more often and female students answer constructed-response items correctly relatively more often. These patterns were consistent across 34 additional participating PISA jurisdictions, although the size of the format differences varied and were larger on average in reading than math. The average magnitude of the format differences is not large enough to be flagged in routine differential item functioning analyses intended to detect test bias but is large enough to raise questions about the validity of inferences based on comparisons of scores across gender groups. Researchers and other test users should account for test item format, particularly when comparing scores across gender groups.more » « less
-
In physics education research, instructors and researchers often use research-based assessments (RBAs) to assess students’ skills and knowledge. In this paper, we support the development of a mechanics cognitive diagnostic to test and implement effective and equitable pedagogies for physics instruction. Adaptive assessments using cognitive diagnostic models provide significant advantages over fixed-length RBAs commonly used in physics education research. As part of a broader project to develop a cognitive diagnostic assessment for introductory mechanics within an evidence-centered design framework, we identified and tested the student models of four skills that cross content areas in introductory physics: apply vectors, conceptual relationships, algebra, and visualizations. We developed the student models in three steps. First, we based the model on learning objectives from instructors. Second, we coded the items on RBAs using the student models. Finally, we then tested and refined this coding using a common cognitive diagnostic model, the deterministic inputs, noisy “and” gate model. The data included 19 889 students who completed either the Force Concept Inventory, Force and Motion Conceptual Evaluation, or Energy and Momentum Conceptual Survey on the LASSO platform. The results indicated a good to adequate fit for the student models with high accuracies for classifying students with many of the skills. The items from these three RBAs do not cover all of the skills in enough detail, however, they will form a useful initial item bank for the development of the mechanics cognitive diagnostic.more » « less
An official website of the United States government

