Computerized assessments and interactive simulation tasks are increasingly popular and afford the collection of process data, i.e., an examinee’s sequence of actions (e.g., clickstreams, keystrokes) that arises from interactions with each task. Action sequence data contain rich information on the problem-solving process but are in a nonstandard, variable-length discrete sequence format. Two methods that directly extract features from the raw action sequences, namely multidimensional scaling and sequence-to-sequence autoencoders, produce multidimensional numerical features that summarize original sequence information. This study explores the utility of action sequence features in understanding how problem-solving behavior relates to cognitive proficiencies and demographic characteristics. This is empirically illustrated with the process data from the 2012 PIAAC PSTRE digital assessment. Regularized regression results showed that action sequence features are more predictive of examinees’ demographic and cognitive characteristics compared to final outcomes. Partial least squares analysis further aided the identification of behavioral patterns systematically associated with demographic/cognitive characteristics.
more »
« less
Combining Clickstream Analyses and Graph-Modeled Data Clustering for Identifying Common Response Processes
Abstract Complex interactive test items are becoming more widely used in assessments. Being computer-administered, assessments using interactive items allow logging time-stamped action sequences. These sequences pose a rich source of information that may facilitate investigating how examinees approach an item and arrive at their given response. There is a rich body of research leveraging action sequence data for investigating examinees’ behavior. However, the associated timing data have been considered mainly on the item-level, if at all. Considering timing data on the action-level in addition to action sequences, however, has vast potential to support a more fine-grained assessment of examinees’ behavior. We provide an approach that jointly considers action sequences and action-level times for identifying common response processes. In doing so, we integrate tools from clickstream analyses and graph-modeled data clustering with psychometrics. In our approach, we (a) provide similarity measures that are based on both actions and the associated action-level timing data and (b) subsequently employ cluster edge deletion for identifying homogeneous, interpretable, well-separated groups of action patterns, each describing a common response process. Guidelines on how to apply the approach are provided. The approach and its utility are illustrated on a complex problem-solving item from PIAAC 2012.
more »
« less
- Award ID(s):
- 1633353
- PAR ID:
- 10323453
- Date Published:
- Journal Name:
- Psychometrika
- Volume:
- 86
- Issue:
- 1
- ISSN:
- 0033-3123
- Page Range / eLocation ID:
- 190 to 214
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract In computer‐based tests allowing revision and reviews, examinees' sequence of visits and answer changes to questions can be recorded. The variable‐length revision log data introduce new complexities to the collected data but, at the same time, provide additional information on examinees' test‐taking behavior, which can inform test development and instructions. In the current study, we used recently proposed statistical learning methods for sequence data to provide an exploratory analysis of item‐level revision and review log data. Based on the revision log data collected from computer‐based classroom assessments, common prototypes of revisit and review behavior were identified. The relationship between revision behavior and various item, test, and individual covariates was further explored under a Bayesian multivariate generalized linear mixed model.more » « less
-
Abstract The response process of problem‐solving items contains rich information about respondents' behaviours and cognitive process in the digital tasks, while the information extraction is a big challenge. The aim of the study is to use a data‐driven approach to explore the latent states and state transitions underlying problem‐solving process to reflect test‐takers' behavioural patterns, and to investigate how these states and state transitions could be associated with test‐takers' performance. We employed the Hidden Markov Modelling approach to identify test takers' hidden states during the problem‐solving process and compared the frequency of states and/or state transitions between different performance groups. We conducted comparable studies in two problem‐solving items with a focus on the US sample that was collected in PIAAC 2012, and examined the correlation between those frequencies from two items. Latent states and transitions between them underlying the problem‐solving process were identified and found significantly different by performance groups. The groups with correct responses in both items were found more engaged in tasks and more often to use efficient tools to solve problems, while the group with incorrect responses was found more likely to use shorter action sequences and exhibit hesitative behaviours. Consistent behavioural patterns were identified across items. This study demonstrates the value of data‐driven based HMM approach to better understand respondents' behavioural patterns and cognitive transmissions underneath the observable action sequences in complex problem‐solving tasks.more » « less
-
Abstract Computer‐based interactive items have become prevalent in recent educational assessments. In such items, the entire human‐computer interactive process is recorded in a log file and is known as the response process. These data are noisy, diverse, and in a nonstandard format. Several feature extraction methods have been developed to overcome the difficulties in process data analysis. However, these methods often focus on the action sequence and ignore the time sequence in response processes. In this paper, we introduce a new feature extraction method that incorporates the information in both the action sequence and the response time sequence. The method is based on the concept of path signature from stochastic analysis. We apply the proposed method to both simulated data and real response process data from PIAAC. A prediction framework is used to show that taking time information into account provides a more comprehensive understanding of respondents' behaviors.more » « less
-
Most tabular data visualization techniques focus on overviews, yet many practical analysis tasks are concerned with investigating individual items of interest. At the same time, relating an item to the rest of a potentially large table is important. In this work, we present Taggle, a tabular visualization technique for exploring and presenting large and complex tables. Taggle takes an item-centric, spreadsheet-like approach, visualizing each row in the source data individually using visual encodings for the cells. At the same time, Taggle introduces data-driven aggregation of data subsets. The aggregation strategy is complemented by interaction methods tailored to answer specific analysis questions, such as sorting based on multiple columns and rich data selection and filtering capabilities. We demonstrate Taggle by a case study conducted by a domain expert on complex genomics data analysis for the purpose of drug discovery.more » « less
An official website of the United States government

