skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on May 26, 2026

Title: A path signature perspective of process data feature extraction
Abstract Computer‐based interactive items have become prevalent in recent educational assessments. In such items, the entire human‐computer interactive process is recorded in a log file and is known as the response process. These data are noisy, diverse, and in a nonstandard format. Several feature extraction methods have been developed to overcome the difficulties in process data analysis. However, these methods often focus on the action sequence and ignore the time sequence in response processes. In this paper, we introduce a new feature extraction method that incorporates the information in both the action sequence and the response time sequence. The method is based on the concept of path signature from stochastic analysis. We apply the proposed method to both simulated data and real response process data from PIAAC. A prediction framework is used to show that taking time information into account provides a more comprehensive understanding of respondents' behaviors.  more » « less
Award ID(s):
2310664
PAR ID:
10598959
Author(s) / Creator(s):
; ;
Publisher / Repository:
Wiley
Date Published:
Journal Name:
British Journal of Mathematical and Statistical Psychology
ISSN:
0007-1102
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Computer simulations have become a popular tool for assessing complex skills such as problem‐solving. Log files of computer‐based items record the human–computer interactive processes for each respondent in full. The response processes are very diverse, noisy, and of non‐standard formats. Few generic methods have been developed to exploit the information contained in process data. In this paper we propose a method to extract latent variables from process data. The method utilizes a sequence‐to‐sequence autoencoder to compress response processes into standard numerical vectors. It does not require prior knowledge of the specific items and human–computer interaction patterns. The proposed method is applied to both simulated and real process data to demonstrate that the resulting latent variables extract useful information from the response processes. 
    more » « less
  2. Abstract Complex interactive test items are becoming more widely used in assessments. Being computer-administered, assessments using interactive items allow logging time-stamped action sequences. These sequences pose a rich source of information that may facilitate investigating how examinees approach an item and arrive at their given response. There is a rich body of research leveraging action sequence data for investigating examinees’ behavior. However, the associated timing data have been considered mainly on the item-level, if at all. Considering timing data on the action-level in addition to action sequences, however, has vast potential to support a more fine-grained assessment of examinees’ behavior. We provide an approach that jointly considers action sequences and action-level times for identifying common response processes. In doing so, we integrate tools from clickstream analyses and graph-modeled data clustering with psychometrics. In our approach, we (a) provide similarity measures that are based on both actions and the associated action-level timing data and (b) subsequently employ cluster edge deletion for identifying homogeneous, interpretable, well-separated groups of action patterns, each describing a common response process. Guidelines on how to apply the approach are provided. The approach and its utility are illustrated on a complex problem-solving item from PIAAC 2012. 
    more » « less
  3. Abstract The response process of problem‐solving items contains rich information about respondents' behaviours and cognitive process in the digital tasks, while the information extraction is a big challenge. The aim of the study is to use a data‐driven approach to explore the latent states and state transitions underlying problem‐solving process to reflect test‐takers' behavioural patterns, and to investigate how these states and state transitions could be associated with test‐takers' performance. We employed the Hidden Markov Modelling approach to identify test takers' hidden states during the problem‐solving process and compared the frequency of states and/or state transitions between different performance groups. We conducted comparable studies in two problem‐solving items with a focus on the US sample that was collected in PIAAC 2012, and examined the correlation between those frequencies from two items. Latent states and transitions between them underlying the problem‐solving process were identified and found significantly different by performance groups. The groups with correct responses in both items were found more engaged in tasks and more often to use efficient tools to solve problems, while the group with incorrect responses was found more likely to use shorter action sequences and exhibit hesitative behaviours. Consistent behavioural patterns were identified across items. This study demonstrates the value of data‐driven based HMM approach to better understand respondents' behavioural patterns and cognitive transmissions underneath the observable action sequences in complex problem‐solving tasks. 
    more » « less
  4. Computational modeling of the human sequential design process and successful prediction of future design decisions are fundamental to design knowledge extraction, transfer, and the development of artificial design agents. However, it is often difficult to obtain designer-related attributes (static data) in design practices, and the research based on combining static and dynamic data (design action sequences) in engineering design is still underexplored. This paper presents an approach that combines both static and dynamic data for human design decision prediction using two different methods. The first method directly combines the sequential design actions with static data in a recurrent neural network (RNN) model, while the second method integrates a feed-forward neural network that handles static data separately, yet in parallel with RNN. This study contributes to the field from three aspects: (a) we developed a method of utilizing designers’ cluster information as a surrogate static feature to combine with a design action sequence in order to tackle the challenge of obtaining designer-related attributes; (b) we devised a method that integrates the function–behavior–structure design process model with the one-hot vectorization in RNN to transform design action data to design process stages where the insights into design thinking can be drawn; (c) to the best of our knowledge, it is the first time that two methods of combining static and dynamic data in RNN are compared, which provides new knowledge about the utility of different combination methods in studying sequential design decisions. The approach is demonstrated in two case studies on solar energy system design. The results indicate that with appropriate kernel models, the RNN with both static and dynamic data outperforms traditional models that only rely on design action sequences, thereby better supporting design research where static features, such as human characteristics, often play an important role. 
    more » « less
  5. Computerized assessments and interactive simulation tasks are increasingly popular and afford the collection of process data, i.e., an examinee’s sequence of actions (e.g., clickstreams, keystrokes) that arises from interactions with each task. Action sequence data contain rich information on the problem-solving process but are in a nonstandard, variable-length discrete sequence format. Two methods that directly extract features from the raw action sequences, namely multidimensional scaling and sequence-to-sequence autoencoders, produce multidimensional numerical features that summarize original sequence information. This study explores the utility of action sequence features in understanding how problem-solving behavior relates to cognitive proficiencies and demographic characteristics. This is empirically illustrated with the process data from the 2012 PIAAC PSTRE digital assessment. Regularized regression results showed that action sequence features are more predictive of examinees’ demographic and cognitive characteristics compared to final outcomes. Partial least squares analysis further aided the identification of behavioral patterns systematically associated with demographic/cognitive characteristics. 
    more » « less