skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: An exploratory analysis of the latent structure of process data via action sequence autoencoders
Computer simulations have become a popular tool for assessing complex skills such as problem‐solving. Log files of computer‐based items record the human–computer interactive processes for each respondent in full. The response processes are very diverse, noisy, and of non‐standard formats. Few generic methods have been developed to exploit the information contained in process data. In this paper we propose a method to extract latent variables from process data. The method utilizes a sequence‐to‐sequence autoencoder to compress response processes into standard numerical vectors. It does not require prior knowledge of the specific items and human–computer interaction patterns. The proposed method is applied to both simulated and real process data to demonstrate that the resulting latent variables extract useful information from the response processes.  more » « less
Award ID(s):
1633360 1826540
PAR ID:
10453296
Author(s) / Creator(s):
 ;  ;  ;  
Publisher / Repository:
Wiley-Blackwell
Date Published:
Journal Name:
British Journal of Mathematical and Statistical Psychology
Volume:
74
Issue:
1
ISSN:
0007-1102
Format(s):
Medium: X Size: p. 1-33
Size(s):
p. 1-33
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Computer‐based interactive items have become prevalent in recent educational assessments. In such items, the entire human‐computer interactive process is recorded in a log file and is known as the response process. These data are noisy, diverse, and in a nonstandard format. Several feature extraction methods have been developed to overcome the difficulties in process data analysis. However, these methods often focus on the action sequence and ignore the time sequence in response processes. In this paper, we introduce a new feature extraction method that incorporates the information in both the action sequence and the response time sequence. The method is based on the concept of path signature from stochastic analysis. We apply the proposed method to both simulated data and real response process data from PIAAC. A prediction framework is used to show that taking time information into account provides a more comprehensive understanding of respondents' behaviors. 
    more » « less
  2. Domain adaptation is an important but challenging task. Most of the existing domain adaptation methods struggle to extract the domain-invariant representation on the feature space with entangling domain information and semantic information. Different from previous efforts on the entangled feature space, we aim to extract the domain invariant semantic information in the latent disentangled semantic representation (DSR) of the data. In DSR, we assume the data generation process is controlled by two independent sets of variables, i.e., the semantic latent variables and the domain latent variables. Under the above assumption, we employ a variational auto-encoder to reconstruct the semantic latent variables and domain latent variables behind the data. We further devise a dual adversarial network to disentangle these two sets of reconstructed latent variables. The disentangled semantic latent variables are finally adapted across the domains. Experimental studies testify that our model yields state-of-the-art performance on several domain adaptation benchmark datasets. 
    more » « less
  3. null (Ed.)
    Classic item response models assume that all items with the same difficulty have the same response probability among all respondents with the same ability. These assumptions, however, may very well be violated in practice, and it is not straightforward to assess whether these assumptions are violated, because neither the abilities of respondents nor the difficulties of items are observed. An example is an educational assessment where unobserved heterogeneity is present, arising from unobserved variables such as cultural background and upbringing of students, the quality of mentorship and other forms of emotional and professional support received by students, and other unobserved variables that may affect response probabilities. To address such violations of assumptions, we introduce a novel latent space model which assumes that both items and respondents are embedded in an unobserved metric space, with the probability of a correct response decreasing as a function of the distance between the respondent’s and the item’s position in the latent space. The resulting latent space approach provides an interaction map that represents interactions of respondents and items, and helps derive insightful diagnostic information on items as well as respondents. In practice, such interaction maps enable teachers to detect students from underrepresented groups who need more support than other students. We provide empirical evidence to demonstrate the usefulness of the proposed latent space approach, along with simulation results. 
    more » « less
  4. Abstract Complex interactive test items are becoming more widely used in assessments. Being computer-administered, assessments using interactive items allow logging time-stamped action sequences. These sequences pose a rich source of information that may facilitate investigating how examinees approach an item and arrive at their given response. There is a rich body of research leveraging action sequence data for investigating examinees’ behavior. However, the associated timing data have been considered mainly on the item-level, if at all. Considering timing data on the action-level in addition to action sequences, however, has vast potential to support a more fine-grained assessment of examinees’ behavior. We provide an approach that jointly considers action sequences and action-level times for identifying common response processes. In doing so, we integrate tools from clickstream analyses and graph-modeled data clustering with psychometrics. In our approach, we (a) provide similarity measures that are based on both actions and the associated action-level timing data and (b) subsequently employ cluster edge deletion for identifying homogeneous, interpretable, well-separated groups of action patterns, each describing a common response process. Guidelines on how to apply the approach are provided. The approach and its utility are illustrated on a complex problem-solving item from PIAAC 2012. 
    more » « less
  5. Abstract The response process of problem‐solving items contains rich information about respondents' behaviours and cognitive process in the digital tasks, while the information extraction is a big challenge. The aim of the study is to use a data‐driven approach to explore the latent states and state transitions underlying problem‐solving process to reflect test‐takers' behavioural patterns, and to investigate how these states and state transitions could be associated with test‐takers' performance. We employed the Hidden Markov Modelling approach to identify test takers' hidden states during the problem‐solving process and compared the frequency of states and/or state transitions between different performance groups. We conducted comparable studies in two problem‐solving items with a focus on the US sample that was collected in PIAAC 2012, and examined the correlation between those frequencies from two items. Latent states and transitions between them underlying the problem‐solving process were identified and found significantly different by performance groups. The groups with correct responses in both items were found more engaged in tasks and more often to use efficient tools to solve problems, while the group with incorrect responses was found more likely to use shorter action sequences and exhibit hesitative behaviours. Consistent behavioural patterns were identified across items. This study demonstrates the value of data‐driven based HMM approach to better understand respondents' behavioural patterns and cognitive transmissions underneath the observable action sequences in complex problem‐solving tasks. 
    more » « less