skip to main content


Title: Eye gaze patterns reveal how reasoning skills improve with experience
Abstract

Reasoning, our ability to solve novel problems, has been shown to improve as a result of learning experiences. However, the underlying mechanisms of change in this high-level cognitive ability are unclear. We hypothesized that possible mechanisms include improvements in the encoding, maintenance, and/or integration of relations among mental representations – i.e., relational thinking. Here, we developed several eye gaze metrics to pinpoint learning mechanisms that underpin improved reasoning performance. We collected behavioral and eyetracking data from young adults who participated in a Law School Admission Test preparation course involving word-based reasoning problems or reading comprehension. The Reasoning group improved more than the Comprehension group on a composite measure of four visuospatial reasoning assessments. Both groups improved similarly on an eyetracking paradigm involving transitive inference problems, exhibiting faster response times while maintaining high accuracy levels; nevertheless, the Reasoning group exhibited a larger change than the Comprehension group on an ocular metric of relational thinking. Across the full sample, individual differences in response time reductions were associated with increased efficiency of relational thinking. Accounting for changes in visual search and a more specific measure of relational integration improved the prediction accuracy of the model, but changes in these two processes alone did not adequately explain behavioral improvements. These findings provide evidence of transfer of learning across different kinds of reasoning problems after completing a brief but intensive course. More broadly, the high temporal precision and rich derivable parameters of eyetracking make it a powerful approach for probing learning mechanisms.

 
more » « less
NSF-PAR ID:
10154117
Author(s) / Creator(s):
;
Publisher / Repository:
Nature Publishing Group
Date Published:
Journal Name:
npj Science of Learning
Volume:
3
Issue:
1
ISSN:
2056-7936
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Objectively differentiating patient mental states based on electrical activity, as opposed to overt behavior, is a fundamental neuroscience problem with medical applications, such as identifying patients in locked-in state vs. coma. Electroencephalography (EEG), which detects millisecond-level changes in brain activity across a range of frequencies, allows for assessment of external stimulus processing by the brain in a non-invasive manner. We applied machine learning methods to 26-channel EEG data of 24 fluent Deaf signers watching videos of sign language sentences (comprehension condition), and the same videos reversed in time (non-comprehension condition), to objectively separate vision-based high-level cognition states. While spectrotemporal parameters of the stimuli were identical in comprehension vs. non-comprehension conditions, the neural responses of participants varied based on their ability to linguistically decode visual data. We aimed to determine which subset of parameters (specific scalp regions or frequency ranges) would be necessary and sufficient for high classification accuracy of comprehension state. Optical flow, characterizing distribution of velocities of objects in an image, was calculated for each pixel of stimulus videos using MATLAB Vision toolbox. Coherence between optical flow in the stimulus and EEG neural response (per video, per participant) was then computed using canonical component analysis with NoiseTools toolbox. Peak correlations were extracted for each frequency for each electrode, participant, and video. A set of standard ML algorithms were applied to the entire dataset (26 channels, frequencies from .2 Hz to 12.4 Hz, binned in 1 Hz increments), with consistent out-of-sample 100% accuracy for frequencies in .2-1 Hz range for all regions, and above 80% accuracy for frequencies < 4 Hz. Sparse Optimal Scoring (SOS) was then applied to the EEG data to reduce the dimensionality of the features and improve model interpretability. SOS with elastic-net penalty resulted in out-of-sample classification accuracy of 98.89%. The sparsity pattern in the model indicated that frequencies between 0.2–4 Hz were primarily used in the classification, suggesting that underlying data may be group sparse. Further, SOS with group lasso penalty was applied to regional subsets of electrodes (anterior, posterior, left, right). All trials achieved greater than 97% out-of-sample classification accuracy. The sparsity patterns from the trials using 1 Hz bins over individual regions consistently indicated frequencies between 0.2–1 Hz were primarily used in the classification, with anterior and left regions performing the best with 98.89% and 99.17% classification accuracy, respectively. While the sparsity pattern may not be the unique optimal model for a given trial, the high classification accuracy indicates that these models have accurately identified common neural responses to visual linguistic stimuli. Cortical tracking of spectro-temporal change in the visual signal of sign language appears to rely on lower frequencies proportional to the N400/P600 time-domain evoked response potentials, indicating that visual language comprehension is grounded in predictive processing mechanisms. 
    more » « less
  2. Analogy problems involving multiple ordered relations of the same type create mapping ambiguity, requiring some mechanism for relational integration to achieve mapping accuracy. We address the question of whether the integration of ordered relations depends on their logical form alone, or on semantic representations that differ across relation types. We developed a triplet mapping task that provides a basic paradigm to investigate analogical reasoning with simple relational structures. Experimental results showed that mapping performance differed across orderings based on category, linear order, and causal relations, providing evidence that each transitive relation has its own semantic representation. Hence, human analogical mapping of ordered relations does not depend solely on their formal property of transitivity. Instead, human ability to solve mapping problems by integrating relations relies on the semantics of relation representations. We also compared human performance to the performance of several vector-based computational models of analogy. These models performed above chance but fell short of human performance for some relations, highlighting the need for further model development. 
    more » « less
  3. The landscapes of many elementary, middle, and high school math classrooms have undergone major transformations over the last half-century, moving from drill-and-skill work to more conceptual reasoning and hands-on manipulative work. However, if you look at a college level calculus class you are likely to find the main difference is the professor now has a whiteboard marker in hand rather than a piece of chalk. It is possible that some student work may be done on the computer, but much of it contains the same type of repetitive skill building problems. This should seem strange given the advancements in technology that allow more freedom than ever to build connections between different representations of a concept. Several class activities have been developed using a combination of approaches, depending on the topic. Topics covered in the activities include Riemann Sums, Accumulation, Center of Mass, Volumes of Revolution (Discs, Washers, and Shells), and Volumes of Similar Cross-section. All activities use student note outlines that are either done in a whole group interactive-lecture approach, or in a group work inquiry-based approach. Some of the activities use interactive graphs designed on desmos.com and others use physical models that have been designed in OpenSCAD and 3D-printed for students to use in class. Tactile objects were developed because they should provide an advantage to students by enabling them to physically interact with the concepts being taught, deepening their involvement with the material, and providing more stimuli for the brain to encode the learning experience. Web-based activities were developed because the topics involved needed substantial changes in graphical representations (i.e. limits with Riemann Sums). Assessment techniques for each topic include online homework, exams, and online concept questions with an explanation response area. These concept questions are intended to measure students’ ability to use multiple representations in order to answer the question, and are not generally computational in nature. Students are also given surveys to rate the overall activities as well as finer grained survey questions to try and elicit student thoughts on certain aspects of the models, websites, and activity sheets. We will report on student responses to the activity surveys, looking for common themes in students’ thoughts toward specific attributes of the activities. We will also compare relevant exam question responses and online concept question results, including common themes present or absent in student reasoning. 
    more » « less
  4. Abstract

    We present a multiple‐choice test, the Montana State University Formal Reasoning Test (FORT), to assess college students' scientific reasoning ability. The test defines scientific reasoning to be equivalent to formal operational reasoning. It contains 20 questions divided evenly among five types of problems: control of variables, hypothesis testing, correlational reasoning, proportional reasoning, and probability. The test development process included the drafting and psychometric analysis of 23 instruments related to formal operational reasoning. These instruments were administered to almost 10,000 students enrolled in introductory science courses at American universities. Questions with high discrimination were identified and assembled into an instrument that was intended to measure the reasoning ability of students across the entire spectrum of abilities in college science courses. We present four types of validity evidence for the FORT. (a) The test has a one‐dimensional psychometric structure consistent with its design. (b) Test scores in an introductory biology course had an empirical reliability of 0.82. (c) Student interviews confirmed responses to the FORT were accurate indications of student thinking. (d) A regression analysis of student learning in an introductory biology course showed that scores on the FORT predicted how well students learned one of the most challenging concepts in biology, natural selection.

     
    more » « less
  5. Abstract Expert testimony varies in scientific quality and jurors have a difficult time evaluating evidence quality (McAuliff et al., 2009). In the current study, we apply Fuzzy Trace Theory principles, examining whether visual and gist aids help jurors calibrate to the strength of scientific evidence. Additionally we were interested in the role of jurors’ individual differences in scientific reasoning skills in their understanding of case evidence. Contrary to our preregistered hypotheses, there was no effect of evidence condition or gist aid on evidence understanding. However, individual differences between jurors’ numeracy skills predicted evidence understanding. Summary Poor-quality expert evidence is sometimes admitted into court (Smithburn, 2004). Jurors’ calibration to evidence strength varies widely and is not robustly understood. For instance, previous research has established jurors lack understanding of the role of control groups, confounds, and sample sizes in scientific research (McAuliff, Kovera, & Nunez, 2009; Mill, Gray, & Mandel, 1994). Still others have found that jurors can distinguish weak from strong evidence when the evidence is presented alone, yet not when simultaneously presented with case details (Smith, Bull, & Holliday, 2011). This research highlights the need to present evidence to jurors in a way they can understand. Fuzzy Trace Theory purports that people encode information in exact, verbatim representations and through “gist” representations, which represent summary of meaning (Reyna & Brainerd, 1995). It is possible that the presenting complex scientific evidence to people with verbatim content or appealing to the gist, or bottom-line meaning of the information may influence juror understanding of that evidence. Application of Fuzzy Trace Theory in the medical field has shown that gist representations are beneficial for helping laypeople better understand risk and benefits of medical treatment (Brust-Renck, Reyna, Wilhelms, & Lazar, 2016). Yet, little research has applied Fuzzy Trace Theory to information comprehension and application within the context of a jury (c.f. Reyna et. al., 2015). Additionally, it is likely that jurors’ individual characteristics, such as scientific reasoning abilities and cognitive tendencies, influence their ability to understand and apply complex scientific information (Coutinho, 2006). Methods The purpose of this study was to examine how jurors calibrate to the strength of scientific information, and whether individual difference variables and gist aids inspired by Fuzzy Trace Theory help jurors better understand complicated science of differing quality. We used a 2 (quality of scientific evidence: high vs. low) x 2 (decision aid to improve calibration - gist information vs. no gist information), between-subjects design. All hypotheses were preregistered on the Open Science Framework. Jury-eligible community participants (430 jurors across 90 juries; Mage = 37.58, SD = 16.17, 58% female, 56.93% White). Each jury was randomly assigned to one of the four possible conditions. Participants were asked to individually fill out measures related to their scientific reasoning skills prior to watching a mock jury trial. The trial was about an armed bank robbery and consisted of various pieces of testimony and evidence (e.g. an eyewitness testimony, police lineup identification, and a sweatshirt found with the stolen bank money). The key piece of evidence was mitochondrial DNA (mtDNA) evidence collected from hair on a sweatshirt (materials from Hans et al., 2011). Two experts presented opposing opinions about the scientific evidence related to the mtDNA match estimate for the defendant’s identification. The quality and content of this mtDNA evidence differed based on the two conditions. The high quality evidence condition used a larger database than the low quality evidence to compare to the mtDNA sample and could exclude a larger percentage of people. In the decision aid condition, experts in the gist information group presented gist aid inspired visuals and examples to help explain the proportion of people that could not be excluded as a match. Those in the no gist information group were not given any aid to help them understand the mtDNA evidence presented. After viewing the trial, participants filled out a questionnaire on how well they understood the mtDNA evidence and their overall judgments of the case (e.g. verdict, witness credibility, scientific evidence strength). They filled this questionnaire out again after a 45-minute deliberation. Measures We measured Attitudes Toward Science (ATS) with indices of scientific promise and scientific reservations (Hans et al., 2011; originally developed by National Science Board, 2004; 2006). We used Drummond and Fischhoff’s (2015) Scientific Reasoning Scale (SRS) to measure scientific reasoning skills. Weller et al.’s (2012) Numeracy Scale (WNS) measured proficiency in reasoning with quantitative information. The NFC-Short Form (Cacioppo et al., 1984) measured need for cognition. We developed a 20-item multiple-choice comprehension test for the mtDNA scientific information in the cases (modeled on Hans et al., 2011, and McAuliff et al., 2009). Participants were shown 20 statements related to DNA evidence and asked whether these statements were True or False. The test was then scored out of 20 points. Results For this project, we measured calibration to the scientific evidence in a few different ways. We are building a full model with these various operationalizations to be presented at APLS, but focus only on one of the calibration DVs (i.e., objective understanding of the mtDNA evidence) in the current proposal. We conducted a general linear model with total score on the mtDNA understanding measure as the DV and quality of scientific evidence condition, decision aid condition, and the four individual difference measures (i.e., NFC, ATS, WNS, and SRS) as predictors. Contrary to our main hypotheses, neither evidence quality nor decision aid condition affected juror understanding. However, the individual difference variables did: we found significant main effects for Scientific Reasoning Skills, F(1, 427) = 16.03, p <.001, np2 = .04, Weller Numeracy Scale, F(1, 427) = 15.19, p <.001, np2 = .03, and Need for Cognition, F(1, 427) = 16.80, p <.001, np2 = .04, such that those who scored higher on these measures displayed better understanding of the scientific evidence. In addition there was a significant interaction of evidence quality condition and scores on the Weller’s Numeracy Scale, F(1, 427) = 4.10, p = .04, np2 = .01. Further results will be discussed. Discussion These data suggest jurors are not sensitive to differences in the quality of scientific mtDNA evidence, and also that our attempt at helping sensitize them with Fuzzy Trace Theory-inspired aids did not improve calibration. Individual scientific reasoning abilities and general cognition styles were better predictors of understanding this scientific information. These results suggest a need for further exploration of approaches to help jurors differentiate between high and low quality evidence. Note: The 3rd author was supported by an AP-LS AP Award for her role in this research. Learning Objective: Participants will be able to describe how individual differences in scientific reasoning skills help jurors understand complex scientific evidence. 
    more » « less