skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Knowledge Annotation for Intelligent Textbooks
With the increased popularity of electronic textbooks, there is a growing interest in developing a new generation of “intelligent textbooks,” which have the ability to guide readers according to their learning goals and current knowledge. Intelligent textbooks extend regular textbooks by integrating machine-manipulable knowledge, and the most popular type of integrated knowledge is a list of relevant concepts mentioned in the textbooks. With these concepts, multiple intelligent operations, such as content linking, content recommendation, or student modeling, can be performed. However, existing automatic keyphrase extraction methods, even supervised ones, cannot deliver sufficient accuracy to be practically useful in this task. Manual annotation by experts has been demonstrated to be a preferred approach for producing high-quality labeled data for training supervised models. However, most researchers in the education domain still consider the concept annotation process as an ad-hoc activity rather than a carefully executed task, which can result in low-quality annotated data. Using the annotation of concepts for the Introduction to Information Retrieval textbook as a case study, this paper presents a knowledge engineering method to obtain reliable concept annotations. As demonstrated by the data we collected, the inter-annotator agreement gradually increased along with our procedure, and the concept annotations we produced led to better results in document linking and student modeling tasks. The contributions of our work include a validated knowledge engineering procedure, a codebook for technical concept annotation, and a set of concept annotations for the target textbook, which could be used as a gold standard in further intelligent textbook research.  more » « less
Award ID(s):
1822752
PAR ID:
10367960
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Technology knowledge and learning
ISSN:
2211-1670
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    We present a novel approach to intro-to-programming domain model discovery from textbooks using an over-generation and ranking strategy. We first extract candidate key phrases from each chapter in a Computer Science textbook focusing on intro-to-programming and then rank those concepts according to a number of metrics such as the standard tf-idf weight used in information retrieval and metrics produced by other text ranking algorithms. Specifically, we conduct our work in the context of developing an intelligent tutoring system for source code comprehension for which a specification of the key programming concepts is needed - the system monitors students' performance on those concepts and scaffolds their learning process until they show mastery of the concepts. Our experiments with programming concept instruction from Java textbooks indicate that the statistical methods such as KP Miner method are quite competitive compared to other more sophisticated methods. Automated discovery of domain models will lead to more scalable Intelligent Tutoring Systems (ITSs) across topics and domains, which is a major challenge that needs to be addressed if ITSs are to be widely used by millions of learners across many domains. 
    more » « less
  2. Frasson, C.; Mylonas, P.; Troussas, C. (Ed.)
    Domain modeling is an important task in designing, developing, and deploying intelligent tutoring systems and other adaptive instructional systems. We focus here on the more specific task of automatically extracting a domain model from textbooks. In particular, this paper explores using multiple textbook indexes to extract a domain model for computer programming. Our approach is based on the observation that different experts, i.e., authors of intro-to-programming textbooks in our case, break down a domain in slightly different ways, and identifying the commonalities and differences can be very revealing. To this end, we present automated approaches to extracting domain models from multiple textbooks and compare the resulting common domain model with a domain model created by experts. Specifically, we use approximate string-matching approaches to increase coverage of the resulting domain model and majority voting across different textbooks to discover common domain terms related to computer programming. Our results indicate that using approximate string matching gives more accurate domain models for computer programming with increased precision and recall. By automating our approach, we can significantly reduce the time and effort required to construct high-quality domain models, making it easy to develop and deploy tutoring systems. Furthermore, we obtain a common domain model that can serve as a benchmark or skeleton that can be used broadly and adapted to specific needs by others. 
    more » « less
  3. Sosnovsky, S.; Brusilovsky, P.; Baraniuk, R.; Lan, A. (Ed.)
    An intelligent textbook may be considered to be an interaction layer that lies between the text and the student, helping the student to master the content in the text. The Mobile Fact and Concept Training System (MoFaCTS) is an adaptive instructional system for simple content that has been developed into an interaction layer to mediate textbook instruction and so is being transformed into the Mobile Fact and Concept Textbook System (MoFaCTS). In this project, MoFaCTS is being completely retooled to accept texts from a textbook and to automatically create cloze sentence practice content to help the student learn the material in the text. Additional features in the prototype stage include automatically generated refutational feedback for incorrect cloze responses and a dialog system, which will trigger a short conversation by a tutor to correct conceptual misunderstandings. MoFaCTS administers this content via a web browser, providing the teacher with score reports and class management tools. Because the "optimal practice" module is interchangeable and the cloze content can come from any text, the system is highly configurable for different grade levels, populations, and academic subjects. To foster faster research progress, data export supports the DataShop transaction format, which allows quick analysis of data using the DataShop tools. 
    more » « less
  4. null (Ed.)
    Knowledge Tracing (KT), which aims to model student knowledge level and predict their performance, is one of the most important applications of user modeling. Modern KT approaches model and maintain an up-to-date state of student knowledge over a set of course concepts according to students’ historical performance in attempting the problems. However, KT approaches were designed to model knowledge by observing relatively small problem-solving steps in Intelligent Tutoring Systems. While these approaches were applied successfully to model student knowledge by observing student solutions for simple problems, such as multiple-choice questions, they do not perform well for modeling complex problem solving in students. Most importantly, current models assume that all problem attempts are equally valuable in quantifying current student knowledge. However, for complex problems that involve many concepts at the same time, this assumption is deficient. It results in inaccurate knowledge states and unnecessary fluctuations in estimated student knowledge, especially if students guess the correct answer to a problem that they have not mastered all of its concepts or slip in answering the problem that they have already mastered all of its concepts. In this paper, we argue that not all attempts are equivalently important in discovering students’ knowledge state, and some attempts can be summarized together to better represent student performance. We propose a novel student knowledge tracing approach, Granular RAnk based TEnsor factorization (GRATE), that dynamically selects student attempts that can be aggregated while predicting students’ performance in problems and discovering the concepts presented in them. Our experiments on three real-world datasets demonstrate the improved performance of GRATE, compared to the state-of-the-art baselines, in the task of student performance prediction. Our further analysis shows that attempt aggregation eliminates the unnecessary fluctuations from students’ discovered knowledge states and helps in discovering complex latent concepts in the problems. 
    more » « less
  5. Interactive textbooks generate big data through student reading participation, including animations, question sets, and auto-graded homework. Animations are multi-step, dynamic visuals with text captions. By dividing new content into smaller chunks of information, student engagement is expected to be high, which aligns with tenets of cognitive load theory. Specifically, students’ clicks are recorded and measure usage, completion, and view time per step and for entire animations. Animation usage data from an interactive textbook for a chemical engineering course in Material and Energy Balances accounts for 60,000 animation views across 140+ unique animations. Data collected across five cohorts between 2016 and 2020 used various metrics to capture animation usage including watch and re-watch rates as well as the length of animation views. Variations in view rate and time were examined across content, parsed by book chapter, and five animation characterizations (Concept, Derivation, Figures and Plots, Physical World, and Spreadsheets). Important findings include: 1) Animation views were at or above 100% for all chapters and cohorts, 2) Median view time varies from 22 s (2-step) to 59 s (6-step) - a reasonable attention span for students and cognitive load, 3) Median view time for animations characterized as Derivation was the longest (40 s) compared to Physical World animations, which resulted in the shortest time (20 s). 
    more » « less