skip to main content


Title: Domain Model Discovery from Textbooks for Computer Programming Intelligent Tutors
We present a novel approach to intro-to-programming domain model discovery from textbooks using an over-generation and ranking strategy. We first extract candidate key phrases from each chapter in a Computer Science textbook focusing on intro-to-programming and then rank those concepts according to a number of metrics such as the standard tf-idf weight used in information retrieval and metrics produced by other text ranking algorithms. Specifically, we conduct our work in the context of developing an intelligent tutoring system for source code comprehension for which a specification of the key programming concepts is needed - the system monitors students' performance on those concepts and scaffolds their learning process until they show mastery of the concepts. Our experiments with programming concept instruction from Java textbooks indicate that the statistical methods such as KP Miner method are quite competitive compared to other more sophisticated methods. Automated discovery of domain models will lead to more scalable Intelligent Tutoring Systems (ITSs) across topics and domains, which is a major challenge that needs to be addressed if ITSs are to be widely used by millions of learners across many domains.  more » « less
Award ID(s):
1822816
NSF-PAR ID:
10290858
Author(s) / Creator(s):
Date Published:
Journal Name:
he International FLAIRS Conference Proceedings, 34.
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Frasson, C. ; Mylonas, P. ; Troussas, C. (Ed.)
    Domain modeling is an important task in designing, developing, and deploying intelligent tutoring systems and other adaptive instructional systems. We focus here on the more specific task of automatically extracting a domain model from textbooks. In particular, this paper explores using multiple textbook indexes to extract a domain model for computer programming. Our approach is based on the observation that different experts, i.e., authors of intro-to-programming textbooks in our case, break down a domain in slightly different ways, and identifying the commonalities and differences can be very revealing. To this end, we present automated approaches to extracting domain models from multiple textbooks and compare the resulting common domain model with a domain model created by experts. Specifically, we use approximate string-matching approaches to increase coverage of the resulting domain model and majority voting across different textbooks to discover common domain terms related to computer programming. Our results indicate that using approximate string matching gives more accurate domain models for computer programming with increased precision and recall. By automating our approach, we can significantly reduce the time and effort required to construct high-quality domain models, making it easy to develop and deploy tutoring systems. Furthermore, we obtain a common domain model that can serve as a benchmark or skeleton that can be used broadly and adapted to specific needs by others. 
    more » « less
  2. Intelligent tutoring systems (ITSs) have consistently been shown to improve the educational outcomes of students when used alone or combined with traditional instruction. However, building an ITS is a time-consuming process which requires specialized knowledge of existing tools. Extant authoring methods, including the Cognitive Tutor Authoring Tools' (CTAT) example-tracing method and SimStudent's Authoring by Tutoring, use programming-by-demonstration to allow authors to build ITSs more quickly than they could by hand programming with model-tracing. Yet these methods still suffer from long authoring times or difficulty creating complete models. In this study, we demonstrate that Simulated Learners built with the Apprentice Learner (AL) Framework can be combined with a novel interaction design that emphasizes model transparency, input flexibility, and problem solving control to enable authors to achieve greater model completeness in less time than existing authoring methods. 
    more » « less
  3. Crossley, Scott ; Popescu, Elvira (Ed.)
    We present here a novel instructional resource, called DeepCode, to support deep code comprehension and learning in intro-to-programming courses (CS1 and CS2). DeepCode is a set of instructional code examples which we call a codeset and which was annotated by our team with comments (e.g., explaining the logical steps of the underlying problem being solved) and related instructional questions that can play the role of hints meant to help learners think about and articulate explanations of the code. While DeepCode was designed primarily to serve our larger efforts of developing an intelligent tutoring system (ITS) that fosters the monitoring, assessment, and development of code comprehension skills for students learning to program, the codeset can be used for other purposes such as assessment, problem-solving, and in various other learning activities such as studying worked-out code examples with explanations and code visualizations. We present here the underlying principles, theories, and frameworks behind our design process, the annotation guidelines, and summarize the resulting codeset of 98 annotated Java code examples which include 7,157 lines of code (including comments), 260 logical steps, 260 logical step details, 408 statement level comments, and 590 scaffolding questions. 
    more » « less
  4. Domain modeling is a central component in education technologies as it represents the target domain students are supposed to train on and eventually master. Automatically generating domain models can lead to substantial cost and scalability benefits. Automatically extracting key concepts or knowledge components from, for instance, textbooks can enable the development of automatic or semi-automatic processes for creating domain models. We explore in this work the use of transformer based pre-trained models for the task of keyphrase extraction. Specifically, we investigate and evaluate four different variants of BERT, a pre-trained transformer based architecture, that vary in terms of training data, training objective, or training strategy to extract knowledge components from textbooks for the domain of intro-to-programming. We report results obtained using the following BERT-based models: BERT, CodeBERT, SciBERT and RoBERTa. 
    more » « less
  5. Antonija Mitrovic and Nigel Bosch (Ed.)
    Domain modeling is a central component in education technologies as it represents the target domain students are supposed to train on and eventually master. Automatically generating domain models can lead to substantial cost and scalability benefits. Automatically extracting key concepts or knowledge components from, for instance, textbooks can enable the development of automatic or semi-automatic processes for creating domain models. We explore in this work the use of transformer based pre-trained models for the task of keyphrase extraction. Specifically, we investigate and evaluate four different variants of BERT, a pre-trained transformer based architecture, that vary in terms of training data, training objective, or training strategy to extract knowledge components from textbooks for the domain of intro-to-programming. We report results obtained using the following BERT-based models: BERT, CodeBERT, SciBERT and RoBERTa. 
    more » « less