skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on April 11, 2026

Title: Neighborhood-Aware Negative Sampling for Student Knowledge and Behavior Modeling
Simple random negative sampling is a technique used to enhance decision-making in sequential models with numerous potential negative instances, like recommender systems. However, it ignores the patterns that can be discovered in complex sequences to select the most informative negative samples. In this paper, we address this challenge by introducing a Neighborhood-Aware Negative Sampling (NANS) technique in the context of student knowledge modeling (KM) and behavior modeling (BM). In the education domain, KM quantifies student knowledge based on past performance, while BM focuses on behaviors like student preferences of questions. With the vast number of problems to choose from and the intricate relationship between student knowledge and behavior, selecting the proper negative samples becomes a notable challenge in this problem. NANS, along with our proposed multi-objective, multi-task sequential model for KM and BM, NANS-KoBeM frames the simultaneous modeling of student knowledge and question selection as a multi-task learning problem with dual objectives: predicting students’ performance and their question selections.  more » « less
Award ID(s):
2047500
PAR ID:
10612060
Author(s) / Creator(s):
;
Publisher / Repository:
AAAI
Date Published:
Journal Name:
Proceedings of the AAAI Conference on Artificial Intelligence
Volume:
39
Issue:
12
ISSN:
2159-5399
Page Range / eLocation ID:
13374 to 13382
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Benjamin, Paaßen; Carrie, Demmans Epp (Ed.)
    Knowledge Tracing (KT) focuses on quantifying student knowledge according to the student's past performance. While KT models focus on modeling student knowledge, they miss the behavioral aspect of learning, such as the types of learning materials that the students choose to learn from. This is mainly because traditional knowledge tracing (KT) models only consider assessed activities, like solving questions. Recently, there has been a growing interest in multi-type KT which considers both assessed and non-assessed activities (like video lectures). Since multi-type KT models include different learning material types, they present a new opportunity to investigate student behavior, as in the choice of the learning material type, along with student knowledge. We argue that student knowledge can affect their behavior, and student interest in learning materials may affect their knowledge. In this paper, we model the relationship between students' knowledge states and their choice of learning activities. To this end, we propose Pareto-TAMKOT which frames the simultaneous learning of student knowledge and behavior as a multi-task learning problem. It employs a transition-aware multi-activity KT method for two objectives: modeling student knowledge and student behavior. Pareto-TAMKOT uses the Pareto Multi-task learning algorithm (Pareto MTL) to solve this multi-objective optimization problem. We evaluate Pareto-TAMKOT on one real-world dataset, demonstrating the benefit of approaching student knowledge and behavior modeling as a multi-task learning problem. 
    more » « less
  2. Interactive learning environments facilitate learning by providing hints to fill the gaps in the understanding of a concept. Studies suggest that hints are not used optimally by learners. Either they are used unnecessarily or not used at all. It has been shown that learning outcomes can be improved by providing hints when needed. An effective hinttaking prediction model can be used by a learning environment to make adaptive decisions on whether to withhold or provide hints. Past work on student behavior modeling has focused extensively on the task of modeling a learner’s state of knowledge over time, referred to as knowledge tracing. The other aspects of a learner’s behavior such as tendency to use hints has garnered limited attention. Past knowledge tracing models either ignore the questions where a hint was taken or label hints taken as an incorrect response. We propose a multi-task memory-augmented deep learning model to jointly predict the hint-taking and the knowledge tracing task. The model incorporates the effect of past responses as well as hints taken on both the tasks. We apply the model on two datasets – ASSISTments 2009-10 skill builder dataset and Junyi Academy Math Practicing Log. The results show that deep learning models efficiently leverage the sequential information present in a learner’s responses. The proposed model significantly out-performs the past work on hint prediction by at least 12% points. Moreover, we demonstrate that jointly modeling the two tasks improves performance consistently across the tasks and the datasets, albeit by a small amount. 
    more » « less
  3. Contrastive learning learns input representation by pushing similar data together and pulling dissimilar data away, along with data augmentation and pretext task construction. It enhances the large model learning due to its ability to use a large amount of unlabeled data. It has been suc- cessfully applied to large language models, pre-trained image models, and multimodal models. In addition, contrastive learning learns a representation from modeling the explainable structure of the latent space, which has a broad application in scientific discovery and interpretable Artificial Intelligence (AI). The primary focus of this thesis is to explore contrastive learning from a data construction perspective in real-world problems to fill the gap between the principle of contrastive learning and its application. The challenges, such as sampling bias and data quality, will largely affect the representations learned by contrastive learning. This thesis analyzes the data construction chanlledges and limitations in 1) the negative sampling of knowledge graph embedding (KGE), 2) high-quliaty preference data labeling of Large Language Models (LLMs) alignment, 3) data augmentation in Non-linear dynamic system modeling, and 4) data properties in functions of mesange RNA (mRNA) sequence. To solve the challenges 1), a hardness and structure-based objective function was proposed by considering sampling bias in hard negative sampling. For challenge 2), the similarity of response embedding is used to evaluate the quality of preference pairs to mitigate the labeling error of humans when they face an ambiguous response pair. Chal- lenge 3) is solved by systematically considering the physical system and contrastive learning. A data augmentation strategy by partitioning the full sequence is used for learning the transition matrix in the latent linear space. Challenge 4) is common to see in the biological domain due to the high cost of lab experiments. Pre-trained model will advantage the limited dataset su- pervised learning by learning general features from domain knowledge. A contrastive learning based teacher-student framework is proposed for mRNA sequence learning by contrasting the unmasked sequence and the hard-masked sequence. By providing careful data construction or data sampling, contrastive learning will be boosted to solve tasks in reality. For the KGE, the novel contrastive loss function learns the boundary between negative samples and positive samples to improve the link prediction task in the knowl- edge graph; For the LLM alignment, in the same labeling cost, the selected dissimilar responses will improve the vanilla direct preference optimization (DPO) alignment; The data augmentation with contrastive loss play crucial role to learn more accuracy dynamic system, which explained by the learned the continiues eigenfunction; By considering the tearch-student framework with hard-masked strategy, the pre-trained model achieve the state-of-the-art result by fine-tuning on limited downstrame task data. Overall, this thesis provides a broad data-driven contrastive learning methodology to enhance representation learning in different domains. The methodology consists of a imprived objective function in the face of data bias, a better data selection reducing labeling error, and proper data augmentation for a particular application domain. This methodology improve the learning result compare to traditional method. 
    more » « less
  4. We study active learning methods for single index models of the form $$F({\bm x}) = f(\langle {\bm w}, {\bm x}\rangle)$$, where $$f:\mathbb{R} \to \mathbb{R}$$ and $${\bx,\bm w} \in \mathbb{R}^d$$. In addition to their theoretical interest as simple examples of non-linear neural networks, single index models have received significant recent attention due to applications in scientific machine learning like surrogate modeling for partial differential equations (PDEs). Such applications require sample-efficient active learning methods that are robust to adversarial noise. I.e., that work even in the challenging agnostic learning setting. We provide two main results on agnostic active learning of single index models. First, when $$f$$ is known and Lipschitz, we show that $$\tilde{O}(d)$$ samples collected via {statistical leverage score sampling} are sufficient to learn a near-optimal single index model. Leverage score sampling is simple to implement, efficient, and already widely used for actively learning linear models. Our result requires no assumptions on the data distribution, is optimal up to log factors, and improves quadratically on a recent $${O}(d^{2})$$ bound of \cite{gajjar2023active}. Second, we show that $$\tilde{O}(d)$$ samples suffice even in the more difficult setting when $$f$$ is \emph{unknown}. Our results leverage tools from high dimensional probability, including Dudley's inequality and dual Sudakov minoration, as well as a novel, distribution-aware discretization of the class of Lipschitz functions. 
    more » « less
  5. null (Ed.)
    Knowledge Tracing (KT), which aims to model student knowledge level and predict their performance, is one of the most important applications of user modeling. Modern KT approaches model and maintain an up-to-date state of student knowledge over a set of course concepts according to students’ historical performance in attempting the problems. However, KT approaches were designed to model knowledge by observing relatively small problem-solving steps in Intelligent Tutoring Systems. While these approaches were applied successfully to model student knowledge by observing student solutions for simple problems, such as multiple-choice questions, they do not perform well for modeling complex problem solving in students. Most importantly, current models assume that all problem attempts are equally valuable in quantifying current student knowledge. However, for complex problems that involve many concepts at the same time, this assumption is deficient. It results in inaccurate knowledge states and unnecessary fluctuations in estimated student knowledge, especially if students guess the correct answer to a problem that they have not mastered all of its concepts or slip in answering the problem that they have already mastered all of its concepts. In this paper, we argue that not all attempts are equivalently important in discovering students’ knowledge state, and some attempts can be summarized together to better represent student performance. We propose a novel student knowledge tracing approach, Granular RAnk based TEnsor factorization (GRATE), that dynamically selects student attempts that can be aggregated while predicting students’ performance in problems and discovering the concepts presented in them. Our experiments on three real-world datasets demonstrate the improved performance of GRATE, compared to the state-of-the-art baselines, in the task of student performance prediction. Our further analysis shows that attempt aggregation eliminates the unnecessary fluctuations from students’ discovered knowledge states and helps in discovering complex latent concepts in the problems. 
    more » « less