skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: An Approach to Improve k-Anonymization Practices in Educational Data Mining
Educational data mining has allowed for large improvements in educational outcomes and understanding of educational processes. However, there remains a constant tension between educational data mining advances and protecting student privacy while using educational datasets. Publicly available datasets have facilitated numerous research projects while striving to preserve student privacy via strict anonymization protocols (e.g., k-anonymity); however, little is known about the relationship between anonymization and utility of educational datasets for downstream educational data mining tasks, nor how anonymization processes might be improved for such tasks. We provide a framework for strictly anonymizing educational datasets with a focus on improving downstream performance in common tasks such as student outcome prediction. We evaluate our anonymization framework on five diverse educational datasets with machine learning-based downstream task examples to demonstrate both the effect of anonymization and our means to improve it. Our method improves downstream machine learning accuracy versus baseline data anonymization by 30.59%, on average, by guiding the anonymization process toward strategies that anonymize the least important information while leaving the most valuable information intact.  more » « less
Award ID(s):
2000638
PAR ID:
10541656
Author(s) / Creator(s):
; ;
Publisher / Repository:
Zenodo
Date Published:
Journal Name:
Journal of Educational Data Mining
ISSN:
2157-2100
Subject(s) / Keyword(s):
student privacy data sharing machine learning
Format(s):
Medium: X
Right(s):
Creative Commons Attribution 4.0 International
Sponsoring Org:
National Science Foundation
More Like this
  1. Anonymization of event logs facilitates process mining while protecting sensitive information of process stakeholders. Existing techniques, however, focus on the privatization of the control-flow. Other process perspectives, such as roles, resources, and objects are neglected or subject to randomization, which breaks the dependencies between the perspectives. Hence, existing techniques are not suited for advanced process mining tasks, e.g., social network mining or predictive monitoring . To address this gap, we propose PMDG, a framework to ensure privacy for multi-perspective process mining through data generalization. It provides group-based privacy guarantees for an event log, while preserving the characteristic dependencies between the control-flow and further process perspectives. Unlike existing privatization techniques that rely on data suppression or noise insertion, PMDG adopts data generalization: a technique where the activities and attribute values referenced in events are generalized into more abstract ones, to obtain equivalence classes that are sufficiently large from a privacy point of view. We demonstrate empirically that PMDG outperforms state-of-the-art anonymization techniques, when mining handovers and predicting outcomes. 
    more » « less
  2. Educational process data, i.e., logs of detailed student activities in computerized or online learning platforms, has the potential to offer deep insights into how students learn. One can use process data for many downstream tasks such as learning outcome prediction and automatically delivering personalized intervention. However, analyzing process data is challenging since the specific format of process data varies a lot depending on different learning/testing scenarios. In this paper, we propose a framework for learning representations of educational process data that is applicable across many different learning scenarios. Our framework consists of a pre-training step that uses BERT-type objectives to learn representations from sequential process data and a fine-tuning step that further adjusts these representations on downstream prediction tasks. We apply our framework to the 2019 nation’s report card data mining competition dataset that consists of student problem-solving process data and detail the specific models we use in this scenario. We conduct both quantitative and qualitative experiments to show that our framework results in process data representations that are both predictive and informative.1 
    more » « less
  3. Educational process data, i.e., logs of detailed student activities in computerized or online learning platforms, has the potential to offer deep insights into how students learn. One can use process data for many downstream tasks such as learning outcome prediction and automatically delivering personalized intervention. In this paper, we propose a framework for learning representations of educational process data that is applicable across different learning scenarios. Our framework consists of a pre-training step that uses BERTtype objectives to learn representations from sequential process data and a fine-tuning step that further adjusts these representations on downstream prediction tasks. We apply our framework to the 2019 nation’s report card data mining competition dataset that consists of student problem-solving process data and detail the specific models we use in this scenario. We conduct both quantitative and qualitative experiments to show that our framework results in process data representations that are both predictive and informative. 
    more » « less
  4. Educational process data, i.e., logs of detailed student activities in computerized or online learning platforms, has the potential to offer deep insights into how students learn. One can use process data for many downstream tasks such as learning outcome prediction and automatically delivering personalized intervention. In this paper, we propose a framework for learning representations of educational process data that is applicable across different learning scenarios. Our framework consists of a pre-training step that uses BERTtype objectives to learn representations from sequential process data and a fine-tuning step that further adjusts these representations on downstream prediction tasks. We apply our framework to the 2019 nation’s report card data mining competition dataset that consists of student problem-solving process data and detail the specific models we use in this scenario. We conduct both quantitative and qualitative experiments to show that our framework results in process data representations that are both predictive and informative. 
    more » « less
  5. Abstract Machine learning (ML) models are used for in-situ monitoring in additive manufacturing (AM) for defect detection. However, sensitive information stored in ML models, such as part designs, is at risk of data leakage due to unauthorized access. To address this, differential privacy (DP) introduces noise into ML, outperforming cryptography, which is slow, and data anonymization, which does not guarantee privacy. While DP enhances privacy, it reduces the precision of defect detection. This paper proposes combining DP with Hyperdimensional Computing (HDC), a brain-inspired model that memorizes training sample information in a large hyperspace, to optimize real-time monitoring in AM while protecting privacy. Adding DP noise to the HDC model protects sensitive information without compromising defect detection accuracy. Our studies demonstrate the effectiveness of this approach in monitoring anomalies, such as overhangs, using high-speed melt pool data analysis. With a privacy budget set at 1, our model achieved an F-score of 94.30%, surpassing traditional models like ResNet50, DenseNet201, EfficientNet B2, and AlexNet, which have performance up to 66%. Thus, the intersection of DP and HDC promises accurate defect detection and protection of sensitive information in AM. The proposed method can also be extended to other AM processes, such as fused filament fabrication. 
    more » « less