Educational data mining has allowed for large improvements in educational outcomes and understanding of educational processes. However, there remains a constant tension between educational data mining advances and protecting student privacy while using educational datasets. Publicly available datasets have facilitated numerous research projects while striving to preserve student privacy via strict anonymization protocols (e.g., k-anonymity); however, little is known about the relationship between anonymization and utility of educational datasets for downstream educational data mining tasks, nor how anonymization processes might be improved for such tasks. We provide a framework for strictly anonymizing educational datasets with a focus on improving downstream performance in common tasks such as student outcome prediction. We evaluate our anonymization framework on five diverse educational datasets with machine learning-based downstream task examples to demonstrate both the effect of anonymization and our means to improve it. Our method improves downstream machine learning accuracy versus baseline data anonymization by 30.59%, on average, by guiding the anonymization process toward strategies that anonymize the least important information while leaving the most valuable information intact.
more »
« less
IDEAL: An Interactive De-Anonymization Learning System
In the era of digital communities, a massive volume of data is created from people's online activities on a daily basis. Such data is sometimes shared with third-parties for commercial benefits, which has caused people's concerns about privacy disclosure. Privacy preserving technologies have been developed to protect people's sensitive information in data publishing. However, due to the availability of data from other sources, e.g., blogging, it is still possible to de-anonymize users even from anonymized data sets. This paper presents the design and implementation of an Interactive De-Anonymization Learning system—IDEAL. The system can help students learn about de-anonymization through engaging hands-on activities, such as tuning different parameters to evaluate their impact on the accuracy of de-anonymization, and observing the affect of data anonymization on de-anonymization. A pilot lab session to evaluate the system was conducted among thirty-five students at Prairie View A&M University and the feedback was very positive.
more »
« less
- Award ID(s):
- 1712496
- PAR ID:
- 10438709
- Date Published:
- Journal Name:
- 2020 IEEE 44th Annual Computers, Software, and Applications Conference
- Page Range / eLocation ID:
- 449 to 454
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
As social media grows increasingly in popularity, so do people's concerns about their privacy disclosure. Considering the large amount of time the younger generations spend on online social networks, educational activities are needed to promote privacy protection. This paper presents an initial effort on exploring a hands-on learning approach to educate students on social media privacy. Specifically, a labware was developed to educate students that their query behaviors in online social networks may disclose the private relationships of other users on the site. The labware aims at making students aware of the privacy issues on social media, understanding the costs of privacy protection, stimulating their interests, and improving students self-efficacy. This paper discusses the design and implementation of the labware, which was evaluated among a group of student volunteers through the lecture and hands-on activities. Student feedback was very positive and encouraging.more » « less
-
Anonymization of event logs facilitates process mining while protecting sensitive information of process stakeholders. Existing techniques, however, focus on the privatization of the control-flow. Other process perspectives, such as roles, resources, and objects are neglected or subject to randomization, which breaks the dependencies between the perspectives. Hence, existing techniques are not suited for advanced process mining tasks, e.g., social network mining or predictive monitoring . To address this gap, we propose PMDG, a framework to ensure privacy for multi-perspective process mining through data generalization. It provides group-based privacy guarantees for an event log, while preserving the characteristic dependencies between the control-flow and further process perspectives. Unlike existing privatization techniques that rely on data suppression or noise insertion, PMDG adopts data generalization: a technique where the activities and attribute values referenced in events are generalized into more abstract ones, to obtain equivalence classes that are sufficiently large from a privacy point of view. We demonstrate empirically that PMDG outperforms state-of-the-art anonymization techniques, when mining handovers and predicting outcomes.more » « less
-
There is an increasing concern in computer vision devices invading the privacy of their users. We want the camera systems/robots to recognize important events and assist human daily life by understanding its videos, but we also want to ensure that they do not intrude people's privacy. In this paper, we propose a new principled approach for learning a video anonymizer. We use an adversarial training setting in which two competing systems fight: (1) a video anonymizer that modifies the original video to remove privacy-sensitive information (i.e., human face) while still trying to maximize spatial action detection performance, and (2) a discriminator that tries to extract privacy-sensitive information from such anonymized videos. The end goal is for the video anonymizer to perform a pixel-level modification of video frames to anonymize each person's face, while minimizing the effect on action detection performance. We experimentally confirm the benefit of our approach particularly compared to conventional hand-crafted video/face anonymization methods including masking, blurring, and noise adding.more » « less
-
Research and practical development of data-anonymization techniques have proliferated in recent years. Yet, limited attention has been paid to examine the potentially disparate impact of privacy protection on underprivileged subpopulations. This study is one of the first attempts to examine the extent to which data anonymization could mask the gross statistical disparities between subpopulations in the data. We first describe two common mechanisms of data anonymization and two prevalent types of statistical evidence for disparity. Then, we develop conceptual foundation and mathematical formalism demonstrating that the two data-anonymization mechanisms have distinctive impacts on the identifiability of disparity, which also varies based on its statistical operationalization. After validating our findings with empirical evidence, we discuss the business and policy implications, highlighting the need for firms and policy makers to balance between the protection of privacy and the recognition/rectification of disparate impact. This paper was accepted by Chris Forman, information systems.more » « less