Educational data mining has allowed for large improvements in educational outcomes and understanding of educational processes. However, there remains a constant tension between educational data mining advances and protecting student privacy while using educational datasets. Publicly available datasets have facilitated numerous research projects while striving to preserve student privacy via strict anonymization protocols (e.g., k-anonymity); however, little is known about the relationship between anonymization and utility of educational datasets for downstream educational data mining tasks, nor how anonymization processes might be improved for such tasks. We provide a framework for strictly anonymizing educational datasets with a focus on improving downstream performance in common tasks such as student outcome prediction. We evaluate our anonymization framework on five diverse educational datasets with machine learning-based downstream task examples to demonstrate both the effect of anonymization and our means to improve it. Our method improves downstream machine learning accuracy versus baseline data anonymization by 30.59%, on average, by guiding the anonymization process toward strategies that anonymize the least important information while leaving the most valuable information intact.
more »
« less
This content will become publicly available on June 1, 2026
BlockR: An Areal Spatial Anonymization and Visualization Tool
Spatial anonymization is an important step in the research workflow for many researchers. In this paper we present BlockR, an open‐source tool in R for areal spatial anonymization and visualization. After using a shape‐clustering algorithm (SKATER), the underlying tool performs a series of affine transformations on the region of interest followed by “blockification” and border obfuscation processes to obscure the underlying shape. Importantly, BlockR anonymizes areal units while preserving contextually important spatial characteristics and administrative properties. Measures of disclosure risk are provided through a theoretical analysis of Hausdorff distances and the use of a neural network image classifier.
more »
« less
- Award ID(s):
- 2316857
- PAR ID:
- 10632419
- Publisher / Repository:
- Wiley
- Date Published:
- Journal Name:
- Transactions in GIS
- Volume:
- 29
- Issue:
- 4
- ISSN:
- 1361-1682
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Place‐based spatial accessibility quantifies the distribution of access to goods and services across space. The Two‐Step Floating Catchment Area (2SFCA) family of methods have become a default tool for spatial accessibility analysis in part due to their intuitive approach and interpretability. This family of methods relies on calculating catchment areas around supply locations to estimate the area and population that may utilize them. However, these “catchment areas” are generally defined by origin‐destination matrices of travel‐time, giving us point‐to‐point distances and not polygons with actual area. This means that population geographies (census tracts, blocks, etc.) are binarily included or excluded, with no room for partial inclusion. When using nongranular data, which is often the case due to data privacy restrictions, this has the potential to cause significant errors in accessibility measurements. In this article, we propose Areal 2SFCA: a new approach that considers the area of overlap between travel‐time polygons and population geographies. We demonstrate the effectiveness of the Areal 2SFCA method using a case study that compares the Enhanced Two‐Step Floating Catchment Area (E2SFCA) and Areal E2SFCA for the state of Illinois in the USA using multiple population granularities.more » « less
-
Laser powder bed fusion (PBF‐LB) is an additive manufacturing (AM) technology for producing complex geometry parts. However, the high cost of post‐processing coarse as‐built surfaces drives the need to control surface roughness during fabrication. Prior studies have evaluated the relationship between process parameters and as‐built surface roughness, but they rely on forward models using trial‐and‐error, regression, and data‐driven methods based only on areal surface roughness parameters that neglect spatial surface characteristics. In contrast, this study introduces, for the first time, an inverse data‐centric framework that leverages machine learning algorithms and an experimental dataset of Inconel 718 as‐built surfaces to predict the PBF‐LB process parameters required to achieve a desired as‐built roughness. This inverse model shows a prediction accuracy of ≈80%, compared to 90% for the corresponding forward model. Additionally, it incorporates deterministic surface roughness parameters, which capture both height and spatial information, and significantly improves prediction accuracy compared to only using areal parameters. The inverse model provides a digital tool to process engineers that enables control of surface roughness by tailoring process parameters. Hence, it establishes a foundation for integrating surface roughness control into the digital thread of AM, thereby reducing the need for post‐processing and improving process efficiency.more » « less
-
Rapid numerical approximation method for integrated covariance functions over irregular data regionsIn many practical applications, spatial data are often collected at areal levels (i.e., block data), and the inferences and predictions about the variable at points or blocks different from those at which it has been observed typically depend on integrals of the underlying continuous spatial process. In this paper, we describe a method based onFourier transformsby which multiple integrals of covariance functions over irregular data regions may be numerically approximated with the same level of accuracy as traditional methods, but at a greatly reduced computational expense.more » « less
-
We demonstrate a procedure for the anonymization of infant subjects in videos such that salient behavioral information is retained. This method also creates a new identity that is consistent temporally across video frames. We present an overview of this anonymization process, which involves moving through the latent space of a generative model with an infant specific latent space traversal technique. We apply the technique on videos of infants, a historically difficult source of data, and make comparisons to other state-of-the-art anonymization systems. Metrics demonstrate an improved ability to retain emotional content of videos during the anonymization process, even during extreme emotions or poses, while maintaining a consistent identity throughout.more » « less
An official website of the United States government
