Data are becoming increasingly important in science and society, and thus data literacy is a vital asset to students as they prepare for careers in and outside science, technology, engineering, and mathematics and go on to lead productive lives. In this paper, we discuss why the strongest learning experiences surrounding data literacy may arise when students are given opportunities to work with authentic data from scientific research. First, we explore the overlap between the fields of quantitative reasoning, data science, and data literacy, specifically focusing on how data literacy results from practicing quantitative reasoning and data science in the context of authentic data. Next, we identify and describe features that influence the complexity of authentic data sets (selection, curation, scope, size, and messiness) and implications for data-literacy instruction. Finally, we discuss areas for future research with the aim of identifying the impact that authentic data may have on student learning. These include defining desired learning outcomes surrounding data use in the classroom and identification of teaching best practices when using data in the classroom to develop students’ data-literacy abilities.
more »
« less
Ant Isotope ratios at the Kellogg Biological Station, Hickory Corners, MI (2018)
Dataset AbstractThis is data supporting the publication: Jackson A. Helms IV et.al 2020. “Bioenergy landscapes drive trophic shifts in generalist ants” in the Journal of Animal Ecology. These data document tissue stable isotope ratios of C and N in plants and invertebrates collected by hand or in pitfall traps during the 2018 growing season from the Biofuel Cropping System Experiment (BCSE) plots and the GLBRC Scale-Up Experiment.original data source http://lter.kbs.msu.edu/datasets/199
more »
« less
- Award ID(s):
- 1832042
- PAR ID:
- 10357169
- Publisher / Repository:
- Environmental Data Initiative
- Date Published:
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Recent studies have shown that several government and business organizations experience huge data breaches. Data breaches increase in a daily basis. The main target for attackers is organization sensitive data which includes personal identifiable information (PII) such as social security number (SSN), date of birth (DOB) and credit card /debit card (CCDC). The other target is encryption/decryption keys or passwords to get access to the sensitive data. The cloud computing is emerging as a solution to store, transfer and process the data in a distributed location over the Internet. Big data and internet of things (IoT) increased the possibility of sensitive data exposure. Most methods used for the attack are hacking, unauthorized access, insider theft and false data injection on the move. Most of the attacks happen during three different states of data life cycle such as data-at-rest, data-in-use, and data-in-transit. Hence, protecting sensitive data at all states particularly when data is moving to cloud computing environment needs special attention. The main purpose of this research is to analyze risks caused by data breaches, personal and organizational weaknesses to protect sensitive data and privacy. The paper discusses methods such as data classification and data encryption at different states to protect personal and organizational sensitive data. The paper also presents mathematical analysis by leveraging the concept of birthday paradox to demonstrate the encryption key attack. The analysis result shows that the use of same keys to encrypt sensitive data at different data states make the sensitive data less secure than using different keys. Our results show that to improve the security of sensitive data and to reduce the data breaches, different keys should be used in different states of the data life cycle.more » « less
-
null (Ed.)It is common practice for data providers to include text descriptions for each column when publishing data sets in the form of data dictionaries. While these documents are useful in helping an end-user properly interpret the meaning of a column in a data set, existing data dictionaries typically are not machine-readable and do not follow a common specification standard. We introduce the Semantic Data Dictionary, a specification that formalizes the assignment of a semantic representation of data, enabling standardization and harmonization across diverse data sets. In this paper, we present our Semantic Data Dictionary work in the context of our work with biomedical data; however, the approach can and has been used in a wide range of domains. The rendition of data in this form helps promote improved discovery, interoperability, reuse, traceability, and reproducibility. We present the associated research and describe how the Semantic Data Dictionary can help address existing limitations in the related literature. We discuss our approach, present an example by annotating portions of the publicly available National Health and Nutrition Examination Survey data set, present modeling challenges, and describe the use of this approach in sponsored research, including our work on a large National Institutes of Health (NIH)-funded exposure and health data portal and in the RPI-IBM collaborative Health Empowerment by Analytics, Learning, and Semantics project. We evaluate this work in comparison with traditional data dictionaries, mapping languages, and data integration tools.more » « less
-
Big Data empowers the farming community with the information needed to optimize resource usage, increase productivity, and enhance the sustainability of agricultural practices. The use of Big Data in farming requires the collection and analysis of data from various sources such as sensors, satellites, and farmer surveys. While Big Data can provide the farming community with valuable insights and improve efficiency, there is significant concern regarding the security of this data as well as the privacy of the participants. Privacy regulations, such as the European Union’s General Data Protection Regulation (GDPR), the EU Code of Conduct on agricultural data sharing by contractual agreement, and the proposed EU AI law, have been created to address the issue of data privacy and provide specific guidelines on when and how data can be shared between organizations. To make confidential agricultural data widely available for Big Data analysis without violating the privacy of the data subjects, we consider privacy-preserving methods of data sharing in agriculture. Synthetic data that retains the statistical properties of the original data but does not include actual individuals’ information provides a suitable alternative to sharing sensitive datasets. Deep learning-based synthetic data generation has been proposed for privacy-preserving data sharing. However, there is a lack of compliance with documented data privacy policies in such privacy-preserving efforts. In this study, we propose a novel framework for enforcing privacy policy rules in privacy-preserving data generation algorithms. We explore several available agricultural codes of conduct, extract knowledge related to the privacy constraints in data, and use the extracted knowledge to define privacy bounds in a privacy-preserving generative model. We use our framework to generate synthetic agricultural data and present experimental results that demonstrate the utility of the synthetic dataset in downstream tasks. We also show that our framework can evade potential threats, such as re-identification and linkage issues, and secure data based on applicable regulatory policy rules.more » « less
-
Data perturbation is a technique for generating synthetic data by adding ‘noise’ to raw data, which has an array of applications in science and engineering, primarily in data security and privacy. One challenge for data perturbation is that it usually produces synthetic data resulting in information loss at the expense of privacy protection. The information loss, in turn, renders the accuracy loss for any statistical or machine learning method based on the synthetic data, weakening downstream analysis and deteriorating in machine learning. In this article, we introduce and advocate a fundamental principle of data perturbation, which requires the preservation of the distribution of raw data. To achieve this, we propose a new scheme, named data flush, which ascertains the validity of the downstream analysis and maintains the predictive accuracy of a learning task. It perturbs data nonlinearly while accommodating the requirement of strict privacy protection, for instance, differential privacy. We highlight multiple facets of data flush through examples.more » « less
An official website of the United States government
