skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Studying Interdisciplinary Thinking about Complex Real-World Data at DataFest
In the 21st century with the rise of computing power, it has become increasingly important to create opportunities for students to learn to work with large, authentic, complex (LAC) data across multiple disciplines. DataFest, a hackathon style undergraduate event, creates a space for such inquiry due to the collaborative, data-driven, open-problem, real-world relevant nature of the challenge it presents. We present preliminary findings from research that explores how teams at DataFest leverage and integrate multidisciplinary tools and domain knowledge to engage productively with the data investigation process. Implications for statistics and data science education are discussed.  more » « less
Award ID(s):
2216023
PAR ID:
10458486
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
IASE 2023 Satellite Conference Proceedings: Fostering Learning in Statistics and Data Science
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Today, data is being actively generated by a variety of devices, services, and applications. Such data is important not only for the information that it contains, but also for its relationships to other data and to interested users. Most existing Big Data systems focus onpassivelyanswering queries from users, rather thanactivelycollecting data, processing it, and serving it to users. To satisfy both passive and active requests at scale, application developers need either to heavily customize an existing passive Big Data system or to glue one together with systems likeStreaming EnginesandPub-sub services. Either choice requires significant effort and incurs additional overhead. In this paper, we present the BAD (Big Active Data) system as an end-to-end, out-of-the-box solution for this challenge. It is designed to preserve the merits of passive Big Data systems and introduces new features for actively serving Big Data to users at scale. We show the design and implementation of the BAD system, demonstrate how BAD facilitates providing both passive and active data services, investigate the BAD system’s performance at scale, and illustrate the complexities that would result from instead providing BAD-like services with a “glued” system. 
    more » « less
  2. The need for secure and efficient communication between connected devices continues to grow in healthcare systems within smart cities. Secure communication of healthcare data in Internet of Things (IoT) systems is critical to ensure patient privacy and data integrity. Problems with healthcare communication, like data breaches, integrity issues, scalability issues, and cyber threats, make it harder for people to trust doctors, cause costs to rise, stop people from using new technology, and put private data at risk. So, this paper presents a blockchain-based hybrid method for sending secure healthcare data that combines IoT systems with blockchain technology and high-tech encryption techniques like elliptic curve cryptography (ECC). The proposed method uses the public key of a smart contract to encrypt private data to protect its privacy. It also uses cryptographic hashing and digital signatures to make sure that the data is correct and real. The framework stores metadata (e.g., hashes and signatures) on-chain, and large data uses off-chain storage like IPFS to reduce costs and improve scalability. It also incorporates a mechanism to authenticate IoT devices and enable secure communication across heterogeneous networks. Moreover, this work bridges gaps in existing solutions by providing an end-to-end secure communication system for healthcare applications. It provides strong data security and efficient storage for a reliable and scalable way to handle healthcare data safely in IoT ecosystems. 
    more » « less
  3. Many analytic tools have been developed to discover knowledge from student data. However, the knowledge discovery process requires advanced analytical modelling skills, making it the province of data scientists. This impedes the ability of educational leaders, professors, and advisors to engage with the knowledge discovery process directly. As a result, it is challenging for analysis to take advantage of domain expertise, making its outcome often neither interesting nor useful. Usually the outcome produced from such analytic tools is static, preventing domain experts from exploring different hypotheses by changing data models or predictive models inside the tool. We have developed a framework for interactive and exploratory learning analytics which begins to address these challenges. We engaged in data exploration and hypotheses generation with our university domain experts by conducting two focus groups. We used the findings of these focus groups to validate our framework, arguing that it enables domain experts to explore the data, analysis and interpretation of student data to discover useful and interesting knowledge. 
    more » « less
  4. HTCondor is a major workload management system used in distributed high throughput computing (dHTC) environments, e.g., the Open Science Grid. One of the distinguishing features of HTCondor is the native support for data movement, allowing it to operate without a shared filesystem. Coupling data handling and compute scheduling is both convenient for users and allows for significant infrastructure flexibility but does introduce some limitations. The default HTCondor data transfer mechanism routes both the input and output data through the submission node, making it a potential bottleneck. In this document we show that by using a node equipped with a 100 Gbps network interface (NIC) HTCondor can serve data at up to 90 Gbps, which is sufficient for most current use cases, as it would saturate the border network links of most research universities at the time of writing. 
    more » « less
  5. TaxonWorks (http://taxonworks.org) is an integrated workbench for taxonomists and biodiversity scientists. It is designed to capture, organize, and enrich data, share and refine it with collaborators, and package it for analysis and publication. It is based on PostgreSQL (database) and the Ruby-on-Rails programming language and framework for developing web applications (https://github.com/SpeciesFileGroup/taxonworks). The TaxonWorks community is built around an open software ecosystem that facilitates participation at many levels. TaxonWorks is designed to serve both researchers who create and curate the data, as well as technical users, such as programmers and informatics specialists, who act as data consumers. TaxonWorks provides researchers with robust, user friendly interfaces based on well thought out customized workflows for efficient and validated data entry. It provides technical users database access through an application programming interface (API) that serves data in JSON format. The data model includes coverage for nearly all classes of data recorded in modern taxonomic treatments primary studies of biodiversity, including nomenclature, bibliography, specimens and collecting events, phylogenetic matrices and species descriptions, etc. The nomenclatural classes are based on the NOMEN ontology (https://github.com/SpeciesFileGroup/nomen). 
    more » « less