skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Authentic Undergraduate Research in Machine Learning with The Informatics Skunkworks: A Strategy for Scalable Apprenticeship Applied to Materials Informatics Research
Award ID(s):
2016981
PAR ID:
10552589
Author(s) / Creator(s):
; ; ; ; ; ;
Publisher / Repository:
ASEE Conferences
Date Published:
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The scientific field of urban climatology has long investigated the two-way interactions between cities and their overlying atmosphere through in-situ observations and climate simulations at various scales. Novel research directions now emerge through recent advancements in sensing and communication technologies, algorithms, and data sources. Coupled with rapid growth in computing power, those advancements augment traditional urban climate methods and provide unprecedented insights into urban atmospheric states and dynamics. The emerging field introduced and discussed here as Urban Climate Informatics (UCI) takes on a multidisciplinary approach to urban climate analyses by synthesizing two established domains: urban climate and climate informatics. UCI is a rapidly evolving field that takes advantage of four technological trends to answer contemporary climate challenges in cities: advances in sensors, improved digital infrastructure (e.g., cloud computing), novel data sources (e.g., crowdsourced or big data), and leading-edge analytical algorithms and platforms (e.g., machine learning, deep learning). This paper outlines the history and development of UCI, reviews recent technological and methodological advances, and highlights various applications that benefit from novel UCI methods and datasets. 
    more » « less
  2. The Informatics Skunkworks program provides a new framework for engaging undergraduates in research experiences, with a focus on the interface of data science and materials science. The program seeks to provide authentic research, engaged personal learning, and professional development while also being efficient, accessible, and scalable. Initially developed at the University of Wisconsin-Madison, participation continues to grow, with over 90 students engaged in research or training activities during the Fall 2021 semester from 4 institutions. The Skunkworks focuses on reducing barriers to engagement for mentors and students in undergraduate research by replacing bespoke and ad-hoc approaches with efforts and infrastructure that are reusable and scalable, including simplified standardized recruiting methods, online modular training resources, flexible undergraduate accessible software tools, long-term research projects with many similar but distinct components to engage large teams, and support from a learning community. For example, new students have the option to participate in a modular, self-paced, online onboarding curriculum that teaches students the basic skills needed for most data science projects, thereby dramatically reducing the mentor time needed to engage students with limited background in machine learning research. Projects are authentic research challenges that strive to allow for large flexible teams, thereby scaling up their impact from the typical engagement of just one or two students and allowing for extensive peer teaching. Throughout the program, professional development activities are efficiently delivered through standardized materials to teach critical research skills like record keeping, establishing group expectations and dynamics, and networking. These skills are also reinforced at workshop events hosted during the semester, which are effectively delivered online and yield growing impact for modest effort as the community grows. The program has been successfully implemented as evidenced by the last two semesters' evaluation findings through interviews, focus groups, and pre-post surveys. The students reported a positive attitude towards the program. Students' perception about machine learning knowledge and skills and their self-confidence improved after they got involved in the program. The instructors and mentors indicated positive teaching and mentoring experiences, and shared ideas on the further improvement of the program. Building on its early successes the team is continuing to implement evaluation data-driven improvements to the program with the goal of continuing to grow through new collaborations. 
    more » « less
  3. Maintaining data quality is a fundamental requirement for any successful and long-term data management. Providing high-quality, reliable, and statistically sound data is a primary goal for clinical research informatics. In addition, effective data governance and management are essential to ensuring accurate data counts, reports, and validation. As a crucial step of the clinical research process, it is important to establish and maintain organization-wide standards for data quality management to ensure consistency across all systems designed primarily for cohort identification, allowing users to perform an enterprise-wide search on a clinical research data repository to determine the existence of a set of patients meeting certain inclusion or exclusion criteria. Some of the clinical research tools are referred to as de-identified data tools. Assessing and improving the quality of data used by clinical research informatics tools are both important and difficult tasks. For an increasing number of users who rely on information as one of their most important assets, enforcing high data quality levels represents a strategic investment to preserve the value of the data. In clinical research informatics, better data quality translates into better research results and better patient care. However, achieving high-quality data standards is a major task because of the variety of ways that errors might be introduced in a system and the difficulty of correcting them systematically. Problems with data quality tend to fall into two categories. The first category is related to inconsistency among data resources such as format, syntax, and semantic inconsistencies. The second category is related to poor ETL and data mapping processes. In this paper, we describe a real-life case study on assessing and improving the data quality at one of healthcare organizations. This paper compares between the results obtained from two de-identified data systems i2b2, and Epic Slicedicer, and discuss the data quality dimensions' specific to the clinical research informatics context, and the possible data quality issues between the de-identified systems. This work in paper aims to propose steps/rules for maintaining the data quality among different systems to help data managers, information systems teams, and informaticists at any health care organization to monitor and sustain data quality as part of their business intelligence, data governance, and data democratization processes. 
    more » « less
  4. Personal informatics (PI) has become an area of significant research over the past decade, maturing into a sub-field that seeks to support people from many backgrounds and life contexts in collecting and finding value in their personal data. PI research includes a focus on people with chronic conditions as a monolithic group, but currently fails to distinguish the needs of people with motor disabilities (MD). To understand how current PI literature addresses those needs, we conducted a mapping review on PI publications engaged with people with MD. We report results from 50 publications identified in the ACM DL, Pubmed, JMIR, SCOPUS, and IEEE Xplore. Our analysis shows significant incompatibilities between the needs of individuals with MD and the ways that PI literature supports them. We also found inconsistencies in the ways that disability levels are reported, that PI literature for MD excludes non-health-related data domains, and an insufficient focus on PI tools' accessibility and usability for some MD users. In contrast with Epstein et al.'s [36] recent PI review, behavior change and habit awareness were the most common motivation in these publications. Finally, many of the reviewed articles reported involvement by caregivers, trainers, healthcare providers, and researchers across the PI stages. In addition to these insights, we provide recommendations for designing PI technology through a user-centric lens that will broaden the scope of PI and include people regardless of their motor abilities. 
    more » « less