skip to main content


Search for: "data science"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Data science consulting and collaboration units (DSUs) are core infrastructure for research at universities. Activities span data management, study design, data analysis, data visualization, predictive modelling, preparing reports, manuscript writing and advising on statistical methods and may include an experiential or teaching component. Partnerships are needed for a thriving DSU as an active part of the larger university network. Guidance for identifying, developing and managing successful partnerships for DSUs can be summarized in six rules: (1) align with institutional strategic plans, (2) cultivate partnerships that fit your mission, (3) ensure sustainability and prepare for growth, (4) define clear expectations in a partnership agreement, (5) communicate and (6) expect the unexpected. While these rules are not exhaustive, they are derived from experiences in a diverse set of DSUs, which vary by administrative home, mission, staffing and funding model. As examples in this paper illustrate, these rules can be adapted to different organizational models for DSUs. Clear expectations in partnership agreements are essential for high quality and consistent collaborations and address core activities, duration, staffing, cost and evaluation. A DSU is an organizational asset that should involve thoughtful investment if the institution is to gain real value.

     
    more » « less
  2. Abstract

    SkyPortalis an open-source software package designed to discover interesting transients efficiently, manage follow-up, perform characterization, and visualize the results. By enabling fast access to archival and catalog data, crossmatching heterogeneous data streams, and the triggering and monitoring of on-demand observations for further characterization, aSkyPortal-based platform has been operating at scale for >2 yr for the Zwicky Transient Facility Phase II community, with hundreds of users, containing tens of millions of time-domain sources, interacting with dozens of telescopes, and enabling community reporting. WhileSkyPortalemphasizes rich user experiences across common front-end workflows, recognizing that scientific inquiry is increasingly performed programmatically,SkyPortalalso surfaces an extensive and well-documented application programming interface system. From back-end and front-end software to data science analysis tools and visualization frameworks, theSkyPortaldesign emphasizes the reuse and leveraging of best-in-class approaches, with a strong extensibility ethos. For instance,SkyPortalnow leverages ChatGPT large language models to generate and surface source-level human-readable summaries automatically. With the imminent restart of the next generation of gravitational-wave detectors,SkyPortalnow also includes dedicated multimessenger features addressing the requirements of rapid multimessenger follow-up: multitelescope management, team/group organizing interfaces, and crossmatching of multimessenger data streams with time-domain optical surveys, with interfaces sufficiently intuitive for newcomers to the field. This paper focuses on the detailed implementations, capabilities, and early science results that establishSkyPortalas a community software package ready to take on the data science challenges and opportunities presented by this next chapter in the multimessenger era.

     
    more » « less
  3. Abstract

    An explosion of data available in the life sciences has shifted the discipline toward genomics and quantitative data science research. Institutions of higher learning have been addressing this shift by modifying undergraduate curriculums resulting in an increasing number of bioinformatics courses and research opportunities for undergraduates. The goal of this study was to explore how a newly designed introductory bioinformatics seminar could leverage the combination of in‐class instruction and independent research to build the practical skill sets of undergraduate students beginning their careers in the life sciences. Participants were surveyed to assess learning perceptions toward the dual curriculum. Most students had a neutral or positive interest in these topics before the seminar and reported increased interest after the seminar. Students had increases in confidence level in their bioinformatic proficiency and understanding of ethical principles for data/genomic science. By combining undergraduate research with directed bioinformatics skills, classroom seminars facilitated a connection between student's life sciences knowledge and emerging research tools in computational biology.

     
    more » « less
  4. As data grows exponentially across diverse fields, the ability to effectively leverage big data has become increasingly crucial. In the field of data science, however, minority groups, including African Americans, are significantly underrepresented. With the strategic role of minority-serving institutions to enhance diversity in the data science workforce and apply data science to health disparities, the National Institute for Minority Health Disparities (NIMHD) provided funding in September 2021 to six Research Centers in Minority Institutions (RCMI) to improve their data science capacity and foster collaborations with data scientists. Meharry Medical College (MMC), a historically Black College/University (HBCU), was among the six awardees. This paper summarizes the NIMHD-funded efforts at MMC, which include offering mini-grants to collaborative research groups, surveys to understand the needs of the community to guide project implementation, and data science training to enhance the data analytics skills of the RCMI investigators, staff, medical residents, and graduate students. This study is innovative as it addressed the urgent need to enhance the data science capacity of the RCMI program at MMC, build a diverse data science workforce, and develop collaborations between the RCMI and MMC’s newly established School of Applied Computational Science. This paper presents the progress of this NIMHD-funded project, which clearly shows its positive impact on the local community. 
    more » « less
  5. Meng, X-L (Ed.)
    A substantial fraction of students who complete their college education at a public university in the United States begin their journey at one of the 935 public 2-year colleges. While the number of 4-year colleges offering bachelor’s degrees in data science continues to increase, data science instruction at many 2-year colleges lags behind. A major impediment is the relative paucity of introductory data science courses that serve multiple student audiences and can easily transfer. In addition, the lack of predefined transfer pathways (or articulation agreements) for data science creates a growing disconnect that leaves students who want to study data science at a disadvantage. We describe opportunities and barriers to data science transfer pathways. Five points of curricular friction merit attention: 1) a first course in data science, 2) a second course in data science, 3) a course in scientific computing, data science workflow, and/or reproducible computing, 4) lab sciences, and 5) navigating communication, ethics, and application domain requirements in the context of general education and liberal arts course mappings. We catalog existing transfer pathways, efforts to align curricula across institutions, obstacles to overcome with minimally disruptive solutions, and approaches to foster these pathways. Improvements in these areas are critically important to ensure that a broad and diverse set of students are able to engage and succeed in undergraduate data science programs. 
    more » « less
  6. Abstract

    Pressing environmental research questions demand the integration of increasingly diverse and large‐scale ecological datasets as well as complex analytical methods, which require specialized tools and resources.

    Computational training for ecological and evolutionary sciences has become more abundant and accessible over the past decade, but tool development has outpaced the availability of specialized training. Most training for scripted analyses focuses on individual analysis steps in one script rather than creating a scripted pipeline, where modular functions comprise an ecosystem of interdependent steps. Although current computational training creates an excellent starting place, linear styles of scripting can risk becoming labor‐ and time‐intensive and less reproducible by often requiring manual execution. Pipelines, however, can be easily automated or tracked by software to increase efficiency and reduce potential errors. Ecology and evolution would benefit from techniques that reduce these risks by managing analytical pipelines in a modular, readily parallelizable format with clear documentation of dependencies.

    Workflow management software (WMS) can aid in the reproducibility, intelligibility and computational efficiency of complex pipelines. To date, WMS adoption in ecology and evolutionary research has been slow. We discuss the benefits and challenges of implementing WMS and illustrate its use through a case study with thetargets rpackage to further highlight WMS benefits through workflow automation, dependency tracking and improved clarity for reviewers.

    Although WMS requires familiarity with function‐oriented programming and careful planning for more advanced applications and pipeline sharing, investment in training will enable access to the benefits of WMS and impart transferable computing skills that can facilitate ecological and evolutionary data science at large scales.

     
    more » « less
  7. Despite being disproportionately impacted by health disparities, Black, Hispanic, Indigenous, and other underrepresented populations account for a significant minority of graduates in biomedical data science-related disciplines. Given their commitment to educating underrepresented students and trainees, minority serving institutions (MSIs) can play a significant role in enhancing diversity in the biomedical data science workforce. Little has been published about the reach, curricular breadth, and best practices for delivering these data science training programs. The purpose of this paper is to summarize six Research Centers in Minority Institutions (RCMIs) awarded funding from the National Institute of Minority Health Disparities (NIMHD) to develop new data science training programs. A cross-sectional survey was conducted to better understand the demographics of learners served, curricular topics covered, methods of instruction and assessment, challenges, and recommendations by program directors. Programs demonstrated overall success in reach and curricular diversity, serving a broad range of students and faculty, while also covering a broad range of topics. The main challenges highlighted were a lack of resources and infrastructure and teaching learners with varying levels of experience and knowledge. Further investments in MSIs are needed to sustain training efforts and develop pathways for diversifying the biomedical data science workforce. 
    more » « less
  8. While reviewing and discussing the potential of data science in oncology, we emphasize medical imaging and radiomics as the leading contextual frameworks to measure the impacts of Artificial Intelligence (AI) and Machine Learning (ML) developments. We envision some domains and research directions in which radiomics should become more significant in view of current barriers and limitations. 
    more » « less
  9. Hartshorne, Richard (Ed.)
    Data science and computational thinking (CT) skills are important STEM literacies necessary to make informed daily decisions. In elementary schools, particularly in rural areas, there is little instruction and limited research towards understanding and developing these literacies. Using a Research-Practice Partnership model (RPP; Coburn & Penuel, 2016) we conducted multimethod research investigating nine elementary teachers’ perceptions of data science and related curriculum design during professional development (PD). Connected Learning theory, enhanced with Universal Design for Learning, guided ways we assisted teachers in designing the data science curriculum. Findings suggest teachers maintained high levels of interest in data science instruction and CT before and after the PD and increased their self-efficacy towards teaching data science. A thematic analysis revealed how a data science framework guided curriculum design and assisted teachers in defining, understanding, and co-creating the curriculum. During curriculum design, teachers shared the workload among partners, made collaborative design choices, integrated differentiation strategies, and felt confidence towards teaching data science. Identified challenges included locating data sets and the complexity of understanding data science and related software. This study addresses the research gap in data science education for elementary teachers and assists with successful strategies for data science PD and curricular design. 
    more » « less