skip to main content


Search for: All records

Creators/Authors contains: "Horton, Nicholas"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Meng, X-L (Ed.)
    Many data science students and practitioners are reluctant to adopt good coding practices as long as the code ‘works.’ However, code standards are an important part of modern data science practice, and they play an essential role in the development of data acumen. Good coding practices lead to more reliable code and save more time than they cost, making them important even for beginners. We believe that principled coding is vital for quality data science practice. To effectively instill these practices within academic programs, instructors and programs need to begin establishing these practices early, to reinforce them often, and to hold themselves to a higher standard while guiding students. We describe key aspects of good coding practices for data science, illustrating with examples in R and in Python, though similar standards are applicable to other software environments. Practical coding guidelines are organized into a top ten list. 
    more » « less
  2. Meng, X-L (Ed.)
    A substantial fraction of students who complete their college education at a public university in the United States begin their journey at one of the 935 public 2-year colleges. While the number of 4-year colleges offering bachelor’s degrees in data science continues to increase, data science instruction at many 2-year colleges lags behind. A major impediment is the relative paucity of introductory data science courses that serve multiple student audiences and can easily transfer. In addition, the lack of predefined transfer pathways (or articulation agreements) for data science creates a growing disconnect that leaves students who want to study data science at a disadvantage. We describe opportunities and barriers to data science transfer pathways. Five points of curricular friction merit attention: 1) a first course in data science, 2) a second course in data science, 3) a course in scientific computing, data science workflow, and/or reproducible computing, 4) lab sciences, and 5) navigating communication, ethics, and application domain requirements in the context of general education and liberal arts course mappings. We catalog existing transfer pathways, efforts to align curricula across institutions, obstacles to overcome with minimally disruptive solutions, and approaches to foster these pathways. Improvements in these areas are critically important to ensure that a broad and diverse set of students are able to engage and succeed in undergraduate data science programs. 
    more » « less
  3. Abstract

    Text provides a compelling example of unstructured data that can be used to motivate and explore classification problems. Challenges arise regarding the representation of features of text and student linkage between text representations as character strings and identification of features that embed connections with underlying phenomena. In order to observe how students reason with text data in scenarios designed to elicit certain aspects of the domain, we employed a task‐based interview method using a structured protocol with six pairs of undergraduate students. Our goal was to shed light on students' understanding of text as data using a motivating task to classify headlines as “clickbait” or “news.” Three types of features (function, content, and form) surfaced, the majority from the first scenario. Our analysis of the interviews indicates that this sequence of activities engaged the participants in thinking at both the human‐perception level and the computer‐extraction level and conceptualizing connections between them.

     
    more » « less
  4. While coursework provides undergraduate data science students with some relevant analytic skills, many are not given the rich experiences with data and computing they need to be successful in the workplace. Additionally, students often have limited exposure to team-based data science and the principles and tools of collaboration that are encountered outside of school.

    In this paper, we describe the DSC-WAV program, an NSF-funded data science workforce development project in which teams of undergraduate sophomores and juniors work with a local non-profit organization on a data-focused problem. To help students develop a sense of agency and improve confidence in their technical and non-technical data science skills, the project promoted a team-based approach to data science, adopting several processes and tools intended to facilitate this collaboration.

    Evidence from the project evaluation, including participant survey and interview data, is presented to document the degree to which the project was successful in engaging students in team-based data science, and how the project changed the students' perceptions of their technical and non-technical skills. We also examine opportunities for improvement and offer insight to other data science educators who may want to implement a similar team-based approach to data science projects at their own institutions.

     
    more » « less
  5. null (Ed.)
  6. A report summarizing the “Keeping Data Science Broad” series including data science challenges, visions for the future, and community asks. The goal of the Keeping Data Science Broad series was to garner community input into pathways for keeping data science education broadly inclusive across sectors, institutions, and populations. Input was collected from a community input survey, three webinars (Data Science in the Traditional Context, Alternative Avenues for Development of Data Science Education Capacity, and Big Picture for a Big Data Science Education Network available to view through the South Big Data Hub YouTube channel) and an interactive workshop (Negotiating the Digital and Data Divide). Through these venues, we explore the future of data science education and workforce at institutions of higher learning that are primarily teaching-focused. The workshop included representatives from sixty data science programs across the nation, either traditional or alternative, and from a range of institution types including community colleges, Historically Black Colleges and Universities (HBCU’s), Hispanic-Serving Institutions (HSI’s), other minority-led and minority-serving institutions, liberal arts colleges, tribal colleges, universities, and industry partners. 
    more » « less