skip to main content

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 8:00 PM ET on Friday, March 21 until 8:00 AM ET on Saturday, March 22 due to maintenance. We apologize for the inconvenience.


Title: How Computer Science and Statistics Instructors Approach Data Science Pedagogy Differently: Three Case Studies
Over the past decade, data science courses have been growing more popular across university campuses. These courses often involve a mix of programming and statistics and are taught by instructors from diverse backgrounds. In our experiences launching a data science program at a large public U.S. university over the past four years, we noticed one central tension within many such courses: instructors must finely balance how much computing versus statistics to teach in the limited available time. In this experience report, we provide a detailed firsthand reflection on how we have personally balanced these two major topic areas within several offerings of a large introductory data science course that we taught and wrote an accompanying textbook for; our course has served several thousand students over the past four years. We present three case studies from our experiences to illustrate how computer science and statistics instructors approach data science differently on topics ranging from algorithmic depth to modeling to data acquisition. We then draw connections to deeper tradeoffs in data science to help guide instructors who design interdisciplinary courses. We conclude by suggesting ways that instructors can incorporate both computer science and statistics perspectives to improve data science teaching.  more » « less
Award ID(s):
1730628
PAR ID:
10399610
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
SIGCSE 2022: Proceedings of the 53rd ACM Technical Symposium on Computer Science Education
Volume:
1
Page Range / eLocation ID:
29 to 35
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Introductory data science courses are appearing at colleges, universities, and high schools around the country and the world. What topics do we cover in these courses, and how and why are these decisions made? How do we consider the background knowledge of our students and how they hope to utilize their skills after this course (whether professionally, additional courses, or as an engaged citizen)? In addition, the course is being taught by computer scientists, statisticians, business analysts, mathematicians, journalists, etc. Each of these disciplines approaches the topics differently. What upskilling is required of instructors to prepare them to integrate material from academic disciplines in which they were not trained into the course? How much, if any, cross-disciplinary collaboration, and discussion occurs or should occur in designing this course? Participants in this birds-of-a-feather will share their decision processes and choices about introductory data science courses that they teach or are designing. This includes choices made about the content as well as whether and how upskilling occurs. They will review and refine a list of current data science topics created based on national surveys of data science instructors as well as a review of curriculum guidelines. Close attention will be paid to differing language between data science instructors from different academic backgrounds. We welcome new and experienced data science instructors, educators planning on or interested in teaching such a course. 
    more » « less
  2. With increasingly technology-driven workplaces and high data volumes, instructors across STEM+C disciplines are integrating more data science topics into their course learning objectives. However, instructors face significant challenges in integrating additional data science concepts into their already full course schedules. Streamlined instructional modules that are integrated with course content, and cover relevant data science topics, such as data collection, uncertainty in data, visualization, and analysis using statistical and machine learning methods can benefit instructors across multiple disciplines. As part of a cross-university research program, we designed a systematic structural approach–based on shared instructional and assessment principles–to construct modules that are tailored to meet the needs of multiple instructional disciplines, academic levels, and pedagogies. Adopting a research-practice partnership approach, we have collectively developed twelve modules working closely with instructors and their teaching assistants for six undergraduate courses. We identified and coded primary data science concepts in the modules into five common themes: 1) data acquisition, 2) data quality issues, 3) data use and visualization, 4) advanced machine learning techniques, and 5) miscellaneous topics that may be unique to a particular discipline (e.g., how to analyze data streams collected by a special sensor). These themes were further subdivided to make it easier for instructors to contextualize the data science concepts in discipline-specific work. In this paper, we present as a case study the design and analysis of four of the modules, primarily so we can compare and contrast pairs of similar courses that were taught at different levels or at different universities. Preliminary analyses show the wide distribution of data science topics that are common among a number of environmental science and engineering courses. We identified commonalities and differences in the integration of data science instruction (through modules) into these courses. This analysis informs the development of a set of key considerations for integrating data science concepts into a variety of STEM + C courses. 
    more » « less
  3. With increasingly technology-driven workplaces and high data volumes, instructors across STEM+C disciplines are integrating more data science topics into their course learning objectives. However, instructors face significant challenges in integrating additional data science concepts into their already full course schedules. Streamlined instructional modules that are integrated with course content, and cover relevant data science topics, such as data collection, uncertainty in data, visualization, and analysis using statistical and machine learning methods can benefit instructors across multiple disciplines. As part of a cross-university research program, we designed a systematic structural approach–based on shared instructional and assessment principles–to construct modules that are tailored to meet the needs of multiple instructional disciplines, academic levels, and pedagogies. Adopting a research-practice partnership approach, we have collectively developed twelve modules working closely with instructors and their teaching assistants for six undergraduate courses. We identified and coded primary data science concepts in the modules into five common themes: 1) data acquisition, 2) data quality issues, 3) data use and visualization, 4) advanced machine learning techniques, and 5) miscellaneous topics that may be unique to a particular discipline (e.g., how to analyze data streams collected by a special sensor). These themes were further subdivided to make it easier for instructors to contextualize the data science concepts in discipline-specific work. In this paper, we present as a case study the design and analysis of four of the modules, primarily so we can compare and contrast pairs of similar courses that were taught at different levels or at different universities. Preliminary analyses show the wide distribution of data science topics that are common among a number of environmental science and engineering courses. We identified commonalities and differences in the integration of data science instruction (through modules) into these courses. This analysis informs the development of a set of key considerations for integrating data science concepts into a variety of STEM + C courses. 
    more » « less
  4. Asynchronous online courses are popular because they offer benefits to both students and instructors. Students benefit from the convenience, flexibility, affordability, freedom of geography, and access to information. Instructors and institutions benefit by having a broad geographical reach, scalability, and cost-savings of no physical classroom. A challenge with asynchronous online courses is providing students with engaging, collaborative and interactive experiences. Here, we describe how an online poster symposium can be used as a unique educational experience and assessment tool in a large-enrollment (e.g., 500 students), asynchronous, natural science, general education (GE) course. The course, Introduction to Environmental Science (ENR2100), was delivered using distance education (DE) technology over a 15-week semester. In ENR2100 students learn a variety of topics including freshwater resources, surface water, aquifers, groundwater hydrology, ecohydrology, coastal and ocean circulation, drinking water, water purification, wastewater treatment, irrigation, urban and agricultural runoff, sediment and contaminant transport, water cycle, water policy, water pollution, and water quality. Here we present a is a long-term study that takes place from 2017 to 2022 (before and after COVID-19) and involved 5,625 students over 8 semesters. Scaffolding was used to break up the poster project into smaller, more manageable assignments, which students completed throughout the semester. Instructions, examples, how-to videos, book chapters and rubrics were used to accommodate Students’ different levels of knowledge. Poster assignments were designed to teach students how to find and critically evaluate sources of information, recognize the changing nature of scientific knowledge, methods, models and tools, understand the application of scientific data and technological developments, and evaluate the social and ethical implications of natural science discoveries. At the end of the semester students participated in an asynchronous online poster symposium. Each student delivered a 5-min poster presentation using an online learning management system and completed peer reviews of their classmates’ posters using a rubric. This poster project met the learning objectives of our natural science, general education course and taught students important written, visual and verbal communication skills. Students were surveyed to determine, which parts of the course were most effective for instruction and learning. Students ranked poster assignments first, followed closely by lectures videos. Approximately 87% of students were confident that they could produce a scientific poster in the future and 80% of students recommended virtual poster symposiums for online courses. 
    more » « less
  5. Data Science is one of the fastest growing fields with unmet demand from employers. Many academic institutions have taken on the task of creating programs to meet both current and future needs and demands. Data science, as a field, integrates aspects of computer science, statistics, and subject matter expertise which encourages cross-disciplinary conversations and collaboration. In this talk, we present results from a broad survey of instructors of introductory college-level data science courses for undergraduates. In addition, we explore the alignment of these findings with the recommendations of various professional organizations. We conducted a national survey on topics covered in introductory, college-level data science courses. With responses from computer scientists, statisticians, and allied fields, these results represent a wide array of instructors of data science. The survey identifies topics commonly covered, the amount of time spent on each, common and divergent definitions of data science, and course materials used. These results will be presented. We will then discuss the alignment of these results through a rigorous review and synthesis of recommendations from various professional organizations. These include Association for Computing Machinery's Computing Competencies for Undergraduate Data Science Curricula[1], the National Academies of Science, Engineering, and Medicine’s Data Science for Undergraduates: Opportunities and Options[2], the Park City Math Institute's report Curriculum Guidelines for Undergraduate Programs in Data Science[3], and the American Statistical Association’s Two-Year College Data Science Summit Final Report[4] and Curriculum Guidelines for Undergraduate Programs in Statistical Science[5]. We will also explore alignment with ABET’s accreditation of data science.[6] 
    more » « less