Introductory data science courses are appearing at colleges, universities, and high schools around the country and the world. What topics do we cover in these courses, and how and why are these decisions made? How do we consider the background knowledge of our students and how they hope to utilize their skills after this course (whether professionally, additional courses, or as an engaged citizen)? In addition, the course is being taught by computer scientists, statisticians, business analysts, mathematicians, journalists, etc. Each of these disciplines approaches the topics differently. What upskilling is required of instructors to prepare them to integrate material from academic disciplines in which they were not trained into the course? How much, if any, cross-disciplinary collaboration, and discussion occurs or should occur in designing this course? Participants in this birds-of-a-feather will share their decision processes and choices about introductory data science courses that they teach or are designing. This includes choices made about the content as well as whether and how upskilling occurs. They will review and refine a list of current data science topics created based on national surveys of data science instructors as well as a review of curriculum guidelines. Close attention will be paid to differing language between data science instructors from different academic backgrounds. We welcome new and experienced data science instructors, educators planning on or interested in teaching such a course.
more »
« less
How Computer Science and Statistics Instructors Approach Data Science Pedagogy Differently: Three Case Studies
Over the past decade, data science courses have been growing more popular across university campuses. These courses often involve a mix of programming and statistics and are taught by instructors from diverse backgrounds. In our experiences launching a data science program at a large public U.S. university over the past four years, we noticed one central tension within many such courses: instructors must finely balance how much computing versus statistics to teach in the limited available time. In this experience report, we provide a detailed firsthand reflection on how we have personally balanced these two major topic areas within several offerings of a large introductory data science course that we taught and wrote an accompanying textbook for; our course has served several thousand students over the past four years. We present three case studies from our experiences to illustrate how computer science and statistics instructors approach data science differently on topics ranging from algorithmic depth to modeling to data acquisition. We then draw connections to deeper tradeoffs in data science to help guide instructors who design interdisciplinary courses. We conclude by suggesting ways that instructors can incorporate both computer science and statistics perspectives to improve data science teaching.
more »
« less
- Award ID(s):
- 1730628
- PAR ID:
- 10399610
- Date Published:
- Journal Name:
- SIGCSE 2022: Proceedings of the 53rd ACM Technical Symposium on Computer Science Education
- Volume:
- 1
- Page Range / eLocation ID:
- 29 to 35
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
With increasingly technology-driven workplaces and high data volumes, instructors across STEM+C disciplines are integrating more data science topics into their course learning objectives. However, instructors face significant challenges in integrating additional data science concepts into their already full course schedules. Streamlined instructional modules that are integrated with course content, and cover relevant data science topics, such as data collection, uncertainty in data, visualization, and analysis using statistical and machine learning methods can benefit instructors across multiple disciplines. As part of a cross-university research program, we designed a systematic structural approach–based on shared instructional and assessment principles–to construct modules that are tailored to meet the needs of multiple instructional disciplines, academic levels, and pedagogies. Adopting a research-practice partnership approach, we have collectively developed twelve modules working closely with instructors and their teaching assistants for six undergraduate courses. We identified and coded primary data science concepts in the modules into five common themes: 1) data acquisition, 2) data quality issues, 3) data use and visualization, 4) advanced machine learning techniques, and 5) miscellaneous topics that may be unique to a particular discipline (e.g., how to analyze data streams collected by a special sensor). These themes were further subdivided to make it easier for instructors to contextualize the data science concepts in discipline-specific work. In this paper, we present as a case study the design and analysis of four of the modules, primarily so we can compare and contrast pairs of similar courses that were taught at different levels or at different universities. Preliminary analyses show the wide distribution of data science topics that are common among a number of environmental science and engineering courses. We identified commonalities and differences in the integration of data science instruction (through modules) into these courses. This analysis informs the development of a set of key considerations for integrating data science concepts into a variety of STEM + C courses.more » « less
-
With increasingly technology-driven workplaces and high data volumes, instructors across STEM+C disciplines are integrating more data science topics into their course learning objectives. However, instructors face significant challenges in integrating additional data science concepts into their already full course schedules. Streamlined instructional modules that are integrated with course content, and cover relevant data science topics, such as data collection, uncertainty in data, visualization, and analysis using statistical and machine learning methods can benefit instructors across multiple disciplines. As part of a cross-university research program, we designed a systematic structural approach–based on shared instructional and assessment principles–to construct modules that are tailored to meet the needs of multiple instructional disciplines, academic levels, and pedagogies. Adopting a research-practice partnership approach, we have collectively developed twelve modules working closely with instructors and their teaching assistants for six undergraduate courses. We identified and coded primary data science concepts in the modules into five common themes: 1) data acquisition, 2) data quality issues, 3) data use and visualization, 4) advanced machine learning techniques, and 5) miscellaneous topics that may be unique to a particular discipline (e.g., how to analyze data streams collected by a special sensor). These themes were further subdivided to make it easier for instructors to contextualize the data science concepts in discipline-specific work. In this paper, we present as a case study the design and analysis of four of the modules, primarily so we can compare and contrast pairs of similar courses that were taught at different levels or at different universities. Preliminary analyses show the wide distribution of data science topics that are common among a number of environmental science and engineering courses. We identified commonalities and differences in the integration of data science instruction (through modules) into these courses. This analysis informs the development of a set of key considerations for integrating data science concepts into a variety of STEM + C courses.more » « less
-
The rapid expansion of data science programs across a wide range of academic disciplines - including computer science, engineering, business, and other applied data domains - presents a challenge for standardizing curricula in line with established competencies. This paper critically examines whether university data science programs are aligned with the ACM Competencies for Undergraduate Data Science Curricula. Using a systematic review of 788 data science program offerings and 9,322 course titles, we assess levels of alignment with ACM's eleven competency areas. Additionally, we evaluate the inclusion of additional common skills course offerings, such as math/statistics, data analytics, and capstone courses. Our findings highlight significant variability in programs' adherence to the ACM competencies. This underscores the need for greater interdisciplinary collaboration towards integrating computing, statistics, and domain-specific coursework into the broad range of data science curricula, ensuring that data science graduates have a well-rounded, interdisciplinary skill set suited to the diverse applications of data science.more » « less
-
Asynchronous online courses are popular because they offer benefits to both students and instructors. Students benefit from the convenience, flexibility, affordability, freedom of geography, and access to information. Instructors and institutions benefit by having a broad geographical reach, scalability, and cost-savings of no physical classroom. A challenge with asynchronous online courses is providing students with engaging, collaborative and interactive experiences. Here, we describe how an online poster symposium can be used as a unique educational experience and assessment tool in a large-enrollment (e.g., 500 students), asynchronous, natural science, general education (GE) course. The course, Introduction to Environmental Science (ENR2100), was delivered using distance education (DE) technology over a 15-week semester. In ENR2100 students learn a variety of topics including freshwater resources, surface water, aquifers, groundwater hydrology, ecohydrology, coastal and ocean circulation, drinking water, water purification, wastewater treatment, irrigation, urban and agricultural runoff, sediment and contaminant transport, water cycle, water policy, water pollution, and water quality. Here we present a is a long-term study that takes place from 2017 to 2022 (before and after COVID-19) and involved 5,625 students over 8 semesters. Scaffolding was used to break up the poster project into smaller, more manageable assignments, which students completed throughout the semester. Instructions, examples, how-to videos, book chapters and rubrics were used to accommodate Students’ different levels of knowledge. Poster assignments were designed to teach students how to find and critically evaluate sources of information, recognize the changing nature of scientific knowledge, methods, models and tools, understand the application of scientific data and technological developments, and evaluate the social and ethical implications of natural science discoveries. At the end of the semester students participated in an asynchronous online poster symposium. Each student delivered a 5-min poster presentation using an online learning management system and completed peer reviews of their classmates’ posters using a rubric. This poster project met the learning objectives of our natural science, general education course and taught students important written, visual and verbal communication skills. Students were surveyed to determine, which parts of the course were most effective for instruction and learning. Students ranked poster assignments first, followed closely by lectures videos. Approximately 87% of students were confident that they could produce a scientific poster in the future and 80% of students recommended virtual poster symposiums for online courses.more » « less
An official website of the United States government

