skip to main content


Title: Teaching Natural Language Processing through Big Data Text Summarization with Problem-Based Learning
Abstract Natural language processing (NLP) covers a large number of topics and tasks related to data and information management, leading to a complex and challenging teaching process. Meanwhile, problem-based learning is a teaching technique specifically designed to motivate students to learn efficiently, work collaboratively, and communicate effectively. With this aim, we developed a problem-based learning course for both undergraduate and graduate students to teach NLP. We provided student teams with big data sets, basic guidelines, cloud computing resources, and other aids to help different teams in summarizing two types of big collections: Web pages related to events, and electronic theses and dissertations (ETDs). Student teams then deployed different libraries, tools, methods, and algorithms to solve the task of big data text summarization. Summarization is an ideal problem to address learning NLP since it involves all levels of linguistics, as well as many of the tools and techniques used by NLP practitioners. The evaluation results showed that all teams generated coherent and readable summaries. Many summaries were of high quality and accurately described their corresponding events or ETD chapters, and the teams produced them along with NLP pipelines in a single semester. Further, both undergraduate and graduate students gave statistically significant positive feedback, relative to other courses in the Department of Computer Science. Accordingly, we encourage educators in the data and information management field to use our approach or similar methods in their teaching and hope that other researchers will also use our data sets and synergistic solutions to approach the new and challenging tasks we addressed.  more » « less
Award ID(s):
1619028 1319578 1638207 1822436
NSF-PAR ID:
10146448
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Data and Information Management
Volume:
4
Issue:
1
ISSN:
2543-9251
Page Range / eLocation ID:
18 to 43
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Many university engineering programs require their students to complete a senior capstone experience to equip them with the knowledge and skills they need to succeed after graduation. Such capstone experiences typically integrate knowledge and skills learned cumulatively in the degree program, often engaging students in projects outside of the classroom. As part of an initiative to completely transform the civil engineering undergraduate program at Clemson University, a capstone-like course sequence is being incorporated into the curriculum during the sophomore year. Funded by a grant from the National Science Foundation’s Revolutionizing Engineering Departments (RED) program, this departmental transformation (referred to as the Arch initiative) is aiming to develop a culture of adaptation and a curriculum support for inclusive excellence and innovation to address the complex challenges faced by our society. Just as springers serve as the foundation stones of an arch, the new courses are called “Springers” because they serve as the foundations of the transformed curriculum. The goal of the Springer course sequence is to expose students to the “big picture” of civil engineering while developing student skills in professionalism, communication, and teamwork through real-world projects and hands-on activities. The expectation is that the Springer course sequence will allow faculty to better engage students at the beginning of their studies and help them understand how future courses contribute to the overall learning outcomes of a degree in civil engineering. The Springer course sequence is team-taught by faculty from both civil engineering and communication, and exposes students to all of the civil engineering subdisciplines. Through a project-based learning approach, Springer courses mimic capstone in that students work on a practical application of civil engineering concepts throughout the semester in a way that challenges students to incorporate tools that they will build on and use during their junior and senior years. In the 2019 spring semester, a pilot of the first of the Springer courses (Springer 1; n=11) introduced students to three civil engineering subdisciplines: construction management, hydrology, and transportation. The remaining subdisciplines will be covered in a follow-on Springer 2 pilot.. The project for Springer 1 involved designing a small parking lot for a church located adjacent to campus. Following initial instruction in civil engineering topics related to the project, students worked in teams to develop conceptual project designs. A design charrette allowed students to interact with different stakeholders to assess their conceptual designs and incorporate stakeholder input into their final designs. The purpose of this paper is to describe all aspects of the Springer 1 course, including course content, teaching methods, faculty resources, and the design and results of a Student Assessment of Learning Gains (SALG) survey to assess students’ learning outcomes. An overview of the Springer 2 course is also provided. The feedback from the SALG indicated positive attitudes towards course activities and content, and that students found interaction with project stakeholders during the design charrette especially beneficial. Challenges for full scale implementation of the Springer course sequence as a requirement in the transformed curriculum are also discussed. 
    more » « less
  2. Abstract

    The era of ‘big data’ promises to provide new hydrologic insights, and open web‐based platforms are being developed and adopted by the hydrologic science community to harness these datasets and data services. This shift accompanies advances in hydrology education and the growth of web‐based hydrology learning modules, but their capacity to utilize emerging open platforms and data services to enhance student learning through data‐driven activities remains largely untapped. Given that generic equations may not easily translate into local or regional solutions, teaching students to explore how well models or equations work in particular settings or to answer specific problems using real data is essential. This article introduces an open web‐based module developed to advance data‐driven hydrologic process learning, targeting upper level undergraduate and early graduate students in hydrology and engineering. The module was developed and deployed on the HydroLearn open educational platform, which provides a formal pedagogical structure for developing effective problem‐based learning activities. We found that data‐driven learning activities utilizing collaborative open web platforms like CUAHSI HydroShare and JupyterHub to store and run computational notebooks allowed students to access and work with datasets for systems of personal interest and promoted critical evaluation of results and assumptions. Initial student feedback was generally positive, but also highlighted challenges including trouble‐shooting and future‐proofing difficulties and some resistance to programming and new software. Opportunities to further enhance hydrology learning include better articulating the benefits of coding and open web platforms upfront, incorporating additional user‐support tools, and focusing methods and questions on implementing and adapting notebooks to explore fundamental processes rather than tools and syntax. The profound shift in the field of hydrology toward big data, open data services and reproducible research practices requires hydrology instructors to rethink traditional content delivery and focus instruction on harnessing these datasets and practices in the preparation of future hydrologists and engineers.

     
    more » « less
  3. Engineering is fundamentally about design, yet many undergraduate programs offer limited opportunities for students to learn to design. This design case reports on a grant-funded effort to revolutionize how chemical engineering is taught. Prior to this effort, our chemical engineering program was like many, offering core courses primarily taught through lectures and problem sets. While some faculty referenced examples, students had few opportunities to construct and apply what they were learning. Spearheaded by a team that included the department chair, a learning scientist, a teaching-intensive faculty member, and faculty heavily engaged with the undergraduate program, we developed and implemented design challenges in core chemical engineering courses. We began by co-designing with students and faculty, initially focusing on the first two chemical engineering courses students take. We then developed templates and strategies that supported other faculty-student teams to expand the approach into more courses. Across seven years of data collection and iterative refinements, we developed a framework that offers guidance as we continue to support new faculty in threading design challenges through core content-focused courses. We share insights from our process that supported us in navigating through challenging questions and concerns.

     
    more » « less
  4. null (Ed.)
    The first major goal of this project is to build a state-of-the-art information storage, retrieval, and analysis system that utilizes the latest technology and industry methods. This system is leveraged to accomplish another major goal, supporting modern search and browse capabilities for a large collection of tweets from the Twitter social media platform, web pages, and electronic theses and dissertations (ETDs). The backbone of the information system is a Docker container cluster running with Rancher and Kubernetes. Information retrieval and visualization is accomplished with containers in a pipelined fashion, whether in the cluster or on virtual machines, for Elasticsearch and Kibana, respectively. In addition to traditional searching and browsing, the system supports full-text and metadata searching. Search results include facets as a modern means of browsing among related documents. The system supports text analysis and machine learning to reveal new properties of collection data. These new properties assist in the generation of available facets. Recommendations are also presented with search results based on associations among documents and with logged user activity. The information system is co-designed by five teams of Virginia Tech graduate students, all members of the same computer science class, CS 5604. Although the project is an academic exercise, it is the practice of the teams to work and interact as though they are groups within a company developing a product. The teams on this project include three collection management groups -- Electronic Theses and Dissertations (ETD), Tweets (TWT), and Web-Pages (WP) -- as well as the Front-end (FE) group and the Integration (INT) group to help provide the overarching structure for the application. This submission focuses on the work of the Integration (INT) team, which creates and administers Docker containers for each team in addition to administering the cluster infrastructure. Each container is a customized application environment that is specific to the needs of the corresponding team. Each team will have several of these containers set up in a pipeline formation to allow scaling and extension of the current system. The INT team also contributes to a cross-team effort for exploring the use of Elasticsearch and its internally associated database. The INT team administers the integration of the Ceph data storage system into the CS Department Cloud and provides support for interactions between containers and the Ceph filesystem. During formative stages of development, the INT team also has a role in guiding team evaluations of prospective container components and workflows. The INT team is responsible for the overall project architecture and facilitating the tools and tutorials that assist the other teams in deploying containers in a development environment according to mutual specifications agreed upon with each team. The INT team maintains the status of the Kubernetes cluster, deploying new containers and pods as needed by the collection management teams as they expand their workflows. This team is responsible for utilizing a continuous integration process to update existing containers. During the development stage the INT team collaborates specifically with the collection management teams to create the pipeline for the ingestion and processing of new collection documents, crossing services between those teams as needed. The INT team develops a reasoner engine to construct workflows with information goal as input, which are then programmatically authored, scheduled, and monitored using Apache Airflow. The INT team is responsible for the flow, management, and logging of system performance data and making any adjustments necessary based on the analysis of testing results. The INT team has established a Gitlab repository for archival code related to the entire project and has provided the other groups with the documentation to deposit their code in the repository. This repository will be expanded using Gitlab CI in order to provide continuous integration and testing once it is available. Finally, the INT team will provide a production distribution that includes all embedded Docker containers and sub-embedded Git source code repositories. The INT team will archive this distribution on the Virginia Tech Docker Container Registry and deploy it on the Virginia Tech CS Cloud. The INT-2020 team owes a sincere debt of gratitude to the work of the INT-2019 team. This is a very large undertaking and the wrangling of all of the products and processes would not have been possible without their guidance in both direct and written form. We have relied heavily on the foundation they and their predecessors have provided for us. We continue their work with systematic improvements, but also want to acknowledge their efforts Ibid. Without them, our progress to date would not have been possible. 
    more » « less
  5. Engineering students develop competencies in fundamental engineering courses (FECs) that are critical for success later in advanced courses and engineering practice. Literature on the student learning experience, however, associate these courses with challenging educational environments (e.g., large class sizes) and low student success rates. Challenging educational environments are particularly prevalent in large, research-intensive institutions. To address concerns associated with FECs, it is important to understand prevailing educational environments in these courses and identify critical points where improvement and change is needed. The Academic Plan Model provides a systematic way to critically examine the factors that shape the educational environment. It includes paths for evaluation and adjustment, allowing educational environments to continuously improve. The Model may be applied to various levels in an institution (e.g., course, program, college), implying that a student’s entire undergraduate learning experience is the result of several enacted academic plans that are interacting with each other. Thus, understanding context-specific factors in a specific educational environment will yield valuable information affecting the undergraduate experience, including concerns related to attrition and persistence. In order to better understand why students are not succeeding in large foundational engineering courses, we developed a form to collect data on why students withdraw from certain courses. The form was included as a requirement during the withdrawal process. In this paper, we analyzed course withdrawal data from several academic departments in charge of teaching large foundational engineering courses, and institutional transcript data for the Spring 2018 semester. The withdrawal dataset includes the final grades that students expected to receive in the course and the factors that influenced their decision to withdraw. Institutional transcript data includes demographic information (e.g., gender, major), admissions data (e.g., SAT scores, high school GPA), and institutional academic information (e.g., course grades, cumulative GPA). Results provide a better understanding of the main reasons students decide to withdraw from a course, including having unsatisfactory grades, not understanding the professor, and being overwhelmed with work. We also analyzed locus of control for the responses, finding that the majority of students withdrawing courses consider that the problem is outside of their control and comes from an external source. We provide analysis by different departments and different specific courses. Implications for administrators, practitioners, and researchers are provided. 
    more » « less