skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Teaching Natural Language Processing through Big Data Text Summarization with Problem-Based Learning
Abstract Natural language processing (NLP) covers a large number of topics and tasks related to data and information management, leading to a complex and challenging teaching process. Meanwhile, problem-based learning is a teaching technique specifically designed to motivate students to learn efficiently, work collaboratively, and communicate effectively. With this aim, we developed a problem-based learning course for both undergraduate and graduate students to teach NLP. We provided student teams with big data sets, basic guidelines, cloud computing resources, and other aids to help different teams in summarizing two types of big collections: Web pages related to events, and electronic theses and dissertations (ETDs). Student teams then deployed different libraries, tools, methods, and algorithms to solve the task of big data text summarization. Summarization is an ideal problem to address learning NLP since it involves all levels of linguistics, as well as many of the tools and techniques used by NLP practitioners. The evaluation results showed that all teams generated coherent and readable summaries. Many summaries were of high quality and accurately described their corresponding events or ETD chapters, and the teams produced them along with NLP pipelines in a single semester. Further, both undergraduate and graduate students gave statistically significant positive feedback, relative to other courses in the Department of Computer Science. Accordingly, we encourage educators in the data and information management field to use our approach or similar methods in their teaching and hope that other researchers will also use our data sets and synergistic solutions to approach the new and challenging tasks we addressed.  more » « less
Award ID(s):
1619028 1319578 1638207 1822436
PAR ID:
10146448
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Data and Information Management
Volume:
4
Issue:
1
ISSN:
2543-9251
Page Range / eLocation ID:
18 to 43
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. To pursue transdisciplinary education, bringing together different disciplinary perspectives is necessary. As two graduate researchers, in engineering technology and anthropology, on a National Science Foundation (NSF) Improving Undergraduate STEM Education research project, we want to embody and explore our role in the journey to pursue transdisciplinary education. Our familiarity with higher education as students, our different disciplinary backgrounds and lived experiences, and our training as an engineering technology educator and a social scientist contribute greatly to the advancement of understanding the project. Harnessing our combined expertise enables us to see collaborative co-teaching, group learning, and student engagement in new ways. Often transdisciplinary education research is approached from siloed disciplines or from a single perspective and not inclusive of graduate students' perspectives. We find ourselves working on a collaborative cross-college project between three different colleges, Business, Engineering Technology, and Liberal Arts, where faculty and students are co-teaching and co-learning in a series of design and innovation courses. A key element of this project is gathering and using stakeholder data from students, faculty, and administrators. Midway through our three-year project, the NSF project’s external reviewer highlighted the crucial value added of having graduate researchers looking at transforming higher education towards transdisciplinarity. With that in mind, we offer some guiding thoughts about collaborative research among graduate students and faculty from different academic disciplines. This includes tips on how we collaborated in coding, analysis, and data presentations. Using project examples, we will discuss how we used tools for collaboration such as NVivo Teams and Microsoft Teams; these platforms aided in contributing to the iterative research design of this project and research outputs. Our process was strengthened by active participation in project meetings with faculty, educational community events, and data review sessions to reach data consensus. We have noticed how transdisciplinarity can transform undergraduate learning and encourage cross-college faculty collaboration. We will reflect on the significance of collaboration at all levels of higher education. Furthermore, this experience has set us on the path to becoming transdisciplinary scholars ourselves. 
    more » « less
  2. There is a critical nationwide shortage of IT professionals as well as of scientists and engineers with high-performance computing (HPC) and big data related advanced computing skills. Simultaneously, the technology is growing in complexity and sophistication, which has led to the use of multi-disciplinary teams with members from a broad range of home domains everywhere in industry, government, and academia. Moreover, a lot of the vital team collaborations take will place virtually using a variety of software platforms now and in the future. We report here on experiences with preparing undergraduate and graduate students for these career opportunities in several contexts, from regular semester classes, an undergraduate summer research program, to an advanced graduate student CyberTraining program. All these programs are conducted fully online and leveraged concepts of flipped classrooms, recorded lectures, team-based and active learning, regular oral presentations, and more to ensure student engagement and lasting learning. 
    more » « less
  3. Engineering is fundamentally about design, yet many undergraduate programs offer limited opportunities for students to learn to design. This design case reports on a grant-funded effort to revolutionize how chemical engineering is taught. Prior to this effort, our chemical engineering program was like many, offering core courses primarily taught through lectures and problem sets. While some faculty referenced examples, students had few opportunities to construct and apply what they were learning. Spearheaded by a team that included the department chair, a learning scientist, a teaching-intensive faculty member, and faculty heavily engaged with the undergraduate program, we developed and implemented design challenges in core chemical engineering courses. We began by co-designing with students and faculty, initially focusing on the first two chemical engineering courses students take. We then developed templates and strategies that supported other faculty-student teams to expand the approach into more courses. Across seven years of data collection and iterative refinements, we developed a framework that offers guidance as we continue to support new faculty in threading design challenges through core content-focused courses. We share insights from our process that supported us in navigating through challenging questions and concerns. 
    more » « less
  4. ABSTRACT Undergraduate research experiences (UREs) cultivate workforce skills, such as critical thinking, project management, and scientific communication. Many UREs in biophysical research have constraints related to limited resources, often resulting in smaller student cohorts, barriers for students entering a research environment, and fewer mentorship opportunities for graduate students. In response to those limitations, we have created a structured URE model that uses an asynchronous training style paired with direct-tiered mentoring delivered by peers, graduate students, and faculty. The adaptive undergraduate research training and experience (AURTE) framework was piloted as part of the Brown Experiential Learning program, a computational biophysics research lab. The program previously demonstrated substantial increases and improvements in the number of students served and skills developed. Here, we discuss the long-term effectiveness of the framework, impacts on graduate and undergraduate students, and efficacy in teaching research skills and computational-based biophysical methods. The longitudinal impact of our structured URE on student outcomes was analyzed by using student exit surveys, interviews, assessments, and 5 years of feedback from alumni. Results indicate high levels of student retention in research compared with university-wide metrics. Also, student feedback emphasizes how tiered mentoring enhanced research skill retention, while allowing graduate mentors to develop mentorship and workforce skills to expedite research. Responses from alumni affirm that workforce-ready skills (communicating science, data management, and scientific writing) acquired in the program persisted and were used in postgraduate careers. The framework reinforces the importance of establishing, iterating, and evaluating a structured URE framework to foster student success in biophysical research, while promoting mentorship skill training for graduate students. Future work will explore the adaptability of the framework in wet lab environments and probe the potential of AURTE in broader educational contexts. 
    more » « less
  5. While the NLP community has produced numerous summarization benchmarks, none provide the rich annotations required to simultaneously address many important problems related to control and reliability. We introduce a Wikipedia-derived benchmark, complemented by a rich set of crowd-sourced annotations, that supports 8 interrelated tasks: (i) extractive summarization; (ii) abstractive summarization; (iii) topic-based summarization; (iv) compressing selected sentences into a one-line summary; (v) surfacing evidence for a summary sentence; (vi) predicting the factual accuracy of a summary sentence; (vii) identifying unsubstantiated spans in a summary sentence; (viii) correcting factual errors in summaries. We compare various methods on this benchmark and discover that on multiple tasks, moderately-sized fine-tuned models consistently outperform much larger few-shot prompted language models. For factuality-related tasks, we also evaluate existing heuristics to create training data and find that training on them results in worse performance than training on 20× less human-labeled data. Our articles draw from 6 domains, facilitating cross-domain analysis. On some tasks, the amount of training data matters more than the domain where it comes from, while for other tasks training specifically on data from the target domain, even if limited, is more beneficial. 
    more » « less