Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
Abstract Natural language processing (NLP) covers a large number of topics and tasks related to data and information management, leading to a complex and challenging teaching process. Meanwhile, problem-based learning is a teaching technique specifically designed to motivate students to learn efficiently, work collaboratively, and communicate effectively. With this aim, we developed a problem-based learning course for both undergraduate and graduate students to teach NLP. We provided student teams with big data sets, basic guidelines, cloud computing resources, and other aids to help different teams in summarizing two types of big collections: Web pages related to events, and electronic theses and dissertations (ETDs). Student teams then deployed different libraries, tools, methods, and algorithms to solve the task of big data text summarization. Summarization is an ideal problem to address learning NLP since it involves all levels of linguistics, as well as many of the tools and techniques used by NLP practitioners. The evaluation results showed that all teams generated coherent and readable summaries. Many summaries were of high quality and accurately described their corresponding events or ETD chapters, and the teams produced them along with NLP pipelines in a single semester. Further, both undergraduate and graduate students gave statistically significant positive feedback, relative to other courses in the Department of Computer Science. Accordingly, we encourage educators in the data and information management field to use our approach or similar methods in their teaching and hope that other researchers will also use our data sets and synergistic solutions to approach the new and challenging tasks we addressed.more » « less
The purpose of the Twitter Disaster Behavior project is to identify patterns in online behavior during natural disasters by analyzing Twitter data. The main goal is to better understand the needs of a community during and after a disaster, to aid in recovery. The datasets analyzed were collections of tweets about Hurricane Maria, and recent earthquake events, in Puerto Rico. All tweets pertaining to Hurricane Maria are from the timeframe of September 15 through October 14, 2017. Similarly, tweets pertaining to the Puerto Rico earthquake from January 7 through February 6, 2020 were collected. These tweets were then analyzed for their content, number of retweets, and the geotag associated with the author of the tweet. We counted the occurrence of key words in topics relating to preparation, response, impact, and recovery. This data was then graphed using Python and Matplotlib. Additionally, using a Twitter crawler, we extracted a large dataset of tweets by users that used geotags. These geotags are used to examine location changes among the users before, during, and after each natural disaster. Finally, after performing these analyses, we developed easy to understand visuals and compiled these figures into a poster. Using these figures and graphs, we compared the two datasets in order to identify any significant differences in behavior and response. The main differences we noticed stemmed from two key reasons: hurricanes can be predicted whereas earthquakes cannot, and hurricanes are usually an isolated event whereas earthquakes are followed by aftershocks. Thus, the Hurricane Maria dataset experienced the highest amount of tweet activity at the beginning of the event and the Puerto Rico earthquake dataset experienced peaks in tweet activity throughout the entire period, usually corresponding to aftershock occurrences. We studied these differences, as well as other important trends we identified.more » « less