skip to main content


Title: Conversational Metaphors in Use: Exploring the Contrast between Technical and Everyday Notions of Metaphor
Much computational work has been done on identifying and interpreting the meaning of metaphors, but little work has been done on understanding the motivation behind the use of metaphor. To computationally model discourse and social positioning in metaphor, we need a corpus annotated with metaphors relevant to speaker intentions. This paper reports a corpus study as a first step towards computational work on social and discourse functions of metaphor. We use Amazon Mechanical Turk (MTurk) to annotate data from three web discussion forums covering distinct domains. We then compare these to annotations from our own annotation scheme which distinguish levels of metaphor with the labels: nonliteral, conventionalized, and literal. Our hope is that this work raises questions about what new work needs to be done in order to address the question of how metaphors are used to achieve social goals in interaction.  more » « less
Award ID(s):
1302522
NSF-PAR ID:
10080461
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Second Workshop on Metaphor in NLP
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    The use of metaphor in cybersecurity discourse has become a topic of interest because of its ability to aid communication about abstract security concepts. In this paper, we borrow from existing metaphor identification algorithms and general theories to create a lightweight metaphor identification algorithm, which uses only one external source of knowledge. The algorithm also introduces a real time corpus builder for extracting collocates; this is, identifying words that appear together more frequently than chance. We implement several variations of the introduced algorithm and empirically evaluate the output using the TroFi dataset, a de facto evaluation dataset in metaphor research. We find first, contrary to our expectation, that adding word sense disambiguation to our metaphor identification algorithm decreases its performance. Second, we find, that our lightweight algorithms perform comparably to their existing, more complex, counterparts. Finally, we present the results of several case studies to observe the utility of the algorithm for future research in linguistic metaphor identification in text related to cybersecurity texts and threats.

     
    more » « less
  2. Purpose The goal of this study is to explore an immediate step in understanding the lived experiences of under-represented students through metaphor construction and possibly collect more in-depth data through photograph-based interviews. Design/Methodology/Approach This article introduced photo-elicitation based narrative interviews as a qualitative methodology while interviewing fourteen undergraduate community college students mostly from underrepresented groups (URGs). At the beginning of each interview, the authors probed the participants with 8 photographs chosen by the research team to represent a diverse set of experiences in engineering. The authors conducted a thematic analysis of the interview data. Findings The findings suggested that the inclusion of photo-elicitation often catalyzed consumption of representations, images, metaphors, and voice to stories passed unnoticed; and finally produces more detailed descriptions and complements semi-structured narrative interviews. Research Limitations/Implications This study advances the scholarship that extends photograph driven interviews/photo elicitation methodology while interviewing marginalized population and offers a roadmap for what a multi-modal, arts-based analysis process might look like for in-depth interviews. Practical Implications The use of photo-elicitation in our research enabled a deeper, more poignant exploration of the URG students' experience of navigating engineering. The participants were able to relate to the photographs and shared their life narratives through them; hence, use of photographs can be adapted in future research. Social Implications Our research revealed that PEI has excellent potential to capture marginalized narratives of URGs, which is not well explored in educational research, specially, in higher education. In our research, PEI promoted more culturally inclusive approaches positioning the participants as experts of their own narratives. Originality/Value The study presented in this paper serves as an example of qualitative research that expands methodological boundaries and centers the role of power, marginalization, and creativity in research. This work serves as a unique and important contribution to the photo-elicitation literature, offering a critical roadmap for researchers who are drawn to photo elicitation/photograph driven interviews as a method to explore their inquiry. 
    more » « less
  3. null (Ed.)
    Today’s classrooms are remarkably different from those of yesteryear. In place of individual students responding to the teacher from neat rows of desks, one more typically finds students working in groups on projects, with a teacher circulating among groups. AI applications in learning have been slow to catch up, with most available technologies focusing on personalizing or adapting instruction to learners as isolated individuals. Meanwhile, an established science of Computer Supported Collaborative Learning has come to prominence, with clear implications for how collaborative learning could best be supported. In this contribution, I will consider how intelligence augmentation could evolve to support collaborative learning as well as three signature challenges of this work that could drive AI forward. In conceptualizing collaborative learning, Kirschner and Erkens (2013) provide a useful 3x3 framework in which there are three aspects of learning (cognitive, social and motivational), three levels (community, group/team, and individual) and three kinds of pedagogical supports (discourse-oriented, representation-oriented, and process-oriented). As they engage in this multiply complex space, teachers and learners are both learning to collaborate and collaborating to learn. Further, questions of equity arise as we consider who is able to participate and in which ways. Overall, this analysis helps us see the complexity of today’s classrooms and within this complexity, the opportunities for augmentation or “assistance to become important and even essential. An overarching design concept has emerged in the past 5 years in response to this complexity, the idea of intelligent augmentation for “orchestrating” classrooms (Dillenbourg, et al, 2013). As a metaphor, orchestration can suggest the need for a coordinated performance among many agents who are each playing different roles or voicing different ideas. Practically speaking, orchestration suggests that “intelligence augmentation” could help many smaller things go well, and in doing so, could enable the overall intention of the learning experience to succeed. Those smaller things could include helping the teacher stay aware of students or groups who need attention, supporting formation of groups or transitions from one activity to the next, facilitating productive social interactions in groups, suggesting learning resources that would support teamwork, and more. A recent panel of AI experts identified orchestration as an overarching concept that is an important focus for near-term research and development for intelligence augmentation (Roschelle, Lester & Fusco, 2020). Tackling this challenging area of collaborative learning could also be beneficial for advancing AI technologies overall. Building AI agents that better understand the social context of human activities has broad importance, as does designing AI agents that can appropriately interact within teamwork. Collaborative learning has trajectory over time, and designing AI systems that support teams not just with a short term recommendation or suggestion but in long-term developmental processes is important. Further, classrooms that are engaged in collaborative learning could become very interesting hybrid environments, with multiple human and AI agents present at once and addressing dual outcome goals of learning to collaborate and collaborating to learn; addressing a hybrid environment like this could lead to developing AI systems that more robustly help many types of realistic human activity. In conclusion, the opportunity to make a societal impact by attending to collaborative learning, the availability of growing science of computer-supported collaborative learning and the need to push new boundaries in AI together suggest collaborative learning as a challenge worth tackling in coming years. 
    more » « less
  4. Abstract

    The abstract nature of energy encourages metaphorical language. In educational settings, teachers and students use conceptual metaphors subconsciously to express their ideas about what energy is or how it functions in particular scenarios. However, research on scientific analogies and metaphors has predominantly focused on explicit instructional analogies, rather than implicit, everyday metaphor. In professional development for secondary science teachers, we sought to make explicit the embeddedness and ubiquity of conceptual metaphor in everyday language and in science—particularly, in energy—to expand teachers’ understanding of their students’ ideas. In our microcase study, we observed and video recorded four secondary teachers discussing metaphor. We used interaction analysis methods, focusing on how both discursive and nonverbal interactions between people, objects, and environment change over time, to analyze the collected data. We found evidence of teachers’ (1) learning about conceptual metaphor theory and (2) finding value in understanding conceptual metaphor in educational settings. In particular, teachers acknowledged that if they identify implicit metaphors in students’ science language, they will better understand students’ ideas about energy. We present possible mechanisms for teacher learning about and valuing of energy metaphor; we also suggest how to support teachers in noticing and valuing metaphors for energy instruction.

     
    more » « less
  5. Obeid, Iyad ; Selesnick, Ivan ; Picone, Joseph (Ed.)
    The Temple University Hospital Seizure Detection Corpus (TUSZ) [1] has been in distribution since April 2017. It is a subset of the TUH EEG Corpus (TUEG) [2] and the most frequently requested corpus from our 3,000+ subscribers. It was recently featured as the challenge task in the Neureka 2020 Epilepsy Challenge [3]. A summary of the development of the corpus is shown below in Table 1. The TUSZ Corpus is a fully annotated corpus, which means every seizure event that occurs within its files has been annotated. The data is selected from TUEG using a screening process that identifies files most likely to contain seizures [1]. Approximately 7% of the TUEG data contains a seizure event, so it is important we triage TUEG for high yield data. One hour of EEG data requires approximately one hour of human labor to complete annotation using the pipeline described below, so it is important from a financial standpoint that we accurately triage data. A summary of the labels being used to annotate the data is shown in Table 2. Certain standards are put into place to optimize the annotation process while not sacrificing consistency. Due to the nature of EEG recordings, some records start off with a segment of calibration. This portion of the EEG is instantly recognizable and transitions from what resembles lead artifact to a flat line on all the channels. For the sake of seizure annotation, the calibration is ignored, and no time is wasted on it. During the identification of seizure events, a hard “3 second rule” is used to determine whether two events should be combined into a single larger event. This greatly reduces the time that it takes to annotate a file with multiple events occurring in succession. In addition to the required minimum 3 second gap between seizures, part of our standard dictates that no seizure less than 3 seconds be annotated. Although there is no universally accepted definition for how long a seizure must be, we find that it is difficult to discern with confidence between burst suppression or other morphologically similar impressions when the event is only a couple seconds long. This is due to several reasons, the most notable being the lack of evolution which is oftentimes crucial for the determination of a seizure. After the EEG files have been triaged, a team of annotators at NEDC is provided with the files to begin data annotation. An example of an annotation is shown in Figure 1. A summary of the workflow for our annotation process is shown in Figure 2. Several passes are performed over the data to ensure the annotations are accurate. Each file undergoes three passes to ensure that no seizures were missed or misidentified. The first pass of TUSZ involves identifying which files contain seizures and annotating them using our annotation tool. The time it takes to fully annotate a file can vary drastically depending on the specific characteristics of each file; however, on average a file containing multiple seizures takes 7 minutes to fully annotate. This includes the time that it takes to read the patient report as well as traverse through the entire file. Once an event has been identified, the start and stop time for the seizure is stored in our annotation tool. This is done on a channel by channel basis resulting in an accurate representation of the seizure spreading across different parts of the brain. Files that do not contain any seizures take approximately 3 minutes to complete. Even though there is no annotation being made, the file is still carefully examined to make sure that nothing was overlooked. In addition to solely scrolling through a file from start to finish, a file is often examined through different lenses. Depending on the situation, low pass filters are used, as well as increasing the amplitude of certain channels. These techniques are never used in isolation and are meant to further increase our confidence that nothing was missed. Once each file in a given set has been looked at once, the annotators start the review process. The reviewer checks a file and comments any changes that they recommend. This takes about 3 minutes per seizure containing file, which is significantly less time than the first pass. After each file has been commented on, the third pass commences. This step takes about 5 minutes per seizure file and requires the reviewer to accept or reject the changes that the second reviewer suggested. Since tangible changes are made to the annotation using the annotation tool, this step takes a bit longer than the previous one. Assuming 18% of the files contain seizures, a set of 1,000 files takes roughly 127 work hours to annotate. Before an annotator contributes to the data interpretation pipeline, they are trained for several weeks on previous datasets. A new annotator is able to be trained using data that resembles what they would see under normal circumstances. An additional benefit of using released data to train is that it serves as a means of constantly checking our work. If a trainee stumbles across an event that was not previously annotated, it is promptly added, and the data release is updated. It takes about three months to train an annotator to a point where their annotations can be trusted. Even though we carefully screen potential annotators during the hiring process, only about 25% of the annotators we hire survive more than one year doing this work. To ensure that the annotators are consistent in their annotations, the team conducts an interrater agreement evaluation periodically to ensure that there is a consensus within the team. The annotation standards are discussed in Ochal et al. [4]. An extended discussion of interrater agreement can be found in Shah et al. [5]. The most recent release of TUSZ, v1.5.2, represents our efforts to review the quality of the annotations for two upcoming challenges we hosted: an internal deep learning challenge at IBM [6] and the Neureka 2020 Epilepsy Challenge [3]. One of the biggest changes that was made to the annotations was the imposition of a stricter standard for determining the start and stop time of a seizure. Although evolution is still included in the annotations, the start times were altered to start when the spike-wave pattern becomes distinct as opposed to merely when the signal starts to shift from background. This cuts down on background that was mislabeled as a seizure. For seizure end times, all post ictal slowing that was included was removed. The recent release of v1.5.2 did not include any additional data files. Two EEG files had been added because, originally, they were corrupted in v1.5.1 but were able to be retrieved and added for the latest release. The progression from v1.5.0 to v1.5.1 and later to v1.5.2, included the re-annotation of all of the EEG files in order to develop a confident dataset regarding seizure identification. Starting with v1.4.0, we have also developed a blind evaluation set that is withheld for use in competitions. The annotation team is currently working on the next release for TUSZ, v1.6.0, which is expected to occur in August 2020. It will include new data from 2016 to mid-2019. This release will contain 2,296 files from 2016 as well as several thousand files representing the remaining data through mid-2019. In addition to files that were obtained with our standard triaging process, a part of this release consists of EEG files that do not have associated patient reports. Since actual seizure events are in short supply, we are mining a large chunk of data for which we have EEG recordings but no reports. Some of this data contains interesting seizure events collected during long-term EEG sessions or data collected from patients with a history of frequent seizures. It is being mined to increase the number of files in the corpus that have at least one seizure event. We expect v1.6.0 to be released before IEEE SPMB 2020. The TUAR Corpus is an open-source database that is currently available for use by any registered member of our consortium. To register and receive access, please follow the instructions provided at this web page: https://www.isip.piconepress.com/projects/tuh_eeg/html/downloads.shtml. The data is located here: https://www.isip.piconepress.com/projects/tuh_eeg/downloads/tuh_eeg_artifact/v2.0.0/. 
    more » « less