Title: NarrativeTime: Dense Temporal Annotation on a Timeline
For the past decade, temporal annotation has been sparse: only a small portion of event pairs in a text was annotated. We present NarrativeTime, the first timeline-based annotation framework that achieves full coverage of all possible TLINKs. To compare with the previous SOTA in dense temporal annotation, we perform full re-annotation of the classic TimeBankDense corpus (American English), which shows comparable agreement with a signigicant increase in density. We contribute TimeBankNT corpus (with each text fully annotated by two expert annotators), extensive annotation guidelines, open-source tools for annotation and conversion to TimeML format, and baseline results. more »« less
Halterman, Andrew; Keith, Katherine; Sarwar, Sheikh; O’Connor, Brendan
(, Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021)
null
(Ed.)
Automated event extraction in social science applications often requires corpus-level evaluations: for example, aggregating text predictions across metadata and unbiased estimates of recall. We combine corpus-level evaluation requirements with a real-world, social science setting and introduce the IndiaPoliceEvents corpus—all 21,391 sentences from 1,257 English-language Times of India articles about events in the state of Gujarat during March 2002. Our trained annotators read and label every document for mentions of police activity events, allowing for unbiased recall evaluations. In contrast to other datasets with structured event representations, we gather annotations by posing natural questions, and evaluate off-the-shelf models for three different tasks: sentence classification, document ranking, and temporal aggregation of target events. We present baseline results from zero-shot BERT-based models fine-tuned on natural language inference and passage retrieval tasks. Our novel corpus-level evaluations and annotation approach can guide creation of similar social-science-oriented resources in the future.
Hamid, Ahmed; Gagliano, Katherine; Rahman, Safwanur; Tulin, Nikita; Tchiong, Vincent; Obeid, Iyad; Picone, Joseph
(, IEEE Signal Processing in Medicine and Biology Symposium SPMB)
Obeid, Iyad; Selesnick, Ivan; Picone, Joseph
(Ed.)
The Neural Engineering Data Consortium has recently developed a new subset of its popular open source EEG corpus – TUH EEG (TUEG) [1]. The TUEG Corpus is the world’s largest open source corpus of EEG data and currently has over 3,300 subscribers. There are several valuable subsets of this data, including the TUH Seizure Detection Corpus (TUSZ) [2], which was featured in the Neureka 2020 Epilepsy Challenge [3]. In this poster, we present a new subset of the TUEG Corpus – the TU Artifact Corpus. This corpus contains 310 EEG files in which every artifact has been annotated. This data can be used to evaluate artifact reduction technology. Since TUEG is comprised of actual clinical data, the set of artifacts appearing in the data is rich and challenging. EEG artifacts are defined as waveforms that are not of cerebral origin and may be affected by numerous external and or physiological factors. These extraneous signals are often mistaken for seizures due to their morphological similarity in amplitude and frequency [4]. Artifacts often lead to raised false alarm rates in machine learning systems, which poses a major challenge for machine learning research. Most state-of-the-art systems use some form of artifact reduction technology to suppress these events. The corpus was annotated using a five-way classification that was developed to meet the needs of our constituents. Brief descriptions of each form of the artifact are provided in Ochal et al. [4]. The five basic tags are: • Chewing (CHEW): An artifact resulting from the tensing and relaxing of the jaw muscles. Chewing is a subset of the muscle artifact class. Chewing has the same characteristic high frequency sharp waves with 0.5 sec baseline periods between bursts. This artifact is generally diffuse throughout the different regions of the brain. However, it might have a higher level of activity in one hemisphere. Classification of a muscle artifact as chewing often depends on whether the accompanying patient report mentions any chewing, since other muscle artifacts can appear superficially similar to chewing artifact. • Electrode (ELEC): An electrode artifact encompasses various electrode related artifacts. Electrode pop is an artifact characterized by channels using the same electrode “spiking” with an electrographic phase reversal. Electrostatic is an artifact caused by movement or interference of electrodes and or the presence of dissimilar metals. A lead artifact is caused by the movement of electrodes from the patient’s head and or poor connection of electrodes. This results in disorganized and high amplitude slow waves. • Eye Movement (EYEM): A spike-like waveform created during patient eye movement. This artifact is usually found on all of the frontal polar electrodes with occasional echoing on the frontal electrodes. • Muscle (MUSC): A common artifact with high frequency, sharp waves corresponding to patient movement. These waveforms tend to have a frequency above 30 Hz with no specific pattern, often occurring because of agitation in the patient. • Shiver (SHIV): A specific and sustained sharp wave artifact that occurs when a patient shivers, usually seen on all or most channels. Shivering is a relatively rare subset of the muscle artifact class. Since these artifacts can overlap in time, a concatenated label format was implemented as a compromise between the limitations of our annotation tool and the complexity needed in an annotation data structure used to represent these overlapping events. We distribute an XML format that easily handles overlapping events. Our annotation tool [5], like most annotation tools of this type, is limited to displaying and manipulating a flat or linear annotation. Therefore, we encode overlapping events as a series of concatenated names using symbols such as: • EYEM+CHEW: eye movement and chewing • EYEM+SHIV: eye movement and shivering • CHEW+SHIV: chewing and shivering An example of an overlapping annotation is shown below in Figure 1. This release is an update of TUAR v1.0.0, which was a partially annotated database. In v1.0.0, a similar five way system was used as well as an additional “null” tag. The “null” tag covers anything that was not annotated, including instances of artifact. Only a limited number of artifacts were annotated in v1.0.0. In this updated version, every instance of an artifact is annotated; ultimately, this provides the user with confidence that any part of the record that is not annotated with one of the five classes does not contain an artifact. No new files, patients, or sessions were added in v2.0.0. However, the data was reannotated with these standards. The total number of files remains the same, but the number of artifact events increases significantly. Complete statistics will be provided on the corpus once annotation is complete and the data is released. This is expected to occur in early July – just after the IEEE SPMB submission deadline. The TUAR Corpus is an open-source database that is currently available for use by any registered member of our consortium. To register and receive access, please follow the instructions provided at this web page: https://www.isip.piconepress.com/projects/tuh_eeg/html/downloads.shtml. The data is located here: https://www.isip.piconepress.com/projects/tuh_eeg/downloads/tuh_eeg_artifact/v2.0.0/.
Dan, Soham; Kordjamshidi, Parisa; Bonn, Julia; Bhatia, Archna; Cai, Jon; Palmer, Martha; Roth, Dan
(, Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020))
Spatial Reasoning from language is essential for natural language understanding. Supporting it requires a representation scheme that can capture spatial phenomena encountered in language as well as in images and videos. Existing spatial representations are not sufficient for describing spatial configurations used in complex tasks. This paper extends the capabilities of existing spatial representation languages and increases coverage of the semantic aspects that are needed to ground spatial meaning of natural language text in the world. Our spatial relation language is able to represent a large, comprehensive set of spatial concepts crucial for reasoning and is designed to support composition of static and dynamic spatial configurations. We integrate this language with the Abstract Meaning Representation (AMR) annotation schema and present a corpus annotated by this extended AMR. To exhibit the applicability of our representation scheme, we annotate text taken from diverse datasets and show how we extend the capabilities of existing spatial representation languages with fine-grained decomposition of semantics and blend it seamlessly with AMRs of sentences and discourse representations as a whole.
Martinez-Lucas, Luz; Abdelwahab, Mohammed; Busso, Carlos
(, Interspeech 2020)
null
(Ed.)
Human-computer interactions can be very effective, especially if computers can automatically recognize the emotional state of the user. A key barrier for effective speech emotion recognition systems is the lack of large corpora annotated with emotional labels that reflect the temporal complexity of expressive behaviors, especially during multiparty interactions. This pa- per introduces the MSP-Conversation corpus, which contains interactions annotated with time-continuous emotional traces for arousal (calm to active), valence (negative to positive), and dominance (weak to strong). Time-continuous annotations offer the flexibility to explore emotional displays at different temporal resolutions while leveraging contextual information. This is an ongoing effort, where the corpus currently contains more than 15 hours of speech annotated by at least five annotators. The data is sourced from the MSP-Podcast corpus, which contains speech data from online audio-sharing websites annotated with sentence-level emotional scores. This data collection scheme is an easy, affordable, and scalable approach to obtain natural data with diverse emotional content from multiple speakers. This study describes the key features of the corpus. It also compares the time-continuous evaluations from the MSP- Conversation corpus with the sentence-level annotations of the MSP-Podcast corpus for the speech segments that overlap between the two corpora.
Abstract BackgroundAnnotating scientific literature with ontology concepts is a critical task in biology and several other domains for knowledge discovery. Ontology based annotations can power large-scale comparative analyses in a wide range of applications ranging from evolutionary phenotypes to rare human diseases to the study of protein functions. Computational methods that can tag scientific text with ontology terms have included lexical/syntactic methods, traditional machine learning, and most recently, deep learning. ResultsHere, we present state of the art deep learning architectures based on Gated Recurrent Units for annotating text with ontology concepts. We use the Colorado Richly Annotated Full Text Corpus (CRAFT) as a gold standard for training and testing. We explore a number of additional information sources including NCBI’s BioThesauraus and Unified Medical Language System (UMLS) to augment information from CRAFT for increasing prediction accuracy. Our best model results in a 0.84 F1 and semantic similarity. ConclusionThe results shown here underscore the impact for using deep learning architectures for automatically recognizing ontology concepts from literature. The augmentation of the models with biological information beyond that present in the gold standard corpus shows a distinct improvement in prediction accuracy.
Rogers, A, Karpinska, M, Gupta, A, Lialin, V, Smelkov, G, and Rumshisky, A. NarrativeTime: Dense Temporal Annotation on a Timeline. Retrieved from https://par.nsf.gov/biblio/10594875.
Rogers, A, Karpinska, M, Gupta, A, Lialin, V, Smelkov, G, & Rumshisky, A. NarrativeTime: Dense Temporal Annotation on a Timeline. Retrieved from https://par.nsf.gov/biblio/10594875.
Rogers, A, Karpinska, M, Gupta, A, Lialin, V, Smelkov, G, and Rumshisky, A.
"NarrativeTime: Dense Temporal Annotation on a Timeline". Country unknown/Code not available: ELRA and ICCL. https://par.nsf.gov/biblio/10594875.
@article{osti_10594875,
place = {Country unknown/Code not available},
title = {NarrativeTime: Dense Temporal Annotation on a Timeline},
url = {https://par.nsf.gov/biblio/10594875},
abstractNote = {For the past decade, temporal annotation has been sparse: only a small portion of event pairs in a text was annotated. We present NarrativeTime, the first timeline-based annotation framework that achieves full coverage of all possible TLINKs. To compare with the previous SOTA in dense temporal annotation, we perform full re-annotation of the classic TimeBankDense corpus (American English), which shows comparable agreement with a signigicant increase in density. We contribute TimeBankNT corpus (with each text fully annotated by two expert annotators), extensive annotation guidelines, open-source tools for annotation and conversion to TimeML format, and baseline results.},
journal = {},
publisher = {ELRA and ICCL},
author = {Rogers, A and Karpinska, M and Gupta, A and Lialin, V and Smelkov, G and Rumshisky, A},
}
Warning: Leaving National Science Foundation Website
You are now leaving the National Science Foundation website to go to a non-government website.
Website:
NSF takes no responsibility for and exercises no control over the views expressed or the accuracy of
the information contained on this site. Also be aware that NSF's privacy policy does not apply to this site.