skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: PropBank goes Public: Incorporation into Wikidata
This paper presents the first integration of PropBank role information into Wikidata, in order to provide a novel resource for information extraction, one combining Wikidata`s ontological metadata with PropBank`s rich argument structure encoding for event classes. We discuss a technique for PropBank augmentation to existing eventive Wikidata items, as well as identification of gaps in Wikidata`s coverage based on manual examination of over 11,300 PropBank rolesets. We propose five new Wikidata properties to integrate PropBank structure into Wikidata so that the annotated mappings can be added en masse. We then outline the methodology and challenges of this integration, including annotation with the combined resources.  more » « less
Award ID(s):
2019805
PAR ID:
10586899
Author(s) / Creator(s):
; ; ; ; ; ; ; ;
Editor(s):
Henning, S; Stede, M
Publisher / Repository:
Association for Computational Linguistics
Date Published:
Format(s):
Medium: X
Location:
St. Julians, Malta
Sponsoring Org:
National Science Foundation
More Like this
  1. Vivi Nastase; Ellie Pavlick; Mohammad Taher Pilehvar; Jose Camacho-Collados; Alessandro Raganato (Ed.)
    This paper describes the evolution of the PropBank approach to semantic role labeling over the last two decades. During this time the PropBank frame files have been expanded to include non-verbal predicates such as adjectives, prepositions and multi-word expressions. The number of domains, genres and languages that have been PropBanked has also expanded greatly, creating an opportunity for much more challenging and robust testing of the generalization capabilities of PropBank semantic role labeling systems. We also describe the substantial effort that has gone into ensuring the consistency and reliability of the various annotated datasets and resources, to better support the training and evaluation of such systems 
    more » « less
  2. Cyber Threat Intelligence (CTI) is information describing threat vectors, vulnerabilities, and attacks and is often used as training data for AI-based cyber defense systems such as Cybersecurity Knowledge Graphs (CKG). There is a strong need to develop community-accessible datasets to train existing AI-based cybersecurity pipelines to efficiently and accurately extract meaningful insights from CTI. We have created an initial unstructured CTI corpus from a variety of open sources that we are using to train and test cybersecurity entity models using the spaCy framework and exploring self-learning methods to automatically recognize cybersecurity entities. We also describe methods to apply cybersecurity domain entity linking with existing world knowledge from Wikidata. Our future work will survey and test spaCy NLP tools, and create methods for continuous integration of new information extracted from text. 
    more » « less
  3. Wikidata is a publicly available, crowdsourced knowledge base that contains interlinked concepts structured for use by intelligent systems. While Wikidata has experienced rapid growth, it is far from complete and faces challenges that prevent it from being used to its full potential. In this paper, we propose a novel method for improving Wikidata by engaging undergraduate students to contribute previously missing knowledge via concept mapping assignments. Rather than allow students to edit Wikidata directly, we describe a workflow in which knowledge is constructed by students and then reviewed by an expert. We present a case study in which we deployed a workflow in a large undergraduate course about sustainability, and find that it was able to contribute a substantial number of high quality statements that persisted in and contributed previously missing knowledge to Wikidata. This work provides a preliminary workflow for improving Wikidata based on classroom assignments, as well as recommendations for how future educational projects could continue to improve Wikidata or other public knowledge bases. 
    more » « less
  4. Structured data peer production (SDPP) platforms like Wikidata play an important role in knowledge production. Compared to traditional peer production platforms like Wikipedia, Wikidata data is more structured and intended to be used by machines, not (directly) by people; end-user interactions with Wikidata often happen through intermediary "invisible machines." Given this distinction, we wanted to understand Wikidata contributor motivations and how they are affected by usage invisibility caused by the machine intermediaries. Through an inductive thematic analysis of 15 interviews, we find that: (i) Wikidata editors take on two archetypes---Architects who define the ontological infrastructure of Wikidata, and Masons who build the database through data entry and editing; (ii) the structured nature of Wikidata reveals novel editor motivations, such as an innate drive for organizational work; (iii) most Wikidata editors have little understanding of how their contributions are used, which may demotivate some. We synthesize these insights to help guide the future design of SDPP platforms in supporting the engagement of different types of editors. 
    more » « less
  5. In February 2021, Google Search added a new interface feature to support the evaluation of web domains, known as the “About this result” feature. A prominent part of this feature is a snippet of text pulled automatically from Wikipedia, if a Wiki page for the web domain exists. While conducting large-scale audits of Google Search, we discovered that less than 40% of web domains shown in Google Search results contain a Wikipedia page. Then, we retrieved their Wikidata entries and looked at the extent they incorporate features related to W3C credibility signals. The lack of information for many signals points out to avenues for expanding Wikidata coverage. 
    more » « less