skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Open-Schema Event Profiling for Massive News Corpora
With the rapid growth of online information services, a sheer volume of news data becomes available. To help people quickly digest the explosive information,we define a newproblem – schema-based news event profiling – profiling events reported in open-domain news corpora, with a set of slots and slot-value pairs for each event, where the set of slots forms the schema of an event type. Such profiling not only provides readers with concise views of events, but also facilitates various applications such as information retrieval, knowledge graph construction and question answering. It is however a quite challenging task. The first challenge is to find out events and event types because they are both initially unknown. The second difficulty is the lack of pre-defined event-type schemas. Lastly, even with the schemas extracted, to generate event profiles from them is still essential yet demanding. To address these challenges, we propose a fully automatic, unsupervised, three-step framework to obtain event profiles. First, we develop a Bayesian non-parametric model to detect events and event types by exploiting the slot expressions of the entities mentioned in news articles. Second, we propose an unsupervised embedding model for schema induction that encodes the insight: an entity may serve as the values of multiple slots in an event, but if it appears in more sentences along with the same set of more entities in the event, its slots in these sentences tend to be similar. Finally, we build event profiles by extracting slot values for each event based on the slots’ expression patterns. To the best of our knowledge, this is the first work on schema-based profiling for news events. Experimental results on a large news corpus demonstrate the superior performance of our method against the state-of-the-art baselines on event detection, schema induction and event profiling.  more » « less
Award ID(s):
1741317 1704532 1618481
PAR ID:
10079167
Author(s) / Creator(s):
; ; ; ; ; ; ; ;
Date Published:
Journal Name:
Proceedings of the 27th {ACM} International Conference on Information and Knowledge Management, {CIKM} 2018
Volume:
2018
Issue:
1
Page Range / eLocation ID:
587 to 596
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Event schemas are a form of world knowledge about the typical progression of events. Recent methods for event schema induction use information extraction systems to construct a large number of event graph instances from documents, and then learn to generalize the schema from such instances. In contrast, we propose to treat event schemas as a form of commonsense knowledge that can be derived from large language models (LLMs). This new paradigm greatly simplifies the schema induction process and allows us to handle both hierarchical relations and temporal relations between events in a straightforward way. Since event schemas have complex graph structures, we design an incremental prompting and verification method INCPROMPT to break down the construction of a complex event graph into three stages: event skeleton construction, event expansion, and event-event relation verification. Compared to directly using LLMs to generate a linearized graph, INCPROMPT can generate large and complex schemas with 7.2% F1 improvement in temporal relations and 31.0% F1 improvement in hierarchical relations. In addition, compared to the previous state-of-the-art closed-domain schema induction model, human assessors were able to cover ∼10% more events when translating the schemas into coherent stories and rated our schemas 1.3 points higher (on a 5-point scale) in terms of readability. 
    more » « less
  2. We propose a means of augmenting FrameNet parsers with a formal logic parser to obtain rich semantic representations of events. These schematic representations of the frame events, which we call Episodic Logic (EL) schemas, abstract constants to variables, preserving their types and relationships to other individuals in the same text. Due to the temporal semantics of the chosen logical formalism, all identified schemas in a text are also assigned temporally bound "episodes" and related to one another in time. The semantic role information from the FrameNet frames is also incorporated into the schema's type constraints. We describe an implementation of this method using a neural FrameNet parser, and discuss the approach's possible applications to question answering and open-domain event schema learning. 
    more » « less
  3. An algebraic model uses a set of algebraic equations to describe a situation. Constructing such models is a fundamental skill, but many students still lack the skill, even after taking several algebra courses in high school and college. For underachieving college students, we developed a tutoring system that taught students to decompose the to-be-modelled situation into schema applications, where a schema represents a simple relationship such as distance-rate-time or part-whole. However, when a model consists of multiple schema applications, it needs some connection among them, usually represented by letting the same variable appear in the slots of two or more schemas. Students in our studies seemed to have more trouble identifying connections among schemas than identifying the schema applications themselves. This paper describes a newly designed tutoring system that emphasizes such connections. An evaluation was conducted using a regression discontinuity design. It produced a marginally reliable positive effect of moderate size (d = 0.4). 
    more » « less
  4. Ad-hoc data models like JSON make it easy to evolve schemas and to multiplex different data-types into a single stream. This flexibility makes JSON great for generating data, but also makes it much harder to query, ingest into a database, and index. In this paper, we explore the first step of JSON data loading: schema design. Specifically, we consider the challenge of designing schemas for existing JSON datasets as an interactive problem. We present SchemaDrill, a roll-up/drill-down style interface for exploring collections of JSON records. SchemaDrill helps users to visualize the collection, identify relevant fragments, and map it down into one or more flat, relational schemas. We describe and evaluate two key components of SchemaDrill: (1) A summary schema representation that significantly reduces the complexity of JSON schemas without a meaningful reduction in information content, and (2) A collection of schema visualizations that help users to qualitatively survey variability amongst different schemas in the collection. 
    more » « less
  5. Campos, Ricardo; Jorge, Alípio Mário; Jatowt, Adam; Bhatia, Sumit; Finlayson, Mark (Ed.)
    A crucial step in the construction of any event story or news report is to identify entities involved in the story, such entities can come from a larger background knowledge graph or from a text corpus with entity links. Along with recognizing which entities are relevant to the story, it is also important to select entities that are relevant to all aspects of the story. In this work, we model and study different types of links between the entities with the goal of identifying which link type is most useful for the entity retrieval task. Our approach demonstrates the e 
    more » « less