- Award ID(s):
- NSF-PAR ID:
- Date Published:
- Journal Name:
- Knowledge and Information Systems
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
We present role matching, a novel, fine-grained integrity constraint on temporal fact data, i.e., (subject, predicate, object, timestamp)-quadruples. A role is a combination of subject and predicate and can be associated with different objects as the real world evolves and the data changes over time. A role matching states that the associated object of two or more roles should always match across time. Once discovered, role matchings can serve as integrity constraints to improve data quality, for instance of structured data in Wikipedia. If violated, role matchings can alert data owners or editors and thus allow them to correct the error. Finding all role matchings is challenging due both to the inherent quadratic complexity of the matching problem and the need to identify true matches based on the possibly short history of the facts observed so far. To address the first challenge, we introduce several blocking methods both for clean and dirty input data. For the second challenge, the matching stage, we show how the entity resolution method Ditto can be adapted to achieve satisfactory performance for the role matching task. We evaluate our method on datasets from Wikipedia infoboxes, showing that our blocking approaches can achieve 95% recall, while maintaining a reduction ratio of more than 99.99%, even in the presence of dirty data. In the matching stage, we achieve a macro F1-score of 89% on our datasets, using automatically generated labels.more » « less
With the rapid growth of online information services, a sheer volume of news data becomes available. To help people quickly digest the explosive information,we define a newproblem – schema-based news event profiling – profiling events reported in open-domain news corpora, with a set of slots and slot-value pairs for each event, where the set of slots forms the schema of an event type. Such profiling not only provides readers with concise views of events, but also facilitates various applications such as information retrieval, knowledge graph construction and question answering. It is however a quite challenging task. The first challenge is to find out events and event types because they are both initially unknown. The second difficulty is the lack of pre-defined event-type schemas. Lastly, even with the schemas extracted, to generate event profiles from them is still essential yet demanding. To address these challenges, we propose a fully automatic, unsupervised, three-step framework to obtain event profiles. First, we develop a Bayesian non-parametric model to detect events and event types by exploiting the slot expressions of the entities mentioned in news articles. Second, we propose an unsupervised embedding model for schema induction that encodes the insight: an entity may serve as the values of multiple slots in an event, but if it appears in more sentences along with the same set of more entities in the event, its slots in these sentences tend to be similar. Finally, we build event profiles by extracting slot values for each event based on the slots’ expression patterns. To the best of our knowledge, this is the first work on schema-based profiling for news events. Experimental results on a large news corpus demonstrate the superior performance of our method against the state-of-the-art baselines on event detection, schema induction and event profiling.more » « less
A key challenge for artificial intelligence in the legal field is to determine from the text of a party’s litigation brief whether, and why, it will succeed or fail. This paper shows a proof-of-concept test case from the United States: predicting outcomes of post-grant inter partes review (IPR) proceedings for invalidating patents. The objectives are to compare decision-tree and deep learning methods, validate interpretability methods, and demonstrate outcome prediction based on party briefs. Specifically, this study compares and validates two distinct approaches: (1) representing documents with term frequency inverse document frequency (TF-IDF), training XGBoost gradient-boosted decision-tree models, and using SHAP for interpretation. (2) Deep learning of document text in context, using convolutional neural networks (CNN) with attention, and comparing LIME and attention visualization for interpretability. The methods are validated on the task of automatically determining case outcomes from unstructured written decision opinions, and then used to predict trial institution or denial based on the patent owner’s preliminary response brief. The results show how interpretable deep learning architecture classifies successful/unsuccessful response briefs on temporally separated training and test sets. More accurate prediction remains challenging, likely due to the fact-specific, technical nature of patent cases and changes in applicable law and jurisprudence over time.more » « less
Recognizing entity synonyms from text has become a crucial task in many entity-leveraging applications. However, discovering entity synonyms from domain-specific text corpora (e.g., news articles, scientific papers) is rather challenging. Current systems take an entity name string as input to find out other names that are synonymous, ignoring the fact that often times a name string can refer to multiple entities (e.g., “apple” could refer to both Apple Inc and the fruit apple). Moreover, most existing methods require training data manually created by domain experts to construct supervised learning systems. In this paper, we study the problem of automatic synonym discovery with knowledge bases, that is, identifying synonyms for knowledge base entities in a given domain-specific corpus. The manually-curated synonyms for each entity stored in a knowledge base not only form a set of name strings to disambiguate the meaning for each other, but also can serve as “distant” supervision to help determine important features for the task. We propose a novel framework, called DPE, to integrate two kinds of mutually complementing signals for synonym discovery, i.e., distributional features based on corpus-level statistics and textual patterns based on local contexts. In particular, DPE jointly optimizes the two kinds of signals in conjunction with distant supervision, so that they can mutually enhance each other in the training stage. At the inference stage, both signals will be utilized to discover synonyms for the given entities. Experimental results prove the effectiveness of the proposed framework.more » « less
Are children from “Eastern” cultures less emotionally expressive and reactive than children from “Western” cultures? To answer this, we used a multi‐level and multi‐contextual approach to understand variations in emotion displays and cortisol reactivity among preschoolers living in China and the United States. One hundred two preschoolers from China (
N= 58; 55% males) and the United States ( N= 44, 48% males) completed three (i.e., control, interpersonal‐related, and achievement‐related) emotion‐challenging paradigms over 3 days. Behavioral emotion expressions were coded, and salivary cortisol was sampled 30 minutes before and across 90 minutes post‐task. Without considering context, Chinese preschoolers displayed fewer levels of positive and negative emotion expressions relative to their United States counterparts. However, Chinese preschoolers displayed similar levels of expressions as their United States counterparts during an achievement‐related challenge that is more salient to their sociocultural emphases and showed higher negative emotion expressions in this challenge, relative to other contexts. Moreover, only the achievement‐related challenge elicited increased cortisol levels among Chinese preschoolers, and this was correlated with higher levels of negative expressions. For US preschoolers, no cortisol increase was observed in any challenging paradigms, nor was cortisol associated with emotional expressions. Findings counter prior notions that East Asian children are generally less emotionally expressive. Instead, an achievement‐related challenge elicited higher emotion expression and cortisol reactivity among Chinese preschoolers, suggesting that children's emotion expression and biological reactivity may be most responsive to contexts salient to their socio‐cultural environments. We discuss the importance of considering cultural contexts when studying emotion regulation. RESEARCH HIGHLIGHTS
Chinese preschoolers displayed lower overall positive and negative expressions relative to their US counterparts without considering situational contexts.
Chinese preschoolers displayed similar levels of emotion expressions as their US counterparts during an achievement‐related challenge salient to their social‐cultural environment.
Chinese preschoolers are particularly responsive to achievement‐related challenges, relative to other emotion‐challenging situations that are less culturally salient.
No cortisol increase was observed in any of the emotion‐challenging paradigms among US preschoolers.
Children's emotion expression and biological reactivity may be most responsive to challenges relevant to their socio‐cultural environments.