<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Sentence Retrieval for Entity List Extraction with a Seed, Context, and Topic</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>09/23/2019</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10175985</idno>
					<idno type="doi">10.1145/3341981.3344250</idno>
					<title level='j'>Proceedings of the 2019 International Conference on the Theory of Information Retrieval (ICTIR 2019)</title>
<idno></idno>
<biblScope unit="volume"></biblScope>
<biblScope unit="issue"></biblScope>					

					<author>Sheikh Muhammad Sarwar</author><author>John Foley</author><author>Liu Yang</author><author>James Allan</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[We present a variation of the corpus-based entity set expansion and entity list completion task. A user-specified query and a sentence containing one seed entity are the input to the task. The output is a list of sentences that contain other instances of the entity class indicated by the input. We construct a semantic query expansion model that leverages topical context around the seed entity and scores sentences. The proposed model finds 46\% of the target entity class by retrieving 20 sentences on average. It achieves 16\% improvement over BM25 model in terms of recall@20.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">INTRODUCTION</head><p>Consider a user searching for a list of civilians killed by the New York Police Department, who issues that query to a search engine. She lands on a web page where she finds the sentence: &#322;On Feb. 4, 1999, four NYPD officers in the Bronx fired 41 shots at a 22-year-old immigrant from Guinea named Amadou Diallo. &#382; <ref type="foot">1</ref> .</p><p>Can she use the information she now has &#347; a query and an example sentence with an instance of her goal identified &#347; to find other mentions of killings? Knowledge bases such as Wikipedia rarely contain articles about non-popular entities such as &#322;Amadou Diallo", so we cannot adopt entity retrieval based approaches that search through knowledge base articles on entities organized by entity categories <ref type="bibr">[21]</ref>. Document co-occurrence based retrieval models would find a large number of unrelated entities co-occurring with New York Police Department in different contexts and thus result in low precision <ref type="bibr">[2]</ref>. Corpus-based set expansion methods can be useful, but they require more than one seed to infer the context and type of the desired entity class and do not take a query as input <ref type="bibr">[20]</ref>.</p><p>An information extraction approach to this problem is to construct a weak supervised training dataset and estimate a statistical NLP model (e.g., feature-rich logistic regression, CNN, CRF) <ref type="bibr">[11]</ref>. Such a dataset is usually constructed by automatically labeling sentences with relevant instances from a knowledge base or a historical list. In the case of our example, the absence of a manually curated historical database of NYPD police killing would make this process infeasible.</p><p>Active Learning (AL) based approaches could be used to gather informative sentences using a human-in-the-loop setting <ref type="bibr">[7]</ref>. However, we have just one training sentence and a statistical model estimated using that would produce ineffective queries for users. We need to retrieve at least a handful of sentences containing target entity instances to support an AL approach <ref type="bibr">[6]</ref>. If we can retrieve such sentences and annotate them, the model would have some positive instances and might then be capable of learning actively. We focus on sentences because a sentence offers more context for interpreting entity relevance <ref type="bibr">[7]</ref>. Moreover, retrieval of these sentences is also useful for other downstream applications like summarization, entity comprehension and retrieving entities in context <ref type="bibr">[13,</ref><ref type="bibr">18,</ref><ref type="bibr">24]</ref>.</p><p>In this work, we take an IR approach, retrieving sentences containing entities from a user desired list. Our contributions are: <ref type="bibr">(1)</ref> we show that distant supervision with an entity annotated in the input training sentence can be effective if a query expansion technique is used; (2) we describe a semantic relevance model for query expansion and show that it is able to exploit entity co-occurrence; and, <ref type="bibr">(3)</ref> we perform experimental evaluation with TREC List QA datasets and show that on average the top 20 sentences found by our approach contain nearly half of the target entities.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">RELATED WORK</head><p>Related work comes from Corpus-based Set Expansion (CSE) , List Completion (LC), and Entity Ranking (ER). Each task is similar to ours but uses more seeds, external resources or lacks an example sentence for context.</p><p>Seed Selection for Active Learning. Active learning (AL) is a setting where a classifier determines which unlabeled data would contribute most to its learning process, and asks a human judge to label those data points <ref type="bibr">[19]</ref>. Dligach and Palmer addressed the problem of seed selection for AL <ref type="bibr">[6]</ref>. They considered a scenario where rare class examples constitute 5% of the data. They stated that 10 randomly selected seeds for AL would have 60% chances of selecting none of the rare class examples. They proposed Language Model (LM) based sampling for selecting seed words for the Word Sense Disambiguation task. Our work focuses on sentences selection for Information Extraction and hypothesizes that a semantic retrieval based approach is more suitable.</p><p>List QA. List questions are a special class of questions, where a system has to come up with a set of entities in response to a question <ref type="bibr">[4,</ref><ref type="bibr">22]</ref>. Systems built for solving this task often leverage some form of well-trained entity recognizer to collect the most promising answer passages <ref type="bibr">[12,</ref><ref type="bibr">16]</ref>. When the target type of a question is known in advance (e.g., who suggests person), identifying passages with that type of entity helps narrow down the set of candidate results. However, these approaches do not generalize to arbitrary types.</p><p>Set Expansion and List Completion. In the List or Set Completion task a small set of seed entities is given, with the aim of retrieving a complete set of target entities. Our work does not assume the existence of a knowledge-base, which makes it differ dramatically from set expansion.</p><p>Entity Ranking Systems. Our proposed task is also closely related to the entity ranking task in INEX 2009 <ref type="bibr">[5]</ref> and TREC 2010 <ref type="bibr">[1]</ref>. Approaches taken by participants in these tasks included the use of co-occurrence models, NER based type filtering, and context modeling <ref type="bibr">[8]</ref>. Co-occurrence modeling was further applied based on context. The relation between a pair of entities was measured by their co-occurrence in documents (context-independent) and by computing similarity of the term vectors extracted from those documents (context-dependent) <ref type="bibr">[3]</ref>. Some solutions augmented entity representation with entity synonyms and used search engines with that representation. Most of the participants of this task adopted external resources and were heavily dependent on Wikipedia for performing entity type filtering <ref type="bibr">[8]</ref>. We study retrieving sentences with target entities by focusing only on textual content around entities.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">TASK DEFINITION</head><p>We are given a collection of sentences, S = {S 1 , . . . , S n } drawn from a collection of documents, D q , identified by a list query, q. We also have a training sentence S q with a set of entities E q from S q identified. In our experiments, S q &#8712; S, but that is not required. E q will normally have one entity in it, but we allow the possibility that a sentence lists more than one name of the target type. Broadly speaking, our objective is to define a function F (S i , S q ) that can score any sentence S i &#8712; S based on the likelihood it contains a previously unseen entity of the same type characterized by E q . Here, the type of E q is implicitly indicated by the contextual information presented in the training sentence.</p><p>This task requires a relevant sentence to contain relevant as well as novel entities. Our evaluation set consists of a set of target entities, E t belonging to the same type as E q . Our scoring function F (&#8226;) scores each S i &#8712; S to produce a ranked list of sentences. We define E k as the union of E q and the subset of E t that has been seen in sentences at ranks 1 to k, with E 0 = E q . We count a sentence at rank k as relevant only if E k E k -1 &#347; that is, if a new target entity occurs. Relevance thus incorporates novelty in this evaluation. Our goal is to show as many distinct entities as possible in the top k retrieved sentences, because more relevant entities with their context would give us more useful training data for AL.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">PROPOSED APPROACH</head><p>We consider both topical and functional similarities to rank candidate sentences. To capture topical similarity, we use a sentence embedding-based approach to rank candidate sentences from S by the likelihood that their content matches the training sentence, S q . For the embedding-based retrieval step, we compute the similarity of the sentence embedding of S q to embeddings of all sentences from S. We find that despite the expansion implicit in embeddings, there is value in further expansion of S q using explicit query expansion methods. Furthermore, we achieve functional similarity by looking at the NER tag of the entities in a candidate sentence, which we refer to as type refinement approach.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Sentence Embedding (SE) based Retrieval</head><p>In order to induce a semantic representation of our short sentences, we lean on the the distributed representation of words learned by the skip-gram model of word2vec <ref type="bibr">[15]</ref>. The sentence embedding of any sentence S i is computed as the average of word embedding vectors for each token t &#8712; S i . Wieting et al. <ref type="bibr">[23]</ref> showed that a simple averaging over the embedding of the words in a sentence provides an effective representation for that sentence in two supervised NLP tasks: sentence similarity and sentence entailment. We do not have sentence similarity labels to train a model, and so we adopt this simple method of averaging.</p><p>We compute the similarity score for S i given query S q using cosine similarity as shown in Equation <ref type="formula">1</ref>, with V being a function that returns the SE vector for its argument.</p><p>Score S E (S q , S i ) = cos[V (S q ), V (S i )] = &#948; (S i , S q )</p><p>(1)</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Query Expansion</head><p>To create a broader and more generalized representation of the training sentence S q , we build on the well-known IR technique of query expansion (QE). For expanding S q in this problem we select a set of k &lt; n sentences from D q , Q e = {S 1 , S 2 , ..., S k } &#8834; S and combine Q e with S q to create an improved representation, S + q . We hypothesize that we can obtain a better representation of user intent with query expansion using the entity set E q from S q . As a result we construct</p><p>Thus we enrich our query representation by drawing in context of the query entity set E q . This approach is referred to as distant supervision and it is widely adopted by the NLP community <ref type="bibr">[11]</ref>. However, distant supervision is generally applied when there is much more than one example. As it is difficult to learn a statistical model using distant supervision with a single training example, we construct a query model using the distantly supervised data. We take two different approaches to query model construction.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.1">Query Expansion with Sentences (SQE).. After ob-</head><p>taining Q e , we compute the expanded representation S + q of S q using the following equation:</p><p>Informally, S + q is the weighted average of the embeddings of the sentences in Q e . The weight &#945; i of sentence S i &#8712; Q e is calculated as Score S E (S q , S i ).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>4.2.2</head><p>Query Expansion with Terms (TQE).. We compute an embedding based query language model for expanding the sentence query S q . Given a set of expansion sentences Q e , we compute the expanded query model &#952; q using:</p><p>For a sentence S i , P(t |S i ) can be estimated as 1 l en(S i ) , and &#948; (S i , S q ) can be estimated as the cosine similarity of the embedding of S i and S q . After obtaining P(t |&#952; q ), we normalize the distribution, and finally compute the expanded representation S + q of S q using:</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3">NER-based Refinement for Novelty</head><p>Our unit of retrieval is a sentence, and we want that sentence to contain novel entities of the target entity type. We use a standard NER tool to find the broad entity type from the query (e.g., person) and discard any candidate sentences that does not contain at least one unseen instance of the same broad entity type. As an example, assume that our target entity type is presidents of United States, and the broad entity type is person. Now, if a sentence at position k in the ranked list does not contain at least one new person instance, we assume that this sentence will thus not have any new president instance and we discard it to refine the ranked list. This approach has a chance to hurt some queries when the NER tagger is wrong (e.g., the type of the target entity &#322;Chicago&#382; is movie, but the broad entity type identified by the NER tagger is location), but it improves overall performance in this study. They excluded list questions that seek common entities like countries, cities, etc. An example question from the refined dataset is list all the graduates of DePaw University. For each list question the dataset provides sentences from the top 1000 documents retrieved against the question. The document ranking is provided by TREC track organizers using BM25 ranking algorithm. For reproducibility purposes, we use the same sentence dataset, even though using a more modern retrieval algorithm may result in better overall results.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">EXPERIMENTS 5.1 Experimental Setup</head><p>The dataset is particularly suitable for us considering the motivating example we provided in introduction -a user queries a search engine at first with her list information need. We call that set D q and draw our candidate set of sentences, S, from there, resulting in a sentence corpus containing an average of 25,229 sentences per query. If we randomly choose one sentence from the corpus, there is 0.7% probability of seeing a sentence that contains a target entity. Thus, only a small subset of these sentences are entity bearing ones. We found this small subset by matching all the sentences against the answer keys of the list question provided by TREC. Thus, for the list question asking about the graduates of DePaw University, we formed a large sentence corpus in which only a handful of sentences contain names of the graduates. We judged and selected our query sentence S q from those (120 sentences for 120 list queries) and evaluated the effectiveness of our approaches in retrieving the remaining sentences.</p><p>All sentences in the dataset are annotated with the Stanford Named Entity Recognizer <ref type="bibr">[9]</ref>. Each sentence term receives a label from the set {OTHER, ORGANIZATION, NUMBER, DOLLAR, LO-CATION, TIME, MISC, PERSON, ORDINAL, DURATION, PERCENT, MONEY}. Among the 120 query sentences 10, 17, 1, 51, 5 and 36 contain instances of LOCATION, ORGANIZATION, NUMBER, PER-SON, MISC, and OTHER entities, respectively. We note that 34% of the entities identified are of OTHER type, which suggests that this dataset is challenging and perhaps interesting to the NER community as well. Moreover, the task gets harder with the novelty requirement, as a sentence would not be relevant in the ranked list if it does not contain a new entity of the target entity class.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>5.1.2</head><p>Training Word Embeddings. We train word embeddings using the Stanford-NLP lemmatized form of the AQUAINT text corpus. Using the word2vec tool by Mikolov et al. <ref type="bibr">[15]</ref>, we derived 200-dimensional skipgram embeddings with a context window of 5, no negative samples, and the recommended sampling threshold of 10 -5 . The raw text of the AQUAINT corpus is 1.6 GB and it contains 517M words. These skipgram models were recommended as the most robust by Levy et al. <ref type="bibr">[14]</ref>. For reproducibility, we release our word embeddings alongside our data.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2">Experimental Results Analysis</head><p>The upper portion of Table <ref type="table">1</ref> shows a comparison of our baselines for the proposed task: SE, BM25 and Conditional Random Field (CRF) model. We found that semantic search works better than keyword based search, and classic information extraction approach, CRF fails with one training sentence and a tagged entity. SE brings 90% of the entities in the top 1000 sentence, whereas BM25 finds 82% (not reported in table). High entity recall in the top 1000 sentences is necessary for our type-refinement approach. As a neural sentence retrieval baseline, we tried to train a siamese recurrent neural network <ref type="bibr">[17]</ref> on Stanford SNLI sentence similarity corpus and applied it on our dataset for scoring candidate sentences. It performed poorly because of domain adaptation issues. We do not report those results for that reason.</p><p>The middle portion of the table shows the performance of sentence based query expansion explained in Section 4.2. Using the weight setW = {&#945; 0 , . . . , &#945; n } in Equation <ref type="formula">2</ref>, we put uniform weights on the ranked query entity-bearing sentences to compute a weighted query representation. We also used the semantic similarity scores between query sentence and expansion sentences to obtain W . Both  <ref type="table">1</ref> focuses on term based QE (TQE) approaches explained in Section 4.2.2. We used the cosine similarity score from SE as document priors for computing term weights for TQE. The first term based method set uniform weights to all top k retrieved terms and it performs significantly worse compared to the weighted version. It shows that our term weight estimation is effective. Overall, the term and sentence based methods perform comparably in terms of precision and recall. The term based method achieves significant increase in MAP at cutoff 1000.</p><p>Interestingly, when we consider P(t |S) = 1 and &#948; (S, S q ) = 1 we obtain results very close to the weighted version (Section 4.2.2). That indicates that term count in expansion sentences is the most important component of the expanded query. To understand why this happens note that for each of the expansion sentences the query entity terms are common, so they get a high probability in the final query model. After looking at the top-ranked search results we found that a majority of the sentences contain the query entity itself. It seems that the model is thus considering query entitybearing sentences from across D q and selecting the sentences with novel co-occurring entities at the top ranks.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">CONCLUSION</head><p>We proposed retrieval approaches using a single training sentence to retrieve more training data. In general, this approach can be taken for any task that suffers from data sparsity. Our approach could retrieve almost half of the target entities by considering only 20 sentences. If these sentences are annotated by a user, it might then be possible to build a model capable of doing active learning. In future work, we expect to combine IR and statistical models for active learning. A limitation of our model is its struggles to retrieve non-co-occurring entities in the top ranks. We aim to improve this model to further increase recall, perhaps by de-emphasizing the original selected target.</p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0"><p>From https://www.huffingtonpost.com/2014/07/18/killed-by-the-nypd-black-men_ n_5600045.html</p></note>
		</body>
		</text>
</TEI>
