Abstract. Many geoportals such as ArcGIS Online are established with the goal of improving geospatial data reusability and achieving intelligent knowledge discovery. However, according to previous research, most of the existing geoportals adopt Lucene-based techniques to achieve their core search functionality, which has a limited ability to capture the user’s search intentions. To better understand a user’s search intention, query expansion can be used to enrich the user’s query by adding semantically similar terms. In the context of geoportals and geographic information retrieval, we advocate the idea of semantically enriching a user’s query from both geospatial and thematic perspectives. In the geospatial aspect, we propose to enrich a query by using both place partonomy and distance decay. In terms of the thematic aspect, concept expansion and embedding-based document similarity are used to infer the implicit information hidden in a user’s query. This semantic query expansion framework is implemented as a semantically-enriched search engine using ArcGIS Online as a case study. A benchmark dataset is constructed to evaluate the proposed framework. Our evaluation results show that the proposed semantic query expansion framework is very effective in capturing a user’s search intention and significantly outperforms a well-established baseline – Lucene’s practical scoring function – with more than 3.0 increments in DCG@K (K=3,5,10).
more »
« less
A Query Understanding Framework for Earth Data Discovery
One longstanding complication with Earth data discovery involves understanding a user’s search intent from the input query. Most of the geospatial data portals use keyword-based match to search data. Little attention has focused on the spatial and temporal information from a query or understanding the query with ontology. No research in the geospatial domain has investigated user queries in a systematic way. Here, we propose a query understanding framework and apply it to fill the gap by better interpreting a user’s search intent for Earth data search engines and adopting knowledge that was mined from metadata and user query logs. The proposed query understanding tool contains four components: spatial and temporal parsing; concept recognition; Named Entity Recognition (NER); and, semantic query expansion. Spatial and temporal parsing detects the spatial bounding box and temporal range from a query. Concept recognition isolates clauses from free text and provides the search engine phrases instead of a list of words. Name entity recognition detects entities from the query, which inform the search engine to query the entities detected. The semantic query expansion module expands the original query by adding synonyms and acronyms to phrases in the query that was discovered from Web usage data and metadata. The four modules interact to parse a user’s query from multiple perspectives, with the goal of understanding the consumer’s quest intent for data. As a proof-of-concept, the framework is applied to oceanographic data discovery. It is demonstrated that the proposed framework accurately captures a user’s intent.
more »
« less
- Award ID(s):
- 1841520
- PAR ID:
- 10398254
- Date Published:
- Journal Name:
- Applied Sciences
- Volume:
- 10
- Issue:
- 3
- ISSN:
- 2076-3417
- Page Range / eLocation ID:
- 1127
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)Inferring the set name of semantically grouped entities is useful in many tasks related to natural language processing and information retrieval. Previous studies mainly draw names from knowledge bases to ensure high quality, but that limits the candidate scope. We propose an unsupervised framework, AutoName, that exploits large-scale text corpora to name a set of query entities. Specifically, it first extracts hypernym phrases as candidate names from query-related documents via probing a pre-trained language model. A hierarchical density-based clustering is then applied to form potential concepts for these candidate names. Finally, AutoName ranks candidates and picks the top one as the set name based on constituents of the phrase and the semantic similarity of their concepts. We also contribute a new benchmark dataset for this task, consisting of 130 entity sets with name labels. Experimental results show that AutoName generates coherent and meaningful set names and significantly outperforms all compared methods. Further analyses show that AutoName is able to offer explanations for extracted names using the sentences most relevant to the corresponding concept.more » « less
-
We introduce the concept of semantic fast-forwarding of video streams for efficient labeling of training data for activity recognition. We show that this concept can be realized by combining deep learning within individual frames, with spatial and temporal entity-relationship reasoning about detected objects. We describe a prototype that implements this concept, and present preliminary experimental results on its feasibility and value.more » « less
-
null; null; null (Ed.)Using entity aspect links, we improve upon the current state-of-the-art in entity retrieval. Entity retrieval is the task of retrieving relevant entities for search queries, such as "Antibiotic Use In Livestock". Entity aspect linking is a new technique to refine the semantic information of entity links. For example, while passages relevant to the query above may mention the entity "USA", there are many aspects of the USA of which only few, such as "USA/Agriculture", are relevant for this query. By using entity aspect links that indicate which aspect of an entity is being referred to in the context of the query, we obtain more specific relevance indicators for entities. We show that our approach improves upon all baseline methods, including the current state-of-the-art using a standard entity retrieval test collection. With this work, we release a large collection of entity-aspect-links for a large TREC corpus.more » « less
-
Humans subconsciously engage in geospatial reasoning when reading articles. We recognize place names and their spatial relations in text and mentally associate them with their physical locations on Earth. Although pretrained language models can mimic this cognitive process using linguistic context, they do not utilize valuable geospatial information in large, widely available geographical databases, e.g., OpenStreetMap. This paper introduces GeoLM, a geospatially grounded language model that enhances the understanding of geo-entities in natural language. GeoLM leverages geo-entity mentions as anchors to connect linguistic information in text corpora with geospatial information extracted from geographical databases. GeoLM connects the two types of context through contrastive learning and masked language modeling. It also incorporates a spatial coordinate embedding mechanism to encode distance and direction relations to capture geospatial context. In the experiment, we demonstrate that GeoLM exhibits promising capabilities in supporting toponym recognition, toponym linking, relation extraction, and geo-entity typing, which bridge the gap between natural language processing and geospatial sciences. The code is publicly available at https://github.com/knowledge-computing/geolm.more » « less
An official website of the United States government

