skip to main content


Search for: All records

Award ID contains: 1213013

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Faceted interfaces are omnipresent on the web to support data exploration and filtering. A facet is a triple: a domain (e.g., Book), a property (e.g., author, language), and a set of property values (e.g., Austen, Beauvoir, Coelho, Dostoevsky, Eco, Kerouac, Suskind, ..., French, English, German, Italian, Portuguese, Russian, ... ). Given a property (e.g., language), selecting one or more of its values (English and Italian) returns the domain entities (of type Book) that match the given values (the books that are written in English or Italian). To implement faceted interfaces in a way that is scalable to very large datasets, it is necessary to automate facet extraction. Prior work associates a facet domain with a set of homogeneous values, but does not annotate the facet property. In this paper, we annotate the facet property with a predicate from a reference Knowledge Base (KB) so as to maximize the semantic similarity between the property and the predicate. We define semantic similarity in terms of three new metrics: specificity, coverage, and frequency. Our experimental evaluation uses the DBpedia and YAGO KBs and shows that for the facet annotation problem, we obtain better results than a state-of-the-art approach for the annotation of web tables as modified to annotate a set of values. 
    more » « less
  2. The digitization of legacy infrastructure constitutes an important component of smart cities. While most cities worldwide possess digital maps of their transportation infrastructure, few have accurate digital information on their electric, natural gas, telecom, water, wastewater, and district heating and cooling systems. Digitizing data on legacy infrastructure systems comes with several challenges such as missing data, data conversion issues, data inconsistency, differences in the data format, spatio-temporal resolutions, structure, semantics and syntax, difficulty in providing controlled access to the datasets, and so on. Therefore, we introduce GUIDES, a new data conversion and management framework for urban infrastructure systems, which is comprised of big data analytics, efficient data management techniques, semantic web technologies, methods to ensure information security, and tools that aid visual analytics. The proposed framework facilitates: (i) mapping of urban infrastructure systems; (ii) integration of heterogeneous geospatial data; (iii) a secured way of storing, analyzing and querying data while preserving the semantics; (iv) qualitative and quantitative analysis over several spatio-temporal resolutions; and (v) visualization of static (e.g., land use) and dynamic (e.g., road traffic) information. 
    more » « less
  3. Cities are actively creating open data portals to enable predictive analytics of urban data. However, the large number of observable patterns that can be extracted by techniques such as Association Rule Mining (ARM) makes the task of sifting through patterns a tedious and time-consuming task. In this paper, we explore the use of domain ontologies to: (i) filter and prune rules that are specific variations of a more general concept in the ontology, and (ii) replace specific rules by a single "general" rule, with the intent to downsize the number of general rules while keeping the semantics of the larger generated set. We show how the combination of several methods reduces significantly the number of rules thus effectively allowing city administrators to use open data to understand patterns, use patterns for decision-making, and better direct limited government resources. 
    more » « less
  4. This paper describes TRIPLEX-ST, a novel information extraction system for collecting spatio-temporal information from textual resources. TRIPLEX-ST is based on a distantly supervised approach, which leverages rich linguistic annotations together with information in existing knowledge bases. In particular, we leverage triples associated with temporal and/or spatial contexts, e.g., as available from the YAGO knowledge base, so as to infer templates that capture new facts from previously unseen sentences. 
    more » « less
  5. AgreementMakerLight (AML) is an automated ontology matching system based primarily on element-level matching and on the use of external resources as background knowledge. This paper describes its configuration for the OAEI 2016 competition and discusses its results. For this OAEI edition, we tackled instance matching for the first time, thus expanding the coverage of AML to all types of ontology matching tasks. We also explored OBO logical definitions to match ontologies for the first time in the OAEI. AML was the top performing system in five tracks (including the Instance and instance-based Process Model tracks) and one of the top performing systems in three others (including the novel Disease and Phenotype track, in which it was one of three prize recipients). 
    more » « less