skip to main content


Title: Selective Sampling for Sensor Type Classification in Buildings
A key barrier to applying any smart technology to a building is the requirement of locating and connecting to the necessary resources among the thousands of sensing and control points, i.e., the metadata mapping problem. Existing solutions depend on exhaustive manual annotation of sensor metadata - a laborious, costly, and hardly scalable process. To reduce the amount of manual effort required, this paper presents a multi-oracle selective sampling framework to leverage noisy labels from information sources with unknown reliability such as existing buildings, which we refer to as weak oracles, for metadata mapping. This framework involves an interactive process, where a small set of sensor instances are progressively selected and labeled for it to learn how to aggregate the noisy labels as well as to predict sensor types. Two key challenges arise in designing the framework, namely, weak oracle reliability estimation and instance selection for querying. To address the first challenge, we develop a clustering-based approach for weak oracle reliability estimation to capitalize on the observation that weak oracles perform differently in different groups of instances. For the second challenge, we propose a disagreement-based query selection strategy to combine the potential effect of a labeled instance on both reducing classifier uncertainty and improving the quality of label aggregation. We evaluate our solution on a large collection of real-world building sensor data from 5 buildings with more than 11, 000 sensors of 18 different types. The experiment results validate the effectiveness of our solution, which outperforms a set of state-of-the-art baselines.  more » « less
Award ID(s):
1718216 1940291
NSF-PAR ID:
10177155
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
2020 19th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN)
Page Range / eLocation ID:
241 to 252
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Precise and eloquent label information is fundamental for interpreting the underlying data distributions distinctively and training of supervised and semi-supervised learning models adequately. But obtaining large amount of labeled data demands substantial manual effort. This obligation can be mitigated by acquiring labels of most informative data instances using Active Learning. However labels received from humans are not always reliable and poses the risk of introducing noisy class labels which will degrade the efficacy of a model instead of its improvement. In this paper, we address the problem of annotating sensor data instances of various Activities of Daily Living (ADLs) in smart home context. We exploit the interactions between the users and annotators in terms of relationships spanning across spatial and temporal space which accounts for an activity as well. We propose a novel annotator selection model SocialAnnotator which exploits the interactions between the users and annotators and rank the annotators based on their level of correspondence. We also introduce a novel approach to measure this correspondence distance using the spatial and temporal information of interactions, type of the relationships and activities. We validate our proposed SocialAnnotator framework in smart environments achieving ≈ 84% statistical confidence in data annotation 
    more » « less
  2. null (Ed.)
    Recent advances in weakly supervised learn- ing enable training high-quality text classifiers by only providing a few user-provided seed words. Existing methods mainly use text data alone to generate pseudo-labels despite the fact that metadata information (e.g., author and timestamp) is widely available across various domains. Strong label indicators exist in the metadata and it has been long overlooked mainly due to the following challenges: (1) metadata is multi-typed, requiring systematic modeling of different types and their combinations, (2) metadata is noisy, some metadata entities (e.g., authors, venues) are more compelling label indicators than others. In this paper, we propose a novel framework, META, which goes beyond the existing paradigm and leverages metadata as an additional source of weak supervision. Specifically, we organize the text data and metadata together into a text-rich network and adopt network motifs to capture appropriate combinations of metadata. Based on seed words, we rank and filter motif instances to distill highly label-indicative ones as “seed motifs”, which provide additional weak supervision. Following a boot-strapping manner, we train the classifier and expand the seed words and seed motifs iteratively. Extensive experiments and case studies on real-world datasets demonstrate superior performance and significant advantages of leveraging metadata as weak supervision. 
    more » « less
  3. null (Ed.)
    Sensor metadata tagging, akin to the named entity recognition task, provides key contextual information (e.g., measurement type and location) about sensors for running smart building applications. Unfortunately, sensor metadata in different buildings often follows dis- tinct naming conventions. Therefore, learning a tagger currently requires extensive annotations on a per building basis. In this work, we propose a novel framework, SeNsER, which learns a sensor metadata tagger for a new building based on its raw metadata and some existing fully annotated building. It leverages the commonality between different buildings: At the character level, it employs bidirectional neural language models to capture the shared underlying patterns between two buildings and thus regularizes the feature learning process; At the word level, it leverages as features the k-mers existing in the fully annotated building. During inference, we further incorporate the information obtained from sources such as Wikipedia as prior knowledge. As a result, SeNsER shows promising results in extensive experiments on multiple real-world buildings. 
    more » « less
  4. The recent advances in the automation of metadata normalization and the invention of a unified schema --- Brick --- alleviate the metadata normalization challenge for deploying portable applications across buildings. Yet, the lack of compatibility between existing metadata normalization methods precludes the possibility of comparing and combining them. While generic machine learning (ML) frameworks, such as MLJAR and OpenML, provide versatile interfaces for standard ML problems, they cannot easily accommodate the metadata normalization tasks for buildings due to the heterogeneity in the inference scope, type of data required as input, evaluation metric, and the building-specific human-in-the-loop learning procedure. We propose Plaster, an open and modular framework that incorporates existing advances in building metadata normalization. It provides unified programming interfaces for various types of learning methods for metadata normalization and defines standardized data models for building metadata and timeseries data. Thus, it enables the integration of different methods via a workflow, benchmarking of different methods via unified interfaces, and rapid prototyping of new algorithms. With Plaster, we 1) show three examples of the workflow integration, delivering better performance than individual algorithms, 2) benchmark/analyze five algorithms over five common buildings, and 3) exemplify the process of developing a new algorithm involving time series features. We believe Plaster will facilitate the development of new algorithms and expedite the adoption of standard metadata schema such as Brick, in order to enable seamless smart building applications in the future. 
    more » « less
  5. Modern buildings are instrumented with thousands of sensing and control points. The ability to automatically extract the physical context of each point, e.g., the type, location, and relationship with other points, is the key to enabling building analytics at scale. However, this process is costly as it usually requires domain expertise with a deep understanding of the building system and its point naming scheme. In this study, we aim to reduce the human effort required for mapping sensors to their context, i.e., metadata mapping. We formulate the problem as a sequential labeling process and use the conditional random field to exploit the regular and dependent structures observed in the metadata. We develop a suite of active learning strategies to adaptively select the most informative subsequences in point names for human labeling, which significantly reduces the inputs from domain experts. We evaluated our approach on three different buildings and observed encouraging performance in metadata mapping from the proposed solution. 
    more » « less