skip to main content


Title: M-DB: A Continuous Data Processing and Monitoring Framework for IoT Applications
IoT devices influence many different spheres of society and are predicted to have a huge impact on our future. Extracting real-time insights from diverse sensor data and dealing with the underlying uncertainty of sensor data are two main challenges of the IoT ecosystem In this paper, we propose a data processing architecture, M-DB, to effectively integrate and continuously monitor uncertain and diverse IoT data. M-DB constitutes of three components:(1) model-based operators (MBO) as data management abstractions for IoT application developers to integrate data from diverse sensors. Model-based operators can support event-detection and statistical aggregation operators,(2) M-Stream, a dataflow pipeline that combines model-based operators to perform computations reflecting the uncertainty of underlying data, and (3) M-Store, a storage layer separating the computation of application logic from physical sensor data management, to effectively deal with missing or delayed sensor data. M-DB is designed and implemented over Apache Storm and Apache Kafka, two open-source distributed event processing systems. Our illustrated application examples throughout the paper and evaluation results illustrate that M-DB provides a realtime data-processing architecture that can cater to the diverse needs of IoT applications.  more » « less
Award ID(s):
1815733
NSF-PAR ID:
10113698
Author(s) / Creator(s):
Date Published:
Journal Name:
IEEE International Conference on Internet of Things
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Large-scale multiuser scientific facilities, such as geographically distributed observatories, remote instruments, and experimental platforms, represent some of the largest national investments and can enable dramatic advances across many areas of science. Recent examples of such advances include the detection of gravitational waves and the imaging of a black hole’s event horizon. However, as the number of such facilities and their users grow, along with the complexity, diversity, and volumes of their data products, finding and accessing relevant data is becoming increasingly challenging, limiting the potential impact of facilities. These challenges are further amplified as scientists and application workflows increasingly try to integrate facilities’ data from diverse domains. In this paper, we leverage concepts underlying recommender systems, which are extremely effective in e-commerce, to address these data-discovery and data-access challenges for large-scale distributed scientific facilities. We first analyze data from facilities and identify and model user-query patterns in terms of facility location and spatial localities, domain-specific data models, and user associations. We then use this analysis to generate a knowledge graph and develop the collaborative knowledge-aware graph attention network (CKAT) recommendation model, which leverages graph neural networks (GNNs) to explicitly encode the collaborative signals through propagation and combine them with knowledge associations. Moreover, we integrate a knowledge-aware neural attention mechanism to enable the CKAT to pay more attention to key information while reducing irrelevant noise, thereby increasing the accuracy of the recommendations. We apply the proposed model on two real-world facility datasets and empirically demonstrate that the CKAT can effectively facilitate data discovery, significantly outperforming several compelling state-of-the-art baseline models. 
    more » « less
  2. Virtual Reality (VR)-based Learning Environments (VRLEs) are gaining popularity due to the wide availability of cloud and its edge (a.k.a. fog) technologies and high-speed networks. Thus, there is a need to investigate Internet-of-Things (IoT)-based application design concepts within social VRLEs to offer scalable, cost-efficient services that adapt to dynamic cloud/fog system conditions. In this paper, we investigate the costperformance trade-offs for an IoT-based application that integrates large-scale sensor data from Social VRLEs and coordinates the real-time data processing and visualization across cloud/fog platforms. To facilitate dynamic performance adaptation of the IoT-based application with increased user scale, we present a set of cost-aware adaptive control rules. The implementation of the rules is based on an analytical queuing model that determines the performance states of the IoT-based application, given the current workload and the allocated cloud/fog resources. Using the IoTbased application in an exemplar VRLE use case, we evaluate the cost-performance trade-offs with three system architectures i.e., cloud-only, edge-only and edge-cloud architectures. Experiment results illustrate the best/worst practices in the cost-performance trade-offs for a range of simulated IoT scenarios involving monitoring user emotional data collected by using brain sensors. Our results also detail the impact of the system architecture selection, and the benefits in enabling feedback about student emotions to instructors during Social VR learning sessions. Lastly, we show the benefits of integrating our model-based feedback control in maximizing IoT-based application performance while keeping the associated costs at a minimum level. 
    more » « less
  3. Virtual Reality (VR)-based Learning Environments (VRLEs) are gaining popularity due to the wide availability of cloud and its edge (a.k.a. fog) technologies and high-speed networks. Thus, there is a need to investigate Internet-of-Things (IoT)-based application design concepts within social VRLEs to offer scalable, cost-efficient services that adapt to dynamic cloud/fog system conditions. In this paper, we investigate the costperformance trade-offs for an IoT-based application that integrates large-scale sensor data from Social VRLEs and coordinates the real-time data processing and visualization across cloud/fog platforms. To facilitate dynamic performance adaptation of the IoT-based application with increased user scale, we present a set of cost-aware adaptive control rules. The implementation of the rules is based on an analytical queuing model that determines the performance states of the IoT-based application, given the current workload and the allocated cloud/fog resources. Using the IoTbased application in an exemplar VRLE use case, we evaluate the cost-performance trade-offs with three system architectures i.e., cloud-only, edge-only and edge-cloud architectures. Experiment results illustrate the best/worst practices in the cost-performance trade-offs for a range of simulated IoT scenarios involving monitoring user emotional data collected by using brain sensors. Our results also detail the impact of the system architecture selection, and the benefits in enabling feedback about student emotions to instructors during Social VR learning sessions. Lastly, we show the benefits of integrating our model-based feedback control in maximizing IoT-based application performance while keeping the associated costs at a minimum level. 
    more » « less
  4. Data-intensive applications are becoming commonplace in all science disciplines. They are comprised of a rich set of sub-domains such as data engineering, deep learning, and machine learning. These applications are built around efficient data abstractions and operators that suit the applications of different domains. Often lack of a clear definition of data structures and operators in the field has led to other implementations that do not work well together. The HPTMT architecture that we proposed recently, identifies a set of data structures, operators, and an execution model for creating rich data applications that links all aspects of data engineering and data science together efficiently. This paper elaborates and illustrates this architecture using an end-to-end application with deep learning and data engineering parts working together. Our analysis show that the proposed system architecture is better suited for high performance computing environments compared to the current big data processing systems. Furthermore our proposed system emphasizes the importance of efficient compact data structures such as Apache Arrow tabular data representation defined for high performance. Thus the system integration we proposed scales a sequential computation to a distributed computation retaining optimum performance along with highly usable application programming interface. 
    more » « less
  5. While cloud computing is the current standard for outsourcing computation, it can be prohibitively expensive for cities and infrastructure operators to deploy services. At the same time, there are underutilized computing resources within cities and local edge-computing deployments. Using these slack resources may enable significantly lower pricing than comparable cloud computing; such resources would incur minimal marginal expenditure since their deployment and operation are mostly sunk costs. However, there are challenges associated with using these resources. First, they are not effectively aggregated or provisioned. Second, there is a lack of trust between customers and suppliers of computing resources, given that they are distinct stakeholders and behave according to their own interests. Third, delays in processing inputs may diminish the value of the applications. To resolve these challenges, we introduce an architecture combining a distributed trusted computing mechanism, such as a blockchain, with an efficient messaging system like Apache Pulsar. Using this architecture, we design a decentralized computation market where customers and suppliers make offers to deploy and host applications. The proposed architecture can be realized using any trusted computing mechanism that supports smart contracts, and any messaging framework with the necessary features. This combination ensures that the market is robust without incurring the input processing delays that limit other blockchain-based solutions. We evaluate the market protocol using game-theoretic analysis to show that deviation from the protocol is discouraged. Finally, we assess the performance of a prototype implementation based on experiments with a streaming computer-vision application. 
    more » « less