skip to main content


Title: Building an end-to-end BAD application
Traditional big data infrastructures are passive in nature, passively answering user requests to process and return data. In many applications however, users not only need to analyze data, but also to subscribe to and actively receive data of interest, based on their subscriptions. Their interest may include the incoming data's content as well as its relationships to other data. Moreover, data delivered to subscribers may need to be enriched with additional relevant and actionable information. To address this Big Active Data (BAD) challenge we have advocated the need for building scalable BAD systems that continuously and reliably capture big data while enabling timely and automatic delivery of relevant and possibly enriched information to a large pool of subscribers. In this demo we showcase how to build an end-to-end active application using a BAD system and a standard email broker for data delivery. This includes enabling users to register their interests with the bad system, ingesting and monitoring data, and producing customized results and delivering them to the appropriate subscribers. Through this example we demonstrate that even complex active data applications can be created easily and scale to many users, considerably limiting the effort of application developers, if a BAD approach is taken.  more » « less
Award ID(s):
1924694 1925610 1831615 1838222 1901379
PAR ID:
10293598
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Proceedings of the 15th ACM International Conference on Distributed and Event-based Systems
Page Range / eLocation ID:
184 to 187
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Today, data is being actively generated by a variety of devices, services, and applications. Such data is important not only for the information that it contains, but also for its relationships to other data and to interested users. Most existing Big Data systems focus onpassivelyanswering queries from users, rather thanactivelycollecting data, processing it, and serving it to users. To satisfy both passive and active requests at scale, application developers need either to heavily customize an existing passive Big Data system or to glue one together with systems likeStreaming EnginesandPub-sub services. Either choice requires significant effort and incurs additional overhead. In this paper, we present the BAD (Big Active Data) system as an end-to-end, out-of-the-box solution for this challenge. It is designed to preserve the merits of passive Big Data systems and introduces new features for actively serving Big Data to users at scale. We show the design and implementation of the BAD system, demonstrate how BAD facilitates providing both passive and active data services, investigate the BAD system’s performance at scale, and illustrate the complexities that would result from instead providing BAD-like services with a “glued” system.

     
    more » « less
  2. null (Ed.)
    In many Big Data applications today, information needs to be actively shared between systems managed by different organizations. To enable sharing Big Data at scale, developers would have to create dedicated server programs and glue together multiple Big Data systems for scalability. Developing and managing such glued data sharing services requires a significant amount of work from developers. In our prior work, we developed a Big Active Data (BAD) system for enabling Big Data subscriptions and analytics with millions of subscribers. Based on that, we introduce a new mechanism for enabling the sharing of Big Data at scale declaratively so that developers can easily create and provide data sharing services using declarative statements and can benefit from an underlying scalable infrastructure. We show our implementation on top of the BAD system, explain the data sharing data flow among multiple systems, and present a prototype system with experimental results. 
    more » « less
  3. Across many domains, end-users need to compose computational elements into novel configurations to perform their day-to-day tasks. End-user composition is a common programming activity performed by such end-users to accomplish this composition task. While there have been many studies on end-user programming, we still need a better understanding of activities involved in end-user composition and environments to support them. In this paper we report a qualitative study of four popular composition environments belonging to diverse application domains, including: Taverna workflow environment for life sciences, Loni Pipeline for brain imaging, SimMan3G for medical simulations and Kepler for scientific simulations. We interview end-users of these environments to explore their experiences while performing common compositions tasks. We use “Content Analysis” technique to analyze these interviews to explore what are the barriers to end-user composition in these domains. Furthermore, our findings show that there are some unique differences in the requirements of naive end-users vs. expert programmers. We believe that not only are these findings useful to improve the quality of end-user composition environments, but they can also help towards development of better end-user composition frameworks. 
    more » « less
  4. Abstract Since the mid-2000s, the Argo oceanographic observational network has provided near-real-time four-dimensional data for the global ocean for the first time in history. Internet (i.e., the “web”) applications that handle the more than two million Argo profiles of ocean temperature, salinity, and pressure are an active area of development. This paper introduces a new and efficient interactive Argo data visualization and delivery web application named Argovis that is built on a classic three-tier design consisting of a front end, back end, and database. Together these components allow users to navigate 4D data on a world map of Argo floats, with the option to select a custom region, depth range, and time period. Argovis’s back end sends data to users in a simple format, and the front end quickly renders web-quality figures. More advanced applications query Argovis from other programming environments, such as Python, R, and MATLAB. Our Argovis architecture allows expert data users to build their own functionality for specific applications, such as the creation of spatially gridded data for a given time and advanced time–frequency analysis for a space–time selection. Argovis is aimed to both scientists and the public, with tutorials and examples available on the website, describing how to use the Argovis data delivery system—for example, how to plot profiles in a region over time or to monitor profile metadata. 
    more » « less
  5. Name-based pub/sub allows for efficient and timely delivery of information to interested subscribers. A challenge is assigning the right name to each piece of content, so that it reaches the most relevant recipients. An example scenario is the dissemination of social media posts to first responders during disasters. We present FLARE, a framework using federated active learning assisted by naming. FLARE integrates machine learning and name-based pub/sub for accurate timely delivery of textual information. In this demo, we show FLARE’s operation. 
    more » « less