skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Poster: Circa: Re-imagining Network Telemetry from an Approximation-First Perspective
Telemetry systems are widely used to collect data from distributed endpoints, analyze data in conjunction to gain valuable insights, and store data for historical analytics. These systems consist of four stages (Figure 1): collection, transmission, analysis, and storage. Collectors at the endpoint collect various types of data, which is then transmitted to a central server for analysis. This data is used for multiple downstream tasks, such as dashboard monitoring and anomaly detection. Finally, this data is stored in long-term storage to aid retrospective analytics and debugging.  more » « less
Award ID(s):
2106214
PAR ID:
10542170
Author(s) / Creator(s):
;
Publisher / Repository:
ACM SIGCOMM Posters
Date Published:
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Summary Remote sensing of plant traits and their environment facilitates non‐invasive, high‐throughput monitoring of the plant's physiological characteristics. However, voluminous observational data generated by such autonomous sensor networks overwhelms scientific users when they have to analyze the data. In order to provide a scalable and effective analysis environment, there is a need for storage and analytics that support high‐throughput data ingestion while preserving spatiotemporal and sensor‐specific characteristics. Also, the framework should enable modelers and scientists to run their analytics while coping with the fast and continuously evolving nature of the dataset. In this paper, we present Radix+ , a high‐throughput distributed data storage system for supporting scalable georeferencing, and interactive query‐based spatiotemporal analytics with trackable data integrity. We include empirical evaluations performed on a commodity machine cluster with up to 1 TB of data. Our benchmarks demonstrate subsecond latency for majority of our evaluated queries and improvement in data ingestion rate over systems such as Geomesa. 
    more » « less
  2. Large-scale applications typically spend a large fraction of their execution time performing I/O to a parallel storage system. However, with rapid progress in compute and storage system stack of large-scale systems, it is critical to investigate and update our understanding of the I/O behavior of large-scale applications. Toward that end, in this work, we monitor, collect and analyze a year worth of storage system data from a large-scale production parallel storage system. We perform temporal, spatial and correlative analysis of the system and uncover surprising patterns which defy existing assumptions and have important implications for future systems. 
    more » « less
  3. Since its emergence, the cloud manufacturing concept has been transforming the manufacturing and remanufacturing industry into a big data and service-oriented environment. The aggressive push toward data collection in cloud-based and cyber-physical systems provides both challenges and opportunities for predictive analytics. One of the key applications of predictive analytics in such domains is predictive quality management that aims to fully exploit the potentials provided by the enormous data collected via cloud-based systems. As a case study, a data set of hard disk drives’ Self-Monitoring, Analysis and Reporting Technology (SMART) attributes from a cloud-storage service provider has been analyzed to derive some insights about the challenges and opportunities of using product lifecycle data. An analysis of time-to-failure monitoring of hard disk drives in real-time has been carried out and the corresponding challenges have been discussed. 
    more » « less
  4. There has been a proliferation of mobile apps in the Medical, as well as Health&Fitness categories. These apps have a wide audience, from medical providers, to patients, to end users who want to track their fitness goals. The low barrier to entry on mobile app stores raises questions about the diligence and competence of the developers who publish these apps, especially regarding the practices they use for user data collection, processing, and storage. To help understand the nature of data that is collected, and how it is processed, as well as where it is sent, we developed a tool named PIT (Personal Information Tracker) and made it available as open source. We used PIT to perform a multi-faceted study on 2832 Android apps: 2211 Medical apps and 621 Health&Fitness apps. We first define Personal Information (PI) as 17 different groups of sensitive information, e.g., user’s identity, address and financial information, medical history or anthropometric data. PIT first extracts the elements in the app’s User Interface (UI) where this information is collected. The collected information could be processed by the app’s own code or third-party code; our approach disambiguates between the two. Next, PIT tracks, via static analysis, where the information is “leaked”, i.e., it escapes the scope of the app, either locally on the phone or remotely via the network. Then, we conduct a link analysis that examines the URLs an app connects with, to understand the origin and destination of data that apps collect and process. We found that most apps leak 1–5 PI items (email, credit card, phone number, address, name, being the most frequent). Leak destinations include the network (25%), local databases (37%), logs (23%), and files or I/O (15%). While Medical apps have more leaks overall, as they collect data on medical history, surprisingly, Health&Fitness apps also collect, and leak, medical data. We also found that leaks that are due to third-party code (e.g., code for ads, analytics, or user engagement) are much more numerous (2x–12x) than leaks due to app’s own code. Finally, our link analysis shows that most apps access 20–80 URLs (typically third-party URLs and Cloud APIs) though some apps could access more than 1,000 URLs. 
    more » « less
  5. USENIX (Ed.)
    We present Cloudscape, a dataset of nearly 400 cloud archi- tectures deployed on AWS. We perform an in-depth analysis of the usage of storage services in cloud systems. Our findings include: S3 is the most prevalent storage service (68%), while file system services are rare (4%); heterogeneity is common in the storage layer; storage services primarily interface with Lambda and EC2, while also serving as the foundation for more specialized ML and analytics services. Our findings provide a concrete understanding of how storage services are deployed in real-world cloud architectures, and our analysis of the popularity of different services grounds existing research. 
    more » « less