skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Transparent DIFC: Harnessing Innate Application Event Logging for Fine-Grained Decentralized Information Flow Control
Information flow control is a canonical approach to access control in systems, allowing administrators to assure confidentiality and integrity through restricting the flow of data. Decentralized Information Flow Control (DIFC) harnesses application-layer semantics to allow more precise and accurate mediation of data. Unfortunately, past approaches to DIFC have depended on dedicated instrumentation efforts or developer buy-in. Thus, while DIFC has existed for decades, it has seen little-to-no adoption in commodity systems; the requirement for complete redesign or retrofitting of programs has proven too high a barrier. In this work, we make the surprising observation that developers have already unwittingly performed the instrumentation efforts required for DIFC — application event logging, a software development best practice used for telemetry and debugging, often contains the information needed to identify application-layer event processes that DIFC mediates. We present T-difc, a kernel-layer reference monitor framework that leverages the insights of application event logs to perform precise decentralized flow control. T-difc identifies and extracts these application events as they are created by monitoring application I/O to log files, then references an administrator-specified security policy to assign data labels and mediate the flow of data through the system. To our knowledge, T-difc is the first approach to DIFC that does not require developer support or custom instrumentation. In a survey of 15 popular open source applications, we demonstrate that T-difc works seamlessly on a variety of popular open source programs while imposing negligible runtime overhead on realistic policies and workloads. Thus, T-difc demonstrates a transparent and non-invasive path forward for the dissemination of decentralized information flow controls.  more » « less
Award ID(s):
1750024 2055127
PAR ID:
10346425
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Transparent DIFC: Harnessing Innate Application Event Logging for Fine-Grained Decentralized Information Flow Control
Page Range / eLocation ID:
487 to 501
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Debugging big data analytics often requires a root cause analysis to pinpoint the precise culprit records in an input dataset responsible for incorrect or anomalous output. Existing debugging or data provenance approaches do not track fine-grained control and data flows in user-defined application code; thus, the returned culprit data is often too large for manual inspection and expensive post-mortem analysis is required. We design FlowDebug to identify a highly precise set of input records based on two key insights. First, FlowDebug precisely tracks control and data flow within user-defined functions to propagate taints at a fine-grained level by inserting custom data abstractions through automated source to source transformation. Second, it introduces a novel notion of influence-based provenance for many-to-one dependencies to prioritize which input records are more responsible than others by analyzing the semantics of a user-defined function used for aggregation. By design, our approach does not require any modification to the framework's runtime and can be applied to existing applications easily. FlowDebug significantly improves the precision of debugging results by up to 99.9 percentage points and avoids repetitive re-runs required for post-mortem analysis by a factor of 33 while incurring an instrumentation overhead of 0.4X - 6.1X on vanilla Spark. 
    more » « less
  2. The diffusion of information about open-source projects is a key factor influencing the adoption of projects and the allocation of developer efforts. Developers learn about new projects, and evaluate their quality and importance by accessing the related information. Social media is an important channel for information diffusion about open-source projects, with previous research suggesting the existence of a social media ecosystem that consists of multiple platforms and collectively supports information diffusion in open source. With different features supporting information diffusion, the same piece of information likely reaches different developer communities on different platforms, which attracts the attention and contribution of different developers and thus influences the success of open-source projects. Despite its importance, few works looked at the identity of the developer community that projectrelated information reaches on social media platforms and its associated impact on the discussed project. In this work, we track social media discussions on open-source projects on three different platforms: Twitter, HackerNews, and Reddit. We first describe the dynamics of project-related information diffusion across platforms, and we analyze the association between the number of posts on each platform, and the number of developers attracted to the discussed project from different communities. We find that posts about open-source projects first appear on Twitter and HackerNews, then move more towards Reddit. The number of project-related posts on Twitter mostly associate with the attracted developers from communities that are close to the project’s main contributor, while posts on other platforms associate more with the attention from remote communities. 
    more » « less
  3. Cryptocurrencies are a significant development in recent years, featuring in global news, the financial sector, and academic research. They also hold a significant presence in open source development, comprising some of the most popular repositories on GitHub. Their openly developed software artifacts thus present a unique and exclusive avenue to quantitatively observe human activity, effort, and software growth for cryptocurrencies. Our data set marks the first concentrated effort toward high-fidelity panel data of cryptocurrency development for a wide range of metrics. The data set is foremost a quantitative measure of developer activity for budding open source cryptocurrency development. We collect metrics like daily commits, contributors, lines of code changes, stars, forks, and subscribers. We also include financial data for each cryptocurrency: the daily price and market capitalization. The data set includes data for 236 cryptocurrencies for 380 days (roughly January 2018 to January 2019). We discuss particularly interesting research opportunities for this combination of data, and release new tooling to enable continuing data collection for future research opportunities as development and application of cryptocurrencies mature. 
    more » « less
  4. Indirect calls, while facilitating dynamic execution characteristics in C and C++ programs, impose challenges on precise construction of the control-flow graphs (CFG). This hinders effective program analyses for bug detection (e.g., fuzzing) and program protection (e.g., control-flow integrity). Solutions using data-tracking and type-based analysis are proposed for identifying indirect call targets, but are either time-consuming or imprecise for obtaining the analysis results. Multi-layer type analysis (MLTA), as the state-of-the-art approach, upgrades type-based analysis by leveraging multi-layer type hierarchy, but their solution to dealing with the information flow between multi-layer types introduces false positives. In this paper, we propose strong multi-layer type analysis (SMLTA) and implement the prototype, DEEPTYPE, to further refine indirect call targets. It adopts a robust solution to record and retrieve type information, avoiding information loss and enhancing accuracy. We evaluate DEEPTYPE on Linux kernel, 5 web servers, and 14 user applications. Compared to TypeDive, the prototype of MLTA, DEEPTYPE is able to narrow down the scope of indirect call targets by 43.11% on average across most benchmarks and reduce runtime overhead by 5.45% to 72.95%, which demonstrates the effectiveness, efficiency and applicability of SMLTA. 
    more » « less
  5. null (Ed.)
    Third-party security analytics allow companies to outsource threat monitoring tasks to teams of experts and avoid the costs of in-house security operations centers. By analyzing telemetry data from many clients these services are able to offer enhanced insights, identifying global trends and spotting threats before they reach most customers. Unfortunately, the aggregation that drives these insights simultaneously risks exposing sensitive client data if it is not properly sanitized and tracked. In this work, we present SCIFFS, an automated information flow monitoring framework for preventing sensitive data exposure in third-party security analytics platforms. SCIFFS performs decentralized information flow control over customer data it in a serverless setting, leveraging the innate polyinstantiated nature of serverless functions to assure precise and lightweight tracking of data flows. Evaluating SCIFFS against a proof-of-concept security analytics framework on the widely-used OpenFaaS platform, we demonstrate that our solution supports common analyst workflows data ingestion, custom dashboards, threat hunting) while imposing just 3.87% runtime overhead on event ingestion and the overhead on aggregation queries grows linearly with the number of records in the database (e.g., 18.75% for 50,000 records and 104.27% for 500,000 records) as compared to an insecure baseline. Thus, SCIFFS not only establishes a privacy-respecting model for third-party security analytics, but also highlights the opportunities for security-sensitive applications in the serverless computing model. 
    more » « less