skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: HyperGI: Automated Detection and Repair of Information Flow Leakage
Maintaining confidential information control in software is a persistent security problem where failure means secrets can be revealed via program behaviors. Information flow control techniques traditionally have been based on static or symbolic analyses — limited in scalability and specialized to particular languages. When programs do leak secrets there are no approaches to automatically repair them unless the leak causes a functional test to fail. We present our vision for HyperGI, a genetic improvement framework that detects, localizes and repairs information leakage. Key elements of HyperGI include (1) the use of two orthogonal test suites, (2) a dynamic leak detection approach which estimates and localizes potential leaks, and (3) a repair component that produces a candidate patch using genetic improvement. We demonstrate the successful use of HyperGI on several programs with no failing functional test cases. We manually examine the resulting patches and identify trade-offs and future directions for fully realizing our vision.  more » « less
Award ID(s):
1901543 1909688
PAR ID:
10310886
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
IEEEACM International Conference on Automated Software Engineering
ISSN:
2643-1572
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Fault localization is essential to software maintenance tasks such as testing and automated program repair. Many fault localization techniques have been developed, the most common of which are spectrum-based. Most techniques have been designed for traditional programming paradigms that map passing and failing test cases to lines or branches of code, hence specialized programming paradigms which utilize different code abstractions may fail to localize well. In this paper, we study fault localization in the context of a class of programs, molecular programs. Recent research has designed automated testing and repair frameworks for these programs but has ignored the importance of fault localization. As we demonstrate, using existing spectrum-based approaches may not provide much information. Instead we propose a novel approach, Traceback, that leverages temporal trace data. In an empirical study on a set of 89 faulty program variants, we demonstrate that Traceback provides between a 32-90\% improvement in localization over reaction-based mapping, a direct translation of spectrum-based localization. We see little difference in parameter tuning of Traceback when all tests, or only code-based (invariant) tests are used, however the best depth and weight parameters vary when using specification based tests, which can be either functional or metamorphic. Overall, invariant-based tests provide the best localization results (either alone or in combination with others), followed by metamorphic and then functional tests. 
    more » « less
  2. null (Ed.)
    We introduce Blade, a new approach to automatically and efficiently eliminate speculative leaks from cryptographic code. Blade is built on the insight that to stop leaks via speculative execution, it suffices to cut the dataflow from expressions that speculatively introduce secrets ( sources ) to those that leak them through the cache ( sinks ), rather than prohibit speculation altogether. We formalize this insight in a static type system that (1) types each expression as either transient , i.e., possibly containing speculative secrets or as being stable , and (2) prohibits speculative leaks by requiring that all sink expressions are stable. Blade relies on a new abstract primitive, protect , to halt speculation at fine granularity. We formalize and implement protect using existing architectural mechanisms, and show how Blade’s type system can automatically synthesize a minimal number of protect s to provably eliminate speculative leaks. We implement Blade in the Cranelift WebAssembly compiler and evaluate our approach by repairing several verified, yet vulnerable WebAssembly implementations of cryptographic primitives. We find that Blade can fix existing programs that leak via speculation automatically , without user intervention, and efficiently even when using fences to implement protect . 
    more » « less
  3. The Internet of Things (IoT) is a vast collection of interconnected sensors, devices, and services that share data and information over the Internet with the objective of leveraging multiple information sources to optimize related systems. The technologies associated with the IoT have significantly improved the quality of many existing applications by reducing costs, improving functionality, increasing access to resources, and enhancing automation. The adoption of IoT by industries has led to the next industrial revolution: Industry 4.0. The rise of the Industrial IoT (IIoT) promises to enhance factory management, process optimization, worker safety, and more. However, the rollout of the IIoT is not without significant issues, and many of these act as major barriers that prevent fully achieving the vision of Industry 4.0. One major area of concern is the security and privacy of the massive datasets that are captured and stored, which may leak information about intellectual property, trade secrets, and other competitive knowledge. As a way forward toward solving security and privacy concerns, we aim in this paper to identify common input-output (I/O) design patterns that exist in applications of the IIoT. These design patterns enable constructing an abstract model representation of data flow semantics used by such applications, and therefore better understand how to secure the information related to IIoT operations. In this paper, we describe communication protocols and identify common I/O design patterns for IIoT applications with an emphasis on data flow in edge devices, which, in the industrial control system (ICS) setting, are most often involved in process control or monitoring. 
    more » « less
  4. Cristea, Alexandra I.; Troussas, Christos (Ed.)
    Supporting novice programming learners at scale has become a necessity. Such a support generally consists of delivering automated feedback on what and why learners did incorrectly. Existing approaches cast the problem as automatically repairing learners’ incorrect programs; specifically, data-driven approaches assume there exists a correct program provided by other learner that can be extrapolated to repair an incorrect program. Unfortunately, their repair potential, i.e., their capability of providing feedback, is hindered by how they compare programs. In this paper, we propose a flexible program alignment based on program dependence graphs, which we enrich with semantic information extracted from the programs, i.e., operations and calls. Having a correct and an incorrect graphs, we exploit approximate graph alignment to find correspondences at the statement level between them. Each correspondence has a similarity attached to it that reflects the matching affinity between two statements based on topology (control and data flow information) and semantics (operations and calls). Repair suggestions are discovered based on this similarity. We evaluate our flexible approach with respect to rigid schemes over correct and incorrect programs belonging to nine real-world introductory programming assignments. We show that our flexible program alignment is feasible in practice, achieves better performance than rigid program comparisons, and is more resilient when limiting the number of available correct programs. 
    more » « less
  5. Throughout 2021, GitGuardian’s monitoring of public GitHub repositories revealed a two-fold increase in the number of secrets (database credentials, API keys, and other credentials) exposed compared to 2020, accumulating more than six million secrets. To our knowledge, the challenges developers face to avoid checked-in secrets are not yet characterized. The goal of our paper is to aid researchers and tool developers in understanding and prioritizing opportunities for future research and tool automation for mitigating checked-in secrets through an empirical investigation of challenges and solutions related to checked-in secrets. We extract 779 questions related to checkedin secrets on Stack Exchange and apply qualitative analysis to determine the challenges and the solutions posed by others for each of the challenges. We identify 27 challenges and 13 solutions. The four most common challenges, in ranked order, are: (i) store/version of secrets during deployment; (ii) store/version of secrets in source code; (iii) ignore/hide of secrets in source code; and (iv) sanitize VCS history. The three most common solutions, in ranked order, are: (i) move secrets out of source code/version control and use template config file; (ii) secret management in deployment; and (iii) use local environment variables. Our findings indicate that the same solution has been mentioned to mitigate multiple challenges. However, our findings also identify an increasing trend in questions lacking accepted solutions substantiating the need for future research and tool automation on managing secrets. 
    more » « less