skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: RESIN: A Holistic Service for Dealing with Memory Leaks in Production Cloud Infrastructure
Memory leak is a notorious issue. Despite the extensive efforts, addressing memory leaks in large production cloud systems remains challenging. Existing solutions incur high overhead and/or suffer from high inaccuracies. This paper presents RESIN, a solution designed to holistically address memory leaks in production cloud infrastructure. RESIN takes a divide-and-conquer approach to tackle the challenges. It performs a low-overhead detection first with a robust bucketization-based pivot scheme to identify suspicious leaking entities. It then takes live heap snapshots at appropriate time points in carefully sampled leak entities. RESIN analyzes the collected snapshots for leak diagnosis. Finally, RESIN automatically mitigates detected leaks. RESIN has been running in production in Microsoft Azure for 3 years. It reports on average 24 leak tickets each month with high accuracy and low overhead, and provides effective diagnosis reports. Its results translate into a 41× reduction of VM reboots caused by low memory.  more » « less
Award ID(s):
1942794
PAR ID:
10343367
Author(s) / Creator(s):
; ; ; ; ; ; ; ;
Date Published:
Journal Name:
16th USENIX Symposium on Operating Systems Design and Implementation
Page Range / eLocation ID:
109-125
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Resource leaks are a common and elusive source of bugs that can result in crashes and security vulnerabilities. The most effective technique to identify such leaks during development is static analysis. However, empirical studies show that in addition to leak warnings, developers often need help in the form of automated fix suggestions to correctly repair such leaks. The only existing tool that can suggest resource-leak fixes is the general-purpose tool Footpatch. Footpatch, however, performs poorly at this task; it generates fixes for only 6% of the leaks, out of which only 27% are correct. In this paper, we introduce RLFixer, a specialized repair tool that generates high-quality fixes for resource leaks identified by any resource-leak detector. A major challenge for RLFixer is that the most general version of the resource-leak repair problem is at least as hard as compile-time object deallocation, a well-known hard problem for compilers. RLFixer tackles this issue by separating the resource-leaks that are infeasible for a compile-time tool to fix from those that are feasible to fix. RLFixer achieves this separation by using a new data-flow analysis of resource objects to classify how they escape the context of their methods. The same analysis also enables RLFixer to generate correct repairs for the feasible-to-fix leaks. RLFixer is demand-driven and hence only analyzes statements relevant to the leak, thereby keeping overhead low. We evaluated RLFixer by applying it to warnings generated by five popular Java resource-leak detectors. We show that, on average, RLFixer generates repairs for 66% of their warnings, out of which 95% are correct. It has an average repair time of 14 seconds 
    more » « less
  2. Memory leaks in web applications are pervasive and difficult to debug. Leaks degrade responsiveness by increasing garbage collection costs and can even lead to browser tab crashes. Previous leak detection approaches designed for conventional applications are ineffective in the browser environment. Tracking down leaks currently requires intensive manual effort by web developers, which is often unsuccessful. This paper introduces BLEAK (Browser Leak debugger), the first system for automatically debugging memory leaks in web applications. BLEAK'S algorithms leverage the observation that in modern web applications, users often repeatedly return to the same (approximate) visual state (e.g., the inbox view in Gmail). Sustained growth between round trips is a strong indicator of a memory leak. To use BLEAK, a developer writes a short script (17-73 LOC on our benchmarks) to drive a web application in round trips to the same visual state. BLEAK then automatically generates a list of leaks found along with their root causes, ranked by return on investment. Guided by BLEAK, we identify and fix over 50 memory leaks in popular libraries and apps including Airbnb, AngularJS, Google Analytics, Google Maps SDK, and jQuery. BLEAK'S median precision is 100%; fixing the leaks it identifies reduces heap growth by an average of 94%, saving from 0.5MB to 8MB per round trip. 
    more » « less
  3. Applications in the cloud are vulnerable to several attack scenarios. In one possibility, an untrusted cloud operator can examine addresses on the memory bus and use this information leak to violate privacy guarantees, even if data is encrypted. The Oblivious RAM (ORAM) construct was introduced to eliminate such information leak and these frameworks have seen many innovations in recent years. In spite of these innovations, the overhead associated with ORAM is very significant. This paper takes a step forward in reducing ORAM memory bandwidth overheads. We make the case that, similar to a cache hierarchy, a lightweight ORAM that fronts the full-fledged ORAM provides a boost in efficiency. The lightweight ORAM has a smaller capacity and smaller depth, and it can relax some of the many constraints imposed on the full-fledged ORAM. This yields a 2-level hierarchy with a relaxed ORAM and a full ORAM. The relaxed ORAM adopts design parameters that are optimized for efficiency and not capacity. We introduce a novel metadata management technique to further reduce the bandwidth for relaxed ORAM access. Relaxed ORAM accesses preserve the indistinguishability property and are equipped with an integrity verification system. Finally, to eliminate information leakage through LLC and relaxed ORAM hit rates, we introduce a deterministic memory scheduling policy. On a suite of memory-intensive applications, we show that the best Relaxed Hierarchical ORAM (ρ) model yields a performance improvement of 50%, relative to a Freecursive ORAM baseline. 
    more » « less
  4. x (Ed.)
    DNA strand displacement (DSD) emerged as a prominent reaction motif for engineering nucleic acid-based computational devices with programmable behaviours. However, strand displacement circuits are susceptible to background noise, known as leaks, which disrupt their intended function. The ill effects of leaks are particularly severe in circuits with complex dynamics, as leaks in them amplify nonlinearly, resulting in rapid circuit degradation. Shadow cancellation is a dynamic leak-elimination strategy originally proposed to control the leak growth in such circuits. However, the kinetic restrictions of the method incur a significant design overhead, making it less accessible. In this work, we use domain-level DSD simulations to examine the method’s capabilities, the inner workings of its components and, most importantly, its robustness to the practical deviations in its design requirements. First, we show that the method could stabilize the dynamics of several catalytic and autocatalytic dynamical systems heavily affected by leaks. Then, through several probing experiments, we show that its design restrictions could be significantly relaxed without impacting the circuit function by simply adjusting the circuit parameters. Finally, we discuss several ideas to tackle the practical challenges in applying the method to arbitrary DSD circuits, paving the way for future experimental work. 
    more » « less
  5. Pulmonary air leak is the most common complication of lung surgery, with air leaks that persist longer than 5 days representing a major source of post-surgery morbidity. Clinical management of air leaks is challenging due to limited methods to precisely locate and assess leaks. Here, we present a sound-guided methodology that enables rapid quantitative assessment and precise localization of air leaks by analyzing the distinct sounds generated as the air escapes through defective lung tissue. Air leaks often present after lung surgery due to loss of tissue integrity at or near a staple line. Accordingly, we investigated air leak sounds from a focal pleural defect in a rat model and from a staple line failure in a clinically relevant swine model to demonstrate the high sensitivity and translational potential of this approach. In rat and swine models of free-flowing air leak under positive pressure ventilation with intrapleural microphone 1 cm from the lung surface, we identified that: (a) pulmonary air leaks generate sounds that contain distinct harmonic series, (b) acoustic characteristics of air leak sounds can be used to classify leak severity, and (c) precise location of the air leak can be determined with high resolution (within 1 cm) by mapping the sound loudness level across the lung surface. Our findings suggest that sound-guided assessment and localization of pulmonary air leaks could serve as a diagnostic tool to inform air leak detection and treatment strategies during video-assisted thoracoscopic surgery (VATS) or thoracotomy procedures. 
    more » « less