skip to main content


Title: Towards a Model Checking Framework for a New Collector Framework
Garbage collectors provide memory safety, an important step toward program correctness. However, correctness of the collector itself can be challenging to establish, given both the style in which such systems are written and the weakly-ordered memory accesses of modern hardware. One way to maximize benefits is to use a framework in which effort can be focused on the correctness of small, modular critical components from which various collectors may be composed. Full proof of correctness is likely impractical, so we propose to gain a degree of confidence in collector correctness by applying model checking to critical kernels within a garbage collection framework. We further envisage a model framework, paralleling the framework nature of the collector, in hope that it will be easy to create new models for new collectors. We describe here a prototype model structure, and present results of model checking both stop-the-world and snapshot-at-the-beginning concurrent marking. We found useful regularities of model structure, and that models could be checked within possible time and space budgets on capable servers. This suggests that collectors built in a modular style might be model checked, and further that it may be worthwhile to develop a model checking framework with a domain-specific language from which to generate those models.  more » « less
Award ID(s):
1909731
PAR ID:
10431636
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
MPLR '22: Proceedings of the 19th International Conference on Managed Programming Languages and Runtimes
Page Range / eLocation ID:
128 to 139
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Popular big data frameworks, ranging from Hadoop MapReduce to Spark, rely on garbage-collected languages, such as Java and Scala. Big data applications are especially sensitive to the effectiveness of garbage collection (i.e., GC), because they usually process a large volume of data objects that lead to heavy GC overhead. Lacking in-depth understanding of GC performance has impeded performance improvement in big data applications. In this paper, we conduct the first comprehensive evaluation on three popular garbage collectors, i.e., Parallel, CMS, and G1, using four representative Spark applications. By thoroughly investigating the correlation between these big data applications’ memory usage patterns and the collectors’ GC patterns, we obtain many findings about GC inefficiencies. We further propose empirical guidelines for application developers, and insightful optimization strategies for designing big-data-friendly garbage collectors. 
    more » « less
  2. This article introduces a novel system for deriving upper bounds on the heap-space requirements of functional programs with garbage collection. The space cost model is based on a perfect garbage collector that immediately deallocates memory cells when they become unreachable. Heap-space bounds are derived using type-based automatic amortized resource analysis (AARA), a template-based technique that efficiently reduces bound inference to linear programming. The first technical contribution of the work is a new operational cost semantics that models a perfect garbage collector. The second technical contribution is an extension of AARA to take into account automatic deallocation. A key observation is that deallocation of a perfect collector can be modeled with destructive pattern matching if data structures are used in a linear way. However, the analysis uses destructive pattern matching to accurately model deallocation even if data is shared. The soundness of the extended AARA with respect to the new cost semantics is proven in two parts via an intermediate linear cost semantics. The analysis and the cost semantics have been implemented as an extension to Resource Aware ML (RaML). An experimental evaluation shows that the system is able to derive tight symbolic heap-space bounds for common algorithms. Often the bounds are asymptotic improvements over bounds that RaML derives without taking into account garbage collection. 
    more » « less
  3. Many of the everyday decisions a user makes rely on the suggestions of online recommendation systems. These systems amass implicit (e.g., location, purchase history, browsing history) and explicit (e.g., reviews, ratings) feedback from multiple users, produce a general consensus, and provide suggestions based on that consensus. However, due to privacy concerns, users are uncomfortable with implicit data collection, thus requiring recommendation systems to be overly dependent on explicit feedback. Unfortunately, users do not frequently provide explicit feedback. This hampers the ability of recommendation systems to provide high-quality suggestions. We introduce Heimdall, the first privacy-respecting implicit preference collection framework that enables recommendation systems to extract user preferences from their activities in a privacy respect- ing manner. The key insight is to enable recommendation systems to run a collector on a user’s device and precisely control the information a collector transmits to the recommendation system back- end. Heimdall introduces immutable blobs as a mechanism to guarantee this property. We implemented Heimdall on the Android plat- form and wrote three example collectors to enhance recommendation systems with implicit feedback. Our performance results suggest that the overhead of immutable blobs is minimal, and a user study of 166 participants indicates that privacy concerns are significantly less when collectors record only specific information—a property that Heimdall enables. 
    more » « less
  4. Model checking has often been used for verifying Cyber-Physical Systems (CPS). A major challenge is how to capture a model that represents the actual behavior of the software. Model extraction can introduce errors that can affect the accuracy of the analysis including loss of precision, inconsistency, non-conformance, and over- and under-approximations.In this paper, we formalize and prove the correctness of extracting a model from a subset of the MicroPython programming language with respect to a trace-based semantics. The extracted models capture the order of method calls and can be model checked using Shelley. We formalize the extraction process from an intermediate representation of MicroPython codes and prove that the behavior of our intermediate representation is a regular language. Our formalization and theoretical results are fully mechanized using the Coq proof assistant. 
    more » « less
  5. Memory safety invariants extracted from a program can help defend and detect against both software and hardware memory violations. For instance, by allowing only specific instructions to access certain memory locations, system can detect out-of-bound or illegal pointer dereferences that lead to correctness and security issues. In this paper, we propose CPU abstractions, called, to specify and check program invariants to provide defense mechanism against both software and hardware memory violations at runtime. ensures that the invariants must be satisfied at every memory accesses. We present a fast invariant address translation and retrieval scheme using a specialized cache. It stores and checks invariants related to global, stack and heap objects. The invariant checks can be performed synchronously or asynchronously. uses synchronous checking for high security-critical programs, while others are protected by asynchronous checking. A fast exception is proposed to alert any violations as soon as possible in order to close the gap for transient attacks. Our evaluation shows that can detect both software and hardware, spatial and temporal memory violations. incurs 53% overhead when checking synchronously, or 15% overhead when checking asynchronously. 
    more » « less