skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Award ID contains: 2107010

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Calling context is crucial for improving the precision of program analyses in various use cases (clients), such as profiling, debugging, optimization, and security checking. Often the calling context is encoded using a numerical value. We have observed that many clients benefit not only from a deterministic but also globally distinguishable value across runs to simplify bookkeeping and guarantee complete uniqueness. However, existing work only guarantees determinism, not global distinguishability. Clients need to develop auxiliary helpers, which incurs considerable overhead to distinguish encoded values among all calling contexts. In this paper, we propose Deterministic Distinguishable Calling Context Encoding () that can enable both properties of calling context encoding natively. The key idea of is leveraging the static call graph and encoding each calling context as the running call path count. Thereby, a mapping is established statically and can be readily used by the clients. Our experiments with two client tools show that has a comparable overhead compared to two state-of-the-art encoding schemes, PCCE and PCC, and further avoids the expensive overheads of collision detection, up to 2.1× and 50%, for Splash-3 and SPEC CPU 2017, respectively. 
    more » « less
    Free, publicly-accessible full text available February 25, 2026
  2. Apache Spark arguably is the most prominent Big Data processing framework tackling the scalability challenge of a wide variety of modern workloads. A key to its success is caching critical data in memory, thereby eliminating wasteful computations of regenerating intermediate results. While critical to performance, caching is not automated. Instead, developers have to manually handle such a data management task via APIs, a process that is error-prone and labor-intensive, yet may still yield sub-optimal performance due to execution complexities. Existing optimizations rely on expensive profiling steps and/or application-specific cost models to enable a postmortem analysis and a manual modification to existing applications. This paper presents CACHEIT, built to take the guesswork off the users while running applications as-is. CACHEIT analyzes the program’s workflow, extracting important features such as dependencies and access patterns, using them as an oracle to detect high-value data candidates and guide the caching decisions at run time. CACHEIT liberates users from low-level memory management requirements, allowing them to focus on the business logic instead. CACHEIT is application-agnostic and requires no profiling or a cost model. A thorough evaluation with a broad range of Spark applications on real-world datasets shows that CACHEIT is effective in maintaining satisfactory performance, incurring only marginal slowdown compared to the manually well-tuned counterparts 
    more » « less
    Free, publicly-accessible full text available December 15, 2025
  3. Thanks to recent advances in high-bandwidth, low-latency interconnects, running a data-intensive application with a remote memory pool is now a feasibility. When developing a data-intensive application, a managed language such as Java is often the developer’s choice due to convenience of the runtime such as automatic memory management. However, the memory management cost increases significantly in far memory due to remote memory accesses. Our insight is that data hotness (i.e., access frequency of objects) is the key to reducing the memory management cost and improving efficiency in far memory. In this paper, we present an ongoing work designing Polar, an enhanced runtime system that is hotness-aware, and optimized for far memory. In Polar, the garbage collector is augmented to identify cold (infrequently accessed) objects and relocate them to remote memory pools. By placing objects at memory locations based on their access frequency, Polar minimizes the number of remote accesses, ensures low access latency for the application, and thus improves overall performance. 
    more » « less