skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Running Language Interpreters Inside SGX: A Lightweight,Legacy-Compatible Script Code Hardening Approach
Recent advances in trusted execution environments, specifically with Intel's introduction of SGX on consumer processors, have provided unprecedented opportunities to create secure applications with a small TCB. While a large number of SGX solutions have been proposed, nearly all of them focus on protecting native code applications, leaving scripting languages unprotected. To fill this gap, this paper presents SCRIPTSHIELD, a framework capable of running legacy script code while simultaneously providing confidentiality and integrity for scripting code and data. In contrast to the existing schemes that either require tedious and time-consuming re-development or result in a large TCB by importing an entire library OS or container, SCRIPTSHIELD keeps the TCB small and provides backwards compatibility (i.e., no changes needed to the scripting code itself). The core idea is to customize the script interpreter to run inside an SGX enclave and pass scripts to it. We have implemented SCRIPTSHIELD and tested with three popular scripting languages: Lua, JavaScript, and Squirrel. Our experimental results show that SCRIPTSHIELD does not cause noticeable overhead. The source code of SCRIPTSHIELD has been made publicly available as an open source project.  more » « less
Award ID(s):
1750809 1718084
PAR ID:
10134889
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
Proceedings of the 2019 ACM Asia Conference on Computer and Communications Security
Page Range / eLocation ID:
114 to 121
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. With close to native performance, Linux containers are becoming the de facto platform for cloud computing. While various solutions have been proposed to secure applications and containers in the cloud environment by leveraging Intel SGX, most cloud operators do not yet offer SGX as a service. This is likely due to a number of security, scalability, and usability concerns coming from both cloud providers and users. Cloud operators worry about the security guarantees of unofficial SDKs, limited support for remote attestation within containers, limited physical memory for the Enclave Page Cache (EPC) making it difficult to support hundreds of enclaves, and potential DoS attacks against EPC by malicious users. Meanwhile, end users need to worry about careful program partitioning to reduce the TCB and adapting legacy applications to use SGX. We note that most of these concerns are the result of an incomplete infrastructure, from the OS to the application layer. We address these concerns with lxcsgx, which allows SGX applications to run inside containers while also: enabling SGX remote attestation for containerized applications, enforcing EPC memory usage control on a per-container basis, providing a general software TPM using SGX to augment legacy applications, and supporting partitioning with a GCC plugin. We then retrofit Nginx/OpenSSL and Memcached using the software TPM and SGX partitioning to defend against known and potential attacks. Thanks to the small EPC footprint of each enclave, we are able to run up to 100 containerized Memcached instances without EPC swapping. Our evaluation shows the overhead introduced by lxcsgx is less than 6.9% for simple SGX applications, 9.5% for Nginx/OpenSSL, and 20.9% for containerized Memcached. 
    more » « less
  2. null (Ed.)
    Summary The need for efficient tools and applications for analyzing genomic diversity is essential for any genetics research program. One such tool, TASSEL (Trait Analysis by aSSociation, Evolution and Linkage), provides many core methods for genomic analyses. Despite its efficiency, TASSEL has limited means to use scripting languages for reproducible research and interacting with other analytical tools. Here we present an R package rTASSEL, a front-end to connect to a variety of highly used TASSEL methods and analytical tools. The goal of this package is to create a unified scripting workflow that exploits the analytical prowess of TASSEL in conjunction with R’s popular data handling and parsing capabilities without ever having the user to switch between these two environments. By implementing this workflow, we can achieve performances ranging from approximately 2 to 20 times faster than other widely used R packages for various functionalities. Availability and implementation rTASSEL is implemented in R using core TASSEL methods written in Java. The source code for rTASSEL can be found at https://bitbucket.org/bucklerlab/rtassel/src/master/. The source code for TASSEL can be found at https://bitbucket.org/tasseladmin/tassel-5-source/src/master/. 
    more » « less
  3. The wide accessibility of social media has provided linguistically under-represented communities with an extraordinary opportunity to create content in their native languages. This, however, comes with certain challenges in script normalization, particularly where the speakers of a language in a bilingual community rely on another script or orthography to write their native language. This paper addresses the problem of script normalization for several such languages that are mainly written in a Perso-Arabic script. Using synthetic data with various levels of noise and a transformer-based model, we demonstrate that the problem can be effectively remediated. We conduct a small-scale evaluation of real data as well. Our experiments indicate that script normalization is also beneficial to improve the performance of downstream tasks such as machine translation and language identification. 
    more » « less
  4. Over the past few years, Large Language Models of Code (Code LLMs) have started to have a significant impact on programming practice. Code LLMs are also emerging as building blocks for research in programming languages and software engineering. However, the quality of code produced by a Code LLM varies significantly by programming language. Code LLMs produce impressive results on high-resource programming languages that are well represented in their training data (e.g., Java, Python, or JavaScript), but struggle with low-resource languages that have limited training data available (e.g., OCaml, Racket, and several others). This paper presents an effective approach for boosting the performance of Code LLMs on low-resource languages using semi-synthetic data. Our approach, called MultiPL-T, generates high-quality datasets for low-resource languages, which can then be used to fine-tune any pretrained Code LLM. MultiPL-T translates training data from high-resource languages into training data for low-resource languages in the following way. 1) We use a Code LLM to synthesize unit tests for commented code from a high-resource source language, filtering out faulty tests and code with low test coverage. 2) We use a Code LLM to translate the code from the high-resource source language to a target low-resource language. This gives us a corpus of candidate training data in the target language, but many of these translations are wrong. 3) We use a lightweight compiler to compile the test cases generated in (1) from the source language to the target language, which allows us to filter our obviously wrong translations. The result is a training corpus in the target low-resource language where all items have been validated with test cases. We apply this approach to generate tens of thousands of new, validated training items for five low-resource languages: Julia, Lua, OCaml, R, and Racket, using Python as the source high-resource language. Furthermore, we use an open Code LLM (StarCoderBase) with open training data (The Stack), which allows us to decontaminate benchmarks, train models without violating licenses, and run experiments that could not otherwise be done. Using datasets generated with MultiPL-T, we present fine-tuned versions of StarCoderBase and Code Llama for Julia, Lua, OCaml, R, and Racket that outperform other fine-tunes of these base models on the natural language to code task. We also present Racket fine-tunes for two very recent models, DeepSeek Coder and StarCoder2, to show that MultiPL-T continues to outperform other fine-tuning approaches for low-resource languages. The MultiPL-T approach is easy to apply to new languages, and is significantly more efficient and effective than alternatives such as training longer. 
    more » « less
  5. Confidential computing aims to secure the code and data in use by providing a Trusted Execution Environment (TEE) for applications using hardware features such as Intel SGX.Timing and cache side-channel attacks, however, are often outside the scope of the threat model, although once exploited they are able to break all the default security guarantees enforced by hardware. Unfortunately, tools detecting potential side-channel vulnerabilities within applications are limited and usually ignore the strong attack model and the unique programming model imposed by Intel SGX. This paper proposes a precise side-channel analysis tool, ENCIDER, detecting both timing and cache side-channel vulnerabilities within SGX applications via inferring potential timing observation points and incorporating the SGX programming model into analysis. ENCIDER uses dynamic symbolic execution to decompose the side-channel requirement based on the bounded non-interference property and implements byte-level information flow tracking via API modeling. We have applied ENCIDER to 4 real-world SGX applications, 2 SGX crypto libraries, and 3 widely-used crypto libraries, and found 29 timing side channels and 73 code and data cache side channels. We have reported our findings to the corresponding parties, e.g., Intel and ARM, who have confirmed most of the vulnerabilities detected. 
    more » « less