This paper answers a long-standing open question in tile-assembly theory, namely that it is possible to strictly assemble discrete self-similar fractals (DSSFs) in the abstract Tile-Assembly Model (aTAM). We prove this in 2 separate ways, each taking advantage of a novel set of tools. One of our constructions shows that specializing the notion of a quine, a program which prints its own output, to the language of tile-assembly naturally induces a fractal structure. The other construction introduces self-describing circuits as a means to abstractly represent the information flow through a tile-assembly construction and shows that such circuits may be constructed for a relative of the Sierpinski carpet, and indeed many other DSSFs, through a process of fixed-point iteration. This later result, or more specifically the machinery used in its construction, further enable us to provide a polynomial time procedure for deciding whether any given subset of ℤ2 will generate an aTAM producible DSSF. To this end, we also introduce the Tree Pump Theorem, a result analogous to the important Window Movie Lemma, but with requirements on the set of productions rather than on the self-assembling system itself.
more »
« less
This content will become publicly available on October 9, 2026
Gauguin, Descartes, Bayes: A Diurnal Golem’s Brain
A "quine" is a deterministic program that prints itself. In this essay, I will show you a "gauguine": a probabilistic program that infers itself. A gauguine is repeatedly asked to guess its own source code. Initially, its chances of guessing correctly are of course minuscule. But as the gauguine observes more and more of its own previous guesses, it detects patterns of behavior and gains information about its inner workings. This information allows it to bootstrap self-knowledge, and ultimately discover its own source code. We will discuss how—and why—we might write a gauguine, and what we stand to learn by constructing one.
more »
« less
- Award ID(s):
- 2328543
- PAR ID:
- 10647184
- Publisher / Repository:
- ACM
- Date Published:
- Page Range / eLocation ID:
- 204 to 212
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Feature-rich software programs typically provide many configuration options for users to enable and disable features, or tune feature behaviors. Given the values of configuration options, certain code blocks in a program will become redundant code and never be used. However, the redundant code is still present in the program and thus unnecessarily increases a program's attack surface by allowing attackers to use it as return-oriented programming (ROP) gadgets. Existing code debloating techniques have several limitations: not targeting this type of redundant code, requiring access to program source code or user-provided test inputs. In this paper, we propose a practical code debloating approach, called BinDebloat, to address these limitations. BinDebloat identifies and removes redundant code caused by configuration option values. It does not require user-provided test inputs, or support from program developers, and is designed to work on closed-source programs. It uses static program analysis to identify code blocks that are control-dependent on configuration option values. Given a set of configuration option values, it automatically determines which of such code blocks become redundant and uses static binary rewriting to neutralize these code blocks so that they are removed from the attack surface. We evaluated BinDebloat on closed-source Windows programs and the results show that BinDebloat can effectively reduce a program's attack surface.more » « less
-
Code summarization is the task of creating short, natural language descriptions of source code. It is an important part of code comprehension and a powerful method of documentation. Previous work has made progress in identifying where programmers focus in code as they write their own summaries (i.e., Writing). However, there is currently a gap in studying programmers’ attention as they read code with pre-written summaries (i.e., Reading). As a result, it is currently unknown how these two forms of code comprehension compare: Reading and Writing. Also, there is a limited understanding of programmer attention with respect to program semantics. We address these shortcomings with a human eye-tracking study (n= 27) comparing Reading and Writing. We examined programmers’ attention with respect to fine-grained program semantics, including their attention sequences (i.e., scan paths). We find distinctions in programmer attention across the comprehension tasks, similarities in reading patterns between them, and differences mediated by demographic factors. This can help guide code comprehension in both computer science education and automated code summarization. Furthermore, we mapped programmers’ gaze data onto the Abstract Syntax Tree to explore another representation of human attention. We find that visual behavior on this structure is not always consistent with that on source code.more » « less
-
The increasing complexity of modern programs motivates software engineers to often rely on the support of third-party libraries. Although this practice allows application developers to achieve a compelling time-to-market, it often makes the final product bloated with conspicuous chunks of unused code. Other than making a program unnecessarily large, this dormant code could be leveraged by willful attackers to harm users. As a consequence, several techniques have been recently proposed to perform program debloating and remove (or secure) dead code from applications. However, state-of-the-art approaches are either based on unsound strategies, thus producing unreliable results, or pose too strict assumptions on the program itself. In this work, we propose a novel abstract domain, called Signedness-Agnostic Strided Interval, which we use as the cornerstone to design a novel and sound static technique, based on abstract interpretation, to reliably perform program debloating. Throughout the paper, we detail the specifics of our approach and show its effectiveness and usefulness by implementing it in a tool, called BinTrimmer, to perform static program debloating on binaries. Our evaluation shows that BinTrimmer can remove up to 65.6% of a library’s code and that our domain is, on average, 98% more precise than the related work.more » « less
-
There has been a proliferation of mobile apps in the Medical, as well as Health&Fitness categories. These apps have a wide audience, from medical providers, to patients, to end users who want to track their fitness goals. The low barrier to entry on mobile app stores raises questions about the diligence and competence of the developers who publish these apps, especially regarding the practices they use for user data collection, processing, and storage. To help understand the nature of data that is collected, and how it is processed, as well as where it is sent, we developed a tool named PIT (Personal Information Tracker) and made it available as open source. We used PIT to perform a multi-faceted study on 2832 Android apps: 2211 Medical apps and 621 Health&Fitness apps. We first define Personal Information (PI) as 17 different groups of sensitive information, e.g., user’s identity, address and financial information, medical history or anthropometric data. PIT first extracts the elements in the app’s User Interface (UI) where this information is collected. The collected information could be processed by the app’s own code or third-party code; our approach disambiguates between the two. Next, PIT tracks, via static analysis, where the information is “leaked”, i.e., it escapes the scope of the app, either locally on the phone or remotely via the network. Then, we conduct a link analysis that examines the URLs an app connects with, to understand the origin and destination of data that apps collect and process. We found that most apps leak 1–5 PI items (email, credit card, phone number, address, name, being the most frequent). Leak destinations include the network (25%), local databases (37%), logs (23%), and files or I/O (15%). While Medical apps have more leaks overall, as they collect data on medical history, surprisingly, Health&Fitness apps also collect, and leak, medical data. We also found that leaks that are due to third-party code (e.g., code for ads, analytics, or user engagement) are much more numerous (2x–12x) than leaks due to app’s own code. Finally, our link analysis shows that most apps access 20–80 URLs (typically third-party URLs and Cloud APIs) though some apps could access more than 1,000 URLs.more » « less
An official website of the United States government
