Due to the ever-increasing complexity of in- come tax laws in the United States, the num- ber of US taxpayers filing their taxes using tax preparation software (henceforth, tax soft- ware) continues to increase. According to the U.S. Internal Revenue Service (IRS), in FY22, nearly 50% of taxpayers filed their individual income taxes using tax software. Given the legal consequences of incorrectly filing taxes for the taxpayer, ensuring the correctness of tax software is of paramount importance. Meta- morphic testing has emerged as a leading solu- tion to test and debug legal-critical tax software due to the absence of correctness requirements and trustworthy datasets. The key idea behind metamorphic testing is to express the proper- ties of a system in terms of the relationship between one input and its slightly metamor- phosed twinned input. Extracting metamor- phic properties from IRS tax publications is a tedious and time-consuming process. As a response, this paper formulates the task of gen- erating metamorphic specifications as a transla- tion task between properties extracted from tax documents—expressed in natural language—to a contrastive first-order logic form. We per- form a systematic analysis on the potential and limitations of in-context learning with Large Language Models (LLMs) for this task, and outline a research agenda towards automating the generation of metamorphic specifications for tax preparation software.
more »
« less
Technical Challenges in Maintaining Tax Prep Software with Large Language Models
As the US tax law evolves to adapt to ever-changing politico-economic realities, tax preparation software plays a significant role in helping taxpayers navigate these complexities. The dynamic nature of tax regulations poses a significant challenge to accurately and timely maintaining tax software artifacts. The state-of-the-art in maintaining tax prep software is time-consuming and error-prone as it involves manual code analysis combined with an expert interpretation of tax law amendments. We posit that the rigor and formality of tax amendment language, as expressed in IRS publications, makes it amenable to automatic translation to executable specifications (code). Our research efforts focus on identifying, understanding, and tackling technical challenges in leveraging Large Language Models (LLMs), such as ChatGPT and Llama, to faithfully extract code differentials from IRS publications and automatically integrate them with the prior version of the code to automate tax prep software maintenance.
more »
« less
- Award ID(s):
- 2317207
- PAR ID:
- 10547716
- Editor(s):
- McClelland, Robert; Johnson, Barry
- Publisher / Repository:
- IRS-TPC
- Date Published:
- Format(s):
- Medium: X
- Location:
- Washington, D.C.
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Static Analysis (SA) in Cybersecurity is a practice aimed at detecting vulnerabilities within the source code of a program. Modern SA applications, though highly sophisticated, lack programming language agnostic generalization, instead requiring codebase specific implementations for each programming language. The manner in which SA is implemented today, though functional, requires significant man hours to develop and maintain, higher costs due to custom applications for each language, and creates inconsistencies in implementation from SA-tool to SA-tool. One promising source of programming language generalization occurs within the compilers used to compile code for programming languages like C, C++, and Java. During the compilation process, source code of varying languages moves through several validation passes before being converted into a grammatically consistent Intermediate Representation (IR). The grammatical consistencies provided by IRs allow the same program derived from different programming languages to be represented uniformly and thus analyzed for vulnerabilities. By using IRs of compiled programming languages as the codebase of SA practices, multiple programming languages can be encompassed by a single SA tool. To begin understanding the possibilities the combination of SA and IRs may reveal, this research presents the following outcomes: 1) a systematic literature search, 2) a literature review, and 3) the classification of existing work pertaining to SA practices using IRs. The results of the study indicate that generalized Static Analysis using IRs is already a common practice in all compilers, but that the extended use of IRs in Cybersecurity SA practices aimed at finding vulnerabilities in source code remains underdeveloped.more » « less
-
TypeScript is a widely used optionally-typed language where developers can adopt “pay as you go” typing: they can add types as desired, and benefit from static typing. The “type annotation tax” or manual effort required to annotate new or existing TypeScript can be reduced by a variety of automatic methods. Probabilistic machine-learning (ML) approaches work quite well. ML approaches use different inductive biases, ranging from simple token sequences to complex graphical neural network (GNN) models capturing syntax and semantic relations. More sophisticated inductive biases are hand-engineered to exploit the formal nature of software. Rather than deploying fancy inductive biases for code, can we just use “big data” to learn natural patterns relevant to typing? We find evidence suggesting that this is the case. We present TypeBert, demonstrating that even with simple token-sequence inductive bias used in BERT-style models and enough data, type-annotation performance of the most sophisticated models can be surpassed.more » « less
-
Free and/or open-source software (or F/OSS) projects now play a major and dominant role in society, constituting critical digital infrastructure relied upon by companies, academics, non-profits, activists, and more. As F/OSS has become larger and more established, we investigate the labor of maintaining and sustaining those projects at various scales. We report findings from an interview-based study with contributors and maintainers working in a wide range of F/OSS projects. Maintainers of F/OSS projects do not just maintain software code in a more traditional software engineering understanding of the term: fixing bugs, patching security vulnerabilities, and updating dependencies. F/OSS maintainers also perform complex and often-invisible interpersonal and organizational work to keep their projects operating as active communities of users and contributors. We particularly focus on how this labor of maintaining and sustaining changes as projects and their software grow and scale across many dimensions. In understanding F/OSS to be as much about maintaining a communal project as it is maintaining software code, we discuss broadly applicable considerations for peer production communities and other socio-technical systems more broadly.more » « less
An official website of the United States government

