skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: On the Potential and Limitations of Few-Shot In-Context Learning to Generate Metamorphic Specifications for Tax Preparation Software
Due to the ever-increasing complexity of in- come tax laws in the United States, the num- ber of US taxpayers filing their taxes using tax preparation software (henceforth, tax soft- ware) continues to increase. According to the U.S. Internal Revenue Service (IRS), in FY22, nearly 50% of taxpayers filed their individual income taxes using tax software. Given the legal consequences of incorrectly filing taxes for the taxpayer, ensuring the correctness of tax software is of paramount importance. Meta- morphic testing has emerged as a leading solu- tion to test and debug legal-critical tax software due to the absence of correctness requirements and trustworthy datasets. The key idea behind metamorphic testing is to express the proper- ties of a system in terms of the relationship between one input and its slightly metamor- phosed twinned input. Extracting metamor- phic properties from IRS tax publications is a tedious and time-consuming process. As a response, this paper formulates the task of gen- erating metamorphic specifications as a transla- tion task between properties extracted from tax documents—expressed in natural language—to a contrastive first-order logic form. We per- form a systematic analysis on the potential and limitations of in-context learning with Large Language Models (LLMs) for this task, and outline a research agenda towards automating the generation of metamorphic specifications for tax preparation software.  more » « less
Award ID(s):
2317207
PAR ID:
10547718
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
Association for Computational Linguistics
Date Published:
Page Range / eLocation ID:
230 to 243
Format(s):
Medium: X
Location:
Singapore
Sponsoring Org:
National Science Foundation
More Like this
  1. McClelland, Robert; Johnson, Barry (Ed.)
    As the US tax law evolves to adapt to ever-changing politico-economic realities, tax preparation software plays a significant role in helping taxpayers navigate these complexities. The dynamic nature of tax regulations poses a significant challenge to accurately and timely maintaining tax software artifacts. The state-of-the-art in maintaining tax prep software is time-consuming and error-prone as it involves manual code analysis combined with an expert interpretation of tax law amendments. We posit that the rigor and formality of tax amendment language, as expressed in IRS publications, makes it amenable to automatic translation to executable specifications (code). Our research efforts focus on identifying, understanding, and tackling technical challenges in leveraging Large Language Models (LLMs), such as ChatGPT and Llama, to faithfully extract code differentials from IRS publications and automatically integrate them with the prior version of the code to automate tax prep software maintenance. 
    more » « less
  2. Ensuring the correctness of scientific software is challenging due to the need to represent and model complex phenomenon in a discrete form. Many dynamic approaches for correctness have been developed for numerical overflow or imprecision, which may manifest as program crashes or hangs. Less effort has been spent on functional correctness, where one of the most widely proposed technique is metamorphic testing. Metamorphic testing often requires deep domain expertise to design meaningful relations. In this vision paper we ask if we can utilize the process of abstraction and refinement, a traditionally formal approach, to guide the development of metamorphic relations. We have built an iterative approach we call Model Assisted Refinements. It starts with domain-agnostic relations and a set of input-output relations created via a dynamic analysis. We then use a model checker to identify missing input/output patterns and potential passing and failing relations. We augment our dynamic analysis, and obtain domain expertise to verify and refine our relations. At the end we have a set of domain-specific metamorphic relations and test cases. We demonstrate our approach on a high-performance chemistry library. Within three refinements we discover several domain specific relations, and increase our behavioral coverage. 
    more » « less
  3. Metamorphic testing is an advanced technique to test programs without a true test oracle such as machine learning applications. Because these programs have no general oracle to identify their correctness, traditional testing techniques such as unit testing may not be helpful for developers to detect potential bugs. This paper presents a novel system, KABU, which can dynamically infer properties of methods' states in programs that describe the characteristics of a method before and after transforming its input. These Metamorphic Properties (MPs) are pivotal to detecting potential bugs in programs without test oracles, but most previous work relies solely on human effort to identify them and only considers MPs between input parameters and output result (return value) of a program or method. This paper also proposes a testing concept, Metamorphic Differential Testing (MDT). By detecting different sets of MPs between different versions for the same method, KABU reports potential bugs for human review. We have performed a preliminary evaluation of KABU by comparing the MPs detected by humans with the MPs detected by KABU. Our preliminary results are promising: KABU can find more MPs than human developers, and MDT is effective at detecting function changes in methods. 
    more » « less
  4. Testing scientific software is a difficult task due to their inherent complexity and the lack of test oracles. In addition, these software systems are usually developed by end user developers who are neither normally trained as professional software developers nor testers. These factors often lead to inadequate testing. Metamorphic testing is a simple yet effective testing technique for testing such applications. Even though MT is a well-known technique in the software testing community, it is not very well utilized by the scientific software developers. The objective of this article is to present MT as an effective technique for testing scientific software. To this end, we discuss why MT is an appropriate testing technique for scientists and engineers who are not primarily trained as software developers. Especially, how it can be used to conduct systematic and effective testing on programs that do not have test oracles without requiring additional testing tools. 
    more » « less
  5. The ability to speak and understand a host country’s primary language is strongly associated with measures of immigrant integration. We estimate the causal effects of English language training for adult immigrants on participants’ civic and economic outcomes using randomized enrollment lotteries from a public adult education program in Massachusetts. Participation doubles voter participation and increases annual earnings by $2,400 (56 percent). Increased tax revenue from earnings gains cover program costs over time, generating a 6 percent return for taxpayers. Ours is the first randomized evaluation of adult English language training as a standalone intervention in the United States. (JEL D72, H75, I21, I26, J15, J24, J31) 
    more » « less