skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Retrieving Data Constraint Implementations Using Fine-Grained Code Patterns
Business rules are an important part of the requirements of software systems that are meant to support an organization. These rules describe the operations, definitions, and constraints that apply to the organization. Within the software system, business rules are often translated into constraints on the values that are required or allowed for data, called data constraints. Business rules are subject to frequent changes, which in turn require changes to the corresponding data constraints in the software. The ability to efficiently and precisely identify where data constraints are implemented in the source code is essential for performing such necessary changes.</p> In this paper, we introduce Lasso, the first technique that automatically retrieves the method and line of code where a given data constraint is enforced. Lasso is based on traceability link recovery approaches and leverages results from recent research that identified line-of-code level implementation patterns for data constraints. We implement three versions of Lasso that can retrieve data constraint implementations when they are implemented with any one of 13 frequently occurring patterns. We evaluate the three versions on a set of 299 data constraints from 15 real-world Java systems, and find that they improve method-level link recovery by 30%, 70%, and 163%, in terms of true positives within the first 10 results, compared to their text-retrieval-based baseline. More importantly, the Lasso variants correctly identify the line of code implementing the constraint inside the methods for 68% of the 299 constraints.</p>  more » « less
Award ID(s):
1910976
PAR ID:
10348231
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Zenodo
Date Published:
Edition / Version:
1.1
Subject(s) / Keyword(s):
business rule code pattern data constraint traceability link recovery empirical study fine-grained traceability
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Modern software systems rely on mining insights from business sensitive data stored in public clouds. A data breach usually incurs significant (monetary) loss for a commercial organization. Conceptually, cloud security heavily relies on Identity Access Management (IAM) policies that IT admins need to properly configure and periodically update. Security negligence and human errors often lead to misconfiguring IAM policies which may open a backdoor for attackers. To address these challenges, first, we develop a novel framework that encodes generating optimal IAM policies using constraint programming (CP). We identify reducing dormant permissions of cloud users as an optimality criterion, which intuitively implies minimizing unnecessary datastore access permissions. Second, to make IAM policies interpretable, we use graph representation learning applied to historical access patterns of users to augment our CP model with similarity constraints: similar users should be grouped together and share common IAM policies. Third, we describe multiple attack models and show that our optimized IAM policies significantly reduce the impact of security attacks using real data from 8 commercial organizations, and synthetic instances. 
    more » « less
  2. Because of the naturalness of software and the rapid evolution of Machine Learning (ML) techniques, frequently repeated code change patterns (CPATs) occur often. They range from simple API migrations to changes involving several complex control structures such as for loops. While manually performing CPATs is tedious, the current state-of-the-art techniques for inferring transformation rules are not advanced enough to handle unseen variants of complex CPATs, resulting in a low recall rate. In this paper we present a novel, automated workflow that mines CPATs, infers the transformation rules, and then transplants them automatically to new target sites. We designed, implemented, evaluated and released this in a tool, PYEVOLVE. At its core is a novel data-flow, control-flow aware transformation rule inference engine. Our technique allows us to advance the state-of-the-art for transformation-by-example tools; without it, 70% of the code changes that PYEVOLVE transforms would not be possible to automate. Our thorough empirical evaluation of over 40,000 transformations shows 97% precision and 94% recall. By accepting 90% of CPATs generated by PYEVOLVE in famous open-source projects, developers confirmed its changes are useful. 
    more » « less
  3. Abstract Temporal graphs represent graph evolution over time, and have been receiving considerable research attention. Work on expressing temporal graph patterns or discovering temporal motifs typically assumes relatively simple temporal constraints, such as journeys or, more generally, existential constraints, possibly with finite delays. In this paper we propose to use timed automata to express temporal constraints, leading to a general and powerful notion of temporal basic graph pattern (BGP). The new difficulty is the evaluation of the temporal constraint on a large set of matchings. An important benefit of timed automata is that they support an iterative state assignment, which can be useful for early detection of matches and pruning of non-matches. We introduce algorithms to retrieve all instances of a temporal BGP match in a graph, and present results of an extensive experimental evaluation, demonstrating interesting performance trade-offs. We show that an on-demand algorithm that processes total matchings incrementally over time is preferable when dealing with cyclic patterns on sparse graphs. On acyclic patterns or dense graphs, and when connectivity of partial matchings can be guaranteed, the best performance is achieved by maintaining partial matchings over time and allowing automaton evaluation to be fully incremental. The code and datasets used in our analysis are available athttp://github.com/amirpouya/TABGP. 
    more » « less
  4. The objective of this study is to examine spatial patterns of disaster impacts and recovery of communities based on fluctuations in credit card transactions (CCTs). Such fluctuations could capture the collective effects of household impacts, disrupted accesses, and business closures and thus provide an integrative measure for examining disaster impacts and community recovery. Existing studies depend mainly on survey and sociodemographic data for disaster impacts and recovery effort evaluations, although such data has limitations, including large data collection efforts and delayed timeliness results. Also, there are very few studies have concentrated on spatial patterns of disaster impacts and short-term recovery of communities, although such investigation can enhance situational awareness during disasters and support the identification of disparate spatial patterns of disaster impacts and recovery in the impacted regions. This study examines CCTs data Harris County (Texas, USA) during Hurricane Harvey in 2017 to explore spatial patterns of disaster impacts and recovery duration from the perspective of community residents and businesses at ZIP-code and county scales, respectively, and to further investigate their spatial disparities across ZIP codes. The results indicate that individuals in ZIP codes with populations of higher income experienced more severe disaster impact and recovered more quickly than those located in lower income ZIP codes for most business sectors. Our findings not only enhance the understanding of spatial patterns and disparities in disaster impacts and recovery for better community resilience assessment but also could benefit emergency managers, city planners, and public officials in enhanced situational awareness and resource allocation. 
    more » « less
  5. Software developers often repeat the same code changes within a project or across different projects. These repetitive changes are known as “code change patterns” (CPATs). Automating CPATs is crucial to expedite the software development process. While current Transformation by Example (TBE) techniques can automate CPATs, they are limited by the quality and quantity of the provided input examples. Thus, they miss transforming code variations that do not have the exact syntax, data-, or control-flow of the provided input examples, despite being semantically similar. Large Language Models (LLMs), pre-trained on extensive source code datasets, offer a potential solution. Harnessing the capability of LLMs to generate semantically equivalent, yet previously unseen variants of the original CPAT could significantly increase the effectiveness of TBE systems. In this paper, we first discover best practices for harnessing LLMs to generate code variants that meet three criteria: correctness (semantic equivalence to the original CPAT), usefulness (reflecting what developers typically write), and applicability (aligning with the primary intent of the original CPAT). We then implement these practices in our tool PyCraft, which synergistically combines static code analysis, dynamic analysis, and LLM capabilities. By employing chain-of-thought reasoning, PyCraft generates variations of input examples and comprehensive test cases that identify correct variations with an F-measure of 96.6%. Our algorithm uses feedback iteration to expand the original input examples by an average factor of 58x. Using these richly generated examples, we inferred transformation rules and then automated these changes, resulting in an increase of up to 39x, with an average increase of 14x in target codes compared to a previous state-of-the-art tool that relies solely on static analysis. We submitted patches generated by PyCraft to a range of projects, notably esteemed ones like microsoft/DeepSpeed and IBM/inFairness. Their developers accepted and merged 83% the 86 CPAT instances submitted through 44 pull requests. This confirms the usefulness of these changes. 
    more » « less