skip to main content


Title: Exploring Regular Expression Evolution
Although there are tools to help developers understand the matching behaviors between a regular expression and a string, regular-expression related faults are still common. Learning developers’ behavior through the change history of regular expressions can identify common edit patterns, which can inform the creation of mutation and repair operators to assist with testing and fixing regular expressions. In this work, we explore how regular expressions evolve over time, focusing on the characteristics of regular expression edits, the syntactic and semantic difference of the edits, and the feature changes of edits. Our exploration uses two datasets. First, we look at GitHub projects that have a regular expression in their current version and look back through the commit logs to collect the regular expressions’ edit history. Second, we collect regular expressions composed by study participants during problem- solving tasks. Our results show that 1) 95% of the regular expressions from GitHub are not edited, 2) most edited regular expressions have a syntactic distance of 4-6 characters from their predecessors, 3) over 50% of the edits in GitHub tend to expand the scope of regular expression, and 4) the number of features used indicates the regular expression language usage increases over time. This work has implications for supporting regular expression repair and mutation to ensure test suite quality.  more » « less
Award ID(s):
1714699 1749936
NSF-PAR ID:
10100319
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
IEEE 26th International Conference on Software Analysis, Evolution and Reengineering
Page Range / eLocation ID:
502 to 513
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    An efficient and scalable rule‐based syntactic differencing approach is presented. The tool srcDiff is built upon the srcML infrastructure. srcML adds abstract syntactic information into the code via an XML format. A syntactic difference of srcML documents is then taken. During this process, the differences are further refined using a set of rules that model typical editing patterns of source code by developers. Thus, the resulting deltas model edits that are programmer centric versus a purely syntactic tree edit view. Other syntactic differencing approaches focus on obtaining an optimal tree edit distance with the assumption that this will produce an accurate difference. While this may work well for small or simple changes, the differences quickly become unreadable for more complex changes. By contrast, the approach presented here purposely deviates from an optimal tree edit difference in order to create a delta that is both easier to understand and better models changes between the original and modified. To evaluate the approach, a comparison user study against a state‐of‐the‐art syntactic differencing approach and two line‐based differencing tools is conducted as an online within‐participant study with about 70 subjects on 14 sample changes. The results provide support that the rule‐based syntactic differencing produces more accurate and understandable deltas.

     
    more » « less
  2. Today, face editing is widely used to refine/alter photos in both professional and recreational settings. Yet it is also used to modify (and repost) existing online photos for cyberbullying. Our work considers an important open question: 'How can we support the collaborative use of face editing on social platforms while protecting against unacceptable edits and reposts by others?' This is challenging because, as our user study shows, users vary widely in their definition of what edits are (un)acceptable. Any global filter policy deployed by social platforms is unlikely to address the needs of all users, but hinders social interactions enabled by photo editing. Instead, we argue that face edit protection policies should be implemented by social platforms based on individual user preferences. When posting an original photo online, a user can choose to specify the types of face edits (dis)allowed on the photo. Social platforms use these per-photo edit policies to moderate future photo uploads, i.e., edited photos containing modifications that violate the original photo's policy are either blocked or shelved for user approval. Realizing this personalized protection, however, faces two immediate challenges: (1) how to accurately recognize specific modifications, if any, contained in a photo; and (2) how to associate an edited photo with its original photo (and thus the edit policy). We show that these challenges can be addressed by combining highly efficient hashing based image search and scalable semantic image comparison, and build a prototype protector (Alethia) covering nine edit types. Evaluations using IRB-approved user studies and data-driven experiments (on 839K face photos) show that Alethia accurately recognizes edited photos that violate user policies and induces a feeling of protection to study participants. This demonstrates the initial feasibility of personalized face edit protection. We also discuss current limitations and future directions to push the concept forward.

     
    more » « less
  3. Developers report testing their regular expressions less than the rest of their code. In this work, we explore how thoroughly tested regular expressions are by examining open source projects. Using standard metrics of coverage, such as line and branch cov- erage, gives an incomplete picture of the test coverage of regular expressions. We adopt graph-based coverage metrics for the DFA representation of regular expressions, providing fine-grained test coverage metrics. Using over 15,000 tested regular expressions in 1,225 Java projects on GitHub, we measure node, edge, and edge-pair coverage. Our results show that only 17% of the regular expressions in the repositories are tested at all. For those that are tested, the median number of test inputs is two. For nearly 42% of the tested regular expressions, only one test input is used. Average node and edge coverage levels on the DFAs for tested regular expressions are 59% and 29%, respectively. Due to the lack of testing of regular expressions, we explore whether a string generation tool for reg- ular expressions, Rex, achieves high coverage levels. With some exceptions, we found that tools such as Rex can be used to write test inputs with similar coverage to the developer tests. 
    more » « less
  4. In this paper we define and investigate the Fréchet edit distance problem. Here, given two polygonal curves $\pi$ and $\sigma$ and a threshhold value $\delta$ , we seek the minimum number of edits to $\sigma$ such that the Fréchet distance between the edited curve and $\pi$ is at most $\delta$. For the edit operations we consider three cases, namely, deletion of vertices, insertion of vertices, or both. For this basic problem we consider a number of variants. Specifically, we provide polynomial time algorithms for both discrete and continuous Fréchet edit distance variants, as well as hardness results for weak Fréchet edit distance variants. 
    more » « less
  5. Mulzer, Wolfgang ; Phillips, Jeff M (Ed.)
    We define and investigate the Fréchet edit distance problem. Given two polygonal curves π and σ and a non-negative threshhold value δ, we seek the minimum number of edits to σ such that the Fréchet distance between the edited σ and π is at most δ. For the edit operations we consider three cases, namely, deletion of vertices, insertion of vertices, or both. For this basic problem we consider a number of variants. Specifically, we provide polynomial time algorithms for both discrete and continuous Fréchet edit distance variants, as well as hardness results for weak Fréchet edit distance variants. 
    more » « less