skip to main content


Search for: All records

Award ID contains: 1161079

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Detecting “similar code” is useful for many software engineering tasks. Current tools can help detect code with statically similar syntactic and–or semantic features (code clones) and with dynamically similar functional input/output (simions). Unfortunately, some code fragments that behave similarly at the finer granularity of their execution traces may be ignored. In this paper, we propose the term “code relatives” to refer to code with similar execution behavior. We define code relatives and then present DyCLINK, our approach to detecting code relatives within and across codebases. DyCLINK records instruction-level traces from sample executions, organizes the traces into instruction-level dynamic dependence graphs, and employs our specialized subgraph matching algorithm to efficiently compare the executions of candidate code relatives. In our experiments, DyCLINK analyzed 422+ million prospective subgraph matches in only 43 minutes. We compared DyCLINK to one static code clone detector from the community and to our implementation of a dynamic simion detector. The results show that DyCLINK effectively detects code relatives with a reasonable analysis time. 
    more » « less
  2. Identifying similar code in software systems can assist many software engineering tasks such as program understanding and software refactoring. While most approaches focus on identifying code that looks alike, some techniques aim at detecting code that functions alike. Detecting these functional clones - code that functions alike - in object oriented languages remains an open question because of the difficulty in exposing and comparing programs' functionality effectively. We propose a novel technique, In-Vivo Clone Detection, that detects functional clones in arbitrary programs by identifying and mining their inputs and outputs. The key insight is to use existing workloads to execute programs and then measure functional similarities between programs based on their inputs and outputs, which mitigates the problems in object oriented languages reported by prior work. We implement such technique in our system, HitoshiIO, which is open source and freely available. Our experimental results show that HitoshiIO detects more than 800 functional clones across a corpus of 118 projects. In a random sample of the detected clones, HitoshiIO achieves 68+% true positive rate with only 15% false positive rate. 
    more » « less
  3. When software engineering researchers discuss "similar" code, we often mean code determined by static analysis to be textually, syntactically or structurally similar, known as code clones (looks alike). Ideally, we would like to also include code that is behaviorally or functionally similar, even if it looks completely different. The state of the art in detecting these behavioral clones focuses on checking the functional equivalence of the inputs and outputs of code fragments, regardless of its internal behavior (focusing only on input and output states). We argue that with an advance in dynamic code clone detection towards detecting behavioral clones (i.e., those with similar execution behavior), we can greatly increase the applications of behavioral clones as a whole for general program understanding tasks. 
    more » « less
  4. When software engineering researchers discuss "similar" code, we often mean code determined by static analysis to be textually, syntactically or structurally similar, known as code clones (looks alike). Ideally, we would like to also include code that is behaviorally or functionally similar, even if it looks completely different. The state of the art in detecting these behavioral clones focuses on checking the functional equivalence of the inputs and outputs of code fragments, regardless of its internal behavior (focusing only on input and output states). We argue that with an advance in dynamic code clone detection towards detecting behavioral clones (i.e., those with similar execution behavior), we can greatly increase the applications of behavioral clones as a whole for general program understanding tasks. 
    more » « less
  5. Slow builds remain a plague for software developers. The frequency with which code can be built (compiled, tested and packaged) directly impacts the productivity of developers: longer build times mean a longer wait before determining if a change to the application being built was successful. We have discovered that in the case of some languages, such as Java, the majority of build time is spent running tests, where dependencies between individual tests are complicated to discover, making many existing test acceleration techniques unsound to deploy in practice. Without knowledge of which tests are dependent on others, we cannot safely parallelize the execution of the tests, nor can we perform incremental testing (i.e., execute only a subset of an application's tests for each build). The previous techniques for detecting these dependencies did not scale to large test suites: given a test suite that normally ran in two hours, the best-case running scenario for the previous tool would have taken over 422 CPU days to find dependencies between all test methods (and would not soundly find all dependencies) — on the same project the exhaustive technique (to find all dependencies) would have taken over 1e300 years. We present a novel approach to detecting all dependencies between test cases in large projects that can enable safe exploitation of parallelism and test selection with a modest analysis cost. 
    more » « less
  6. Dynamic taint tracking is an information flow analysis that can be applied to many areas of testing. Phosphor is the first portable, accurate and performant dynamic taint tracking system for Java. While previous systems for performing general-purpose taint tracking in the JVM required specialized research JVMs, Phosphor works with standard off-the-shelf JVMs (such as Oracle's HotSpot and OpenJDK's IcedTea). Phosphor also differs from previous portable JVM taint tracking systems that were not general purpose (e.g. tracked only tags on Strings and no other type), in that it tracks tags on all variables. We have also made several enhancements to Phosphor, to track taint tags through control flow (in addition to data flow), as well as to track an arbitrary number of relationships between taint tags (rather than be limited to only 32 tags). In this demonstration, we show how developers writing testing tools can benefit from Phosphor, and explain briefly how to interact with it. 
    more » « less
  7. Metamorphic testing is an advanced technique to test programs without a true test oracle such as machine learning applications. Because these programs have no general oracle to identify their correctness, traditional testing techniques such as unit testing may not be helpful for developers to detect potential bugs. This paper presents a novel system, KABU, which can dynamically infer properties of methods' states in programs that describe the characteristics of a method before and after transforming its input. These Metamorphic Properties (MPs) are pivotal to detecting potential bugs in programs without test oracles, but most previous work relies solely on human effort to identify them and only considers MPs between input parameters and output result (return value) of a program or method. This paper also proposes a testing concept, Metamorphic Differential Testing (MDT). By detecting different sets of MPs between different versions for the same method, KABU reports potential bugs for human review. We have performed a preliminary evaluation of KABU by comparing the MPs detected by humans with the MPs detected by KABU. Our preliminary results are promising: KABU can find more MPs than human developers, and MDT is effective at detecting function changes in methods. 
    more » « less
  8. This chapter is based on earlier work: “Secret Ninja testing with HALO software engineering,” in 4th International Workshop on Social Software Engineering, 2011. Copyright ACM, 2011. http://doi.acm.org/10.1145/2024645.2024657. “A competitive-collaborative approach for introducing software engineering in a CS2 class”, in 26th IEEE Conference on Software Engineering Education and Training, 2013. Copyright IEEE, 2013. http://dx.doi.org/10.1109/CSEET.2013.6595235. 
    more » « less
  9. For some applications, it is impossible or impractical to know what the correct output should be for an arbitrary input, making testing difficult. Many machine-learning applications for “big data”, bioinformatics and cyberphysical systems fall in this scope: they do not have a test oracle. Metamorphic Testing, a simple testing technique that does not require a test oracle, has been shown to be effective for testing such applications. We present Metamorphic Runtime Checking, a novel approach that conducts metamorphic testing of both the entire application and individual functions during a program’s execution. We have applied Metamorphic Runtime Checking to 9 machine-learning applications, finding it to be on average 170% more effective than traditional metamorphic testing at only the full application level. 
    more » « less