SPG - Software Productivity Group : Experiments

Defining and implementing refactoring is a nontrivial task since it is difficult to define all preconditions to guarantee that the transformation preserves the program behavior. Therefore, refactoring engines may have overly weak preconditions, overly strong preconditions, and transformation issues related to the refactoring definition. In practice, developers manually write test cases to check their refactoring implementations. We find that 84% of the test suites of Eclipse and JRRT are concerned to these kinds of bugs. However, the engines still have them. Researchers have proposed a number of techniques for testing refactoring engines. Nevertheless, they may have limitations related to the generation of input programs, time consumption, kinds of bugs, automation, and debugging. In this work, we propose a technique to scale testing of refactoring engines by extending a previous technique. It automatically generates programs as test inputs using Dolly, a Java and C program generator. We reduce the time to test the refactoring implementations by skipping some consecutive test inputs. We also add more Java constructs in DOLLY, such abstract classes and methods, and interface. Our technique uses SAFEREFACTORIMPACT to identify failures related to behavioral changes. It generates test cases only for the methods impacted by a transformation. Also, we propose a new oracle to evaluate whether refactoring preconditions are overly strong by disabling a subset of them. Finally, we present a technique to identify transformation issues related to the refactoring definition. We evaluate our technique in 28 refactoring implementations of Java (Eclipse and JRRT) and C (Eclipse) and find 119 bugs related to compilation errors, behavioral changes, overly strong preconditions, and transformation issues. The technique reduces the time in 90% and 96% using skips of 10 and 25 in Dolly while missing only 3% and 6% of the bugs, respectively. Additionally, it finds the first failure in general in a few seconds using skips. Using the new oracle to identify overly strong preconditions, it detects more bugs and facilitates the debugging activity different from previous works. Finally, we evaluate refactoring implementations of Eclipse and JRRT using the input programs of their refactoring test suites and find 23 bugs not detected by the developers.

Melina Mongiovi, Rohit Gheyi, Gustavo Soares, Marcio Ribeiro, Paulo Borba, and Leopoldo Teixeira

toostrong
Table V: Summary of the time to find the first failure (TTFF). Bug Type = type of the first detected failure: compiltation error (CE) or behavioral change (BC); Generated Programs = number of programs generated to find the first failure.

results
Table VI: Summary of the evaluation results of JRRT and Eclipse refactoring implementations; Refact. = Kind of Refactoring; Skip = Skip value used by DOLLY to reduce the number of generated programs; GP = Number of Generated Programs by DOLLY; CP = number of compilable programs (%); No of assessed preconditions = Number of assessed refactoring preconditions in our study; LOC added in the refactoring engine = Lines of Code added in the refactoring engine to disable the refactoring preconditions; Overly Strong Preconditions = Number of detected overly strong preconditions in the refactoring implementations; Time (hr) = Total time to evaluate the refactoring implementations in hours; Time to First Failure (min) = Time to find the first failure in minutes; "na" = not assessed.

results
Table VII: Summary of the evaluation results of the comparison of DP and DT techniques using input programs generated by DOLLY; Refact. = Kind of Refactoring; Skip = Skip value used by DOLLY to reduce the number of generated programs; GP = Number of Generated Programs by DOLLY; DP Tech. = DP Technique; DT Tech. = DT Technique; Overly Strong Preconditions = Number of detected overly strong preconditions in the refactoring implementations; "na" = not assessed.

results
Table VIII: Summary of the evaluation results of Eclipse and JRRT refactoring implementations with our technique using SCA oracle; Refactoring = kind of refactoring; Scope = scope used by DOLLY to generate programs; P = package; C = class; M = method; F = field; Alloy Instances = number of Alloy instances generated by the Alloy Analyzer; GP (using skip of 25) = number of generated programs using skip of 25 in DOLLY 2.0; CP = compilable generated programs; Transformation Issues = number of different kinds of issues related to incorrect transformations; Time = total time to evaluate the refactoring implementations.

We evaluated SafeRefactor in 45 pairs of programs. First, we evaluate it in 8 non-behavior-preserving transformations applied by Eclipse. Table VIII summarizes the experimental results. The subjects can be downloaded here.

results
Table VIII (a): Results using a time limit of 0.2s. Methods = number of methods passed to Randoop to generate tests; Time = the total time of the analysis in seconds; Change Coverage = the percentage of impacted methods covered; Relevant Tests = the percentage of relevant tests; Result = it states whether the transformation is behavior preserving.

We also analyzed 23 design patterns implemented in Java and AspectJ. Our tool identified that the AO and OO versions of the State pattern are not equivalent. Table IX summarizes the experimental results. The subjects can be found here.

toostrong
Table IX: Results using a time limit of 0.5s. Impacted Methods = number of methods identified in the change impact analysis; Methods = number of methods passed to Randoop to generate tests; Time = the total time of the analysis in seconds; Change Coverage = the percentage of impacted methods covered; Relevant Tests = the percentage of relevant tests; Result = it states whether the transformation is behavior preserving.

We use our tools to evaluate two JML compilers that generate AO code. Table X summarizes the experimental results. The subjects can be downloaded here.

toostrong
Table X: Results using a time limit of 0.2s. Impacted Methods = number of methods identified in the change impact analysis; Methods = number of methods passed to Randoop to generate tests; Time = the total time of the analysis in seconds; Change Coverage = the percentage of impacted methods covered; Relevant Tests = the percentage of relevant tests; Result = it states whether the transformation is behavior preserving.

Additionally, we compared 12 transformations applied to real OO and AO programs. Table XI summarizes the experimental results. The subjects can be found here.

toostrong
Table XI (a): Results using a time limit of 20s. Impacted Methods = number of methods identified in the change impact analysis; Methods = number of methods passed to Randoop to generate tests; Time = the total time of the analysis in seconds; Change Coverage = the percentage of impacted methods covered; Relevant Tests = the percentage of relevant tests; Result = it states whether the transformation is behavior preserving.

Alloy Specifications:
Java meta-model specified in Alloy
Java meta-model specified in Alloy (with abstract and interface)
C meta-model specified in Alloy

Tools used in our experiment:
SafeRefactor/SafeRefactorImpact
Dolly

Subjects used in the evaluation of SafeRefactorImpact:
Toy Examples
Design Patterns
JML Compilers
Larger Subjects

New Bugs of Eclipse JDT
434862: Push Down Field (compilation error)
434878: Pull Up Field (behavioral change)
434881: Move Method (overly strong condition)
434886: Move Method (overly strong condition)
462994: Pull Up Field (overly strong condition)
471952: Push Down Method (transformation issue)
473650: Push Down Method (transformation issue)
473651: Pull Up Method (transformation issue)
473653: Pull Up Method (transformation issue)
476293: Pull Up Method (transformation issue)
473655: Pull Up Method (transformation issue)
476256: Pull Up Method (transformation issue)
471953: Encapsulate Field (transformation issue)
471955: Encapsulate Field (transformation issue)
471961: Encapsulate Field (transformation issue)

New Bugs of Eclipse CDT
426896: Rename Global Variable (compilation error)
426895: Rename Global Variable (compilation error)
426894: Rename Global Variable (compilation error)
426893: Rename Global Variable (compilation error)
426899: Rename Global Variable (behavioral change)
435125: Rename Local Variable (compilation error)
425431: Rename Local Variable (compilation error)
425433: Rename Local Variable (compilation error)
435124: Rename Local Variable (compilation error)
425434: Rename Local Variable (behavioral change)
426891: Rename Define (compilation error)
426890: Rename Define (compilation error)
426889: Rename Define (compilation error)
425491: Rename Define (compilation error)
426925: Rename Function (compilation error)
426924: Rename Function (compilation error)
426922: Rename Function (compilation error)
425466: Rename Function (compilation error)
425465: Rename Function (compilation error)
425426: Rename Parameter (compilation error)
426635: Rename Parameter (compilation error)
425489: Rename Parameter (compilation error)
426636: Rename Parameter (compilation error)
425428: Rename Parameter (behavioral change)
426916: Extract Function (compilation error)
426915: Extract Function (compilation error)
426914: Extract Function (compilation error)
426913: Extract Function (compilation error)
426911: Extract Function (compilation error)
425438: Extract Function (compilation error)
396658: Extract Function (compilation error)
426918: Extract Function (behavioral change)
425446: Extract Local Variable (compilation error)
435122: Extract Local Variable (compilation error)
435121: Extract Local Variable (compilation error)
425445: Extract Local Variable (compilation error)
425444: Extract Local Variable (compilation error)
397278: Extract Local Variable (behavioral change)
397275: Extract Constant (compilation error)
425127: Extract Constant (compilation error)
435132: Extract Constant (compilation error)
435131: Extract Constant (compilation error)
435130: Extract Constant (compilation error)
435129: Extract Constant (compilation error)
435128: Extract Constant (compilation error)
425470: Extract Constant (compilation error)
425471: Extract Constant (behavioral change)

Bugs of Eclipse JDT detected by the DP technique using the programas generated by JDolly as test inputs
434881: Move Method (New Bug) Status: CLOSED DUPLICATE
434886: Move Method (New Bug) Status: ASSIGNED
486694: Move Method (New Bug) Status: NEW (They did not answer yet)
399350: Pull Up Method Status: ASSIGNED
399788: Pull Up Method Status: ASSIGNED
462994: Pull Up Field (New Bug) Status: RESOLVED INVALID
391715: Rename Method Status: NEW (They did not answer yet)
391713: Rename Method Status: ASSIGNED
399181: Rename Method Status: RESOLVED INVALID
486693: Encapsulate Field (New Bug) Status: NEW (They did not answer yet)
399789: Add Parameter Status: ASSIGNED
486692: Add Parameter (New Bug) Status: NEW (They did not answer yet)
399347: Push Down Method Status: ASSIGNED
391710 : Rename Type Status: ASSIGNED
399183: Rename Type Status: CLOSED DUPLICATE

New Bugs of JRRT
1. Move Method (overly strong condition)
2. Move Method (overly strong condition)
Catalog of bugs detected by the Dp technique (overly strong condition)
Catalog of bugs detected by the technique to identify transformation issues (transformation issues)

Transformation templates and aspects used to disable refactoring preconditions of Eclipse and JRRT:
Templates and Aspects

Scaling Testing of Refactoring Engines