Week 4

Week 4 Progress Report

Previous Goals:

  • Respond to Thomas’ reviews about PRs and help him merge all open PRs
  • Consult Mike and Thomas on Unix diff implementation and try to get it further down in the pipeline
  • Convert Flexeme and SmartCommit results to diff formats
  • Implement tangled lines/hunks: add new functionality
  • Resolve CI issues (with Github Actions caching, Docker)

This Week’s Progress:

  • I have modified the pipeline as follows:
    • V_{n-1}, V_buggy, and V_fixed are retrieved from VC history, and all comments, blank lines, and import statements, are removed from them (i.e. they are “filtered”)
    • Then, 3 diffs are computed from these versions (The inverted patch from D4J is not needed)
    • I tried to apply the patch in a more relaxed way (with fuzz option set to 3): patch bug_fix.diff on original.java to obtain non_bug_fix.diff. I want to compare the non_bug_fix.diff obtained this way to the original way of diffing {V_{n-1}, V_buggy}
  • Regarding the behavior of the patch program, this is how the patch program works until now:
    • This patch is successful and correct on: bug files with a single commit on 1 code file no tangled hunks, no tangled lines
    • This patch is successful, but incorrect on: bug files with tangled hunks
    • This patch fails on bug files with tangled sequential changes in Defects4J (Closure 78, Lang 63). Thomas looked at these 2 examples and we thought this might be a potential bug in Defects4J: in these bug files, there are lines in original VC diff that are non-existent in the bug fix patch. I want Mike and Rene’s opinion on this.
    • Thus, I am blocked on this re-designing direction.
  • I also debugged the clean_artifacts.py program and unit-testing to it. I separated the acquisition of 3 diff artifacts and 3 source code artifacts from ground truth construction, moved it to the beginning of the pipeline, to ensure that we have them ‘filtered’ for the rest of running the benchmark.
  • Regarding new functionality, I have written code for tangled hunk/line support but I’m not sure if it’s correct and robust. I need Thomas’ review for this.
    • I attempt to implement commit_metrics.py and ground_truth.py from the 3 diff artifacts. On commit_metrics.py, I think my implementation not only returns a Boolean indicating whether a line/hunk is tangled but also counts the number of tangled hunks and lines. This algorithm may be naïve/error-prone, as it has only been tested on the filtered diff pipeline for now.
    • Because of the patch program erroneous/unexpected behavior, I have yet achieved the goal of repairing the diff lines using line numbers for identity. However, the 3 diff artifacts allows me to handle ground truth construction more elegantly, as I can go from VC.diff and classify aligned bug_fix and non_bug_fix lines.
    • The biggest flaw of this implementation is that it only deals with minimized bug-fixes on 1 single Java source code file.

Next Goals:

  • Get Thomas’ opinion and review on the implementation for tangle hunk and line support in the ground truth.
  • Look for ways to convert Flexeme and SmartCommit results to diff formats.
  • Read paper of replicating running the tools on synthetic commits & Paper on new tangled dataset
  • Run experiments and complete TODOs as requested by Thomas and Mike.

Miscellaneous

For teatime event in PLSE this week, I tried out Japanese dessert sandwich for the first time! teatime 1

Here is another delicious pic :) teatime 2

Written on June 30, 2023