Week 4
Week 4 Progress Report
Previous Goals:
- Respond to Thomas’ reviews about PRs and help him merge all open PRs
- Consult Mike and Thomas on Unix diff implementation and try to get it further down in the pipeline
- Convert Flexeme and SmartCommit results to diff formats
- Implement tangled lines/hunks: add new functionality
- Resolve CI issues (with Github Actions caching, Docker)
This Week’s Progress:
- I have modified the pipeline as follows:
- V_{n-1}, V_buggy, and V_fixed are retrieved from VC history, and all comments, blank lines, and import statements, are removed from them (i.e. they are “filtered”)
- Then, 3 diffs are computed from these versions (The inverted patch from D4J is not needed)
- I tried to apply the patch in a more relaxed way (with fuzz option set to 3): patch bug_fix.diff on original.java to obtain non_bug_fix.diff. I want to compare the non_bug_fix.diff obtained this way to the original way of diffing {V_{n-1}, V_buggy}
- Regarding the behavior of the patch program, this is how the patch program works until now:
- This patch is successful and correct on: bug files with a single commit on 1 code file no tangled hunks, no tangled lines
- This patch is successful, but incorrect on: bug files with tangled hunks
- This patch fails on bug files with tangled sequential changes in Defects4J (Closure 78, Lang 63). Thomas looked at these 2 examples and we thought this might be a potential bug in Defects4J: in these bug files, there are lines in original VC diff that are non-existent in the bug fix patch. I want Mike and Rene’s opinion on this.
- Thus, I am blocked on this re-designing direction.
- I also debugged the clean_artifacts.py program and unit-testing to it. I separated the acquisition of 3 diff artifacts and 3 source code artifacts from ground truth construction, moved it to the beginning of the pipeline, to ensure that we have them ‘filtered’ for the rest of running the benchmark.
- Regarding new functionality, I have written code for tangled hunk/line support but I’m not sure if it’s correct and robust. I need Thomas’ review for this.
- I attempt to implement commit_metrics.py and ground_truth.py from the 3 diff artifacts. On commit_metrics.py, I think my implementation not only returns a Boolean indicating whether a line/hunk is tangled but also counts the number of tangled hunks and lines. This algorithm may be naïve/error-prone, as it has only been tested on the filtered diff pipeline for now.
- Because of the patch program erroneous/unexpected behavior, I have yet achieved the goal of repairing the diff lines using line numbers for identity. However, the 3 diff artifacts allows me to handle ground truth construction more elegantly, as I can go from VC.diff and classify aligned bug_fix and non_bug_fix lines.
- The biggest flaw of this implementation is that it only deals with minimized bug-fixes on 1 single Java source code file.
Next Goals:
- Get Thomas’ opinion and review on the implementation for tangle hunk and line support in the ground truth.
- Look for ways to convert Flexeme and SmartCommit results to diff formats.
- Read paper of replicating running the tools on synthetic commits & Paper on new tangled dataset
- Run experiments and complete TODOs as requested by Thomas and Mike.
Miscellaneous
For teatime event in PLSE this week, I tried out Japanese dessert sandwich for the first time!
Here is another delicious pic :)
Written on June 30, 2023