Week 4

Week 4 Progress Report

Previous Goals:

Respond to Thomas’ reviews about PRs and help him merge all open PRs
Consult Mike and Thomas on Unix diff implementation and try to get it further down in the pipeline
Convert Flexeme and SmartCommit results to diff formats
Implement tangled lines/hunks: add new functionality
Resolve CI issues (with Github Actions caching, Docker)

This Week’s Progress:

I have modified the pipeline as follows:
- V_{n-1}, V_buggy, and V_fixed are retrieved from VC history, and all comments, blank lines, and import statements, are removed from them (i.e. they are “filtered”)
- Then, 3 diffs are computed from these versions (The inverted patch from D4J is not needed)
- I tried to apply the patch in a more relaxed way (with fuzz option set to 3): patch bug_fix.diff on original.java to obtain non_bug_fix.diff. I want to compare the non_bug_fix.diff obtained this way to the original way of diffing {V_{n-1}, V_buggy}
Regarding the behavior of the patch program, this is how the patch program works until now:
- This patch is successful and correct on: bug files with a single commit on 1 code file no tangled hunks, no tangled lines
- This patch is successful, but incorrect on: bug files with tangled hunks
- This patch fails on bug files with tangled sequential changes in Defects4J (Closure 78, Lang 63). Thomas looked at these 2 examples and we thought this might be a potential bug in Defects4J: in these bug files, there are lines in original VC diff that are non-existent in the bug fix patch. I want Mike and Rene’s opinion on this.
- Thus, I am blocked on this re-designing direction.
I also debugged the clean_artifacts.py program and unit-testing to it. I separated the acquisition of 3 diff artifacts and 3 source code artifacts from ground truth construction, moved it to the beginning of the pipeline, to ensure that we have them ‘filtered’ for the rest of running the benchmark.
Regarding new functionality, I have written code for tangled hunk/line support but I’m not sure if it’s correct and robust. I need Thomas’ review for this.
- I attempt to implement commit_metrics.py and ground_truth.py from the 3 diff artifacts. On commit_metrics.py, I think my implementation not only returns a Boolean indicating whether a line/hunk is tangled but also counts the number of tangled hunks and lines. This algorithm may be naïve/error-prone, as it has only been tested on the filtered diff pipeline for now.
- Because of the patch program erroneous/unexpected behavior, I have yet achieved the goal of repairing the diff lines using line numbers for identity. However, the 3 diff artifacts allows me to handle ground truth construction more elegantly, as I can go from VC.diff and classify aligned bug_fix and non_bug_fix lines.
- The biggest flaw of this implementation is that it only deals with minimized bug-fixes on 1 single Java source code file.

Next Goals:

Get Thomas’ opinion and review on the implementation for tangle hunk and line support in the ground truth.
Look for ways to convert Flexeme and SmartCommit results to diff formats.
Read paper of replicating running the tools on synthetic commits & Paper on new tangled dataset
Run experiments and complete TODOs as requested by Thomas and Mike.

Miscellaneous

For teatime event in PLSE this week, I tried out Japanese dessert sandwich for the first time! teatime 1

Here is another delicious pic :) teatime 2

Written on June 30, 2023