Week 5

Previous Goals:

Get Thomas’ opinion and review on the implementation for tangle hunk and line support in the ground truth.
Look for ways to convert Flexeme and SmartCommit results to diff formats.
Read paper of replicating running the tools on synthetic commits & Paper on new tangled dataset
Run experiments and complete TODOs as requested by Thomas and Mike.

This Week’s Progress:

After Thomas’ initial review, I realized that a big flaw in my implementation is that it can operate on only one Java file - which is the Java file that the bug fix patch is applied. I rewrote the diff commands in ‘generate_artifacts.sh’ to fix this.
I have also paused on trying to work on the Unix patch approach to obtain non-bug-fix diff from the fixed, uncommented code and final fixed code in VC. Based on the experiments I have ran (playing with fuzz and different numbers of context lines) I think this is too error-prone, and it would need cleaning (i.e. deleting comments, import statements, and blank lines) from all the Java source code files. Thus, Mike is implementing a script that checks out the bug file repository from Defects4J and cleans up all the source code at the start of the pipeline. We expect to use commit hashes from this repository to run Flexeme and SmartCommit.
I also implemented a rehunking algorithm that, in hindsight, did not respect the original hunking by Version Control diff. However, this doeas spark ideas about nice-to-have experiments about running diff with different number of context lines to evaluate untangling performance.
I found 2 bug files in Defects4J and reported it to the authors. As these 2 files happen to be on the d4j-5-bugs.csv example, I replaced them for the purpose of running end to end results.
Regarding functionality, my uncertainty from last week has been resolved, as Thomas gave me a review of the logic behind my implementation of tangled hunk and tangled line support. I argued that we should implement an integer count of the number of tangled hunks and lines instead of just Boolean metrics, since tangled-ness information might be lost with respect to the original version control diff. My program currently respects the hunking and original line numbers from VC diff, but I’m not happy still about mapping by just line contents.
By re-implementing ground truth construction, I also managed to eliminate excessive code and fixed a bug in repair_line_number in the original ground_truth.py file. I think this bug only reveals itself in some bug files (2 so far), but it is nontrivial. I re-ran the end to end results to obtain new goal files for metrics.csv (with tangled line and hunk count metrics) and scores.csv (with recomputed Rand Index scores).
I helped Thomas with GNU parallelization, and ran my branch end to end to test. It succeeded on my personal computer and reduces runtime by approximately half, but there seems to be a problem for the lab server.

Next Goals:

Integrate Mike’s implementation of cleaning repo into the pipeline.
Get Mike’s and Thomas’ code review on unix-patch branch.
Read paper of replicating running the tools on synthetic commits & Paper on new tangled dataset.
Run experiments and complete TODOs as requested by Thomas and Mike.

Miscellaneous

A fun July PLSE hangout at Big Time!

PLSE hangout

Written on July 7, 2023