Week 7

Previous Goals:

Create a Docker Image on Linux to finally set up end to end testing on Github Actions
Top priority: run new diff metrics program and ground truth construction on all compatible bug files. The paper needs results now!
Resolved end to end testing issue: Java being non-deterministic for SmartCommit
I will be volunteering at ISSTA, so I expect to be taken away from research for some time :)

This Week’s Progress:

I created a new branch, new-pipeline, to run on d4j-compatible-bugs.csv and debugged it. Thomas helped me with the program specification and investigated Defects4J compilation issues together. I discovered that now, the pipeline fails to run on 29 deprecated bugs as it fails to checkout the bug repository from Defects4J. I learned that their buggy repositories do not exist, only the Git history metadata, and thus the deprecated bugs must be removed.
- I learned that their buggy repositories do not exist, only the Git history metadata, and thus the deprecated bugs must be removed.
- I learned that some (aounrd ~8) bugs fail to run because of a Unicode ‘uft-8’ encoding issue. This leads to unidiff.PatchSet failing to be generated from the diff files. I changed this to ‘latin-1’ encoding after inspecting the VC diff for failing file, and I think this is a correct solution.
- I learned that the pre-condition for the new ground truth construction to work is for the order of the changed diff lines in the VC diff to be the same as in non-bug-fix diff, at least. However, inspecting Cli-29 led me to realize that removed lines have unexpected behavior, and their line numbers are very hard to fix - even with cleaning all diffs beforehand! Thus, I decided to handle this special scenarios by switching the truth classification scheme to content matching.
I got access to pipsqueak, thanks to Gus and Thomas! I managed to install the tool myself on the machine (that means the README.md instructions writeup is very good :)) and run the first 3 scripts of the pipeline: generate_artifacts.sh, compute_metrics.sh, and generate_ground_truth.sh. We have new results - metrics.csv and truth.csv - in home/thdang/untangling-tools-benchmark/benchmark-all/evaluation on pipsqueak.
Thank you Mike for setting up the Docker image for CI! This is incredibly helpful, as CI has been a pain for me, and it is bad to have to run end to end tests locally every time I want to push (my laptop actually ran out of memory, lol!). I helped Mike debugged Docker container, and the iterations are longer than I expected due to merge issues. However, the Docker successfully runs the tool and passes end-to-end tests now.
I wrote a paragraph on pipeline limitations on the current paper and submit a merge request on GitLab.

Next Goals:

Automate analysis script for table 3 in auto-analysis
Set up end to end tests on new-pipeline and respond to Thomas’ concerns
Generate all results on Defects4J, generate tables/figures, rewrite the analyses
Work on the paper

Written on July 21, 2023