ABSTRACT
We present DEAR, a DL-based approach that supports auto-fixing for the bugs that require dependent changes at once to one or multiple hunks and one or multiple consecutive statements. We first design a novel fault localization (FL) technique for multi-hunk, multi-statement fixes that combines traditional spectrum-based (SB) FL with deep learning and data-flow analysis. It takes the buggy statements returned by the SBFL, and detects the buggy hunks to be fixed at once and expands a buggy statement s in a hunk to include other suspicious statements from s. We enhance a two-tier, tree-based LSTM model that incorporates cycle training and uses a divide-and-conquer strategy to learn proper code transformations for fixing multiple statements in the suitable fixing context consisting of surrounding subtrees. We conducted several experiments to evaluate DEAR on three datasets: Defects4J (395 bugs), BigFix (+26k bugs), and CPatMiner (+44k bugs). In CPatMiner, DEAR fixes 71 and 164 more bugs, including 52 and 61 more multi-hunk/multistatement bugs, than existing DL-based APR tools. Among 667 fixed bugs, there are 169 (25.3%) multi-hunk/multi-statement ones. On Defects4J, it outperforms the baselines from 42%–683% in terms of the number of auto-fixed bugs with only Top-1 ranked patches.Experimental Results
RQ1:Comparison with DL APR tools on Defects4J with Fault Localization
We compare DEAR with five state-of-the art DL-based APR tools: DLFix [18], CoCoNuT [23], SequenceR [5],Tufano19 [38], CODIT [4], and CURE [14].RQ2. Comparison with DL APRs on Large Datasets
DEAR fixes more bugs than any studied DL baselines on the two large datasets.Raw Data Snapshot:
Among the two big datasets, there are 7009 bugs in the test sets in total. To make the results readable, we would like to use a snapshot to help understand our work more clearly. This snapshot includes three bugs among 7009 bugs in total. And two of them we can fix correctly.