DEAR: A Novel Deep Learning-based Approach for Automated Program Repair

ABSTRACT

We present DEAR, a DL-based approach that supports auto-fixing for the bugs that require dependent changes at once to one or multiple hunks and one or multiple consecutive statements. We first design a novel fault localization (FL) technique for multi-hunk, multi-statement fixes that combines traditional spectrum-based (SB) FL with deep learning and data-flow analysis. It takes the buggy statements returned by the SBFL, and detects the buggy hunks to be fixed at once and expands a buggy statement s in a hunk to include other suspicious statements from s. We enhance a two-tier, tree-based LSTM model that incorporates cycle training and uses a divide-and-conquer strategy to learn proper code transformations for fixing multiple statements in the suitable fixing context consisting of surrounding subtrees. We conducted several experiments to evaluate DEAR on three datasets: Defects4J (395 bugs), BigFix (+26k bugs), and CPatMiner (+44k bugs). In CPatMiner, DEAR fixes 71 and 164 more bugs, including 52 and 61 more multi-hunk/multistatement bugs, than existing DL-based APR tools. Among 667 fixed bugs, there are 169 (25.3%) multi-hunk/multi-statement ones. On Defects4J, it outperforms the baselines from 42%–683% in terms of the number of auto-fixed bugs with only Top-1 ranked patches.

Experimental Results

RQ1:Comparison with DL APR tools on Defects4J with Fault Localization

We compare DEAR with five state-of-the art DL-based APR tools: DLFix [18], CoCoNuT [23], SequenceR [5],Tufano19 [38], CODIT [4], and CURE [14].

Raw data

RQ2. Comparison with DL APRs on Large Datasets

DEAR fixes more bugs than any studied DL baselines on the two large datasets.

Raw Data Snapshot:

Among the two big datasets, there are 7009 bugs in the test sets in total. To make the results readable, we would like to use a snapshot to help understand our work more clearly. This snapshot includes three bugs among 7009 bugs in total. And two of them we can fix correctly.

RQ2. Comparison with DL APRs on Cross-Datasets

DEAR also outperformed the baselines in cross-dataset setting (trained on CPatMiner, tested on BigFix, and vice versa).

RQ3. Comparison with Pattern-based APRs

DEAR fixes at the same level of the number of bugs as the top pattern-based tools Hercules and Tbar.