Figure 1a: Histograms of commit message cross-entropies, initial model
Note the tall bins which contain a large number of auto-generated commit messages that were not foreseen when training this model.
Figure 1b: Histograms of commit message cross-entropies, refined model
We recalculated the histogram, removing auto-generated commits, as well as many non-English commit messages.
Figure 2: Empirical cumulative distribution functions of commits
Empirical cumulative distribution functions of number of passed (in green), failed (in purple), and errored (in orange) commits as cross-entropy (“unusualness”) increases. Note that failed, initially grows slower than passed and errored; by 10 bits, however, failed is indistinguishable from passed and errored.