Just enter your email
A few ideas to refine your Truck Factor heuristic:
1. Many developers might be very familiar with the code of a file even if they have not actually modified it. Some might have modified it in their unmerged forks or commented on its commits, or linked to a line of it. All of these actions are visible on Github. People who are familiar with the code but don't use Github much are extremely difficult to track, of course.
2. Reducing complexity is a huge part of increasing the Truck Factor, and it is not taken into account in your study. Sometimes the code is so clean and so well-documented that any developer could lead the project after a day. This is difficult to calculate from Github data, of course.
3. Often in open source projects, lead developers come and go, see for instance https://github.com/ankidroid/Anki-Android/graphs/contributors where Edu wrote most of the files, but has not been seen since 2010.
4. You might want to make it explicit that your numbers are about the source code only. The Truck Factor of an application also includes: Who owns the Web domain? Who owns the Google Play account? Who has the secret key used to sign the binaries? etc
By the way, "Github applications" are "Github Desktop" and "GitHub Android App". What you meant is probably "Github repositories" or "Applications whose source code is hosted on Github".