A few ideas to refine your Truck Factor heuristic:
1. Many developers might be very familiar with the code of a file even if they have not actually modified it. Some might have modified it in their unmerged forks or commented on its commits, or linked to a line of it. All of these actions are visible on Github. People who are familiar with the code but don't use Github much are extremely difficult to track, of course.
2. Reducing complexity is a huge part of increasing the Truck Factor, and it is not taken into account in your study. Sometimes the code is so clean and so well-documented that any developer could lead the project after a day. This is difficult to calculate from Github data, of course.
3. Often in open source projects, lead developers come and go, see for instance https://github.com/ankidroid/Anki-Android/graphs/contributors where Edu wrote most of the files, but has not been seen since 2010.
4. You might want to make it explicit that your numbers are about the source code only. The Truck Factor of an application also includes: Who owns the Web domain? Who owns the Google Play account? Who has the secret key used to sign the binaries? etc
By the way, "Github applications" are "Github Desktop" and "GitHub Android App". What you meant is probably "Github repositories" or "Applications whose source code is hosted on Github".
You can also choose to receive updates via daily or weekly email digests. If you are following multiple preprints then we will send you no more than one email per day or week based on your preferences.
Note: You are now also subscribed to the subject areas of this preprint and will receive updates in the daily or weekly email digests if turned on. You can add specific subject areas through your profile settings.
Usage since published - updated daily