First of all, I want to state that the authors of this paper are tackling a worthwhile question, the health implications of cooperative behaviour, using data from what feels like an unprecedented number of individuals for a primate grooming paper. I think this work has great potential, even though I will point out some things that I think might limit our ability to interpret the results as they stand now. I appreciate the great effort that has gone into this project; if I focus on potential pitfalls below rather than on the many strong points of the manuscript, it is in the hope these comments will allow the authors to further improve this manuscript.
My ability to comment on many points is limited by a lack of information on sample sizes in the manuscript; specifically, for grooming and huddling interactions. The manuscript states that there were 222 females; I do not know how they are spread over the groups, but if we assume they are spread evenly (55 per group), then the networks have 1485 possible dyadic connections for each group, so almost 6000 possible connections for all groups. The manuscript establishes who falls into the 'affiliative' and 'political' network by assigning them based on huddling interactions, with the assumption that huddling is more specific to friendships than grooming. A dyad falls into the affiliative network if they have been seen huddling at least once. But if the huddling database consists of fewer cases than there are dyads, then even if huddling was distributed at random, dyads will be assigned to one or the other category. Because of the scan sampling approach, a dyad that huddled at the exact time of one of the scans will go into one network, and one that huddled 5min later will go into the other, without biological difference. We always make the assumption that our data collection is representative of all the time when we are not observing the individuals, but for this to be true the datasets we work with have to be really large, given the large data requirements of network analysis (e.g. Whitehead 2008 'Precision and power in the analysis of social structure using associations'). The same is true for the grooming networks: if they are based on fewer than one datapoint per dyad on average (i.e., 1485 grooming bouts in my estimation), then the indices will likely come with a really strong measurement error (again, see the Whitehead paper above, or any of the network analysis methods papers by Farine et al.). Because the sample size is not revealed, reviewers cannot judge whether this is a problem. The fundamental assumption here is that dyads who huddle are qualitatively different from those that do not huddle; this is a really strong and novel assumption, so it would need really strong data support. The authors would also have to explain why huddling cannot be used in a 'political' fashion the same way grooming is (sitting close to a high-ranking individual would be fantastic protection). As there is no real way to prove this point (see below), I would strongly suggest to test grooming in the kin vs non-kin networks, and also use this distinction for the framing.
Apart from the data requirements, I feel there might be a framing problem to this study; specifically, the claims about differences in 'political' and 'affiliative' grooming as two distinct types of grooming that produce different networks. I think everyone working on grooming agrees that grooming can have many different functions (bond maintenance, bond formation, reconciliation, exchange for tolerance, for other services, etc). However, usually we don't tend to think of these as specific to dyads ('affiliative' dyads, 'political' dyads), but to contexts. So, a dyad will sometimes have reconciliatory grooming, sometimes have grooming for tolerance, to get access to a resource, to establish a bond etc. There is a certain circularity to the central argument regarding the framing used: dyads are assigned to one of two categories, based on a criterion (huddling and kinship); there are group differences in some continuous measure (inflammation markers); this means that the established groups are meaningful. However, this is not necessarily the case, because the network positions in the two networks come with a host of other differences in individuals and dyads: assuming that most of the grooming happens in the kinship network, individuals who are central in this network will simply have more grooming than everyone else overall, which in itself can have health effects. This will be completely independent from network positions or which network the behaviour occurred in. Overall grooming time of individuals is not controlled for in models. Because of how the networks are defined, individuals who have a lot of huddling will have more possible connections in the affiliative network even if grooming itself is distributed randomly; thus, the group differences in biomarkers could be due to huddling itself, not grooming at all. As other cooperative behaviours might follow the same distribution as grooming, those behaviours might play a role (see for example Preis et al. 2018 'Urinary oxytocin levels in relation to post-conflict affiliations in wild male chimpanzees (Pan troglodytes verus)', showing that even single affiliative interactions affect urinary oxytocin levels). The results would also look the same if there was only a difference between individuals who like grooming kin and non-kin, and it just happens that some non-kin dyads were classified as 'affiliative' and others as 'political'. If the authors had assigned the non-kin dyads to 'affiliative' and 'political' based on a different variable, e.g. size difference, the result might have been the same because it was driven by the kin dyads. So we have already three very simple explanations that do not require the 'affiliative' vs 'political' distinction to be biologically meaningful: overall grooming time; huddling itself; and kin/non-kin effects.
The two networks are not independent from each other, both in terms of the amount of grooming assigned to each network and which dyad goes where, and this has potentially serious consequences for the network measures: I think most of the network measures, unless controlled, assume that all dyads in the network can potentially interact with each other. This is not the case here: if individuals A and B are kin, and A and C huddle but B and C do not, the connection of BC will still be assumed to be possible in the first network and assigned 0. When calculating the network indices, the missing connection of BC in the first network is interpreted as 'no grooming occurring', which is not necessarily true: the information of the dyad is simply in a different network. Given that individuals differ in how many kin dyads they are in etc, the centrality measures of individuals even under random conditions are not the same, and the network measures for the two networks would be polar opposites. Individuals who have more kin in the community will be more central in the 'affiliative' than in the 'political' network even if grooming is distributed randomly, so finding opposing effects might potentially be trivial (many kin == central in kin network == not central in non-kin network). All network measures would have to control for this bias, and I don't think this is the case at the moment.
Lastly, I think the manuscript would be stronger if there were clear hypotheses as to why certain network measures would influence biomarkers. For most of the network measures (e.g. eigenvector centrality, information centrality) it seems to me that there is no clear mechanism that would explain why they influence an individual's glucocorticoids or inflammation, because they are not about the behaviour of the individual, but are influenced by the behaviour of other group members with each other. What is the biological reason (in the example given in the supplementary) to assume that individual H (5 grooming partners) differs from individual C (also 5 grooming partners)? The path length of the longest path going through an individual is a purely abstract measure that has no relevance for an individual's life. Combined with the statistical approach of using several of these measures in the same model, sometimes in interaction with rank variables, the methods and results feel exploratory at times.
I really think this project would benefit from focusing on kin vs non-kin grooming, because assessing 'political' and 'affiliative' dyads creates many problems as described above, and is potentially really contentious. Whatever the results of that analysis would still be really valuable for our understanding of cooperation and its impact on health and fitness, and much harder to criticise on data grounds alone.
Good luck with this manuscript going forward!