First of all, I want to state that the authors of this paper are tackling a worthwhile question, the health implications of cooperative behaviour, using data from what feels like an unprecedented number of individuals for a primate grooming paper. I think this work has great potential, even though I will point out some things that I think might limit our ability to interpret the results as they stand now. I appreciate the great effort that has gone into this project; if I focus on potential pitfalls below rather than on the many strong points of the manuscript, it is in the hope these comments will allow the authors to further improve this manuscript.
My ability to comment on many points is limited by a lack of information on sample sizes in the manuscript; specifically, for grooming and huddling interactions. The manuscript states that there were 222 females; I do not know how they are spread over the groups, but if we assume they are spread evenly (55 per group), then the networks have 1485 possible dyadic connections for each group, so almost 6000 possible connections for all groups. The manuscript establishes who falls into the 'affiliative' and 'political' network by assigning them based on huddling interactions, with the assumption that huddling is more specific to friendships than grooming. A dyad falls into the affiliative network if they have been seen huddling at least once. But if the huddling database consists of fewer cases than there are dyads, then even if huddling was distributed at random, dyads will be assigned to one or the other category. Because of the scan sampling approach, a dyad that huddled at the exact time of one of the scans will go into one network, and one that huddled 5min later will go into the other, without biological difference. We always make the assumption that our data collection is representative of all the time when we are not observing the individuals, but for this to be true the datasets we work with have to be really large, given the large data requirements of network analysis (e.g. Whitehead 2008 'Precision and power in the analysis of social structure using associations'). The same is true for the grooming networks: if they are based on fewer than one datapoint per dyad on average (i.e., 1485 grooming bouts in my estimation), then the indices will likely come with a really strong measurement error (again, see the Whitehead paper above, or any of the network analysis methods papers by Farine et al.). Because the sample size is not revealed, reviewers cannot judge whether this is a problem. The fundamental assumption here is that dyads who huddle are qualitatively different from those that do not huddle; this is a really strong and novel assumption, so it would need really strong data support. The authors would also have to explain why huddling cannot be used in a 'political' fashion the same way grooming is (sitting close to a high-ranking individual would be fantastic protection). As there is no real way to prove this point (see below), I would strongly suggest to test grooming in the kin vs non-kin networks, and also use this distinction for the framing.
Apart from the data requirements, I feel there might be a framing problem to this study; specifically, the claims about differences in 'political' and 'affiliative' grooming as two distinct types of grooming that produce different networks. I think everyone working on grooming agrees that grooming can have many different functions (bond maintenance, bond formation, reconciliation, exchange for tolerance, for other services, etc). However, usually we don't tend to think of these as specific to dyads ('affiliative' dyads, 'political' dyads), but to contexts. So, a dyad will sometimes have reconciliatory grooming, sometimes have grooming for tolerance, to get access to a resource, to establish a bond etc. There is a certain circularity to the central argument regarding the framing used: dyads are assigned to one of two categories, based on a criterion (huddling and kinship); there are group differences in some continuous measure (inflammation markers); this means that the established groups are meaningful. However, this is not necessarily the case, because the network positions in the two networks come with a host of other differences in individuals and dyads: assuming that most of the grooming happens in the kinship network, individuals who are central in this network will simply have more grooming than everyone else overall, which in itself can have health effects. This will be completely independent from network positions or which network the behaviour occurred in. Overall grooming time of individuals is not controlled for in models. Because of how the networks are defined, individuals who have a lot of huddling will have more possible connections in the affiliative network even if grooming itself is distributed randomly; thus, the group differences in biomarkers could be due to huddling itself, not grooming at all. As other cooperative behaviours might follow the same distribution as grooming, those behaviours might play a role (see for example Preis et al. 2018 'Urinary oxytocin levels in relation to post-conflict affiliations in wild male chimpanzees (Pan troglodytes verus)', showing that even single affiliative interactions affect urinary oxytocin levels). The results would also look the same if there was only a difference between individuals who like grooming kin and non-kin, and it just happens that some non-kin dyads were classified as 'affiliative' and others as 'political'. If the authors had assigned the non-kin dyads to 'affiliative' and 'political' based on a different variable, e.g. size difference, the result might have been the same because it was driven by the kin dyads. So we have already three very simple explanations that do not require the 'affiliative' vs 'political' distinction to be biologically meaningful: overall grooming time; huddling itself; and kin/non-kin effects.
The two networks are not independent from each other, both in terms of the amount of grooming assigned to each network and which dyad goes where, and this has potentially serious consequences for the network measures: I think most of the network measures, unless controlled, assume that all dyads in the network can potentially interact with each other. This is not the case here: if individuals A and B are kin, and A and C huddle but B and C do not, the connection of BC will still be assumed to be possible in the first network and assigned 0. When calculating the network indices, the missing connection of BC in the first network is interpreted as 'no grooming occurring', which is not necessarily true: the information of the dyad is simply in a different network. Given that individuals differ in how many kin dyads they are in etc, the centrality measures of individuals even under random conditions are not the same, and the network measures for the two networks would be polar opposites. Individuals who have more kin in the community will be more central in the 'affiliative' than in the 'political' network even if grooming is distributed randomly, so finding opposing effects might potentially be trivial (many kin == central in kin network == not central in non-kin network). All network measures would have to control for this bias, and I don't think this is the case at the moment.
Lastly, I think the manuscript would be stronger if there were clear hypotheses as to why certain network measures would influence biomarkers. For most of the network measures (e.g. eigenvector centrality, information centrality) it seems to me that there is no clear mechanism that would explain why they influence an individual's glucocorticoids or inflammation, because they are not about the behaviour of the individual, but are influenced by the behaviour of other group members with each other. What is the biological reason (in the example given in the supplementary) to assume that individual H (5 grooming partners) differs from individual C (also 5 grooming partners)? The path length of the longest path going through an individual is a purely abstract measure that has no relevance for an individual's life. Combined with the statistical approach of using several of these measures in the same model, sometimes in interaction with rank variables, the methods and results feel exploratory at times.
I really think this project would benefit from focusing on kin vs non-kin grooming, because assessing 'political' and 'affiliative' dyads creates many problems as described above, and is potentially really contentious. Whatever the results of that analysis would still be really valuable for our understanding of cooperation and its impact on health and fitness, and much harder to criticise on data grounds alone.
Good luck with this manuscript going forward!
Alexander Mielke
We thank Dr. Mielke for his very thoughtful commentary on our recently submitted manuscript to PeerJ. We appreciate his timely feedback and would like to respond to some overarching themes he raised in his commentary under the following headings: (1) “overall” versus “kin/non-kin” versus “political/family-friend” grooming categorization schemes, (2) functional level of analysis: interactions versus relationships, and (3) rationale for network metric selection.
(1) “Overall” versus “kin/non-kin” versus “political/family-friend” grooming categorization schemes: As suggested by Dr. Mielke, we initially did conduct an analysis of overall grooming as well as kin versus non-kin grooming relationships and found no association with our health outcomes. This is what spurred us to consider alternative ways to consider grooming relationships. In thinking about potential alternatives, we began to consider the categorization of kin versus non-kin as functionally arbitrary because it incorrectly assumes that all familial relationships are necessarily beneficial. In contrast, we know that many humans and nonhuman primates forge strong relationships outside of family with non-kin and can avoid or have poor relationships with kin. Thus, in our view, the kin/non-kin distinction does not necessarily provide a simpler or more biologically meaningful alternative and thus does not provide a good founding assumption. We would also like to point out that if we had found the same difference using this kin/non-kin categorization (which we did not), this distinction still does not provide a functional explanation as to why. That is…why would there be health benefits associated with grooming kin and health costs with grooming non-kin? Would this not come back to a “political”/”affiliative”-type of functional explanation?
(2) Functional level of analysis: interactions versus relationships:
Dr. Mielke brings up an important point in highlighting that a given dyad might exchange grooming for different reasons (political or affiliative) depending on the context. While there is evidence that grooming can serve multiple functions, it is nearly impossible for a human observer to determine the function of a single grooming interaction. We believe this difficulty is one of the main reasons that work such as ours has not been previously undertaken. In lieu of attempting to discern the function of a single grooming interaction, we categorized our dyads as having a relationship that was predominantly “political” or “affiliative” based upon the data gathered across a 6-week period. We are thankful to Dr. Mielke for pointing out the need to clarify the distinction between interactions and relationships, and we wish to emphasize that our analysis is a first step to try to understand the impact of different grooming functions.
(3) Rationale for network metric selection: Although it might not have appeared so, we actually were quite selective in the metrics that we used in our analyses as they represent different types of “influence” in social networks. Eigenvector centrality is the degree to which one is highly connected to other highly connected individuals and information centrality (a type of betweenness centrality) is the degree to which individuals connect others in the network, both of which we think really lend themselves to structures seen in many types of political networks in humans. In contrast, closeness is how close one is to all others in a network, regardless of whether those individuals are central or not, which seem to fit strong affiliative networks in humans quite nicely. We interpret high eigenvector and information centralities in our nonhuman primate context as being in the “hotspot” of the grooming network (as a human analogy--deep in the political structure of government, business, or academia, etc.), while we interpret high closeness centrality as being quickly connected to all others in the network (as a human analogy--easy/quick access to family and close friends). In our view, the fact that these three types of network metrics came out of the model selection approach as being not just a little better than other metrics (e.g., degree) but substantially better in predicting positive and negative health suggests to us that the distinction between “political” and “friends/family” grooming (as operationally defined in our study) is both real and salient to the animals, and should question our assumptions about grooming as simply an affiliative behavior.
Thanks again to Dr. Mielke for providing such quick and helpful feedback. We will definitely use his feedback to improve our manuscript by providing the descriptive statistics he suggests and clarifying our approach based upon his questions and concerns. We look forward to hearing more from other interested readers during this review process. Thanks to PeerJ for providing such an excellent forum for open discussion.