Review History

All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.

Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.

View examples of open peer review.


  • The initial submission of this article was received on March 4th, 2020 and was peer-reviewed by 3 reviewers and the Academic Editor.
  • The Academic Editor made their initial decision on May 6th, 2020.
  • The first revision was submitted on July 31st, 2020 and was reviewed by 1 reviewer and the Academic Editor.
  • The article was Accepted by the Academic Editor on September 3rd, 2020.

Version 0.2 (accepted)

· Sep 3, 2020 · Academic Editor


I am happy with the additional revisions conducted on the manuscript.

[# PeerJ Staff Note - this decision was reviewed and approved by Jennifer Vonk, a PeerJ Section Editor covering this Section #]


Basic reporting


Experimental design


Validity of the findings


Additional comments

I dont feel like the authors have convinced me that their analysis method provides any realy advantages over the widely used ones - we've discussed this already, and i'm not being hard line on it, as the raw data speaks for itself anyways.

authors have addressed the concerns just fine, and i dont have any more substantive concerns to raise.

Version 0.1 (original submission)

· May 6, 2020 · Academic Editor

Major Revisions

We have received three reviews for your manuscript. All reviewers concurred that it was interesting and generally well written. However, they underlined some issues that deserve revisions.

In particular, the statistical approach (conditional inference tree) should be better justified or alternative common approach should be used. Several comments are related to the low sample size (and the possible lack of power) and thus this issue needs to be assessed convincingly. An important concern here is whether your study design/data allowed to reach meaningful conclusions. Related to this point, one of the reviewer requested that more descriptive statistics should be included. Reviewers also emphasized that the assumption of motivating factors vs trap types and the factors affecting captures need to be clarified. All others comments also need to be taken into account.

[# PeerJ Staff Note: Please ensure that all review comments are addressed in a rebuttal letter and any edits or clarifications mentioned in the letter are also inserted into the revised manuscript where appropriate.  It is a common mistake to address reviewer questions in the rebuttal letter but not in the revised manuscript. If a reviewer raised a question then your readers will probably have the same question so you should ensure that the manuscript can stand alone without the rebuttal letter.  Directions on how to prepare a rebuttal letter can be found at: #]

[# PeerJ Staff Note: It is PeerJ policy that additional references suggested during the peer-review process should only be included if the authors are in agreement that they are relevant and useful #]


Basic reporting

I found the article's aims to be clearly stated, and well written. It was easy to follow the methods that the authors employed and the analysis (and associated rationale) for handling the data.

Raw data was easily retrieved and and text of analysis well documented.

Experimental design

The authors compare individuals caught using two common techniques - mistnets vs baited walk-in traps - to capture chickadees, and compare whether this creates a bias in the attributes of the individuals captured.

They caught up to six birds at each of ten sights alternating between techniques - first three birds at a site with either traps/mistnets then switch until they catch the next three
birds. They used sufficient numbers of exemplars with predator model presentations to account for potential pseudoreplication issues in playback stimuli. Thy also pre-bait/train birds for Potters traps, which should reduce initial neophobic response to the platforms, so in both cases they have employed standard capture procedures.

They then compare mophometric measures (wing, weight, tarsus), physiological (baseline and induced corticosteroid response) and behavioural attributes (boldness to predator model, and neophobia to novel object in PIT tagged birds coming to and RFID feeder) of birds caught using either technique. Analysis uses conditional inference trees to account for multiple comparisons. They found no difference in any measure between birds caught with different techniques, and so suggest that this could alleviate fears of capture-based biases.

Methodology is very thorough, but I have several queries that warrant address by the authors.

1. I commend the detail of data collected for physiological/behavioural comparisons, but the overall sample size of 55 birds and ten sites is rather small for the comparisons (27 vs 28 with either technique). Given that some of the metrics, particularly behavioural assays, have a very broad data range, how does this affect power to detect differences?

2. Related to the above, we have typically experienced a marked drop off in capture rates (using single techniques) in the second half of the morning vs the first half. Did you see this decline in rate (increasing length of time between successive captures with time of morning)? You partially account for this by alternating between sites which of the two techniques you employed first. However, this would result in only a 5 vs 5 comparison even if you control for order as a random variable - do you feel this is sufficient to account for this variation affecting your overall results?

3. L190- Most procedures I have seen with induced corticosterone response recommend spinning the samples down and extract plasma within 30min - 60min. Your methods (L244) suggest you did these within 6hrs of collection. Did you see any difference in the amount of corticosterone with time to extract, and if so did you correct for this?

4. L217- an 80-90dB range is huge, as dB is a logarithmic scale. Was this range variance across sites, or were all sites varying over the same upper and lower range?

5. L317 - One metric you don't measure is typical CONDITION measure - residual mass/tarsus length. What is the reason this wasn't included (instead of measuring mass and tarsus independently)? If you do include this, I would recommend using a scaled mass index, which better accounts for the non-linear relationships between increasing mass/length than ordinary least squares residuals (e.g. Peig, J., A. J. Green, and C. Ame. 2009. New perspectives for estimating body condition from mass / length data : the scaled mass index as an alternative method. Oikos 118:1883–1891.)

Validity of the findings

The authors find no differences in morphology/physiology or behavioural assays between birds caught using either technique, which is good news for those wanting to use these as an unbiased means of subsampling the population.

There are a couple of things the authors should address:
1. L120 - the assumption is that there may be two independent motivating factors which differ between response to the two trap types - reduced neophobia to walk-in traps, and increased boldness to mist nets. This would predict that boldness and neophobia are not negatively correlated (i.e that bold individuals are also not the least neophobic). What evidence is there to make this assumption in chickadees?

2. L313-320 - One of the attributes that you suggest might affect capture is dominance rank. However you didn't assess this attribute among your banded birds through observation of interactions. However, you might be able to address this through proxies - both age and sex are correlated with rank. While you found no difference of sex on capture techniques, this can also be confounded by mate protection of high-ranked females having priority access to feeders. Did you assess whether the relative age of the birds (as assessed by tail shape for AHY/ASY or HY/SY) affected propensity to catch the birds with either technique, especially when controlling for sex?

Additional comments

Found this paper very interesting to read, and hope you can address the comments.


Basic reporting

Very good. no issues.

Experimental design

See general comments

Validity of the findings

Valid, see below

Additional comments

This study gives all indications to be a thoughtful and rigorously carried out study that investigates gear-specific sampling bias. They have been careful to contrast this with sampling bias more broadly, a related question, but something which is difficult or impossible to do in most instances as it requires knowing the underlying population frequency distribution. They find no evidence of any differences in traits between the two gear types, concluding each is equally good (or bad) at sampling birds in the wild. The MS is mostly well written, and covers relevant literature and ideas, so I don’t have substantive comments to make on that note. More generally, I don’t have any negative comments to make in particular, but rather raise a couple issues.

I had no idea what this “conditional inference tree approach” was, and I have never heard of it, despite being somewhat of stats nerd and have lots of statsy chats with people who are quantitatively saavy. At lines 280-284 they justify why this method is better (than say a familiar linear model) without any citation, and to my mind these arguments are either not correct or not compelling to me. The data are not particularly unusual in any way, don’t appear to be badly behaved, so it seems odd to me that they choose to use an analysis that likely won’t be familiar to readers. They state they also analysed the data using familiar linear models and with mixed effects and it gives same results. So, why not use these? Or use a MV model? That should be equally parsimonious as the tree approach. I’m just trying to avoid you having a situation where some reader(s) dismisses your study in part or whole because they get hung up on the analysis.

All that said, a visual inspection of the individual plots of raw data indicate immediately that there are no differences between gear types for any trait. In a sense, given these data in the graphs, any analysis should return a null result! Thank you for plotting raw data, many studies do not do this most basic and crucial aspect of presenting results.

Related to analyses is one issue that I think you should acknowledge, and that is low sample size: at both the among- and within-individual levels. Several of your traits are labile, meaning that estimating an individual’s trait value with a single assay/measure often leads to downward bias of correlations and reductions in differences (see e.g. Adolph and Hardin); related to this is the additional issue of few individuals, and potential sex differences that you don’t have any power to test for. Behaviour and hormones are notoriously low repeatability traits (R = ca 0.3-0.4, Bell, Biro and Fanson), meaning we need both repeated measures and fairly high sample sizes (see Wolak, where he shows data needed to estimate individual mean values with precision, tho his paper focuses on estimating R, which are two sides of the same coin – if you estimate R with precision, you also estimate individual predicted mean values with precision, i.e. the blups in a mixed model. To gain a sense of how variable a trait is when R is low (0.4), and how relatively indistinguishable individuals are even with large samples, look at graphs in Biro and Stamps.

Therefore, its possible that a difference exists, yet you did have good power to test for it in the case of behavioural and hormone traits. Please acknowledge this in your discussion.

Pete Biro

(Adolph and Hardin 2007, Bell et al. 2009, Wolak et al. 2012, Biro and Stamps 2015, Fanson and Biro 2018)
Adolph, S., and J. Hardin. 2007. Estimating phenotypic correlations: correcting for bias due to intraindividual variability. Functional Ecology 21:178-184.
Bell, A. M., S. J. Hankison, and K. L. Laskowski. 2009. The repeatability of behaviour: a meta-analysis. Animal Behaviour 77:771–783.
Biro, P. A., and J. A. Stamps. 2015. Using repeatability to study physiological and behavioural traits: ignore time-related change at your peril. Animal Behaviour 105:223-230.
Fanson, K. V., and P. A. Biro. 2018. Meta-analytic insights into factors influencing the repeatability of hormone levels in agricultural, ecological, and medical fields. American Journal of Physiology-Regulatory, Integrative and Comparative Physiology 316:R101-R109.
Wolak, M. E., D. J. Fairbairn, and Y. R. Paulsen. 2012. Guidelines for estimating repeatability. Methods in Ecology and Evolution 3:129-137.


Basic reporting

The paper uses professional English language that is clear for the most part.

The introduction and background were mostly well referenced and relevant. The rationale regarding the possible influences of morphology and sex on the likelihood of a bird being captured by one or the other of the methods was not well developed. Cited sources refer to problems regarding entering a trap (because an animal is too large, for example) but that should not have been an issue at all for chickadees entering a walk-in trap … and unless the mist nets had very large mesh, there is little chance that a bird’s morphology would have affect its chance of being captured in a mist net. That males and females might behave differently in the social conditions associated with trapping is plausible, but the authors did not develop thoroughly their assumptions about ow that might work.

The submission conforms to PeerJ standards regarding manuscript structure. (The Results section is, however, exceedingly brief … and it omits information that would be important for fully assessing the validity of the study’s conclusions, as noted later).

The figures are well designed and labeled: the patterns they show (including lack of differences between birds in the two capture categories) are clear.

The authors supplied the raw data via the Open Science Framework, and I was able to access the those. The raw data file provides metrics for the individual subjects in the study, but it omits information about site and sequence of capture method that preclude checking of some analyses. The file also omits some variables; regarding corticosterone, for example, the file includes only the stress-induced metric, but not the bird’s initial value (it also omits covariates such as time after capture).

Experimental design

The manuscript presents original primary research within PeerJ’s scope.

The study’s main research question—about whether behavioral or other differences among individual animals might result in sampling bias depending on capture method—is well defined, relevant, and meaningful. The authors did state that their study fills a gap in knowledge that they identified about the possibility that using a trap to capture birds for behavioral or ecological studies might result in marking of a different set of birds than one would obtain using mist nets.

The authors seem to have performed their investigation rigorously and with appropriate attention to ethical standards about use of wild birds as subjects and about management of data and reporting of results.

The manuscript includes sufficient detail and explanation about the methods to make future replication possible.

Validity of the findings

The study does not replicate any prior work involving the study species (black-capped chickadee). It parallels in some respects prior work on two other songbirds, the great tit (a close chickadee relative) and both pied and collared flycatchers.

The study reports an overall negative result: the authors did not find evidence that morphological, behavioral, or physiological characteristics differed between birds captured using one method as opposed to the other. The authors conclude that the two capture methods (both of which are used widely in field ornithology) would not produce issues for downstream studies resulting from capture biases.

This (negative) conclusion is clearly stated and linked to the study’s results. I am not familiar with the details of the statistical approach (conditional inference tree analysis) on which the authors relied, but if I understand the things correctly, the paper’s conclusions match the patterns and test results reported in Table 1.

While I do not dispute that the available data support a conclusion of a negative result, I was disappointed not to see discussion that considered questions about the power of the analyses. A main reason that the statistical analysis did not identify any factors discriminating between birds caught using the two methods was that some of the potential predictive measures (especially those concerning behavior) were highly variable among individuals, as indicated by the standard deviation values in Table 1. The manuscript does not include thorough consideration of sources of this variation and its effect on the primary conclusion (no difference depending on capture method), although the authors did mention dominance relationships (which they did not evaluate) as a potential complicating factor. Given that researchers aimed to assay behavior often considered part of an animal’s personality, more consideration about consistency of the metrics was merited. However, the study’s design precluded any quantitative consideration of consistency. Accordingly, most of the variability in the results—on which the conclusion about lack of bias deriving from the alternative capture methods largely depends—is unexplained.

Additional comments

I appreciate that while the study involved a relatively small number of subjects (28 females and 27 males), the project’s multiple elements (including especially netting/trapping at multiple field sites spread over a fairly large geographical area followed by later behavioral experimentation across that same range of sites) represented significant logistical challenges. The authors clearly put in a lot of time and effort on the project, yielding a good amount of information.

That said, I think that the authors can improve the paper in various ways. I address these in the comments below.

Title: Could and should be less vague; “differences” could refer to all sorts of things.

In the Introduction, the rationale about potential importance of morphology (lines 88-89) emphasizes mechanisms such as the physical constraints of a trap (preventing some animals from entering). However, no such constraint would be an issue for chickadees entering a walk-in trap or flying into a mist net. Testing morphology for its own sake (beyond its correlation with sex and therefore likely with dominance) needs better justification.

Similarly, the authors state the postulated effects of sex because of behavioral differences between males and females far too vaguely (“territoriality, offspring defense, and foraging). Testing for differences by sex allow inclusion of a simple comparison … but without any clear expectation of why trappability would differ between male and female chickadees. Again, the authors need to provide clearer justification here.

Figure 1: absolute measures of body size/mass not very meaningful. Why not consider body mass index (mass scaled to body size) as measure of condition? Correlated with fat score? While the data from this study don’t provide support for the idea that a bird’s condition affect its tendency to enter trap vs. get caught in net, it would be good to cite more thoroughly studies in birds (including especially tits) showing that there are connections between condition, dominance, and risk-taking.

Table 1 is problematic regarding the RFID feeder data for the post-capture behavioral assays. The rows for number of feeder visits are labeled as having the units of seconds … but should these numbers not be just counts of independent visits to the feeder? Beyond this detail, I question the value of considering duration of feeder visits—as a separate metric from number of visits—because duration of each individual visit to a feeder probably didn’t vary much at all. Maybe it did … but the authors present no data about that.

To evaluate this, I checked the raw data and found that the “duration” at the feeder during a 2-hour sample period was predicted by the number of visits (P < 0.0001, r^2 = 0.86 for the data from the control time periods and the cup presentation time periods; slight weaker relationship—but influenced by one extreme outlier (excessive duration: 437 seconds, vs. max for all others of 139 seconds; erroneous data point, or bird that became “frozen” to the perch??)—for visits during the owl presentation).

If birds did change the time they spent at the feeder during each visit (to choose a seed, or to stand on the perch surveying surroundings before selecting a seed), that would be interesting and potentially important.

This brings me to a more general comment: The results section is exceedingly short, because it omits presentation about underlying aspects of the data like those I just discussed. (The analysis includes nothing about the duration of individual visits to the feeder.) More presentation about descriptive statistics for this and other various metrics would be helpful … BEFORE moving on to focus on comparisons between the birds with respect to how they were captured. Standard deviations were very large for many of the behavioral measures. The paper needs to include more information about variation, because it relates to statistical power: was there any reasonable chance of detecting the kind of difference that the authors hypothesized given the number of birds studied and the variability in the metrics that emerged?

Additionally, it seems (if I interpreted the Methods correctly) that the authors conducted each behavioral assay only once at each site (2 h with owl one day, 2 h with novel object on the prior or subsequent day). This design provides no measure of consistency of behavioral responses for individual birds at a given site. Were there not RFID data available for other time periods that the authors could use to consider consistency of visitation (number of visits in 2-hour period; visit duration) for the control period, at least?

Without including analyses like these, the next best thing the authors could do is to review more thoroughly what’s known about the consistency of behavior measures associated with personality in songbirds, and particularly in tits (including chickadees). Including more discussion about that could reassure readers that the behavior measured was unlikely to have varied much (for the same bird) from day to day because personality-associated variables of that kind are known to be highly repeatable.

The design controlled for time of day, but not for other conditions that likely varied from day to day (e.g., temperature, wind, cloud cover). Ample prior research (not cited) has shown that those varying conditions affect visitation at feeders by chickadees … and thus there may have been a lot of extraneous variation injected that obscured any patterns (based on prior capture method) that the researchers were testing for.

Besides dominance affecting visitation at RFID feeder, it might be worth also citing studies showing that a bird’s social network position can be an influence, too (perhaps correlated with dominance, but not necessarily synonymous).

Lastly, on a general level, would the analyses have been sensitive to detecting a pattern whereby one capture method yields a set of birds whose traits represent a SUBSET of the subjects captured using the other method? The manuscript does not seem to address this possibility, as the analysis focus (only) on whether the birds caught in the two ways differ in their average traits.

Some additional specific comments:

Line 280: The assertion that “the conditional inference tree approach is preferable” merits a citation.

Lines 324 – 327 “In chickadees, we expect dominance hierarchies could influence the direction of response to the stimuli, with lower-ranked individuals being more likely to come in to the feeders during stimulus presentation, while dominants are avoiding the perceived threat.” OK ... but what is the basis for this assertion? (Citation of sources would be helpful here.)

Lines 128 & 153: “comprised of” should be just “comprising” or “comprise”

Line 167: Poor grammar to use “however” as a conjunction. Precede “however” with semicolon to link two independent phrases.

Line186: PIT tag company is now Eccel, in Leicester (no longer IB Technology in Bucks). Citing company’s new name and location would be more helpful for readers wanting to buy similar gear.

Lines 294 – 302: excessive repetition of concepts from Introduction (no new information added here).

Lines 307 – 310: largely repeats concepts and citations from lines 57 – 61 (again, no new information added).

Line 329: write out numerals small than 10 (not just because the sentence starts with a number)

All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.