To increase transparency, PeerJ operates a system of 'optional signed reviews and history'. This takes two forms: (1) peer reviewers are encouraged, but not required, to provide their names (if they do so, then their profile page records the articles they have reviewed), and (2) authors are given the option of reproducing their entire peer review history alongside their published article (in which case the complete peer review process is provided, including revisions, rebuttal letters and editor decision letters).
The authors addressed satisfactory a minor issue pointed out by a reviewer.
Please address the comment of Reviewer 1 concerning multi-label method.
OK. Clarity and English is improved as requested.
The added mention of Fodor/Lasseck's work is appropriate, but the reader would benefit from understanding WHY it is relevant - i.e. because it is also a template-matching paradigm.
The mention of multi-label methods added to the conclusion is a bit odd. I recognise that reviewer 2 requested it, but the authors present it as "a way to filter out sound activity not related to fauna", which is not really a good description of what multi-label provides. Also, note that the method presented in this paper is equivalent to the "binary relevance" approach to multi-label, namely, simplifying the problem to an independent binary problem per species. The paragraph should be reconsidered to provide a more meaningful reaction to reviewer 2's request.
OK. The inclusion of AUC as well as the other statistics improves confidence in the results.
The language issue has been well fixed by the author.
The experiment is well designed.
The paper provides a method, which is implemented in the ARBIMON web-based system. It is especially useful for those ecologists who want to study bioacoustics.
This paper is well documented and useful for interested readers. The author has well addressed the proposed problem. I would recommend the paper to be accepted.
The two domain experts consider the contribution interesting. They point to some core issues (reviewer 1) and questions (reviewer 2) which need to be thoroughly addressed in a major revision.
The article is in general clearly written and well-structured. The algorithm statements are commendably clear.
The paragraph between lines 167 and 168 is unclear. I believe that the "5 selected neighbourhoods" are 5 strongly-matching temporal regions. I also believe that 266% is a freely-chosen parameter. Please rewrite the paragraph to be clearer.
The article does not make enough connection with related literature on the topic.
* A very obvious point of comparison is the template-matching species classification introduced by Gabor Fodor and used by Mario Lasseck to perform species classification and get extremely strong results in the annual BirdCLEF contests. Since that method is currently a leading method in the exact topic of this paper, it should be discussed.
* The two-stage method in this paper (cross-correlation followed by SSIM refinement) is somewhat related to Ross & Allen (2013, Ecological Informatics). This comparison could be made in the discussion.
Line 292 claims that the method is "non-species specific" - this is not a good description of the method, since the method relies on templates for the target species and so is species-specific. It would be appropriate to call the method "generic" or "general-purpose".
Broadly fine. However some issues:
The step size used for moving through each spectrogram (in step 5 of "Algorithm to create the similarity vector") is potentially problematic. Moving by a jump of 16 steps rather than 1 carries quite a strong risk (in fact, a 94% chance) of missing the strongest-matching alignment. The authors should have tested for the procedure's sensitivity to this parameter choice.
It would be better to use AUC (area under the ROC curve) rather than accuracy. This is widely-known and is especially important when the classes are unbalanced, as is the case here. The authors claim (line 196) that accuracy is a suitable proxy for AUC in the balanced case: this argument is misleading and must be removed, since the authors are not considering the balanced case.
It is a shame that the authors only used around 900 recordings, when they had access to many more. Some of the folds must have as few as 3 examples, for some species. However, since the main outcomes appear to be significant, this is not a fatal flaw.
The findings appear to be valid, and significance is appropriately determined. The results would be more reliable if AUC rather than accuracy was used.
The title mentions both "detection" and "classification". I understand why. However the introduction should clarify for the reader what task is being attempted here: it is to develop a binary classifier for the presence/absence of a single species, and then applying it to each of a set of species of interest. At present the article does not make this quite explicit, the reader must infer it. I would argue that the title should really refer to a comparison of three detection algorithms not three classification algorithms, since the three template-matching procedures used here are used for detection.
The introduction MUST tell the reader something about the three similarity measures that are being tested here, since that is the main comparison. The full algorithm statement comes later, but the intro needs to prepare the reader for what the paper is about. Why these three? What properties does one expect of them?
* "Hanning" -> "Hann"
* "lets define" -> "define"
* "algorithms have order of" -> "algorithms have time complexity of"
* "analize" -> "analyze"
The manuscript compares three template-based classification algorithms using random forests. A web-based cloud-hosted system is quite useful to the bioacoustic community. This paper is well organized, but has some grammar errors, which needs to be improved.
Line 17, Line 24, Line 34, Line 63-64 1 to one, Line 69 id? Line 101 optimize....
In this paper, the template is created by all ROIs submitted by the user. The template is highly affected by the training data. Since animal calls are often different in different regions and different times, can you figure out some methods to address this problem? or can you update the template when doing the comparison to make it less sensitive to those factors? In environmental recordings, there are often multiple species in an individual recording, a single-instance single-label classifier might not suitable for the classification task. However, multi-instance multi-label learning or multi-label learning have shown better performance in previous studies, such as "Acoustic classification of multiple simultaneous bird species: A multi-instance multi-label approach" "Using multi-label classification for acoustic pattern detection and assisting bird species surveys", "Detecting Frog Calling Activity Based on Acoustic Event Detection and Multi-label Learning".
The novelty of this paper is limited, but the impact is high. The data is robust and the conclusion is well stated. The conclusion and future work section is weak and should be expanded and added, respectively.
All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.