All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.
Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.
Dear authors,
Congratulations on the nice work and looking forward to seeing it published.
[# PeerJ Staff Note - this decision was reviewed and approved by Monika Mortimer, a PeerJ Section Editor covering this Section #]
Dear authors, Though the other reviewers think this manuscript is ready to be accepted, reviewer #2 raised some major issues in this round and identified places that are confusing and need further clarification. Please address the review carefully in your next round of revision.
[# PeerJ Staff Note: Please ensure that all review and editorial comments are addressed in a response letter and any edits or clarifications mentioned in the letter are also inserted into the revised manuscript where appropriate. #]
See comments below
See comments below
See comments below
I think that the authors put a lot of effort into their study and this is fully appreciated. However, understanding the results is even more important, even if some of them shows that the mentioned effort did not bring expected output.
In my opinion there is no improvement in the presented methodology if the final results are much worse than other existing products and methodological approaches. As I pointed out previously, there are numerous methods that provide more reliable products. The authors claim that their aim is to produce a framework to map LULC changes instead of producing reliable products. I am not going to discuss with that point. But for me the quality of the product matter, not a complicated framework to process a very heavy set of data that produce output which cannot be trusted.
I have got fully confused reading the rebuttal letter. The authors admitted that their cross-validation resulted in a very low accuracy, but comparison with data coming from the S2GLC dataset showed promising results. But this is the point. Do the authors want to admit that their cross-validation is not as valuable as comparison with independent data. If so, there is no way to assess the products with 44 classes. So why there is so much emphasis put on the methodological approach that classify LC with so many classes if this dataset cannot be properly assessed. On the other hand, the authors incorrectly used the independent S2GLC dataset. According to Malinowski et al, (2020) the validation data was prepared for validating land cover product with resolution of 10 m, and in this current study it was used for validation of 30 m product. It is difficult to predict what difference this will make for the final result. Moreover, the good results indicated by validation with the S2GLC datasets are achieved for a product aggregated to 14 classes. This again confirms that achievement of relatively high accuracy is possible only when the legend is considerably reduced. It is valuable that the authors tried to map all CLC classes. But some of the CLC classes are so heterogenous that they my be mapped only by visual interpretation with use of ancillary data providing contextual meaning. This is how the CLC data was produced and is updated. Numerous studies showed this problem and in the current work the authors seems to be ignoring this issue completely.
Therefore, I want to emphasis that I appreciate the authors’ effort in preparing the whole methodology and processing all the data. I can see that there was a lot of work done. But in my opinion the work can be published only with right interpretation of the results. The work should highlight the multitemporal aspect of the study, but it should also consider all the limitations that were encountered and draw the right conclusions. The work cannot be, however, praised for delivering products that is difficult to trust. The level of quality that the product represents may be met with considerable reduced effort and less sophisticated methods.
Dear authors, your manuscript received two reviews, and this time, one recommended minor and one recommended major. Reviewer #2 raised the issue of providing two many classes (44) and the low accuracies. More discussion on this issue or aggregated classes is recommended to make the map more meaningful. I have provided all reviews here and hope it would be helpful for you to revise and improve. Thanks.
The paper was much improved by the authors. It is easier to read and is much more consistent. Most of the issues that I highlighted have been addressed. I still have a concern about the NDVI trend analysis presented here, and I suggest the authors will consider removing it from the publications. I am leaving this decision at the authors’ discretion.
NDVI trend analysis issue.
Fig. 16. The map of NDVI trends clearly shows some artifacts in the data. For example, the visible NDVI gain over one Landsat path/row in Poland. The change from positive to negative NDVI trend by latitude at around 59 degrees N may be an artifact. The NDVI decline in the Alps is also a possible artifact. I still believe that this analysis may be compromised by the ARD data quality and the method of trend extraction. I think fig 12 and fig 17 are adequate to represent dominant land cover change trends. I suggest considering removing NDVI trend analysis and fig 16 altogether.
Fig 19 shows that the average NDVI after 2013 is lower than before 2013. This requires additional investigation. In particular, a comparison with other NDVI products from MODIS and VIIRS will be informative.
NA
NA
Other recommendations.
Check references format. Using the first name (J. Chen et al., 2015) is not recommended by the Journals formatting rules (https://peerj.com/about/author-instructions/).
L318. Edit for clarity. I suggest using parenthesis: (1) the same season; (2) ... , etc.
L396. I am confused... Here, you say “our model was not trained to predict peat bogs”. Yet, you have peat bogs as a class in Table 3, and you stated that Peat bogs have high accuracy (L516). Can you explain this inconsistency?
L515-525. You probably should explain that while you use the water frequency from Pekel et al. as the input, it is not correct to say that your model “predicts” water class. The class label is clearly identified by one of the model predictors, so you have a circularity here.
L627: replace “:” with a period.
L 642. Capitalize “Europe”
L710-711. Hansen et al. does not provide data on balanced forest loss and gain 2001-2018. Where does the data come from? I think you are interpreting the GFW data incorrectly.
L771 Should be spelled “stakeholder”.
The already relatively good structure of the first version of the manuscript has been improved and contains now all important parts. English and different typos has also been corrected.
I regret, however, that the authors provided a document called ‘peerj-61104-PeerJ_land_cover_dataset_EU_article_diff.tif’, which is so messy that cannot be used during revision. I do not know if this results from the journal’s requirements or it was up to the authors.
No comments
This is a very lengthy manuscript, which even though written in proper English require numerous reading for understanding its contents. In general most findings are correctly identified and supported by results. However, it seems that one of the very important message coming from this research is not highlighted enough clearly. In my opinion this research confirms that AUTOMATIC mapping of land cover or/and land use at a large scale like continental, is still very difficult when considering large number of classes. This most often results in low mapping accuracy. The authors highlighted the issue of relatively small number of classes of existing map even for produces with high resolution of 10 m. Indeed this is an issue, however, the authors only confirms this problem. After reading this manuscript I feel that the mapping product with 44 classes is still being praised even though its VERY WEAK mapping accuracy. Most researches are aware that mapping 44 or so often very heterogenous classes is pointless and this is why most large scale land cover products contain only between 10 to 20 classes. In this project the authors spent a lot of time (months) and worked on a project that provides simply qualitatively useless products. Besides that, this manuscript tries to persuade readers that the approach is very innovative and novel. The true is, however, that only the product with five classes is worth of attention considering its accuracy and quality. On the other hand, similar or higher accuracy to this low class product were achieved in other projects, still mapping more classes and providing the same or even higher spatial resolution. Because of this I think that the manuscript requires to be rearranged and the proper conclusions should be drawn. I have pointed this issue already in the first round. It is also quite strange that after receiving relatively low accuracy when mapping 33 classes as described in the first version of the manus, and after my comment on the weak performance of this approach, the authors decided to even extent the legend of their map to 44 classes. The authors addressed this point a bit in their rebuttal, but this only confused me even more. If producing reliable and accurate method for land cover mapping is not the point of the authors then I think discussion may be finished here. There are already too many poor methods available openly. If the authors want to add one more, which at the same time produce enormous amount of output data, this is a simply a dead end road. I cannot see any point to publish manuscript that describes a methodology or a framework resulting in less than 50% accuracy (different types of accuracy).
Some more comments:
- Line - 416 – as pointed in the previous revision both ‘spatial’ and ‘spatiotemporal’ models use multi-temporal data. The fact that the former use data from a single year does not change the fact that it use multi-seasonal information. This in turn is multi-temporal data.
- Line 537 – very confusing: was the S2GLC data used for validation or classification?
- 553 – classes 323 and 322 were used in S2GLC (compare with table 6). BTW, what are the percentages in brackets?
- 574-578 – this sentences states that NEURAL NETWORK was better then the proposed meta-learner run on three different classifiers. This lines are not in agreement with lines 672-677
- 764 – 767 – a very nice idea, however the current study only confirm that the authors are far from this goal
- another important issue is the fact that the presented framework produce huge amount of output data, which in most cases is never used or is difficult to interpret. This concept need to be rethink.
Dear authors,
Your manuscript received two reviews as follows. Though both reviewers think it has the potential to be published in PeerJ, they also raised many critical issues which need to be well-addressed before it is considered for publication in PeerJ. I have provided their review here, and hope they would be helpful for you to revise and improve. Thanks!
[# PeerJ Staff Note: Please ensure that all review comments are addressed in a response letter and any edits or clarifications mentioned in the letter are also inserted into the revised manuscript where appropriate. It is a common mistake to address reviewer questions in the response letter but not in the revised manuscript. If a reviewer raised a question then your readers will probably have the same question so you should ensure that the manuscript can stand alone without the response letter. Directions on how to prepare a response letter can be found at: https://peerj.com/benefits/academic-rebuttal-letters/ #]
The manuscript presents an approach for continental-scale annual land-cover mapping by integrating several machine learning tools and existing independently collected training data. The manuscript is hard to follow. Consider adding hierarchical section numbering (1, 1.1, 1.1.1). It is very hard to distinguish a section from a sub-section, which makes the document structure unclear. I have concerns about project methodology (which is not always fully presented in the paper). I also have concerns about the findings and interpretation of the results, which contradicts previously published reports. Overall, I do not think that this manuscript is ready for publication in PeerJ.
1. Project methodology
L 151 – I do not see “m” in equation 1., Y and X are not explained (land cover and annual covariates?)
It is not clear why such a complex model was implemented rather than using a single approach (e.g., CART or CNN). The authors suggest that the proposed EML model is beneficial compared to a single machine learning approach, however, no valid comparison is presented to prove this point. It would be beneficial to compare the author’s results with a result from a single approach. Specifically, it is important to confirm if the class probability based on a single model (i.e. CART ensemble) will be substantially different from the obtained results. Is it also important to understand why such a complex model was needed: due to temporal inconsistency of source data, or errors in training data, or inconsistency between land cover and land use thematic classes?
The EML model design and application described very superficially, precluding detailed understanding and possibilities for replication (L185-198). Such description may be suitable for dataset metadata, but not for a scientific publication. It is not clear how the results of all different machine learning tools were integrated and what was the weight of each of the outputs to the final decision.
The purpose of “prevalent change mapping” is not clear. Why was it done and how was it validated and used?
Table 2: I think, the journal standard is to use comma, not period, as a thousands separator. Please correct.
Table 4 – Is it 30 m^2 (equivalent to 5.5 m spatial resolution), or 900 m^2?
L515 – I disagree. First, each algorithm represents an ensemble of models calibrated using random subsets of training data. Did you check that the covariate’s weight stays the same if a new random training set is used? Did you check the difference between the models in each ensemble?. Second, most of the Landsat-based metrics are correlated with each other. The selection of one metric over another may be a function of the model parameters or code specifics. Finally, even if these models have differences in the weights of covariances, that does not prove that integrating results from multiple models are more effective and accurate than implementing a single algorithm.
The conclusion from the test results for comparison of the spatial and spatiotemporal models is not clear. Can you provide an explanation on why the results of the model trained using multiple years are better? Is it because of Landsat data availability, or incorrect normalization of the ARD products, or training data errors?
2. Uncertainty Metrics and Validation
The authors suggest that the per-pixel class prediction uncertainty is an important feature of their method and the lack of per-pixel uncertainty reporting is a limitation for similar methods (L97). I disagree with this position. The model-based per-class uncertainty depends on multiple factors, including the annual cloud-free data availability and training site quality and distribution. By training data quality, I understand both temporal consistency (land cover changes correctly represented by training class) and thematic consistency (training data correctly represent the class that has a characteristic land cover and that can be identified using multispectral data). The per-pixel uncertainty reflects the ability of the model to replicate the training data, not to predict the actual class distribution. In case when training data consist of thematic or temporal uncertainties, the per-pixel uncertainty may be inappropriate for quantification of the map accuracy. That is why I see limited benefit from per-pixel uncertainty reporting in this or other similar projects.
The “accuracy” term is not always used correctly. For example, L98-99: “Mapping accuracy is provided as a general number (average performance) for the whole area, although in practice prediction accuracy often varies from class to class.” I see a contradiction here: if the mapping accuracy provided for the whole area, by each class, that should be sufficient to describe per-class accuracy variation, correct?
I suggest to authors provide conventional User’s and Producer’s accuracy metrics. This way, the accuracy of your product can be compared with earlier publications. Also, does the validation data were collected using a probability sampling design?
Table 5. The reclassification table seems very superficial. Why have you selected vineyards as a separate category, but omitted orchards? Why did you exclude peat bogs, which category are they on your map? Which category did you assign the mixed forests to?
The average accuracy of the method (Tab. 8) is below most of the published European products (Tab. 7). What is the benefit of the presented approach?
3. NDVI trend
The Landsat ARD product provides normalized reflectance, not surface reflectance calculated using the radiative transfer function. As such, the NDVI calculated from the ARD should not be directly used to estimate the vegetation vigor. Another issue related to changes in Landsat data availability during the 2000-2019 interval. The data in the early years (before Landsat 8 launch) were collected following a data acquisition plan which prioritized growing season observation. As such, the annual NDVI metrics may overestimate vegetation vigor at the beginning of the interval. Finally, the analysis of long-term trends in vegetation cover and/or NDVI requires statistical analysis to quantify trend significance (see Song et al., 2018, for example). The trend maps (Fig. 8) display both significant and subtle insignificant changes. All these factors may be responsible for the observed NDVI decrease in the northern latitudes, which contradicts earlier finding that shows the NDVI increase (Myers-Smith et al., 2020). Fig. 8 also demonstrates artifacts (e.g., a Landsat scene outline visible over Poland) in the NDVI trend. I suggest removing the NDVI trend analysis or re-doing this analysis using temporally consistent surface reflectance data.
4. “Deforestation” in Scandinavia
The “deforestation" term used in the manuscript is misleading. It suggests that the forest cover in part of Europe (specifically, in Scandinavia) is decreasing. This is not true and contradicts other findings.
“Deforestation” term this term is usually used for land-use conversion, while forest change in Scandinavia is (mostly) due to logging operations in managed forests. The net loss of forests in Scandinavia is not validated, and mist probably an artifact of the method. It seems that the forest regeneration was not quantified correctly by the authors. As such, it may demonstrate the weakness of the present approach and its unsuitability for continental-scale land cover change assessment.
5. Incorrectly used and omitted references
L 105-106. That is incorrect. Several publications specifically address the drivers of land cover change. See Tyukavina et al. 2017; 2018, for example.
The reference to Hansen et al., 2013 for this comparison (L 690-691) is wrong. The GFW product and the referenced paper specifically pointed out that the forest loss and gain products are not balanced, and the gain is underestimated. This comparison here is misleading.
When the authors refer to Ceccherini et al. (2020) to confirm their results, they should also refer to the recent replies to this publication (Picard et al., 2021; Palahi et al., 2021) that disproof results published by Ceccherini et al.
see above
see above
The provided manuscript agrees with PeerJ polices and conforms to the required format and structure. However, there is a lack of the Conclusion section that should clearly and concisely summarise the stated objectives and findings that support them. This could be made by selecting appropriate subsections of the Discussion part.
It is also written in good English with some typos that should be corrected. It is advised that the manus is checked by English native speaker to assure flawless reading.
The manus provides sufficient introduction to the topic and includes the most important and necessary references to literature. Most of the included figures and tables are very useful and allow better understanding of the content. However, some of them were not cited in the text. So they should be checked and removed as unnecessary material or referenced correctly in the text.
According to my knowledge the presented research is novel and similar approach have not been published elsewhere. The authors identified aims of the work and indicated direction of study clearly. It seems that the authors performed a lot of hard work to put all the pieces together. However, there should be clearly stated benefits coming from this work as compared to other studies addressing similar topic.
The description of the work is long but there is still may issues that seems to be explained not enough clearly. They are pointed out in section ‘General comments for the author’.
The authors describes the real findings which are in most cases correctly identified and supported by results. However, I provide some comments and ideas that may require some additional changes in the manus and I think they should be considered. There are also numerous questions in my comments which should be answered in order to make the manus easier to understand and follow. This would make the findings more consistent with the conclusions and comments from the authors.
In general speaking I appreciate the presented work as very interesting but some aspects should be corrected and explained more clearly. More detailed comments are provided below:
- line 76 – 7 years of what?
- lines 78-81 – the sentence is not grammatically correct
- Table 1 – why the table is not filled fully? There is some information missing.
- EO is not defined
- line 138 – 20TiB – I would suggest to use the most commonly used unites: TB
- equation 1 – first, adding such equation seems to be useless in such manuscript; second, the notation ‘m’ is not present in the equation
- lines 170-180 – I would put this and other codes as supplementary materials (appendix) to not make the manus too long.
- line 196 – it is not clear how the tuning was performed. Was it carried out on the entire Europe? Or only on some selected 30x30 km tiles? Where the 30x30 tiles come from?
- lines 205 – 208 – what is the different between point 1 and 3? How was p. 1 derived? How p. 3 is derived if there are different classes found in the same pixels by the three classifiers used? This is not clear
- general questions – how the Landsat images were analysed? Per tile? Or were they mosaicked?
- line 245 – ‘covariates’ – why to introduce a new name? usually the terms like input variables or classification features are used in the field of remote sensing. The term ‘variable’ is also used in Figure 1. I would suggest to not complicate it.
- line 247 – which years of Landsat data were analysed? 1999-2020 or 2000-2019??
- lines 252 – 255 – what is the reasoning behind this division? Any scientific clues? The season differ in different regions of Europe so where the one used here come from?
- lines 259-266 – this description is not clear, some figure would make it easier to understand
- lines 297-300 – this is also unclear, why LUCAS got 100% and CLC 85%? And how the weights were used? The lines 229-300 are totally unclear
- lines 312-317 – this section and the use of OSM is difficult to understand
- line 321 – where values 101-200 come from?
- Table 4 caption – 30 m2? Should be rather 30 m pixels size
- lines 362 – 366 – section quite unclear and there is repetition in lines 395-400;
- line 362 – ‘For each dataset’ – what dataset??
- Lines - 365-366 – both ‘spatial’ and ‘spatiotemporal’ models use multi-temporal data!
- lines – 380-394 – all description of the validation methods and comparison strategy is quite confusing and should be supported with some figure/workflow to make it clear, there are too many combinations and years mixed
- lines 404-406 – the sentence here is too long and at least some commas should be used.
- line 413 – OLS??? Please provide full explanation of this abbreviation
- lines 420 – 446 – I suggest to add it as supplementary materials
- Figure 4 – is this illustration representing analysis for only one class??
- lines 459 – 463 – not much informative
- lines 467-470 – very confusing sentence
- Table 7 – why there are different number of classes provided in the table for the same land cover products but compared with training data from different years?
- line 509 – ‘This proves…..’ what exactly means ‘this’?
- - line 520 – please correct the structure, it should be ‘…accuracy for 33 classes at level 3, …’ by the way, this results are for which classification? spatial or spatiotemporal? On which training dataset? CLC, LUCAS or combination of them?
- line 529 – what is the positive rate??
- line 541 – reclassifying…
- Table 9 – it is nice summary of the results. But it is not clear of which results? Which year? Which set of training data used? What is the ‘Accuracy’ at the bottom of the table? Please also use the same unit of data. In the text you write in %, e.g. 62% but in the table you provide number with decimal places, why?? The table shows that the accuracy of most of the classes at the level 3 received very week accuracies and cannot be considered as reliable result. There are just six out of 33 classes exceeding 70% of F1-score, therefore a clear statement should be provided addressing this issue and informing that the map should not be used in other analyses as very erroneous.
- lines – 560 – which results exactly is described here?
- lines – 568-569 – correct the numbers of figure…
- Figure 7 – not referenced in the text.. remove if not necessary. The pictures and notations are too small.
- Table 12 – not referenced in the text, remove if not useful
- line 575 – Fig. 4??,
- Figure 8 – not referenced in the text!! Pictures too small
- Figure 10 – not reference in the text
- line 623-626 – how the findings mentioned here can be seen in table 8?
- line 630 – it must be mentioned here that the result of 87% was met only for the classification with level 1 – five classes!!!
- lines 632-634 – very confusing sentence. It is actually not so comparable. The similar results are met only for level 1, with less classes. And the threshold of 85% was met for level 1 so another confusion. If you mention about 10 m resolution then you cannot do this because there is no product resulting from your methods with such resolution.
- line 729 – 731 – this is obvious because land use classes with heterogonous cover are very difficult to map automatically with ML and in many cases such attempt is not logical. This is why you received so low accuracy for the level 3 classification. Many of this classes are too complex to be resolved with ML and Landsat data.
- figure 11 – not reference in the text!!! By the way, it is a bit unclear what is presented here. If the figure stays in the manus, please explain its content.
All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.