This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ PrePrints) and either DOI or URL of the article must be cited.
The natural sciences, such as ecology and earth science, study complex interactions between biotic and abiotic systems in order to infer understanding and make predictions. Machine-learning-based methods have an advantage over traditional statistical methods in studying these systems because the former do not impose unrealistic assumptions (such as linearity), are capable of inferring missing data, and can reduce long-term expert annotation burden. Thus, a wider adoption of machine learning methods in ecology and earth science has the potential to greatly accelerate the pace and quality of science. Despite these advantages, machine learning techniques have not had wide spread adoption in ecology and earth science. This is largely due to 1) a lack of communication and collaboration between the machine learning research community and natural scientists, 2) a lack of easily accessible tools and services, and 3) the requirement for a robust training and test data set. These impediments can be overcome through financial support for collaborative work and the development of tools and services facilitating ML use. Natural scientists who have not yet used machine learning methods can be introduced to these techniques through Random Forest, a method that is easy to implement and performs well. This manuscript will 1) briefly describe several popular ML methods and their application to ecology and earth science, 2) discuss why ML methods are underutilized in natural science, and 3) propose solutions for barriers preventing wider ML adoption.