The implementation of rare events logistic regression to predict the distribution of mesophotic hard corals across the main Hawaiian Islands
 Published
 Accepted
 Received
 Academic Editor
 James Reimer
 Subject Areas
 Ecology, Ecosystem Science, Environmental Sciences, Marine Biology
 Keywords
 Mesophotic, Rare events corrected regression, Species distribution model, Hawaii, Scleractinian corals, Leptoseris, Montipora, Theta threshold selection, Predictive modeling
 Copyright
 © 2016 Veazey et al.
 Licence
 This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
 Cite this article
 2016. The implementation of rare events logistic regression to predict the distribution of mesophotic hard corals across the main Hawaiian Islands. PeerJ 4:e2189 https://doi.org/10.7717/peerj.2189
Abstract
Predictive habitat suitability models are powerful tools for costeffective, statistically robust assessment of the environmental drivers of species distributions. The aim of this study was to develop predictive habitat suitability models for two genera of scleractinian corals (Leptoserisand Montipora) found within the mesophotic zone across the main Hawaiian Islands. The mesophotic zone (30–180 m) is challenging to reach, and therefore historically understudied, because it falls between the maximum limit of SCUBA divers and the minimum typical working depth of submersible vehicles. Here, we implement a logistic regression with rare events corrections to account for the scarcity of presence observations within the dataset. These corrections reduced the coefficient error and improved overall prediction success (73.6% and 74.3%) for both original regression models. The final models included depth, rugosity, slope, mean current velocity, and wave height as the best environmental covariates for predicting the occurrence of the two genera in the mesophotic zone. Using an objectively selected theta (“presence”) threshold, the predicted presence probability values (average of 0.051 for Leptoseris and 0.040 for Montipora) were translated to spatiallyexplicit habitat suitability maps of the main Hawaiian Islands at 25 m grid cell resolution. Our maps are the first of their kind to use extant presence and absence data to examine the habitat preferences of these two dominant mesophotic coral genera across Hawai‘i.
Introduction
Consistent and pervasive deterioration of marine ecosystems worldwide highlights significant gaps in current management of ocean resources (Foley et al., 2010; Douvere, 2008; Crowder & Norse, 2008). One such gap is the data required for informed marine spatial planning, a management approach that synthesizes information about the location, anthropogenic use, and value of ocean resources to achieve better management practices such as defining marine protected areas and implementing harvesting restrictions (Jackson, Trebitz & Cottingham, 2000; Larsen et al., 2004). The creation of spatial predictive models for improved marine planning is a relatively lowcost and noninvasive technique for projecting the effects of presentday human activities on the health and geographic distribution of marine ecosystems.
Defining and managing the biological and physical boundaries of ecosystems is a complicated but essential component of marine spatial planning (McLeod et al., 2005). The heterogeneous nature of ecological datasets can require the timeintensive development of problemspecific ecosystem models (Cramer et al., 2001; Tyedmers, Watson & Pauly, 2005). Scientists frequently use straightforward, easytoimplement regression methods to analyze complex datasets. The development of software accessible to relative novices has contributed to the growing popularity of regression methods (e.g., Lambert et al., 2005; Tomz, King & Zeng, 2003).
Here, we employ a logistic regression with rare events corrections (King & Zeng, 2001) to analyze the presence and absence data of two coral genera (Leptoseris and Montipora) and, thus, develop a predictive framework for the geographic mapping of mesophotic coral reef ecosystems (MCEs) across the main Hawaiian Islands. Mesophotic coral ecosystems, located at depths of 30–180 m, are considered to be extensions of shallow reefs because they harbor many of the same reef species present at shallower depths, and are also oases of endemism in their own right (Grigg, 2006; Lesser et al., 2010; Kane, Kosaki & Wagner, 2014; Hurley et al., 2016). MCE habitats are formed primarily by macroalgae, sponges, and hard corals tolerant of low light levels (Lesser, Slattery & Leichter, 2009). Corals of genus Montipora colonize primarily the shallow reef zone (<30 m), but some species, particularly Montipora capitata (Rooney et al., 2010), are able to extend their settlement into mesophotic depths. Corals of genus Leptoseris construct extremely efficient, lightcapturing skeletons that facilitate their habitation of the lower mesophotic zone (Kahng et al., 2012) and are considered to be exclusively mesophotic dwellers (Kahng & Kelley, 2007).
Ecological studies in the mesophotic zone are sharply limited in contrast to the shallower photic zone more accessible by open circuit SCUBA, but steady advances in diving, computing, and remotely operated vehicle technologies continue to facilitate interdisciplinary mesophotic research (Pyle, 2000; Puglise et al., 2009). Mesophotic research in Hawai‘i has been conducted primarily in the ‘Au‘au Channel, Maui, a relatively shallow, semienclosed waterway between the islands of Maui and Lāna‘i that is among the most geographically sheltered and accessible areas in the Hawaiian Archipelago, and, as a result, much of the existing video and photo records of MCEs are from this area. This concentration of historic surveys highlights the importance of creating a panHawai‘i predictive habitat model to identify likely areas of MCEs across unexplored areas of Hawai‘i’s mesophotic zone. Increasing our knowledge about the habitat preferences of the deep extensions of shallow coral species is critical given that approximately 40% of shallow (<20 m) reefbuilding corals face a heightened extinction risk from the effects of climate change (Carpenter et al., 2008). Here, we model the habitat associations of mesophotic scleractinian corals because of both their intrinsic biological value as well as their potential to recolonize globally threatened shallow reef areas and serve as a refuge to mobile reef organisms (Bongaerts et al., 2010; Kahng, Copus & Wagner, 2014).
Previous research about the environmental variables driving mesophotic scleractinian colonization in Hawai‘i suggests that distinct variation in community structure exists between the upper (30–50 m) and mid to lower mesophotic (50–180 m) depths (Rooney et al., 2010; Kahng et al., 2010; Kahng, Copus & Wagner, 2014). Potentially influential environmental variables include photosynthetically active radiation (PAR) levels (Goreau & Goreau, 1973; Fricke, Vareschi & Schlichter, 1987; Kahng & Kelley, 2007; Kahng et al., 2010), isotherms (Grigg, 1981; Kahng & Kelley, 2007; Rooney et al., 2010), and hard substrate availability (Kahng & Kelley, 2007; Costa et al., 2012). Rooney et al. (2010) noted that hard coral abundance declined dramatically below 100 m despite high (>25%) availability of colonizable substrate; this sudden reduction in coral cover occurs at increasingly shallower depths across the northwestern Hawaiian Ridge and may be driven by the synchronously shallower occurrence of isotherms.
Light and temperature intensity (Jokiel & Coles, 1977; Rogers, 1990), physical stress (e.g., wave energy or uncontrolled tourism) (Dollar, 1982; Nyström, Folke & Moberg, 2000; Franklin, Jokiel & Donahue, 2013), and availability of colonizable substrate (Jokiel et al., 2004; Franklin, Jokiel & Donahue, 2013) are known drivers of shallow (<30 m) reef coral distributions across the world. We expect that our model will capture the influence of these abiotic variables on the distribution of mesophotic corals, especially in the shallower mesophotic zone. We speculate that our model may detect unexpected drivers of Leptoseris distribution, particularly because Leptoseris is known to colonize deeper depths that bear little resemblance to shallow reefs (Lesser, Slattery & Leichter, 2009; Rooney et al., 2010). Finally, previous predictive modeling research about the drivers of Hawaiian mesophotic coral colonization identified depth, distance from shore, euphotic depth, and sea surface temperature as potentially influential environmental variables (Costa et al., 2012; Costa et al., 2015). Our novel modeling approach utilizes all observational data (corals present and absent), which we believe will offer more insight into the dynamics that facilitate and inhibit coral colonization across the mesophotic zone.
Materials and Methods
Organismal and environmental data
The Hawai‘i Undersea Research Laboratory (HURL) and the Pacific Islands Fisheries Science Center (PIFSC) provided video and photo records from MCEs in the Hawaiian Islands for our analyses. This imagery came from 19 dives conducted using submersibles, remotely operated vehicles (ROVs), autonomous underwater vehicles (AUVs), and tethered optical assessment devices (TOADs) in the ‘Au‘au Channel, Maui (13 dives) and two other geographically distinct regions: south O‘ahu (5 dives) and southeast Kaua‘i (1 dive). These dives were conducted between 2001–2013. We analyzed dive video using the Coral Point Count with Excel extensions (CPCe) tool (Kohler & Gill, 2006) in combination with a modified PIFSC 2011 mapping protocol (Pacific Islands Benthic Habitat Monitoring Center (PIBHMC), 2015). PIFSC has used this type of combined analysis, referred to as the random five point overlay method (RFPOM), to process coral reef ecosystem benthic imagery throughout the U.S. Pacific Islands Region since August 2011, and our use of it ensures database consistency with regions processed prior to this study. The CPCe software placed five points randomly on each snapshot, which we then assessed for coral presence. If any of the five points was on coral, that observation was recorded as a “presence.” In an effort to evaluate the accuracy of RFPOM, we counted all corals in 200 randomly selected screengrabs and found that this method misses 2.4% of coral observations recorded in these images. We categorized corals by genus, because both Montipora (Forsman et al., 2010) and Leptoseris (Luck et al., 2013) contain species complexes that remain the subject of taxonomic uncertainty which prevent us from being able to reliably identify corals to the species level from photographs.
We recorded snapshots every 30 s for the duration of each dive video. In addition to an existing database of 40,193 records from dives in the ‘Au‘au Channel, 3517 new snapshots were collected from the additional dives across south O‘ahu and Kaua‘i (Fig. 1). Of these 43,710 total images, 20,980 were discarded because either: (1) crucial observational data were absent; (2) they were redundant due to an extended stationary period; or (3) they fell outside the study depth range of 30–180 m. Of the remaining 22,714 records, we analyzed 2,757 unprocessed images using the RFPOM (Table 1).
Source  No. observations  Leptoseris  Montipora 

O‘ahu  2,645  192  0 
Kaua‘i  112  38  3 
Maui  19,957  708  791 
Total  22,714  938  794 
We selected our environmental covariates, listed in Table 2, based on the sufficiency of the data and the potential significance of each variable as an indicator of hard coral habitat suitability (e.g., Dolan et al., 2008; Rooney et al., 2010; Costa et al., 2012). We defined summer and winter seasons as May–September and October–April, respectively (Kay, 1994; Rooney et al., 2010). We delineated significant wave height estimates and mean current velocities by season and direction. We extracted and averaged significant wave height data from 144 days per season of twentyfour hour PacIOOS Simulating WAves Nearshore (SWAN) regional wave models estimated values for 2011–2015 (see website: http://oos.soest.Hawaii.edu/las/). Mean current velocity values were available from 0:00–21:00 every three hours for all months from 2013–2015; for each season and direction, 48 mean current velocity values were extracted and averaged from the PacIOOS Regional Ocean Modeling System (see website: http://oos.soest.Hawaii.edu/las/). This model has a 4 km horizontal resolution with 30 vertical levels across seafloor terrain. We sourced monthly MODIS Aqua Chlorophyll a averages for the year 2012 from the NOAA PIFSC OceanWatch Live Access Server (see website: http://oceanwatch.pifsc.noaa.gov/). Using the Morel (2007) method, we applied the following cubic polynomial equation to obtain logged euphotic depth: (1)${log}_{10}{Z}_{eu}=1.5240.436x0.0145{x}^{2}+0.0186{x}^{3},$ where x represents the measured Chlorophyll a concentrations (mg/m^{3}) at sea surface. Euphotic depth is the depth at which the level of photosynthetically active radiation (PAR), a limiting factor for many heterotrophic mesophotic corals, is at 1% of surface PAR. In total, we used 14 environmental predictor variables to shape our model (Table 2) (Figs. S1–S5).
Variable  Category  Variable description  Source  Resolution  Variable 

Biological (response)  Hard coral  Presence/absence between 30–180 m in depth  PIFSC, HURL optical validation data  NA  Leptoseris 
Montipora  
Environmental (predictor)  Light availability  The depth of the euphotic zone (PAR 1%) determined using the Morel method (2007)  NOAA Oceanwatch Live Access Server; NASA, 2014  4 km × 4 km  Mean euphotic depth (m) 
Topography  Seafloor complexity calculated with the ArcGIS BTM Terrain Ruggedness tool  USGS, 1998; University of Hawaii SOEST, 2014  50 m × 50 m resampled to 25 m × 25 m  Rugosity (unitless)  
Depth of seafloor  Depth (m)  
Rate of change calculated with the ArcGIS BTM Slope tool  Slope (degrees)  
Curvature of the seafloor calculated using the ArcGIS Curvature tool  Curvature (degrees of degrees)  
Hardness of seafloor detected by acoustic backscatter  Substrate hardness (unitless)  
Distance of observation point to nearest coastline  Distance to coastline (m)  
Compass direction of maximum slope calculated using the ArcGIS Aspect tool  Aspect (degrees)  
Waves/currents  Mean current velocity data obtained per season (winter/summer) for depths: 200, 150, 125, 100, 75, 50, 30 m  PacIOOS Hawaii Regional Ocean Model  4 km × 4 km  Mean cur. vel. (northward/summer) (m s^{1})  
Mean cur. vel. (northward/winter) (m s^{1})  
Mean cur. vel. (eastward/summer) (m s^{1})  
Mean cur. vel. (eastward/winter)(m s^{1})  
Sea surface mean significant wave height  PacIOOS Hawaii SWAN Wave Model  0.5 km × 0.5 km and 1 km × 1 km  Sig. wave height (summer) (m)  
Sig. wave height (winter) (m) 
The spatial resolution of the bathymetry data was 50 m × 50 m for all islands. We resampled the bathymetry raster to a cell size of 25 × 25 m consistent with a conservatively estimated ±25 m positioning error margin observed at a depth of ∼800 m. We estimated an average camera swath value of 3.24 m (range 2.45–4.54 m) based on previous measurements from 19 still image screenshots taken when the submersible was located at different heights above the seafloor. Our geopositional error for the images is ±5 m and we can expect that the location data are within a circle with a 10 m diameter. Our observation sampling area is projected out from the location area a distance of ≤5 m. Addition of a conservative 5 m observation area buffer to the location error area produces an observational data position of ±20 m from the given coordinates of a data point.
We removed all subsampling within cells due to slight variations in camera angles or vessel speed through a pointtoraster conversion. We categorized all cells with ≥1 presence observation as “present” cells and all cells with only absence observations as “absent” cells. This removal of multiple observations within the same 25 × 25 m pixel effectively eliminates pseudoreplication within the data. We used ArcToolbox and the Benthic Terrain Modeler Toolbox to calculate slope, curvature, rugosity, and aspect (compass direction) values (Wright et al., 2012). We performed a spatial join based on proximity to observation point data to assign values for surface Chlorophyll a concentration, mean current velocities, distance from shore, and significant wave heights.
Regression methods
In describing the relationship between a response variable and one or more predictor variables, we use a logistic regression model because the response variable is dichotomous (Hosmer & Lemeshow, 2004). The ordinary logistic regression (OLR) model is defined as: (2)$\theta =\text{expit}\left(\mathrm{\mu}\right)=\frac{1}{1+exp\left(\mathrm{\mu}\right)},$
where θ is the probability that the species of interest is present (y = 1), and 1 − θ is the probability it is absent (y = 0). The logit function is the inverse of the expit function, and (3)$\text{logit}\left(\theta \right)=\mathrm{\mu}={\beta}_{0}+{\beta}_{1}{x}_{1}+\cdots +{\beta}_{n}{x}_{n}$
is the linear sum of predictor variables, x_{1}, x_{2}, …, x_{n}, with intercept β_{0} andregression coefficients β_{1}, β_{2}, …, β_{n}. In the language of generalized linear models (GLM), OLR is said to have the logit function as its link function and the expit function as its inverse link function. Logistic regression provides a straightforward, meaningful interpretation of the relationship between a dichotomous dependent variable y and a set of predictor variables (Allison, 2001).
Despite the popularity of OLR, it may yield extremely biased results when an imbalance exists in the proportion of the response variable data (e.g., such as in our case, when y = 0 ≫ y = 1) (Van Den Eeckhaut et al., 2006). King & Zeng (2001) coined the term “rare events logistic regression” to describe their corrective methodology in dealing with unbalanced binary event data:

1.
The first step requires the selection of a representative sample. Though researchers generally prefer to work with more uniform response data (e.g., Liu et al., 2005), selection of an unusually high proportion of the rare event (in this case, y = 1) to “balance” the dataset and increase θ estimates will yield nonsensical results. We divided the data in half to create our training and testing datasets and checked that each set of observations had an approximately equal proportion ($\overline{y}$) of presence observations to better reflect the “true state” of the full dataset.

2.
The second step rectifies any bias that might be introduced when dividing the dataset. This prior correction on the intercept (β_{0}) can be calculated as: (4)${\stackrel{\u02c6}{\beta}}_{0}={\stackrel{\u0303}{\beta}}_{0}ln\left[\left(\frac{1\tau}{\tau}\right)\left(\frac{\overline{y}}{1\overline{y}}\right)\right];$
here, ${\stackrel{\u02c6}{\beta}}_{0}$ is the corrected intercept, ${\stackrel{\u0303}{\beta}}_{0}$ is the uncorrected intercept, τ is the true proportion of 1s in the population; and $\overline{y}$ is the observed proportion of 1s in the training sample.

3.
The third step rectifies any underestimation of the probabilities of the independent variables β_{1…n} from the substitution of the intercept value, obtained as: (5)$P\left({y}_{i}=1\right)={\stackrel{\u0303}{\theta}}_{i}+{C}_{i},$
where the correction factor C_{i} is given by: (6)${C}_{i}=\left(0.5{\stackrel{\u0303}{\theta}}_{i}\right){\stackrel{\u0303}{\theta}}_{i}\left(1{\stackrel{\u0303}{\theta}}_{i}\right)XV\left({\stackrel{\u0303}{\beta}}_{i}\right){X}^{\prime},$
where X is a 1 × (n + 1) vector of values for each independent variable β_{i}, X′ isthe transpose of X, and $V\left({\stackrel{\u0303}{\beta}}_{i}\right)$ is the variance covariance matrix. We obtained the improved probability estimates through estimation of β_{i} via ${\stackrel{\u0303}{\beta}}_{i}$ , thereby considered “mostly” Bayesian (King & Zeng, 2001). Our priors in this case would be uninformative, which means that we lack sufficient knowledge to estimate the probability distributions of our data and our parameter of interest, θ. This is often the case when working with sparse ecological datasets. As the uninformative prior for a regression coefficient with domain (∞, − ∞) is uniform, a full Bayesian estimation with uninformative priors is equivalent to a traditional logistic regression. Therefore, this correction is effectively a correction to the approximate Bayesian estimator, and its addition improves our regression by lowering the mean squared error of our estimates. We implemented this rare events logistic regression using the ‘Zelig’ package run in R (Imai, King & Lau, 2008; Choirat et al., 2015).
We constructed a correlation scatterplot matrix per coral genus to observe correlation levels between all variables. In choosing which highly correlated variables to exclude from the analyses, we followed the criteria outlined by Dancey & Reidy (2004) and Tabachnick & Fidell (1996), who suggest a cutoff correlation value of 0.7. Only mean significant wave height parsed by season consistently overreached this threshold; the covariate that was least correlated with the response variable was removed. We excluded predictors that lacked a clear distribution pattern or correlated minimally (<0.05) with the response variable.
One of the more studied habitat preferences of Leptoseris and Montipora is the influence of depth on their distribution (Rooney et al., 2010; Costa et al., 2012; Kahng et al., 2010). Increasing depths often correlate with greater distance from shore. The inclusion of squared terms (e.g., ${x}_{2}={x}_{1}^{2}$) in our regression equation expit(θ) = β_{0} + β_{1}x_{1} + ⋯ + β_{n}x_{n} permits the logistic curve to reflect the bell curve shape expected in plotting the distribution of these animals across a range of depths or distance from shore. In order to account for these trends, we added Depth Squared and Distance Squared as potential variables for consideration in our final model. As depth or distance increases, its square increases even more rapidly, allowing the squared term to eventually dominate and “pull down” the probability curve.
We withheld 50% of our information per genus as testing (i.e., validation) data. Using the remaining 50% (our training data), we performed the rare events corrected logistic regression described above. Using an exhaustive iterative algorithm (Calcagno, Mazancourt & Claire, 2010), we modeled all possible combinations of included covariates. We ranked models using the corrected Akaike information criterion (AICc) (Hurvich & Tsai, 1989), which is considered an excellent comparative measurement of model strength, especially for sparse datasets. For both genera, the models with the lowest (lowest = best) AICc scores were lower than the “second best” AICc scores by at least 2 (i.e., Δ AICc ≥2), indicating strong preference for the best model (e.g., (Hayward et al., 2007)).
In an ideal and unrealistic study, all biotic and abiotic components of a model would be homogenous and evenly distributed across a sampling space. Our sampling design includes overlapping submarine dive tracks and the inherent heterogeneity of the marine environment, which could problematically violate our model’s underlying assumption regarding the independence of our biological and environmental data. We removed all instances of pseudoreplication (multiple observations in one grid cell) when we assigned each grid cell to a category of “corals present” or “corals absent.” After we removed subsampling within our observational data, we checked for the presence of clustering, or spatial autocorrelation, within these data. Uncorrected spatial autocorrelation between observational data points confounds and undermines any biological inferences drawn from model predictions.
We checked smallscale, local spatial autocorrelation using Geary’s C statistic (Geary, 1954), based on the deviations in the responses of observation points with one another: (7)$C=\frac{n1}{2{S}_{0}}\frac{\sum _{i}\sum _{j}{w}_{ij}{\left({x}_{i}{x}_{j}\right)}^{2}}{\sum _{i}{\left({x}_{i}\overline{x}\right)}^{2}}.$
Here, x is the variable of interest, i and j are locations (where i ≠ j), w_{ij}represents the components of the weight matrix, and S_{0} is the sum of the components of the weight matrix. Geary’s C ranges from 0 (maximal positive autocorrelation) to 2 for high negative autocorrelation. In the absence of autocorrelation, its expectation is 1 (Sokal & Oden, 1978).
We also examined global spatial autocorrelation using Moran’s I statistic, which measures crossproducts of deviations from the mean (Moran, 1950): (8)$I=\frac{n}{{S}_{0}}\frac{\sum _{i}\sum _{j}{w}_{ij}\left({x}_{i}\overline{x}\right)\left({x}_{j}\overline{x}\right)}{\sum _{i}{\left({x}_{i}\overline{x}\right)}^{2}}.$
Moran’s I values generally range from –1 to 1, with 0 as the expectation when no spatial autocorrelation is present.
We also verified the spatial independence of our observational point data using a semivariogram, which is a graphical method of quantifying spatial correlation in a set of points (Figs. 2–3). We selected our theoretical semivariogram to fit the empirical semivariance using the ordinary least squares (OLS) method (Jian, Olea & Yu, 1996; Kendall et al., 2005). The spherical model had the best quantitative fit based on OLS estimates (Table 3). For each dataset, the low thresholds at which semivariance stopped increasing indicated the almost complete absence of spatial autocorrelation for each genus.
Genus  Sum of squares  Input σ^{2}  Input ψ  Actual σ^{2}  Actual ψ  Actual τ^{2} 

Leptoseris  2940.671  0.055  218  0.051  206.909  0 
Montipora  14013.610  0.020  390  0.032  390.000  0.003 
Model assessment
Evaluation of the rare events logistic regression model output is more complicated than for the typical linear model. For example, R^{2} values, although calculated, have little applicability to logistic regressions and are therefore ignored (Menard, 2000; Peng, Lee & Ingersoll, 2002). Sample size and selected threshold largely influence the results of the Hosmer and Lemeshow goodnessoffit test (Hosmer et al., 1997). Accordingly, we use model classification accuracy as a second measure of goodnessoffit (in addition to ΔAICc). We want to maximize true positives (TP) and true negatives (TN) while minimizing false positives (FP) and false negatives (FN). The sensitivityspecificity sum maximization approach (Cantor et al., 1999) therefore maximizes (9)$S{S}_{\mathrm{max}}=\frac{TP}{TP+FN}+\frac{TN}{TN+FP},$
which is equivalent to finding the point on the ROC (receiver operating characteristics) curve at which the tangent slope is 1, indicating the optimal cutoff point at which “cost” (here, the number of FN and FP) and “benefit” (the number of TN and TP) is balanced. We chose this technique because we aim to identify regions devoid of hard corals as well as regions deemed potentially suitable for habitation.
ROC curves plot the true positive test rate against the false positive test rate across different theta cutoff points (Hadley & McNeil, 1982). We calculated values for sensitivity and specificity for threshold increments of 0.005 ± 1 standard deviation of the rounded mean for each model. Because each theta threshold value varied based on the genus and model, the thresholdindependent area under the curve (AUC) test statistic best reflects the predictive accuracy of the model.
In addition to creating ROC curves, we also took into account the overall prediction success of each model, given as: (10)$OPS=\frac{TP+TN}{TP+TN+FP+FN}.$
Overall prediction success is a measure of total correct classification of both present and absent observations. While this is a good final assessment of model classification error, consideration of the prediction success alone is not a viable evaluation method when binary data is highly imbalanced, as a value given by this method may primarily represent model success in identifying the most common observation type (Fielding & Bell, 1997). We plotted our sensitivity and specificity values on a ROC curve to show how each model performed relative to chance (Fig. 4). All models fall in the range 0.7 ≤ AUC < 0.9, which indicates good discrimination and reliability of model predictions (Hosmer & Lemeshow, 2004).
We also created maps of individual and summed predicted occurrence probabilities of both coral genera across the main Hawaiian Islands and ran a hotspot analysis using the ArcGIS GetisOrd ${G}_{i}^{\ast}$ Hotspot Analysis tool. We constructed a polygon fishnet composed of 1 × 1 km cells which encompassed all islands. We summed each 25 × 25 m raster cell value for probability of Leptoseris occurrence and probability of Montipora occurrence. We performed a spatial join of raster cell values within each polygon for an average value of summed probabilities. The GetisOrd ${G}_{i}^{\ast}$ statistic identifies clusters within these polygons that display values higher in magnitude than random chance would permit. The GetisOrd local statistic is given as: (11)${G}_{i}^{\ast}=\frac{\sum _{j=1}^{n}{w}_{i,j}{x}_{j}\overline{X}\sum _{j=1}^{n}{w}_{i,j}}{S\sqrt{\frac{1}{n1}\left[n\sum _{j=1}^{n}{w}_{i,j}^{2}{\left(\sum _{j=1}^{n}{w}_{i,j}\right)}^{2}\right]}}.$
Here, w_{i,j} represents the spatial weights between features i and j; n represents the total number of features;x_{j} is the attribute value for feature j; $\overline{X}=\frac{1}{n}{\sum}_{j=1}^{n}{x}_{j}$; and $S=\sqrt{\frac{1}{n}{\sum}_{j=1}^{n}{x}_{j}^{2}{\left(\overline{X}\right)}^{2}}$.
Results
Geary’s C test statistic is a measure of local (smallscale) spatial autocorrelation; in the absence of correlation, 1 is the expected value of Geary’s C. Moran’s I is a measure of global (largescale) spatial autocorrelation; in the absence of correlation, a value of 0 is expected for the Moran’s I test statistic. For our Leptoseris dataset, Geary’s C = 0.990; for our Montipora dataset, Geary’s C = 0.996. For our Leptoseris dataset, Moran’s I = 0.006; for our Montipora dataset, Moran’s I = 0.003. These values do not indicate any local clustering or global spatial autocorrelation within either dataset. We observed negligible levels of autocorrelation up to ∼100 m for Montipora (Fig. 3). By ensuring that spatial autocorrelation is not present in our data, we do not violate the assumption that our response data are independently observed, which enables us to draw robust conclusions about the ecological factors influencing the distribution of these coral genera within the mesophotic zone across the main Hawaiian Islands.
Genus  θ threshold  TP  TN  FP  FN  TP_{c−v}  TN_{c−v}  FP_{c−v}  FN_{c−v}  Sensitivity  Specificity  SS_{max}  OPS  OPS_{c−v}  AUC 

Leptoseris  θ_{OLR} = 0.065  223  4,133  1,525  84  219  3,757  1,894  94  0.7264  0.7305  1.4569  73.0%  66.7%  0.782 
θ_{corr} = 0.067  220  4,168  1,490  87  182  4,000  1,651  131  0.7166  0.7367  1.4533  73.6%  70.1%  0.780  
Montipora  θ_{OLR} = 0.064  200  4,299  1,536  69  159  4,453  1,406  86  0.7435  0.7368  1.4803  73.7%  75.6%  0.808 
θ_{corr} = 0.0625  198  4,336  1,499  71  165  4,406  1,453  80  0.7361  0.7465  1.4792  74.3%  74.9%  0.809 
The OLR covariate coefficients were modified using the rare events corrections proposed by King & Zeng (2001), resulting in a change in predictive power (Table 4). Rare events corrected models usually performed better than the uncorrected models, in terms of improved specificity and prediction success. Our sensitivity values for both corrected models were slightly lower than the corresponding OLR sensitivities, but in each case, specificity and prediction success values were improved. Additionally, standard errors of the coefficient estimates were lower for corrected models than for uncorrected models (Tables S1–S4).
Leptoseris corals inhabit mesophotic regions with high slope and rugosity values, high to moderate perennial current flow, and their occurrence peaks around 100 m (Table S3, Figs. S6–S10). Montipora corals peak in occurrence around 60 m and colonize regions less exposed to high energy winter swells (Table S4, Figs. S11–S12). Predicted presence probability values (θ) averaged 0.051 for Leptoseris and 0.040 for Montipora models in the validation data (Figs. 5–6). These values agree well with the actual presence frequencies in that data (0.052, 0.042). To better interpret these realistically low theta values, we chose a theta threshold to transform the probability estimates to presence/absence values. This is standard practice when examining the results of a rare events logistic regression, but less common when performing OLR (Liu et al., 2005). Objective selection of a theta threshold on a permodel basis is more scientifically sound than, for example, an arbitrary assignment of 0.5 (Cramer, 2003). The transformed model is valid if a threshold value yields a high percentage of correctly classified observations and a low number of FP and FN observations (Gobin, Campling & Feyen, 2001). We selected an appropriate threshold for each model (Table 4) in order to maximize SS_{max} (Liu et al., 2005).
Our final hotspot maps show the results of our analysis for Leptoseris, Montipora, and both genera combined across all islands (Figs. 7–9). We show hotspots of habitat suitability for both coral genera in red for areas of highest suitability and blue for areas of lowest suitability. We identify a cell as a hotspot when the sum of its value and the values of its nearest neighbors is much higher or lower than the mean over all cells. When the local sum of a cluster is very different from the expected value, a statistically significant hotspot is identified (${G}_{i}^{\ast}$ statistic ≥1.96 or ${G}_{i}^{\ast}$ statistic ≤ − 1.96). Neither genus clearly dominated the summed probabilities hotspot identification across any of the islands. Large Leptoseris hotspots were identified in southwest Moloka‘i, northeast O‘ahu, west Hawai‘i, and the central ‘Au‘au Channel. Montipora hotspots were identified in east Ni‘ihau, southwest Kaua‘i, west and south O‘ahu, west Hawai‘i, and the central ‘Au‘au Channel.
Discussion
In this study, we used logistic regression with rare events corrections to predict the habitat preferences of two dominant scleractinian coral genera across the entire mesophotic zone surrounding the main Hawaiian Islands. The habitat preferences of Montipora in the mesophotic zone appear distinct from those of Leptoseris. Montipora prefers the middle mesophotic zone (50–80 m) of reefs less exposed to highenergy winter swells. Leptoseris prefers steep, rugose slopes and the lower mesophotic zone (>80 m) in regions of high year–round current flow.
Important environmental covariates
Predicted Montipora presence peaks at about 60 m (median occurrence probability = 7.5%); Leptoseris presence peaks at about 100 m (median occurrence probability = 7.5%). These predictions are consistent with the inferences of Rooney et al. (2010), which separates mesophotic reefs into three distinct depth sections: upper (30–50 m), branching/plate dominated (50–80 m), and Leptoseris dominated (≥80 m). The depth at which suitability peaks for Leptoseris occurs at a range where steep ridges and dropoffs are plentiful in our study region, and therefore the mean preferred depth may be prone to slight overestimation.
In addition to depth, four environmental covariates appeared to influence the distribution of Leptoseris: rugosity, slope, summer mean current velocity (northward), and winter mean current velocity (eastward). Scleractinians easily colonize environments that are relatively calm and rugose due to the larger amount of available surface area, and this positive correlation was reflected in our model. Leptoseris habitat preference was also positively associated with slope, which was not observed for Montipora. Corals that inhabit the upper mesophotic zone may be more susceptible to damage from debris displaced by high wave energy, and are therefore less likely to colonize steep slopes (e.g., HarmelinVivien & Laboute, 1986; Bridge & Guinotte, 2013). The deeper distribution of Leptoseris may protect it from damage related to wave intensity, allowing it to colonize slopes (e.g., White et al., 2013). Another possibility is that the model is picking up dropoffs from masses accreted during the last glacial maximum. These steep dropoffs are present between 90–120 m in the Leptoserisdominated lower mesophotic zone (Yokoyama et al., 2001; Webster et al., 2004).
Leptoseris also favors wellflushed areas exposed to yearround moderate current flow (i.e., up to 0.3 m/s). The platelike morphology of Leptoseris corals effectively boosts sunlight capture by its symbiotic zooxanthellae and zooplankton capture by the corals themselves, but it also makes the coral vulnerable to smothering by sediment accumulation (Bak, Nieuwland & Meesters, 2005; Bongaerts et al., 2010; Marcelino et al., 2013). The success of Leptoseris corals in areas of moderate current flow may be related to the improbability of sediment settlement and accumulation. While the model did not capture the same effect of current flow on Montipora distribution, we recognize that the morphology of some Montipora species is extremely similar to that of Leptoseris. We do not expect either genus to readily colonize highly turbid regions, especially given that certain species of heterotrophic Montipora are thought to exploit strong currents to meet their energy requirements (Grottoli, Rodrigues & Palardy, 2006; Rooney et al., 2010).
Substrate hardness, a variable known to influence coral colonization, was notably absent from each model. Substrate hardness values were derived from acoustic backscatter imagery readings. The base resolution of these readings (50 m × 50 m) was not sufficiently detailed for purposes of this analysis. We noted plentiful coral colonization along larger surfaces like lava fingers, the hardness of which would be detectable by backscatter surveys, as well as across small rock fragments strewn across a sand flat, which would be obscured by the softness of the surrounding benthos. We can conclude that measurements of benthic hardness are not detailed enough for predictive modeling purposes at a 25 × 25 m resolution.
We emphasize that the purpose of this study was to build a panHawai‘i predictive habitat map for two dominant coral genera within the mesophotic zone. Because the scope of this study included all main Hawaiian Islands, we were constrained by the coarseness of available fullcoverage environmental data. As we build on this analysis, we plan to use our maps to identify areas of interest for further study at higher resolution and to include additional variables currently only available in certain regions, such as light intensity and temperature at depth. For example, our predictive and hotspot maps identify Penguin Bank (southwest Moloka‘i) as particularly suitable for Leptoseris colonization, which has not been verified by video or photo records. High resolution backscatter data (1 × 1 m) exist for this region, and incorporation of these data into new analyses of subsets of our study area may refine our conclusions.
Error sources and model reliability
We examined two types of error (false negatives and false positives) and analyzed our models without giving preference to either one. This approach is widely accepted as the best method of overall error minimization (e.g., Liu et al., 2005; Fielding & Bell, 1997). Rare events corrected models for both Leptoseris and Montipora achieved levels of specificity and sensitivity well above the null, indicating good predictive power. Additionally, both models attained about 74% overall prediction success. We assumed coral detectability was constant across the study region and that we can therefore consider the true absence observations to be reliable indicators of a potentially unsuitable habitat for corals. For each genus, the model tended to slightly overpredict presence observations; large numbers of false positives lowered sensitivity values. This is inevitable in the analysis of severely imbalanced or sparse binary data; the ongoing addition of presence observations to the dataset will improve overall model classification accuracy.
While the consistent identification of southern coastal areas as suitable is reliable, the comparatively infrequent selection of northern coasts is likely due to the source of the modelbuilding observations. The vast majority of mesophotic exploration has been along southern coastlines, which is often where waters are calmest in Hawai‘i. It is speculated that because mesophotic corals are more shielded from winter longperiod wave energy than their shallow water counterparts, they are able to flourish at depth along northern coastlines (Grigg, 1998; Rooney et al., 2010). The addition of data sourced from northern expeditions would likely improve predictive power of the model across northfacing coastlines (Alin, 2010).
We acknowledge that the original data were not collected in a standardized fashion (e.g., variation in vessel traveling speed or differences in data collection vessel and/or quality). Our careful exclusion of overlapping observation points within each 25 × 25 m rectified this sampling design flaw as much as possible and eliminated pseudoreplication.
Distinctions between coral genera
Our Montipora model was simpler than the Leptoseris model in that the only variable included other than depth was winter significant wave height. Though uncertainty was highest at lower values of significant wave height, Montipora demonstrated a preference in colonizing habitats that experience lower significant wave height during winter. This preference contrasts with Montipora species in shallow waters that were more likely to be observed in higher wave height environments (Franklin, Jokiel & Donahue, 2013). This likely influenced the inability of the model to identify any suitable habitat around Ni‘ihau, where the average winter significant wave height equaled 1.78 m, almost double the mean significant wave height of our model training data (0.91 m). Though mesophotic corals are generally thought to be exempt from the growth limitations faced by shallow water corals in regions of high wave energy, prolonged wave intensity has been shown to negatively affect the colonization of upper mesophotic scleractinians, especially in sloping areas prone to debris avalanches (Bridge & Guinotte, 2013; Kahng, Copus & Wagner, 2014). Continuation of this work might include a more indepth examination of the relationship of this coral genus with the combined effects of slope of available substrate and exposure to wave energy.
We found no records of Montipora presence when processing our O‘ahu dataset, which probably contributed to the very low predicted mean probability of Montipora occurrence there (0.1%). We believe this is due in part to the sampling pattern across south O‘ahu; we recorded 62.3% of all observations processed for this region at a depth of 75 m or greater. Montipora prevalence is greater in the uppertomiddle mesophotic zone, and the relative deepness of the O‘ahu dives likely influenced their nonappearance in this portion of the dataset. We emphasize that the dearth of Montipora observations around O‘ahu is an artifact of the dataset we used to construct our model; Montipora corals have been observed in mesophotic depths across O‘ahu (e.g., Fig. 4B, Rooney et al., 2010). The mean significant wave height across the mesophotic zone was lower across the southern and western coasts (1.50 m) than that observed across the northern and eastern coasts (2.37 m) of the island. As at Ni‘ihau, we assume that this high northern and eastern average height, coupled with the absence of Montipora presences in O‘ahu in the training dataset, greatly impacted our model’s ability to detect areas of suitable habitat around the island. The results of our Getis–Ord ${G}_{i}^{\ast}$ Hotspot Analysis corroborate the findings of Costa et al. (2015), who used Maximum Entropy software to predict the highest occurrence probability of Leptoseris and Montipora in the middle and midcoastal ‘Au‘au Channel, respectively (Costa et al., 2015).
The factors influencing the distribution of coral species in shallow and mesophotic habitats differ. One of the fundamental drivers of the occurrence and abundance of coral species on shallow reefs in Hawaiian waters is wave stress (Dollar, 1982; Grigg, 1983; Franklin, Jokiel & Donahue, 2013). Given the depth range of MCEs, wave stress is unlikely to serve as a direct influence on coral occurrence but may provide secondary effects as wave events lead to debris reaching MCEs (Kahng, Copus & Wagner, 2014). Furthermore, the decoupled effects of environmental drivers on shallow and mesophotic zones extend between the islands. In shallow reef communities Montipora species become relatively more dominant from Hawaii Island to Ni‘ihau (Franklin, Jokiel & Donahue, 2013), but appear to peak in occurrence in the mesophotic zone of Maui Nui. While strong environmental drivers influence the distributions of shallow corals, the occurrence patterns of mesophotic corals may reflect a more stable environment with an increased influence of biotic factors such as interspecific competition in a habitat zone with limited light and space resources available.
Conclusions
We implemented a rare events corrected logistic regression to determine the most influential environmental predictors of Montipora and Leptoseris colonization in the mesophotic zone. Habitat preference differences between these genera appear distinct and multifaceted. Montipora thrives in the middle mesophotic zone in areas sheltered from high intensity winter swells, while Leptoseris tends to colonize steep, rugose, wellflushed areas in the lower mesophotic zone. Improved understanding of the distribution of mesophotic corals will enable resource managers to propose the construction of seafloor power cables and other offshore infrastructure in areas less likely to contain coral communities. Results will likewise facilitate efforts to protect these communities by supplementing scientific dive planning and strategies for conservation, such as marine spatial planning.