Do longer sequences improve the accuracy of identification of forensically important Calliphoridae species?

Species identification is a crucial step in forensic entomology. In several cases the calculation of the larval age allows the estimation of the minimum Post-Mortem Interval (mPMI). A correct identification of the species is the first step for a correct mPMI estimation. To overcome the difficulties due to the morphological identification especially of the immature stages, a molecular approach can be applied. However, difficulties in separation of closely related species are still an unsolved problem. Sequences of 4 different genes (COI, ND5, EF-1α, PER) of 13 different fly species collected during forensic experiments (Calliphora vicina, Calliphora vomitoria, Lucilia sericata, Lucilia illustris, Lucilia caesar, Chrysomya albiceps, Phormia regina, Cynomya mortuorum, Sarcophaga sp., Hydrotaea sp., Fannia scalaris, Piophila sp., Megaselia scalaris) were evaluated for their capability to identify correctly the species. Three concatenated sequences were obtained combining the four genes in order to verify if longer sequences increase the probability of a correct identification. The obtained results showed that this rule does not work for the species L. caesar and L. illustris. Future works on other DNA regions are suggested to solve this taxonomic issue.

Previous works indicate that the combination of nuclear and mitochondrial markers is a much more accurate approach for species identification.In a recent paper, the study of Caribbean blow-flies through DNA markers highlights the possibility to resolve phylogenetic relations using a combination of COI and ITS2 genes.In fact, COI failed in demonstrating a monophyly in recently diverged species, leading to uncertain identification.The addition of a second nuclear marker, such as ITS2, increases certainty in species identification (Yusseff-Vanegas & Agnarsson, 2017).McDonagh and co-worker tested a multi-loci approach (28S rRNA, COI and EF-1α) finding that multiple-gene phylogenies permit the use of genes that have evolved at different rates, and also allow the identification of experimental errors in species identification and sequencing (McDonagh & Stevens, 2011).Zaidi et al. (2011) based the identification of Diptera species on five genes and demonstrated that such a multi-gene approach allows to overcome and clarify the misdiagnosis given by a single gene identification.
We focused our attention on dipteran specimens morphologically identified in order to evaluate the accuracy of the molecular approach in the identification of forensically important species.Sequences of four different markers, two mitochondrial (COI and ND5) and two nuclear (EF-1α and PER) were used.According to literature, identification based on a single gene had showed discordant outcomes compared with morphological results (Meier et al., 2006;Vilgalys, 2003) especially in the case of closely related species.In order to clarify the accuracy of a molecular multiple-loci approach in the identification of forensically important species, we built concatenated sequences using the four different markers.

MATERIALS AND METHODS
Eighty specimens (Table 1) were collected between 2011 and 2014 in Italy (Emilia Romagna, Veneto and Calabria), England (West Yorkshire) and Belgium, and preserved in absolute ethanol.The specimens were observed under the microscope and identified using taxonomic keys (Table 2).DNA extraction from adult insects was performed on abdominal tissues carefully dissected, to prevent external contaminations and to preserve the external structure of the insect for future examination.Full puparia and larvae were instead entirely processed, after a photographic documentation to allow further observations.DNA was extracted using the QIAamp DNA Mini Kit R (QIAGEN, Germantown, MD, USA), following the manufacture protocol ''DNA Purification from Tissue'' (QIAGEN).Sterile deionized water was used to elute the DNA.The amplification of DNA was carried out on selected regions of four genes.In particular the barcoding region of the COI gene, and portions within ND5, EF-1α and PER genes were amplified.COI gene was selected as mainstream component of the analysis, and conversely ND5, EF-1α and PER genes were selected because only a little information is available on these DNA portions.A list of the used primers and their specifications are reported in Table 3. PCR was performed using 4 µl of the DNA extract as template for a 40 µl reaction final volume, using 0.5 µl of GoTaq R Flexi DNA Polymerase (Promega, Madison, WI, USA) per reaction.Each 40 µl reaction consisted of 8 µl of 5X Colorless GoTaq R Flexi Buffer (Promega), 4 µl of MgCl2 (25 mM), 1 µl of each of the two primers (10 pmol/µl), 1 µl of 10 mM nucleotide mix (Promega), and 20.5 µl sterile distilled water.Thermal cycler program used for the amplification consisted of an initial denaturation step at 95 • C for 1 min, followed by 35 cycles of 1 min at 95 • C, 1 min at the annealing temperature and 1 min at 72 • C; with a final extension at 72 • C for 10 min.Annealing temperatures were 49.8 • C for COI, 53 • C for ND5, 55 • C for EF-1α and 58 • C for PER.Amplifications were confirmed by standard gel electrophoresis, using 2% w/v agarose/TBE gels, stained with ethidium bromide.Thirty-five µl of PCR products were purified using QIAquick PCR Purification kit R (QIAGEN, Germantown, MD, USA) following the manufacturer protocol and were sequenced by Eurofins Genomics (Ebersberg, Germany).Sequences were considered for species identification purposes using nBLAST R (Altschul et al., 1990) to confirm the previous morphological identification.A total of 309 sequences were analysed, from them 257 were sequenced in this work (Table 4), and 52 were downloaded from GenBank (Table 5).Analyses based on the phylogenetic relationships between the studied species were carried out to confirm the identification.It is worth mentioning that in order to obtain consistent blocks of nucleotides for all the species, the sequences were processed with Gblock and manually checked (Talavera & Castresana, 2007;Castresana, 2000).Subsequently, sequences were aligned using Clustal Omega (Sievers et al., 2011) and concatenated with FASconCAT v1.0 (Kück & Meusemann, 2010).Trees were built using the Neighbour Joining and the Maximum Likelihood methods on MEGA 7.0 (Kumar, Stecher & Tamura, 2016) using Kimura 2-parameter (K2P) evolutionary model (Čandek & Kuntner, 2015).A bootstrap of 1,000 replicates was used for the phylogenetic reconstructions.Trees were visualised with ITOL (Letunic & Bork, 2016).In the trees reconstruction Piophilidae and Muscidae species were considered as outgroups.

RESULTS
The analysed specimens belonged to fourteen species, with Calliphora vicina Robineau-Desvoidy, 1830 and Lucilia sericata (Meigen, 1826) (Diptera, Calliphoridae) as the most abundant taxa representing 29.8 and 27.4% respectively.The first analysis step was based on a local alignment using GenBank BLAST (Altschul et al., 1990) and the percentage of correct identification was evaluated.In particular, the molecular one-gene identification  was compared with the morphological identification obtaining a percentage value match of 87.5% for COI, 72.5% for ND5, 77.1% for EF-1α and 67.9% for PER.Concerning Calliphoridae, the percentages were 77.5, 64.1, 71.2 and 64.2% respectively.A phylogenetic approach was used to verify the molecular identification efficiency, however, the sequencing of EF-1α and PER regions was successful only in 72.6% and 69.1% of the specimens respectively.Independent analysis of COI (Fig. 1A, Fig. S1) recovered the monophyly of all families.All the subfamilies (Calliphorinae, Luciliinae and Chrysomynae) are separated with robust bootstrap values ranging from 0.8 to 1 in a scale between 0 and 1.Among the genus Lucilia, L. sericata sequences cluster together and are clearly distinct from the other co-generic species (bootstrap 1), while the pattern of Lucilia illustris Meigen, 1826 and Lucilia Caesar (Linnaeus, 1758) is not clearly resolved with L. illustris showing a paraphyletic pattern.The genus Calliphora Robineau-Desvoidy, 1830 was also recovered as paraphyletic, in this case C. vicina branches with Cynomya mortuorum (Linnaeus, 1761), but with a weak support, instead of branching with C. vomitoria.Phylogenetic reconstruction based on the ND5 marker (Fig. 1B, Fig. S2) shows an unresolved topology with problems of determination at all taxonomic levels (family, subfamily, genus and species).Lucilia caesar and L. illustris are not clearly distinct and in addition Protophormia terraenovae Robineau-Desvoidy, 1830 sequence clusters with L. sericata sequences.A small number of sequences was available for both EF-1α (Fig. 1C, Fig. S3) and PER gene (Fig. 1D, Fig. S4).Both phylogenetic reconstructions obtained using these markers showed the same problems reported for COI and ND5, with L. illustris and L. caesar not clearly distinct.
In order to increase the molecular information three concatenated sequences were generated using the previous genes (DeSalle, Egan & Siddall, 2005).The phylogenetic reconstruction based on the chimeric sequence generated on the two mitochondrial genes (COI and ND5) (Fig. 2A, Fig. S5) does not provide a better resolution for L. illustris/L.caesar species as well as for the position of C. mortuorum among the Calliphorinae.These two points are not better clarified when the nuclear sequences are included and two more chimeric sequences with three (COI, ND5 and EF-1α) (Fig. 2B, Fig. S6) and four (COI, ND5, EF-1α and PER) (Fig. 2C, Fig. S7) genes are generated.Table 6 summarizes the information of the sequences used in the phylogenetic reconstructions.

DISCUSSION
The results obtained with a local alignment demonstrate that the match of the molecular identification with the morphological identification of the specimens was never higher than 90% also considering COI gene (87.5%),currently used for species identification (DNA Barcoding Project (http://www.barcodeoflife.org/)).The analysis of ND5 gene, a mitochondrial gene, was difficult for Calliphora vomitoria (Linnaeus, 1758) due to a complete lack of ND5 sequences from this species in the database (GenBank) at the moment of the analysis.The molecular analysis of the closely related though morphologically well distinct species, L. illustris and L. caesar, does not allow a unambiguous identification of them, as already reported in previous works where different phylogenetic approaches (e.g., Maximum Parsimony) were also used (GilArriortua et al., 2015;Wells, Wall & Stevens, 2007).In fact, GilArriortua and co-workers (GilArriortua et al., 2015) indicated that L. caesar and L. illustris species appear to share mitochondrial genomes with a divergence value lower than the minimum inter-specific threshold value for mitochondrial loci.ND5 gene showed the same problem in the discrimination of the close Lucilia species.To our knowledge, the analysis of closely related species in blowflies using ND5 gene was only reported by Zaidi and co-workers who showed a good identification performance using this gene (Zaidi et al., 2011).In addition, the same mitochondrial region was used to analyse the evolutionary relationship between flesh flies with a good resolution (Zehner et al., 2004).The analysis of EF-1α gene is in agreement with a previous study (McDonagh & Stevens, 2011) that demonstrated a good ability of this gene to separate blowflies according to morphological classification.However, in our reconstruction both the position of the Lucilia species and Cynomya, within Calliphorinae, are not well resolved.To our knowledge, PER gene was studied for identification purposes only in flesh flies (Guo et al., 2014).This work showed the possibility to use successfully PER gene for identification purposes, although public datasets might be enriched with further DNA sequences belonging to different family of Diptera.The analysis of the concatenated sequences generated with COI, ND5, EF-1α and PER markers unfortunately does not improve the resolution of the investigation despite previous works indicate that the combination of nuclear and mitochondrial genes for 0.9/-1/1 1/1 1/1 1/1 0.6/0.7 0.9/0.91/1 0.9/0.90.5/0.90.7/0.6 0.7/-1/1 1/0.9 0.8/-0.9/-1/10.9/1 0.9/1 1/0.9 0.7/0.6 0.9/0.species identification is a much more accurate approach.In fact the combination of markers that have different evolutionary histories, fast and slow evolving genes, allows a better resolution of the phylogenies.In particular, the multi-loci analysis of COI, EF-1α, and 28S rRNA genes and the combined analysis of COI, CYTB, ND5, and ITS1 and ITS2 genes has demonstrated to be more successful compared to the single-locus phylogeny, leading to a better grouping of species belonging to the same family (Zaidi et al., 2011;McDonagh & Stevens, 2011;Grzywacz, Wallman & Piwczyński, 2017).However, as underlined by Sonet et al. (2012), not always the addition of more genes with different evolutionary histories resolves the monophyly of closely related species such as L. illustris and L. caesar.The monophyly of these two species was clearly demonstrated only by two research groups: one working with the gene Bicoid (Park et al., 2013) and another one using the AFLP (Amplified Fragment Length Polymorphism) approach (Picard et al., 2018).In both cases the two species were well resolved with a strong basal support, confirming the conclusions obtained from the morphological analysis of male and female specimens of both species.At the moment, because of the small number of sequences available for these two species, we cannot exclude phenomena of hybridization at least in some parts of the distribution area of the species, but this point needs further investigations and a larger dataset to be analysed.
The importance to have complete and correct dataset is a crucial point to reach a correct species identification, with both local alignment systems and/or phylogenetic methods.Molecular approach is strongly related to the quality of information stored in databases, and the possibility to improve the amount of genetic markers from different specimens from different geographical locations is important to recover the best resolution in phylogenetic trees.The availability of genetic data from different populations allows to have information about the intraspecific variability that, in closely related species can affect the phylogenetic reconstruction.The use of a single gene approach to identify animal species is an open argument, especially for closely related species.In particular, mtDNA does not seem to be significantly different from any other marker group revealing an overall success rate of 71% (Dupuis, Roe & Sperling, 2012).In fact, the mitochondrial evolution reduces its applicability for detailed systematic or taxonomic analysis for closely related species (Dupuis, Roe & Sperling, 2012;Will, Mishler & Wheeler, 2005;De Carvalho et al., 2008).Dupuis and co-workers (Dupuis, Roe & Sperling, 2012) highlighted two main results: (i) marker classes (mtDNA, ribosomal DNA, autosomal loci, sex-linked loci, and anonymous loci) were moderately successful to delimit closely related species, if used as unique identifier, and (ii) multi-locus power analysis data support investigation and use of multiple markers for species delimitation.Several papers have discussed multi-locus analysis as species identification methods for animal kingdom.In particular, sex-linked markers showed a high success ratio in delimiting closely related species in Diptera and Lepidoptera (Coyne & Orr, 1989;Roe & Sperling, 2007).The improvement of genetic datasets and the concatenation of different mitochondrial and nuclear loci could improve the capability of molecular approach to identify closely related species but this aspect has to be further explored considering as well the taxon's specificity.
It is worth mentioning as well that in this kind of studies the species choice and intra-specific sampling scheme can strongly affect the level of resolution of the analysis.In our study, a further investigation including a larger sequence dataset of species in the genus Lucilia from different geographical contexts would better clarify the results here reported and the derived conclusions.

CONCLUSIONS
Nowadays, in forensic entomology, the morphological identification approach for some species is not completely replaceable by the molecular one if based on a single gene.The two methodologies can complement each other.In addition, because of the lack of information in databases, a phylogenetic approach can increase the ability of species identification when the molecular approach is used.The analysis of mitochondrial genes is considered the best approach because of the peculiarity of this kind of DNA, in terms of haploidy, high copy numbers, low recombination and lack of introns (Hebert et al., 2003).However, considering the nature of mitochondrial evolution and the results of this and previous studies, the use of mtDNA does not provide a good level of resolution for some of the Lucilia species.In addition, the analysis of nuclear genes, such as EF-1α and PER, cannot improve this point.Additional work using mtDNA in association with other genetic markers (i.e., sex-linked loci) could clarify and resolve the relationships among the Lucilia genus as well as other close related species.It is worth mentioning that the investigation for the best marker has to be done at the genus level, in fact some markers that have been suggested in addition to COI (e.g., ITS2) work for the resolution of certain taxa but not for others.In addition, given the problems in the resolution of several genera/species in the family Calliphoridae as highlighted as well in this paper, an approach based on NGS technologies (e.g., WGS -whole genome shotgun) will probably provide enough information to distinguish the taxa.