There is no magic bullet: the importance of testing reference gene stability in RT-qPCR experiments across multiple closely related species

Reverse Transcriptase quantitative Polymerase Chain Reaction (RT-qPCR) is the current gold standard tool for the study of gene expression. This technique is highly dependent on the validation of reference genes, which exhibit stable expression levels among experimental conditions. Often, reference genes are assumed to be stable a priori without a rigorous test of gene stability. However, such an oversight can easily lead to misinterpreting expression levels of target genes if the references genes are in fact not stable across experimental conditions. Even though most gene expression studies focus on just one species, comparative studies of gene expression among closely related species can be very informative from an evolutionary perspective. In our study, we have attempted to find stable reference genes for four closely related species of grasshoppers (Orthoptera: Acrididae) that together exhibit a spectrum of density-dependent phenotypic plasticity. Gene stability was assessed for eight reference genes in two tissues, two experimental conditions and all four species. We observed clear differences in the stability ranking of these reference genes, both between tissues and between species. Additionally, the choice of reference genes clearly influenced the results of a gene expression experiment. We offer suggestions for the use of reference genes in further studies using these four species, which should be taken as a cautionary tale for future studies involving RT-qPCR in a comparative framework.


INTRODUCTION
The current gold standard tool for studying gene expression at the RNA level is Reverse Transcriptase quantitative Polymerase Chain Reaction (RT-qPCR or simply qPCR), due to its high sensitivity and speed of analysis (Gachon, Mingam & Charrier, 2004;Thellin et al., 2009). Nonetheless, qPCR accuracy is highly dependent upon the normalization of target gene expression with reference genes. An optimal reference gene should show minimal variability in its expression level between tissues, be unaffected by tested experimental factors, and exhibit similar expression levels as target genes (Vandesompele et al., 2002). Often those genes that are critical for maintaining basic cellular functions and expressed in all cell types, commonly referred to as housekeeping genes, are used as reference genes for qPCR experiments. Although qPCR is one of the basic tools employed in functional genetic research, mistakes in the experimental setup for qPCR experiments are surprisingly common, including the use of an inappropriate number of reference genes or the lack of accurate testing of reference gene performance under specific experimental conditions (Bustin et al., 2013;Gutierrez et al., 2008;Kozera & Rapacz, 2013). It is often the case that certain reference genes are selected for a particular qPCR experiment simply because they have been used previously, either for other experimental conditions or even in other tissues and species. This type of blind adoption of reference genes can result in inaccurate normalization of target gene expression, and ultimately in an incorrect interpretation of the results (Bustin et al., 2013;Dheda et al., 2004;Gutierrez et al., 2008;Nicot et al., 2005;Tricarico et al., 2002;Vandesompele et al., 2002). In response to these issues, the Minimum Information for Publication of Quantitative Real-Time PCR Experiments (MIQE)-guidelines, aiming to enhance the consistency of performing, interpreting, and reporting qPCR data, were formulated (Bustin et al., 2009). Additionally, several statistical algorithms have been developed to identify the best reference genes to use under certain experimental conditions (Andersen, Jensen & Orntoft, 2004;Hellemans et al., 2007;Pfaffl et al., 2004;Vandesompele et al., 2002). Therefore, it is critical that a thorough investigation of reference gene stability is needed prior to setting up any qPCR experiment.
While a vast majority of qPCR-based studies focus on gene expressions on a single species, there is a recognition that a comparative gene expression study across multiple closely related species that may have distinct biological or ecological differences can reveal important insights into the evolution of gene functions (e.g., Salazar-Jaramillo et al., 2017;Sørensen et al., 2019;Wittkopp, Vaccaro & Carroll, 2002). However, one of the assumptions in this type of comparative studies is that the reference genes that work well for one species must also work well for another closely related species. Although the reference genes are supposed to be stable within a species (Vandesompele et al., 2002), there is no a priori reason to believe that the same pattern is found in another species. If this assumption does not hold true, the subsequent inferences about the expression level of the gene of interest can be incorrect. Nevertheless, this assumption is rarely tested.
In this study, we explore the merit of this assumption by testing reference gene stability in qPCR experiments in four closely related species of grasshoppers in the genus Schistocerca (Orthoptera: Acrididae). Our initial motivation for this work comes from our long-term interest in understanding the molecular basis of density-dependent phenotypic plasticity in locusts. In short, locusts are grasshoppers that show an extreme form of density-dependent phenotypic plasticity in which relatively inactive and solitary individuals can transform into highly active and gregarious individuals in response to change in local population density (Cullen et al., 2017;Pener, 1983;Pener & Simpson, 2009;Uvarov, 1921). When the high density persists, locusts exhibit collective movements, which can lead to locust plagues (Pener & Simpson, 2009). These two density-dependent phenotypes are called solitarious and gregarious phases (Pener & Simpson, 2009;Uvarov, 1921), and understanding the molecular basis of this phenomenon has been considered the last frontier in locust research (Cullen et al., 2017;Pener & Simpson, 2009). We have been studying the Central American locust, Schistocerca piceifrons (Walker), an important swarming locust species affecting Mexico and Central America (Barrientos Lozano et al, 1992;Bredo, 1963;Harvey, 1983), as a model system, which shows behavioral, morphological, physiological, ecological, and molecular plasticity in response to change in density, similar to its more well-studied congener, the desert locust S. gregaria (Forskål). Our research has shown that the Central American locust is more closely related to non-swarming grasshoppers than to other locust species within the genus (Song et al., 2017), and that these non-swarming relatives also show density-dependent phenotypic plasticity, reminiscent of the swarming locusts (Gotham & Song, 2013;Song et al., 2017). Therefore, over the past several years, we have been investigating the evolution of density-dependent phenotypic plasticity in a comparative framework by comparing transcriptomes of S. piceifrons and three other closely related Schistocerca species, which have led to the identification of some candidate genes that might be relevant for the expression of density-dependent phenotypes. It is in this context that we ask a question about the suitability of using the same reference genes for qPCR experiments across the four closely related species. In this study, we have designed and analyzed the stability of eight potential reference genes across the four species reared in two density conditions (isolated vs. crowded), and we demonstrate that the assumption of reference gene stability is not supported in our study system. We also validate our choice of reference genes with a set of four candidate genes in S. piceifrons by performing qPCR experiments. Finally, we provide a set of recommendations for selecting appropriate reference genes in a comparative analysis involving multiple closely related species.

Study insects and rearing regime
We used four closely related species in the genus Schistocerca, maintained as laboratory colonies in the Department of Entomology at Texas A&M University. The four species were the Central American locust, S. piceifrons, and three sedentary species, S. americana (Drury), S. serialis cubense (Saussure), and S. nitens (Thunberg). For conciseness, we use the species epithet (piceifrons, americana, cubense, and nitens) throughout the paper. The piceifrons colony was established from an outbreak population in Yucatan, Mexico collected in October 2015, and imported under a USDA permit (USDA APHIS PPQ P526P-15-03851). The americana colony was established from a population in Brooksville, Florida, collected in September 2010. The cubense colony was established from a population in Islamorada in the Florida Keys collected in January 2011. Finally, the nitens colony was established from a population in Terlingua, Texas, collected in May, 2015. For this study, the colonies of these four species were reared under crowded and isolated conditions. For the isolated condition, nymphs were isolated as hatchlings and reared in separate plastic cages ( we observed that the insects would die in the density used for other species. In both density conditions, the insects were reared at 12 h of light and 12 h of darkness at 30 • C, and were fed daily Romaine lettuce and wheat bran. We reared the insects until they molted to the last nymphal instars to conduct qPCR experiments.

RNA extraction and cDNA synthesis
Sample collection, RNA extraction and RNA quality assessment were performed as previously described (Wang et al., 2020). Briefly, crowded-reared and isolated-reared female nymphs were marked with a ceramic marker on the abdomen after molting to their last nymphal instar, and were dissected around 72 h later. Only specimens that molted before 10 AM were used, and all dissections occurred between 8 and 9 AM. After removing gut tissues, head and thorax were dissected using sterilized dissection scissors. Both tissues were preserved in RNAlater (ThermoFisher Scientific) at −20 • C, following the manufacturer's guideline. A total of 10 nymphs/density/species were dissected. Half of these was used for RNA sequencing, and the other half was used for the qPCR experiment. Tissues were placed in MagNA Lyser Green Beads (Roche) sample tubes and were homogenized for 30 s in one mL of Trizol (ThermoFisher Scientific) using a MagNa Lyser instrument (Roche) at 6,500 rounds per minute. Whole-tissue RNA was extracted using a Trizol-chloroform extraction, followed by clean-up with a RNeasy mini kit (Qiagen) using an on-column DNAse treatment with a RNase-free DNAse set (Qiagen). RNA concentrations were measured with a Denovix DS-11 spectrophotometer; 260/280 and 260/230 values were above 2.0 for all samples. Additionally, microcapillary electrophoresis with a Fragment Analyzer (Agilent Technologies RNA) was used to analyze RNA integrity of samples destined for RNA sequencing. Only those samples with a RNA Quality Number (RQN) over 3.9 were used. Due to differences in 28S ribosomal RNA structure compared to other eukaryotic species, RQN values for insects are often lower than what is generally considered a valid threshold in mammalian samples (Escobar & Hunt, 2017;Macharia, Ombura & Aroko, 2015;Winnebeck, Millar & Warman, 2010). Samples used for qPCR were diluted to a concentration of 100 ng*ml −1 and subsequently used to synthesize cDNA using the iScript cDNA synthesis kit (Bio-Rad) following the manufacturer's guideline.

Sequence assembly and expression analysis
All eight transcriptomes were imported into Geneious (R10.2.6; BioMatters, Ltd.). We selected nine potential reference genes: actin5C (Act5C), α-tubulin (Tub), succinate dehydrogenase (SDH ), elongation factor 2 (EF2), ribosomal protein L5 (RIBL5), glyceraldehyde-3-phosphate dehydrogenase (GAPDH ), annexin IX (Ann), armadillo (Arm) and heat shock protein 70 (Hsp70). We also selected four target genes: pacifastinrelated peptide 4, pacifastin-related peptide 5, allatostatin (ast ), and allatotropin (at ). For consistency, we followed suggestions by Simonet et al. (2004) for naming pacifastin-related peptides and named the Schistocerca piceifrons sequences SPPP-4 and SPPP-5. We obtained nucleotide sequences for all 13 genes from other orthopteran insects from Genbank (https://www.ncbi.nlm.nih.gov/genbank/) and used Megablast or tblastx with default settings in Geneious to find related sequences in our four species. For each gene, sequences of all four species were aligned using MUSCLE version 3.8.24 (Edgar, 2004) in Geneious, with a maximum of nine iterations and default settings. Subsequently, sequences were manually curated, resulting in full-length coding sequences for each species. The identity of each gene was confirmed using the standard nucleotide Basic Local Alignment Search Tool (blastn) at the NCBI website, using the nr/nt database as reference (Altschul et al., 1990;Johnson et al., 2008). Sequence information can be found in Table 1. Additionally, mapping reads were counted as previously described (Wang et al., 2020), using bowtie2 (very sensitive end-to-end, disable no-mixed and no-discordant behavior) and SAMtools' idxstats (Li et al., 2009) in Galaxy. Differential expression analysis was performed with DEseq2 (Love, Huber & Anders, 2014) within SARTools 1.7.1 (Varet et al., 2016), using R 3.5.3 (R Core Team, 2017). All settings were kept at their defaults; the expression in crowded-reared individuals was compared to that of conspecific isolated-reared individuals.

Primer design
Primers for reference genes (Table 2) were designed using Primer3 (Koressaar & Remm, 2007;Untergasser et al., 2012), based on conserved regions, identified using the nucleotide alignment in Geneious. In this way, we aimed to generate primers that would work in all four species. Standard settings were altered to an optimal melting temperature set at 60 • C, and the amplicon length at 150-200 bp. For the four target genes, primers were designed for just piceifrons, with identical settings in Primer3 as described above (Table 2). We designed three primer pairs for each gene. All sequences were ordered from Integrated DNA Technologies.

Real time quantitative PCR and statistics
For each qPCR reaction, 5 µl of cDNA was added to 10 µl of SsoAdvanced TM Universal SYBR R Green Supermix (Bio-Rad #1725275) and 5 µl of primers at a final concentration of 500 nM. Every reaction was performed in duplicate or triplicate. To determine primer efficiency for reference genes, a dilution series of one sample was generated for each species by performing serial 10-fold dilutions ranging from dilutions of 1/1 to 1/10,000. For the target genes, a 5-fold dilution series from 1/1 to 1/3,125 was used instead for determining primer efficiency. Based on these data, the most efficient primer pair was used in all further analyses. All other reactions were run on 96-well plates, with 10 samples (five isolated-reared individuals and five crowded-reared individuals) run on the same plate for a particular gene following the sample maximization method (Hellemans et al., 2007).
Reactions were run on a CFX connect real time system (Bio-Rad) using the following thermal cycling profile: 3 min at 95 • C, 40 cycles of (1) 15 s at 95 • C and (2) 45 s at 60 • C, and a melting curve from 65 • C to 95 • C. C q values were exported from the Bio-Rad CFX manager using the default threshold. Table 2 Primer information for qPCR experiment. Primer sequences, amplicon melt temperature and primer efficiencies are given. Primer efficiencies were obtained with a 10-fold dilution series ranging from 1/1 to 1/1,000,000.

piceifrons americana cubense nitens
Gene code Primer sequences For the analysis of reference gene stability, a total of five crowded-reared and five isolated-reared individuals per species were used for both head and thorax tissues. Gene stability was analyzed using three different programs: geNorm in qbase+ (Mestdagh et al., 2009;Vandesompele et al., 2002), NormFinder (Andersen, Jensen & Orntoft, 2004), and BestKeeper (Pfaffl et al., 2004). Raw C q values were used as input for BestKeeper and geNorm, while they were transformed to a linear scale for NormFinder. Gene efficiencies were assumed to be 100% for all tested genes (see Table 2 for actual gene efficiencies). To obtain a non-arbitrary ranking of reference genes, they were first ranked for each program separately, and subsequently the geometric mean of these scores was calculated as an accurate estimate of which reference genes were the most stable (Chen et al., 2011;Chen et al., 2013;Gong et al., 2016;Pereira-Fantini et al., 2016). The amount of reference genes needed to standardize gene expression was assessed using geNorm. Normalization factors were calculated in a stepwise manner, starting by taking the two most stable reference genes into account and adding another reference gene one by one. Subsequently, the variation between these normalization factors were compared. If the difference in variation is lower than 0.15, the addition of an additional reference gene was deemed unnecessary (Mestdagh et al., 2009;Vandesompele et al., 2002).
For the quantification of target gene expression, we used thorax tissues of five isolatedreared and five crowded-reared individuals of piceifrons. Relative expression of target genes (ast,at, for thorax tissue in piceifrons was calculated with the C q -method (Livak & Schmittgen, 2001), using different sets of reference genes to normalize gene expression. P-values were calculated with a two-tailed student t -test in R (version 3.6.2) based on non-transformed C q -values. All raw qPCR data can be found in Data S1-S3, and the code for the student t -test is present in Data S1.

Selection of potential reference genes
A total of 9 potential reference genes were initially selected based on previous studies using qPCR for locust gene expression research (Chapuis et al., 2011;Van Hiel et al., 2009;Yang et al., 2014). All of these genes were members of different functional categories (Table 1), with the exception of the cytoskeleton genes, Tub and Act5C, which are both commonly used as reference genes in insect gene expression studies (Lü et al., 2018). We chose EF2 and RIBL5 over elongation factor 1α and other ribosomal genes, respectively, based on the low amount of variation for these genes in our transcriptome data. For 7 out of 9 genes, primer efficiencies for the selected primer pairs varied between 96 and 106% (Table 2). Two exceptions were Arm with an efficiency of 109.8% for piceifrons and 92.1% for nitens, and SDH for which we were unable to find primers with sufficient efficiency values in all four species. We attribute this to a surprisingly high amount of sequence differences among the four species for SDH, effectively restricting the sequence regions that were conserved enough for primer design. As a result, SDH was removed from all further analyses, and thus, only eight reference genes were used in further analyses. All included primer pairs showed melt curves with a single peak, suggesting the absence of aspecific amplification and contamination. C q values varied between 17.5 and 28.5, with GAPDH and Hsp70 having the lowest values and Arm having the highest.

Comparing reference gene stability
We subsequently compared the stability of these eight potential reference genes by performing qPCR reactions in five isolated-reared and five crowded-reared individuals for head and thorax tissues in all four species (Fig. 1). In americana, cubense, and nitens, all eight genes showed similar expression levels under the tested rearing conditions. However, in piceifrons, several genes showed a trend towards differential expression between the two density conditions. Specifically, Ann, Act5C, Tub, and Hsp70 showed a trend towards lower expression in the isolated condition for both tissues, while Arm and EF2 showed this trend only in the thorax tissue. RIBL5 was the only gene showing no difference between the two density conditions. We were able to confirm most of these observed trends with our transcriptome data ( Table 3). The only discrepancies between our transcriptome data and the qPCR data were found for Hsp70 in head tissue, GAPDH for thorax tissue, and  Tub for both tissues. For thorax tissue, Act5C and Ann were even found to be significantly upregulated in crowded-reared individuals compared to isolated-reared individuals. NormFinder (Andersen, Jensen & Orntoft, 2004) calculates both intra-and inter-group variations of gene expression, and combines these into a stability value, and thus, genes with a lower rank are considered to be more stably expressed. geNorm (Mestdagh et al., 2009;Vandesompele et al., 2002), the most popular algorithm, assumes that expression values of real reference genes are perfectly correlated with each other in all tested samples. Based on this correlation, genes get a stability value (M value), with genes with M < 1.5 considered to be stable and the most stable gene obtaining the lowest score. BestKeeper (Pfaffl et al., 2004), on the other hand, outputs the standard deviation for each reference gene, expected to be lower than 1, and a correlation coefficient for each gene with the so-called BestKeeper-index, which is essentially the geometric mean of all stable reference genes. Before comparing the stability values, we estimated the amount of necessary reference genes for each species and tissue using geNorm's V-values. For americana, cubense, and nitens, geNorm suggested that using two reference genes was sufficient as its values for V 2/3 were below 0.15 (Fig. 2). However, for piceifrons, the difference in variation between using two or three reference genes was higher than 0.15 for the head tissue and very close to 0.15 for the thorax tissue (Fig. 2). Thus, the use of three reference genes is suggested for this species. Overall, the three programs listed genes in a similar fashion, with only a few exceptions (Tables 4 and 5). Tub and GAPDH were the two least stable genes, and both were listed as the last in 3 out of the 8 tested species/tissue combinations. Act5C be the most stable reference gene overall, as it was generally listed in the top half of stability values by all three programs. Arm was also often listed in the top three of most stable genes, but was also listed as the seventh most stable gene in some cases (Normfinder, Bestkeeper in piceifrons head; Normfinder in nitens head).
For the americana head tissue, Act5C and EF2 were the two most stable genes overall, with Act5C rated the most stable gene by each program, and EF2 obtaining a second place for both geNorm and NormFinder. RIBL5 and Arm, which were also consistently ranked as stable reference genes, obtained a third and a fourth place. For the americana thorax tissue, RIBL5 and EF2 were our two reference genes of choice, as they obtained a first and a second place overall. Additionally, Arm and Act5C were ranked as third and fourth overall, but obtained very similar stability scores to EF2 and were ranked in the first half by all three programs. For the cubense head tissue, both Act5C and Hsp70 were ranked within the top 3 most stable reference genes by all three programs and as a result, obtained top overall rankings. For the cubense thorax tissue, Hsp70 obtained the best ranking, followed by Ann ranked second. Nonetheless, it should be noted that for this species and tissue combination, several reference genes obtained very similar scores for both NormFinder and geNorm, resulting in low rankings for some genes that were actually quite stable. For instance, Act5C was ranked as only fifth overall even though it performed very similar to Ann, which was ranked second. For the nitens head tissue, the rankings suggested by the three different programs disagreed strongly with each other, as Ann and Act5C were ranked first and second by geNorm and NormFinder, but fifth and fourth by BestKeeper, while Arm was ranked first by BestKeeper and sixth and seventh by the other two programs. Overall, Ann and Act5C were ranked first and second, and Hsp70, which was the only gene ranked within the first half by all three programs, was ranked as third. For the nitens thorax tissue, Ann and Arm were clearly the preferred reference genes as they were consistently ranked as the two most stable genes by all three programs. Finally, for both tissue of piceifrons, Act5C and Ann were ranked in the top 4 of most stable genes by all programs. RIBL5 and Arm were also ranked in the top 4 for head and thorax tissue respectively.  Figure 2 Pairwise variation analysis for qPCR reference genes. All data were generated in qbase+ using the geNorm algorithm, based on five biological replicates per group. Values represent the pairwise variation analysis between normalization factors NF n and NF n+1 , or the decrease in variation when adding an additional reference gene. Using 0.15 as a threshold, the optimal number of reference genes required for accurate normalization can be determined.

Validation of reference genes
To explore how the choice of reference genes would affect the outcome of qPCR experiments, we quantified and compared the expression levels of the four target genes (ast, at, SPPP-4, and SPPP-5) using different reference genes in the thorax tissue of piceifrons reared in two density conditions. As Tub was the least stable reference gene, it was not included for this analysis. Our data showed the choice of reference genes strongly affected the qPCR results (Fig. 3). Regardless of the reference genes used, ast showed a significantly higher expression level in isolated-reared individuals compared to crowdedreared individuals, and SPPP-5 never came close to reaching significant differences. When all of the reference genes (except Tub) were chosen, at and SPPP-4 showed trends toward decreased and increased expression, respectively, in crowded-reared individuals (p = 0.068 and p = 0.061, respectively). We subsequently removed RIBL5 from the analysis and thus only used reference genes of which the expression varied between both rearing conditions. As a result, the p-value for at dropped below 0.05, while the trend observed for SPPP-4 partially disappeared. Lastly, we included just Hsp70, RIBL5 and GAPDH as reference genes, which was our preferred set of reference genes based on the earlier data. As a result, SPPP-4 showed a significantly higher expression level in the crowded-reared condition, while no significant difference was observed for at anymore (p = 0.19). Three out of the four of these qPCR results agreed with our transcriptome data (Table 6).

DISCUSSION
The overall goal of this study is to address the suitability of using the same reference genes for qPCR experiments across multiple closely related species. So far, only a handful of studies have compared reference gene stability between different species (e.g., Axtner & Sommer, 2009;Bonnet et al., 2013;Giménez, Pistón & Atienza, 2011;Lin et al., 2009; Weyrich, Axtner Relative concentrations were calculated with the C q -method, with values shown in the graph representing the exponentialized values. Error bars represent the standard error of mean, calculated on exponentialized values. pvalues were calculated based on the C q values. All data is based on the expression in five isolated-reared and five crowded-reared individuals. Significant differences between isolated-reared and crowded-reared individuals were marked for p < 0.05 (*) or p < 0.01 (**). (A) All available reference genes were used; Act5C, Ann, Arm, Hsp70, EF2, GAPDH and RIBL5, (B) only suggested reference genes were used; RIBL5, GAPDH and Hsp70, (C) All reference genes showing a trend towards differential expression in thorax were used; every reference gene except RIBL5.
Full-size DOI: 10.7717/peerj.9618/fig-3 Table 6 Comparison of target gene expression using RNA-Seq and using RT-qPCR. Fold changes for RNA sequencing study were obtained with Deseq2 in SARTools, adjusted pvalues were obtained using default settings. Fold changes for RT-qPCR study were obtained using C q method, p-values were obtained with a student t -test in R. For each analysis, five isolated-reared and five crowded-reared last instar nymphs were used. For both analyses, the isolated-reared condition was used as reference.  Sommer, 2010). Many qPCR-based studies have simply used the reference genes designed and optimized for one species in a study involving another species without specifically testing for their gene stability (Bustin et al., 2013;Lin et al., 2009;Lü et al., 2018). In this study, we have compared the stability of eight potential reference genes across four Schistocerca species reared in two density conditions, and demonstrated that the reference gene stability cannot be assumed a priori in a comparative analysis across multiple species. The choice of reference genes used in qPCR experiments is often guided by the genes used in prior studies, rather than by a comprehensive assessment of all available candidates. Lü et al. (2018) performed a meta-analysis on the selection of reference genes in insect qPCR experiments based on 90 papers published between 2013-2017, and showed that the same 10 reference genes were repeatedly used across 78 insects belonging to 10 different orders. There are certainly more than 10 reference genes available and we think that there could be hundreds or even thousands of potential reference genes in any given species. For example, there are over 3,800 genes that are expressed uniformly across tissues in humans (Eisenberg & Levanon, 2013), and hundreds of reference genes have been utilized in human gene expression studies. The reason why only a small number of reference genes are used in insect gene expression studies is probably because we do not fully understand gene contents and their expression patterns in most insects. Therefore, when designing qPCR experiments for non-model insect species, one has no other option but to rely on the previous studies to identify well-known reference genes. Our study was no exception. We selected our reference genes based on studies performed in other locust species, including the desert locust Schistocerca gregaria (Van Hiel et al., 2009), the migratory locust Locusta migratoria (Yang et al., 2014), and the Australian plague locust Chortoicetes terminifera (Chapuis et al., 2011). However, only one of these (Chapuis et al., 2011) included rearing density as an experimental factor, and found Arm and Elongation Factor 1α as the most stable reference genes, while GAPDH and SDH were found to be the least stable. Even though we did not test exactly the same genes, GAPDH is overall also one of the least stable reference genes, while Arm is often found within the top 3 of most stable genes in our study. This suggests that, to a certain degree, stable reference genes in one species can be expected to be stable in a related species, even though there is a clear need for scrutiny as the reference gene stability varied widely across both species and tissues in our study.

RNA sequencing RT-qPCR
We have designed primers for eight reference genes that can be used across four species of Schistocerca. In doing so, we observed an interesting trend when comparing primer efficiencies between different species. As the four Schistocerca species used in our study are closely related, we designed and tested a single primer pair per each gene that would work for all four species. We consistently selected conserved gene regions for the primer design, and made sure that the selected regions did not overlap with sites showing single nucleotide polymorphisms. Nonetheless, the primer efficiencies for some primer pairs varied widely ( Table 2). As such, it seems that a base pair difference in the amplicon, or maybe even upstream or downstream of it, can influence the primer efficiency. This indicates that primer efficiencies should always be tested for every species (or even subspecies/populations) included in a comparative study, even if those species are very closely related to each other.
After designing the primers for the reference genes and testing their efficiencies, we assayed their gene stability using three commonly used programs (NormFinder, geNorm, and BestKeeper). Interestingly, we found that different programs selected different genes as their top choices, and the results also depended on both tissue types and species (Tables  4 and 5). This discrepancy is probably due to the different algorithms employed by these programs.
We have also identified additional issues when these reference genes are used for a species exhibiting an extreme form of phenotypic plasticity. Of the four species analyzed in this study, piceifrons is the only swarming locust species, and can be expected to exhibit larger molecular differences between both density conditions than the other three species. It is thus feasible that several of the tested reference genes could also be influenced by the extreme form of density-dependent phenotypic plasticity. In fact, we report that several reference genes in piceifrons showed a trend towards differential expression. Additionally, geNorm predicted that the use of two reference genes would not be sufficient to perform a qPCR experiment for piceifrons, in contrast to what the program suggested for the other three species. All of the samples were given exactly the same treatment independent of rearing condition and were all diluted to 100 ng/µl before the cDNA synthesis. As such, we consider it unlikely that all isolated-reared samples somehow had lower RNA concentrations than the crowded-reared samples. Given the extreme form of densitydependent phenotypic plasticity in piceifrons, it is conceivable that some of the tested reference genes actually respond differentially to the rearing conditions. This idea was supported by our transcriptome data, where most genes exhibited a similar trend towards differential expression as found in our qPCR, and two genes even were significantly different between both rearing conditions (Table 3). These observations make the selection of suitable reference genes for piceifrons a lot more complicated compared to that for the other three species. To make the situation even worse, the algorithms used to select the most stable reference genes failed to select the right reference genes. This was especially evident for the thorax tissue in piceifrons, as the only gene not exhibiting different expression levels between both rearing conditions, RIBL5, was ranked as the 8th most stable reference gene. Both geNorm and BestKeeper assume that perfect reference genes are strongly correlated to each other, and as a result they are susceptible to errors when the majority of the tested genes is actually influenced by the tested experimental conditions. Likewise, even though NormFinder takes both inter-and intra-group variation into account and should thus perform better under such conditions, this program ranked several genes showing a clear trend towards differential expression between rearing conditions as the most stable in piceifrons. Therefore, it is clearly not sufficient to only consider the overall stability values in selecting reference genes, but it is also important to take the actual C q values into account. Additionally, these findings suggest that we cannot simply use the same set of reference genes blindly across the species, but we must carefully consider the type of reference genes to be used for each species.
For all species, we followed the suggestions of the geNorm analysis for determining the amount of reference genes. For piceifrons, we thus suggest three genes should be used, while for the other species, two reference genes should be sufficient. In piceifrons, Ann, Arm and Act5C should not be selected as the reference genes when studying differential gene expression between isolated-reared and crowded-reared individuals. For such a study, we suggest the use of RIBL5, Hsp70 and EF2 for the head tissue, and RIBL5, Hsp70 and GAPDH for the thorax tissue instead. Even though RIBL5 was listed as the last-ranked reference gene by all three programs for the thorax tissue, we consider its inclusion is still valid for two reasons. First, it is the only gene that shows no differential expression between the density conditions. Second, the standard deviation on this gene, reported by BestKeeper, is also one of the lowest, even though this variable was not included in our ranking. After selecting the reference genes according to what is presented above to normalize gene expression of the four target genes, our results matched well with the results of RNA-seq experiments. For the other three species, the choice of reference genes is more straight-forward, as there was no discrepancy between the stability values and the C q values. Thus, we suggest the use of the two genes with the best overall stability ranking as reference genes in further studies. Our suggestions for each species are summarized in Table 7. Two of the selected target genes for validation were pacifastins (SPPP-4 and SPPP-5), which are members of a family of peptidase inhibitors that exhibit differential expression between isolated-reared and crowded-reared individuals in S. gregaria (Badisco et al., 2011;Breugelmans et al., 2008;Breugelmans et al., 2009;Simonet et al., 2005;Simonet et al., 2004). We show that SPPP-4 expression in the thorax tissue from crowded-reared individuals is higher compared to isolated-reared individuals, and SPPP-5 shows a similar but a non-significant trend, consistent with earlier studies in S. gregaria (Badisco et al., 2011;Breugelmans et al., 2008;Simonet et al., 2005). The other two target genes were allatotropin, a pleiotropic hormone that is best known as a stimulator of juvenile hormone (JH) production and a myostimulatory peptide (Kataoka et al., 1989;Lismont et al., 2015), and allatostatin, known to repress JH production in insects even though this function could not be confirmed in S. gregaria (Stay, Tobe & Bendena, 1995;Verlinden et al., 2009). We report that ast is significantly downregulated in crowded-reared locusts, and observed a similar but non-significant trend for at. We also note that these observations were not confirmed by our RNA sequencing data, where no significant differences were obtained for either of these two genes. Both ast and at exhibited very low expression levels, and the difference between both techniques might thus represent a difference in sensitivity between both techniques. Other differences between both technique (e.g., RNA sequencing being more prone to PCR artefacts and positional bias) might play an additional role in the observed differences. An earlier study in S. gregaria reported no significant difference between isolated-reared and crowded-reared individuals for at (Lismont et al., 2015), even though the data were not shown. Together, our data might suggest differences in JH regulation between isolated-reared and crowded-reared locusts.

CONCLUSIONS
Our study is one of only few studies testing the stability of reference genes for qPCR in multiple species. When ranking reference genes based on their stability, we found clear differences between the four tested species. Additionally, all three used algorithms, geNorm, BestKeeper, and NormFinder, seemed to struggle when the majority of tested reference genes showed a trend towards differential expression in the tested experimental conditions. As such, stability values obtained by these algorithms should always be analyzed side by side with the raw C q values. Additionally, our study suggests that both qPCR primer efficiency and reference gene stability should always be tested separately for each species of interest. Not doing so might result in differences in amplification efficiencies and incorrect interpretation of the results.