Evolution of digestive enzymes and dietary diversification in birds

As the most species-rich class of tetrapod vertebrates, Aves possesses diverse feeding habits, with multiple origins of insectivory, carnivory, frugivory, nectarivory, granivory and omnivory. Since digestive enzymes mediate and limit energy and nutrient uptake, we hypothesized that genes encoding digestive enzymes have undergone adaptive evolution in birds. To test this general hypothesis, we identified 16 digestive enzyme genes (including seven carbohydrase genes (hepatic amy, pancreatic amy, salivary amy, agl, g6pc, gaa and gck), three lipase genes (cyp7a1, lipf and pnlip), two protease genes (ctrc and pgc), two lysozyme genes (lyz and lyg) and two chitinase genes (chia and chit1)) from the available genomes of 48 bird species. Among these 16 genes, three (salivary amy, lipf and chit1) were not found in all 48 avian genomes, which was further supported by our synteny analysis. Of the remaining 13 genes, eight were single-copy and five (chia, gaa, lyz, lyg and pgc) were multi-copy. Moreover, the multi-copy genes gaa, lyg and pgc were predicted to exhibit functional divergence among copies. Positively selected sites were detected in all of the analyzed digestive enzyme genes, except agl, g6pc, gaa and gck, suggesting that different diets may have favored differences in catalytic capacities of these enzymes. Furthermore, the analysis also revealed that the pancreatic amylase gene and one of the lipase genes (cyp7a1) have higher ω (the ratio of nonsynonymous to the synonymous substitution rates) values in species consuming a larger amount of seeds and meat, respectively, indicating an intense selection. In addition, the gck carbohydrase gene in species consuming a smaller amount of seeds, fruits or nectar, and a lipase gene (pnlip) in species consuming less meat were found to be under relaxed selection. Thus, gene loss, gene duplication, functional divergence, positive selection and relaxed selection have collectively shaped the evolution of digestive enzymes in birds, and the evolutionary flexibility of these enzymes may have facilitated their dietary diversification.


INTRODUCTION
Aves, as the largest class of tetrapod vertebrates, consists of approximately ten thousand known living species, of which more than half are passerines (Gill, 1995). Along with the rich diversity of species, birds have developed a diverse range of dietary habits. In terms of diet, birds can be roughly divided into seven categories: insectivores (referring to species that predominantly feed on insects, such as cuckoos and swifts), frugivores (species that mostly feed on fruits, such as mousebirds and manakins), granivores (birds Swanson, Irwin & Wilson, 1991;Kornegay, Schilling & Wilson, 1994). Functional assays in insectivorous bats suggested that of the two duplicates of lysozyme, one was expressed widely across different tissues, while the other was highly expressed in the tongue exclusively, and had a higher activity in degrading glycol chitin (Liu et al., 2014). In general, despite the increased attention that has been given to digestive enzymes in other vertebrates, molecular evolution of digestive enzymes across the avian phylogeny remains largely unknown.
In this study, we undertook evolutionary analyses of 16 digestive enzyme genes identified from 48 available avian genomes that represent nearly all avian orders (Hillier et al., 2004;Dalloul et al., 2010;Warren et al., 2010;Jarvis et al., 2014;Zhang et al., 2014), with an aim to reconstruct the evolutionary history of digestive enzyme genes across the avian phylogeny. Furthermore, selective pressure analyses were conducted to determine whether these digestive enzyme genes have undergone adaptive evolution associated with dietary diversification in birds, with an aim to test a series of evolutionary hypotheses (Table 1). We estimate ω (the ratio of nonsynonymous to synonymous substitution rates) as an indicator to examine changes in selective pressure among amino acid sites and along different lineages. A higher ω indicates a molecular signature of adaptive evolution. In the present study, we hypothesized that genes encoding amylases have a higher ω value in species consuming more grains, and that genes encoding carbohydrases (excluding amylases) have a higher ω value in species consuming more grains, fruits, or nectar. In a similar fashion, genes encoding lipases and proteases were hypothesized to have a higher ω value in species consuming more meat, and genes encoding lysozymes and chitinases were hypothesized to have a higher ω value in species consuming more insects (Table 1).

Dietary data
Dietary information of 48 bird species ( Fig. 1 and Table S1) was derived from a comprehensive diet database. This database transformed the verbal dietary description derived from many key literature sources, such as handbooks and monographs, into standardized and semiquantitative information about relative importance of different food categories (Wilman et al., 2014). Five food categories consumed by birds were classified (including insects, meat, seeds, fruits or nectar, and other plant materials) ( Fig. 1 and Table  S1). The mean of consumption for a particular food component in all 48 birds with available genome sequences (Jarvis et al., 2014;Zhang et al., 2014) was calculated as 12.71% (seeds), 31.04% (meat), 34.38% (insects), and 25.42% (seeds, fruits and nectar), respectively. We assigned a species to a high or low group depending on whether its consumption is higher or lower than the mean. For details, see Supplemental Material S1.

Identification of digestive enzyme genes
From the same 48 avian species whose dietary data were collected, genomes sequences were retrieved from the Avian Phylogenomics Project (http://avian.genomics.cn/en/) (Jarvis et al., 2014;Zhang et al., 2014), and the basic statistics for each assembly are listed in Table S2. To identify the 16 digestive enzyme genes from each genome, we followed a method used in an earlier study (Wang & Zhao, 2015) with minor modifications. Synteny analysis was undertaken to further determine whether those undetected digestive enzyme genes are truly lost by querying their adjacent genes, with human and chicken genomes as references. The deduced nucleotide sequences of digestive enzyme genes, which contain all pseudogenes as well as partial and intact genes, are provided in Dataset S1. See also Supplemental Material S1.

Phylogenetic analysis
Phylogenetic trees based on each multi-copy gene were reconstructed with the Bayesian approach using the program MrBayes version 3.1.2 (Ronquist et al., 2012) with five million generations. The best-fitting models of nucleotide substitution were selected with the smallest Akaike Information Criterion (AIC) (Akaike, 1974) for each multi-copy gene, using the program MrModelTest version 2.3 (Nylander, 2004). See also Supplemental Material S1.

Evolutionary changes of gene family sizes
In order to understand evolutionary changes in the number of multi-copy digestive enzyme genes across the avian phylogeny, we used the program Computational Analysis of gene Family Evolution (CAFE) version 4.0 (De Bie et al., 2006), which also estimates the error rate associated with the gene copy number in a gene family and evaluates the quality of the genomic information (e.g., assembly and annotation) (Hahn et al., 2005).
Using the phylogenetic tree derived from a recent study (Prum et al., 2015) and the data of gene family sizes in the five multi-copy digestive enzyme genes (gaa, pgc, chia, lyz, and lyg ) from the 48 extant bird species with a sequenced genome, we inferred the global birth and death rates of the five gene families and the gene family sizes at each ancestral node, and identified rapidly evolved branches with accelerated rates of gene gain and loss (P value < 0.01) (De Bie et al., 2006).

Selection pressure analyses
To test whether differences in dietary habits have driven adaptive evolution of digestive enzyme genes in birds, we undertook a series of analyses to detect positive selection and differences in selection intensity.
First, two site-specific models in the Phylogenetic Analysis by Maximum Likelihood (PAML) package (Yang, 2007), M8a and M8 (Swanson, Nielsen & Yang, 2003), were used to detect positively selected sites in each gene, whereby ω (the ratio of nonsynonymous to synonymous substitution rates) can vary among amino acid sites, with ω < 1, ω = 1 and ω > 1 indicating negative selection, neutral evolution and positive selection, respectively. M8 is the alternative model, which assumes a beta distribution for ω among sites (0 < ω < 1) and an extra class of sites with positive selection (ω>1), while M8a is the null model, which assumes a beta distribution for ω among sites (0 < ω < 1) and an extra class of sites with neutral evolution (ω = 1). Furthermore, three improved likelihood methods in the Datamonkey web server, with both nonsynonymous and synonymous substitution rates under consideration (Pond & Frost, 2005), were used to further evaluate the signals of positive selection. Sites that were predicted to be under positive selection in this study were detected by at least two of the four likelihood methods. See also Supplemental Material S1.
Second, the branch model performed through CODEML in the PAML package (Yang, 2007) was used to estimate ω values among branches. Different ω values were compared between the two groups of branches, one containing species with a higher consumption of particular food items and the other containing species with a lower consumption of the same particular food items (Table S3). See also Supplemental Material S1.
In addition to several approaches described above for detecting branches or sites under positive selection, the RELAX method (Wertheim et al., 2015) in the HyPhy package (Pond, Frost & Muse, 2005) was used to test whether natural selection was intensified or relaxed along any of the branches in the avian phylogeny. Here, we hypothesized that selection intensity was relaxed on lineages with (1) a lower meat consumption for proteases and lipases, (2) a lower seed consumption for amylases, (3) a lower consumption of seeds, fruits or nectar for carbohydrases (excluding amylases), and (4) a lower insect consumption for chitinases and lysozymes. See also Supplemental Materials S1.

Functional divergence testing
Among the five multi-copy genes (chia, gaa, lyc, lyg and pgc), the gene tree reconstructed with chia sequences was unable to form distinct sub-clusters ( Fig. S1), thus functional divergence between sub-clusters of each of the four multi-copy genes (gaa, lyz, lyg and pgc) was inferred by the program DIVERGE version 3 (Gu & Vander Velden, 2002). Functional divergence can be divided into two types: type-I and type-II. Type-I suggests altered functional constraints (such as different evolutionary rates), whereas type-II indicates radical changes in amino acid biochemical properties (such as charge positive/negative, hydrophilic/hydrophobic) between two gene clusters. See also Supplemental Material S1.

Ancestral state reconstruction
We reconstructed the ancestral characters in Mesquite version 3.6 (Maddison & Maddison, 2018) using the parsimony model to determine the dietary composition of ancestral lineages and infer the evolutionary history of dietary diversification across the avian phylogeny. The proportions of a specified dietary component such as seeds, meat, fruits and nectar, or insects, were regarded as continuous characters. The phylogenetic tree representing the branching history of descent avian taxa was derived from Prum et al. (2015).

Identification of digestive enzyme genes
Using published vertebrate protein sequences of each digestive enzyme gene as queries, we executed tblastn searches and identified target genes from 48 avian genome sequences ( Fig. 1), which cover all but three orders of the Aves (Jarvis et al., 2014;Zhang et al., 2014).
For convenience, all identified sequences of each gene were classified into three categories: intact genes, partial genes and pseudogenes, among which the first two categories were putatively functional and pseudogenes resulting from nonsense or frame-shifting mutations were regarded as possibly nonfunctional.
Of the 16 digestive enzyme genes, three (chit1, lipf and salivary amy) were not found in 48 avian genome sequences. Synteny analysis on each of these three genes showed that although both upstream and downstream genes were found to be located on the same scaffold or chromosome in most species, the corresponding digestive enzyme gene could not be detected, suggesting an absence in the common ancestor of birds regardless of their dietary habits (Table S4). The remaining 13 genes are all composed of multiple exons with exon number ranging from 4 in lyz to 33 in agl, and lengths of coding regions range from 444 bp in lyz to 4,593 bp in agl. Among these, eight (agl, ctrc, cyp7a1, g6pc, gck, hepatic amy, pancreatic amy and pnlip) are single-copy genes, while the remaining five (chia, gaa, lyg, lyz and pgc) are multi-copy genes, with copy numbers ranging from two to four (Fig. 1). Notably, due to incomplete genome sequencing or possible gene loss, we were able to identify only 6 and 12 intact sequences in the ctrc and chia genes, respectively, with the former being absent in 27 avian genome sequences and the latter being incomplete in 37 avian genome sequences (Fig. 1).

Loss and duplication of digestive enzyme genes
Each digestive enzyme gene identified from avian genome sequences showed a high level of sequence similarity. All identified sequences of one digestive enzyme gene possessed an identical number of exons, except the triplicated lyg gene, wherein one copy lygC appears to have lost the first exon compared to the other two copies (lygA and lygB) (Fig. 1); this observation was first identified through immunoblots by Nile et al. (2004). These findings suggested that all digestive enzyme genes are evolutionarily conserved and thus functionally important in birds.
Phylogenetic analyses were undertaken to examine events of gene duplication and gene loss for multi-copy genes (chia, gaa, lyg, lyz and pgc). One digestive enzyme gene chia, which is involved in breaking down chitin of insects, was found to be multi-copied with the copy number varying significantly among species (Fig. S1). We noticed that the common cuckoo (Cuculus canorus), whose diet is mostly composed of insects, possessed the highest gene copy number of chia (n = 4). Phylogenetic analysis showed that multiple events of gene duplication and gene loss have occurred in chia over the course of avian evolution (Fig. S1). Similarly, phylogenetic tree of the lyz genes showed that the lyz genes of birds formed two distinct clusters, of which one belonged to the conventional lyz and the other belonged to the Ca 2+ -binding lyz (Fig. S2). Notably, both the Peking duck (Anas platyrhynchos) and common cuckoo were found to have two intact lyz genes, and phylogenetic analysis suggested that the two copies in duck belonged separately to the conventional and Ca 2+ -binding type, whereas both copies from common cuckoo belonged to the Ca 2+ -binding group (Fig. S2). For the lyg gene, our phylogenetic tree showed that the lyg genes from birds were classified into three groups with high posterior probabilities (Fig. S3), being consistent with a previous study where three categories of lyg (lygA, lygB and lygC) were proposed (Irwin, 2014). The occurrence of all three types of lyg genes in the Chinese softshell turtle (Pelodiscus sinensis ) and Chinese alligator (Alligator sinensis ) suggested that the duplication of lyg predates the divergence between birds and reptiles (Irwin, 2014). Additionally, the Australaves clade containing nine species from Cariamiformes, Falconiformes, Psittaciformes and Passeriformes was found to possess the single copy lygA without the other two copies (lygB and lygC) (Fig. 1), thus, we speculated that lygB and lygC losses may have independently occurred in the common ancestor of Australaves after divergence from Coraciimorphae ( Fig. 1 and Fig. S3). For the gaa gene, our phylogenetic tree showed that gaa genes of birds formed two distinct clusters (Fig. S4), which were denoted as gaa i and gaa ii by Kunita et al. (1997), who first discovered the two gaa copies in the Japanese quail (Coturnix coturnix japonica). The phylogenetic tree of pgc genes showed three distinct clusters: the basal cluster (pgc) included species from Galloanserae and Palaeognathae, where phylogenetic relationship agreed with the avian species tree (Prum et al., 2015) (Fig. S5); the other two clusters (pgb1 and pgb2) contained all species from Neoaves. Additionally, most species such as the golden-collared manakin (Manacus vitellinus), Anna's hummingbird (Calypte anna), and emperor penguin (Aptenodytes forsteri) possess two copies of pgc with one in each cluster (Fig. S5). Given that a pgb gene was identified in both chicken (Gallus gallus) and turkey (Meleagris gallopavo) (Castro et al., 2012), we speculated that the loss of pgc might have occurred in the common ancestor of Neoaves since their separation from Galloanserae, resulting in only species from Galloanserae and Palaeognathae harboring the pgc gene.
Gene family analyses on five datasets of multi-copy genes indicated that the global birth and death rate (λ) for all datasets was 0.001, and the global error rate was calculated to be 0.248. For each gene family, gene copies of all ancestral lineage nodes are shown in Fig. S6. The most recent common ancestor of the 48 bird species was estimated to have one gaa, two pgc, two chia, one lyz, and three lyg gene copies, respectively (Fig. S6). Notably, for the chia gene, reduction of copy number in lineages leading to the yellow-throated sandgrouse and white-tailed eagle, and copy number gains in the lineage leading to the common cuckoo were inferred to be rapidly evolved with P values <0.01 (Fig. S6). Moreover, an additional rapidly evolved branch with a reduction of gaa gene copies has occurred in the lineage leading to the white-tailed eagle (Fig. S6).

Notes.
a The natural logarithm of likelihood value. b P values denoted with two asterisks (**) when less than 0.01. c Positively selected sites with posterior probabilities ≥ 0.9. Posterior probability values are shown in parentheses next to site numbers, which follow chicken genes.
four methods were regarded as candidates under positive selection. As a result, a total of 148 positively selected sites were identified, comprising of: 14 (hepatic amy), nine (pancreatic amy), 16 (cyp7a1), 40 (pnlip), 12 (pgb1), 19 (pgb2), four (chia), 11 (lygA), 11 (lygB), four (lygC), and eight (lyz) (Table S5). Furthermore, 97.97% (145/148) of the putative positively selected sites were predicted to have experienced radical changes in structural or biochemical properties at the amino acid level by TreeSAAP (Table S5), implying that these digestive enzymes play important roles in shaping avian dietary diversity. For amylases, species were divided into two groups with contrasting seed ingestion; for carbohydrases (excluding the amylases), species were divided into two groups with contrasting ingestion of seeds, fruits and nectar; for lipases and proteases, species were divided into two groups with contrasting meat ingestion; for chitinases and lysozymes, species were divided into two groups with contrasting insect ingestion. ω for each group was estimated by the branch model implemented in PAML (Yang, 2007). An asterisk (*) indicates that ω estimated from the group with higher consumption of a particular food component is significantly greater than that from the other group with lower consumption. Full-size DOI: 10.7717/peerj.6840/ fig-2 We used a pair of branch models to test how selection pressure acted on digestive enzyme genes among bird species with different dietary habits. Through comparisons between nested models, we found that two genes (pancreatic amy and cyp7a1) derived from birds with specific dietary preferences have experienced adaptive evolution ( Fig. 2 and Table S6). Specifically, pancreatic amy genes in species with a higher seed consumption (ω = 0.23) were shown to have a higher ω value than those in species with a lower seed consumption (ω = 0.15), with an FDR corrected P value at 0.01 (Table S6). Since the pancreatic amylase is important in digesting starch and is thus unlikely to undergo a relaxation from functional constraint, we suggested that a short burst of positive selection rather than a relaxation of selective pressure occurred on the pancreatic amy gene in birds consuming higher amount of seeds. The cyp7a1 lipase gene was found to have evolved with a higher ω value in species with a higher meat consumption (ω = 0.21) than species with a lower meat consumption (ω = 0.14) (P = 0.01 after FDR adjustment) (Table S6). By contrast, genes coding for proteases and other enzymes involved in digesting insects did not show any signals of higher ω values in species with a higher meat or insect consumption ( Fig. 2 and Table S6).
In addition, for these two genes (pancreatic amy and cyp7a1) that have undergone positive selection detected by branch models, positively selected sites identified by site models were labeled on specific branches where amino acid changes could lead to radical changes in structural or biochemical properties (Fig. S7). Notably, we found that pancreatic amy at the ancestral lineage of chicken and turkey experienced the same amino acid change (T455D) as the pigeon (Columbia livia) and the ancestral lineage of medium ground-finch (Geospiza fortis) and zebra finch (Taeniopygia guttata) (Fig. S7A). All five species mentioned above have a higher seed consumption (Fig. S7A); this is suggestive of convergent evolution. Similarly, an amino acid change (Q30R) in the cyp7a1 gene was found to have occurred in ancestral or extant lineages with a higher meat consumption (Fig. S7B).
We subsequently tested in each gene whether the intensity of purifying selection is relaxed along test branches by the RELAX method implemented in HyPhy (Table 3). Our results showed that for the gck carbohydrase gene, when compared with species with a higher consumption of seeds, fruits, and nectar, test branches representing species with a lower consumption of those food items were shown to be significantly relaxed (Table 2). In addition, for the pnlip lipase gene, test branches representing species with a lower meat consumption were also shown to be significantly relaxed compared to those with a higher meat consumption (Table 3). Meanwhile, ω values of test branches estimated for both gck and pnlip were found to shift towards neutrality (ω = 1) (Fig. 3). The selection intensity parameter (k) in gck and pnlip was 0.75 and 0.59, with P values of 0.0049 and 0.0001, respectively (Table 3). However, the selection intensity of most digestive enzyme genes showed no significant differences between test and reference branches.

Functional divergence analysis
We next estimated functional divergence between clusters of four multi-copy genes (gaa, lyz, lyg and pgc), and the results are summarized in Table 4. Besides the comparison between Ca 2+ -binding lyz and conventional lyz, type-I functional divergence detecting altered functional constraints showed that the coefficients of type-I functional divergence (θ I ) in any pairs of gaa, lyg and pgc were significantly greater than 0, suggestive of site-specific rate shift after gene duplication occurred in them (Table 4). Moreover, type-II functional divergence that detects radical changes in amino acid biochemical properties was inferred. Statistical significance of coefficients of type-II functional divergence (θ II ) estimated between four gene pairs (gaa i/gaa ii, lygA/lygB, lygA/lygC and pgb1/pgb2) indicated that radical changes in amino acid properties might contribute to functional divergence (Table  4). To further identify the critical amino acid sites (CAASs) that might be responsible for functional divergence, the cutoff of posterior probability (Q k ) >0.95 was used. We found that for six gene pairs gaa i/gaa ii, Ca 2+ -binding lyz/conventional lyz, lygA/lygB, lygA/lygC, lygB/lygC and pgb1/pgb2, a total of 20, 0, 36, 45, 5 and 34 CAASs were identified for type-I functional divergence, respectively, and a total of 190, 27, 63, 56, 16 and 99 CAASs, respectively, were identified for type-II functional divergence (Table 4). No CAASs were identified for type-I functional divergence between Ca 2+ -binding lyz/conventional lyz, whereas 27 sites were detected for type-II functional divergence (Table 4). Thus, we Table 3 Analyses on selection intensity for avian digestive enzyme genes conducted by RELAX (Wertheim et al., 2015). Below average consumption of grain, fruit and nectar (9) Above average consumption of grain, fruit and nectar (5 Notes. a The natural logarithm of likelihood value. b Twice the difference in ln L between two models compared. c An asterisk (*) denotes P < 0.05, whiletwo asterisks (**) indicate P < 0.01. d Significant P values with k < 1 or k > 1 indicate that selection strength is relaxed or intensified along the test branches, respectively. In columns ''Gene'', ''Test branch'' and ''Reference branch'', numbers in parentheses represent total numbers of intact sequences. Two genes (gck and pnlip) were predicted to be under relaxed selection, which simultaneously require P < 0.05 and k < 1 (both are underlined and shown in bold). ( speculated that these 27 sites between clusters of lyz might serve as the main driver to facilitate the shift in evolutionary rate.

Ancestral state reconstruction
Based on the dietary data from Wilman et al. (2014), we conducted the reconstruction of ancestral states in dietary composition and the proportions of consuming seeds, meat, fruit or nectar, and invertebrates were chosen as four continuous characters. This reconstruction showed that the most recent common ancestor of extant birds tended to feed mainly on plant materials (e.g., seeds, fruits, nectar and other plant tissues) and invertebrates (e.g., insects) (Fig. S8). Furthermore, we found that meat gradually became the dominant dietary item in two distinct ancestral lineages: one leading to Aequorlitornithes, and the other leading to Accipitriformes (Fig. S8). In addition, ancestral lineages leading to several species consuming more fruit or nectar, such as Anna's hummingbird, red-crested turaco (Tauraco erythrolophus), speckled mousebird (Colius striatus), golden-collared manakin, rhinoceros hornbill (Buceros rhinoceros) and white-throated tinamou (Tinamus guttatus), did not show a higher consumption of fruits or nectar (Fig. S8).

DISCUSSION
Birds have enormous taxonomic and ecological diversity. Understanding the diversity of dietary habits in birds is of great importance for evaluating their physiological adaptations to food resources and their surrounding environments over the course of avian evolution. Considering the irreplaceable roles of digestive enzymes in energy and nutrient uptake, we hypothesized that digestive enzyme genes may have experienced adaptive evolution among birds with different dietary habits. In this study, based on available genome sequences, we performed a large-scale evolutionary study on 16 digestive enzyme genes from 48 bird species with diverse diets to reconstruct the evolutionary history of digestive enzyme genes across the avian phylogeny.

Gene loss and duplication
Among the 16 digestive enzyme genes, three (chit1, lipf, and salivary amy) were not found in all 48 avian genome sequences, and synteny analysis further indicated that these genes may be absent in the common ancestor of birds. For the 13 genes found in birds, the discovery that most birds possess one pancreatic amy and one hepatic amy is consistent with an earlier study by Benkel et al. (2005), who identified two distinct amy loci in chicken using probe hybridization, with one locus expressed in the pancreas and the second locus detected in the liver. For lipases, despite the absence of lipf, birds still have the pancreatic lipase PNLIP, which was regarded as the primary lipases for breaking down dietary fat molecules during digestion (Lowe, 1997). For the other undetected gene chit1, earlier reports suggested that in mammals, the firstly determined functional chitinase CHIT1 and the secondly discovered functional CHIA, belonging to the family 18 of glycosyl hydrolases, had the same chitinolytic activity and resulted from an earlier gene duplication event (Renkema et al., 1998;Boot et al., 2001). Since frogs were observed to have two copies of active chitinases, of which one copy was clustered with mammalian CHIA (Fujimoto et al., 2002), the gene duplication generating the two copies of chitinases occurred before the divergence of tetrapods accompanied by the development of the acidic stomach (Bussink et al., 2007). Consequently, the absence of chit1 in birds may represent a subsequent loss after the duplication of the two chitinase copies. We speculated that the deficiency in physiological activity caused by these three undetected genes (salivary amy, lipf and chit1) might be compensated through other candidate genes with similar functions. With regard to multi-copy digestive enzyme genes, the g-type lysozyme gene identified in birds is mostly triplicated, with three copies tandemly distributed at the same scaffold or chromosome, which were previously named lygA, lygB and lygC (Irwin, 2014). Through analysis on the reconstructed phylogenetic tree, we discovered that nine bird species from the clade Australaves only have lygA without both lygB and lygC ( Fig. 1 and Fig. S3), suggesting that the losses of lygB and lygC have independently occurred in the ancestor of Australaves after divergence from Coraciimorphae. Phylogenetic relationships of both lyz and gaa (Figs. S2 and S4) showed two distinct gene clusters, which were consistent with earlier discoveries (Kunita et al., 1997;Irwin, Biegel & Stewart, 2011). Moreover, the evolution of avian pepsinogen gene (pgc) turned out to be complex and the pgc and pgb1 genes we referred to here were respectively named as pgc2 and pgb by Castro et al. (2012), who studied the evolution of pepsinogen C genes in vertebrates and found only one pgb copy in both chicken and turkey. However, by searching 48 avian genomes for pgc and reconstructing their phylogenetic relationships, we discovered that apart from some species (chicken, turkey, ostrich and duck) with a single pgc gene, most species possess two distinct pgc genes (pgb1 and pgb2), with each type of pgc comprising an independent cluster (Fig. S5). Considering that all species from Neoaves lack one copy of pgc, it is suggested that the loss of pgc has occurred after the divergence of Galloanserae from Palaeognathae and before Neoaves split from Galloanserae.
Gene family analyses on five multi-copy genes indicated that gene copies of the most recent common ancestor of extant birds were estimated to have one gaa, two pgc, two chia, one lyz, and three lyg gene copies, and the chia gene had rapidly evolved in several lineages such as those leading to the sandgrouse, the white-tailed eagle and the common cuckoo (Fig. S6). Given that gene copy numbers of chia in mammals were suggested to be positively correlated with dietary insects (Janiak, Chaney & Tosi, 2017), we predicted that the four chia copies detected in the common cuckoo might be related to their specialized diet of insects.

Positive selection
Positively selected sites were detected in all digestive enzyme genes except agl, g6pc, gaa and gck, and a total of 148 codons were identified to be under positive selection by at least two methods. Furthermore, adaptive evolution was further supported by evidence that a high percentage (97.97%) of the putative positively selected sites was under radical changes in structural or biochemical properties at the amino acid level, suggesting that the properties and functions of digestive enzymes might have been influenced by radical amino acid changes. Overall, positive selection on digestive enzymes may serve as the major evolutionary force driving the formation of diverse diets in birds.
By contrast, we did not detect any avian lineages showing an ω greater than one, a strong signature of positive selection. Instead, we found multiple lineages showing a significantly higher ω, suggesting a short burst of positive selection or a relaxation of selective pressure. The pancreatic amy encoding amylase, which plays an essential role in hydrolyzing dietary starch by acting on α-1,4-glycosidic bonds, was detected to have a higher ω value in species with a higher seed consumption ( Fig. 2 and Table S6). We predicted that this signature was a short burst of positive selection rather than a relaxation of selective pressure, because the pancreatic amylase is important in digesting starch and is thus unlikely to undergo a relaxation from functional constraint. Indeed, previous experimental evidences have pointed that the activities of pancreatic amylase in six passerines were positively correlated with their dietary starches (Kohl et al., 2010;Kohl et al., 2011). Similar patterns of amylase activity associated with dietary carbohydrates were also found in other groups of animals, such as fruit flies (Hickey & Benkel, 1982), fishes (Hidalgo, Urea & Sanz, 1999), dogs (Axelsson et al., 2013. The higher ω value of the pancreatic amy detected in birds consuming a higher amount of seeds may indicate a higher efficiency in enzymatic hydrolysis of pancreatic amylase. The carbohydrase gene gck, with no positively selected sites detected by the M8 model (Table 2), was found to be under relaxed selection in species consuming a lower amount of seeds, fruits and nectar, when compared with those with consuming a higher amount (Table 3). The enzyme glucokinase (GCK), encoded by gene gck, participates in the phosphorylation of glucose to glucose-6-phosphate, which is the first step of both glycolysis and glycogen synthesis, and plays a major role in the regulation of carbohydrate metabolism (Cardenas, 1995). Enzymatic assays showed that compared to fasted chickens, fed chickens were found to have significantly increased GCK activity in liver homogenates (Berradi et al., 2005). Our findings in gck implied that the bird species consuming more seeds, fruits, and nectar might have evolved a distinct digestive strategy to adapt to higher dietary carbohydrates and facilitate the absorption of glucose and glycolysis.
Lipases play key roles in catalyzing dietary triglyceride into monoglycerides and fatty acids. In this study we found that the cyp7a1 lipase gene has a higher ω value in species eating more meat (Fig. 2), suggesting a short burst of positive selection. Lipase CYP7A1, belonging to the superfamily Cytochrome P450, is an oxidoreductase and can limit the rate of bile acid synthesis (Russell & Setchell, 1992). In animals, bile acids are mainly synthesized in the liver and function as surfactants that can emulsify dietary fats into micelles and keep them suspended before further digestion and absorption (Hofmann & Borgstroem, 1964). Another lipase gene, pnlip, which was found to be under relaxed selection in species eating less meat, has a function in the small intestine for hydrolyzation of dietary triacylglyceride (Hegele et al., 2001). The higher ω value of cyp7a1 and the relatively intensified selection of pnlip detected in birds eating more meat indicate that lipases might be under greater selective pressure for digesting lipids. Furthermore, we found that one codon (site 10) located in the predicted helical transmembrane segment of cyp7a1 was positively selected, the same codon was also found to be positively selected in cetaceans (Wang et al., 2016), implying that this site might be of functional importance for enzyme CYP7A1 in efficiently digesting lipids.
Previous enzymatic assays of one peptidase (APN) and two proteases (TRY and CTRC) in six passerines showed that activities of the three digestive enzymes were not positively correlated with the contents of dietary protein (Kohl et al., 2010). This finding was consistent with our evolutionary analysis on proteases and several enzymes involved in digesting insects, wherein we did not find significant differences in ω values among different bird lineages, considering the abundant protein contents in insects. We hypothesized that proteases and enzymes involved in digesting insects may have not generated significant differences in selective pressure among avian lineages at the sequence level, given that all birds require proteins to maintain their normal life activities and most birds prey on insects during their breeding seasons (Del Hoyo, Elliott & Sargatal, 1994).

Ancestral state reconstruction
Reconstruction of ancestral states indicated that the most recent common ancestor of extant birds were omnivorous and mainly fed on invertebrates and plant materials (seeds, fruits, nectar, or other plant materials) (Fig. S8). However, several groups among living birds had evolved to be specialists feeding on a particular food, such as hoatzins and hummingbirds. Previous studies suggested that omnivory in birds acted as a macroevolutionary sink with higher extinction rates and lower speciation rates (Burin et al., 2016), and omnivorous birds could be favored at places or times with low abundance of a preferred resource or when resource availability was not highly predictable (Macarthur & Levins, 1967), while those birds with specific diets might only survive when food resources were sustainable and highly predictable (Karr, 1976). Considering the unstable climates in the late-Cretaceous that led to a mass extinction of most species (Longrich, Tokaryk & Field, 2011) and the subsequent relatively stable climates in the Cenozoic (Kissling et al., 2012), we speculated that omnivorous birds might have survived from the late-Cretaceous, and the stable and highly predictable resources in the following Cenozoic might function as the main evolutionary force driving dietary specialization. Coincided with our prediction that seeds were one of the ancestral food components in the most recent common ancestor of extant birds, fossil evidence also suggested that seeds were a key factor for the survival of edentulous grain-consuming birds through the late-Cretaceous mass extinction (Larson, Brown & Evans, 2016).

Limitations
There are two limitations in this study. First, some inherent copies of digestive enzyme genes cannot be identified from the avian genomes and some coding sequences are not complete. For instance, with most coding sequences being incomplete or inherent copies being missing, only six intact ctrc (single-copy) and 12 intact chia (multi-copy) genes were identified from 48 genome sequences. Due to the incompleteness of some coding sequences, we only used intact or longer partial coding sequences for evolutionary analyses, which might have caused bias. Meanwhile, in order to evaluate the influence of avian genome information (assembly and annotation) on the number of gene copies obtained, we conducted gene family analyses which estimated the value of global error rate to be 0.248. Second, it is possible that the evaluation of food composition was not entirely accurate, given that the standardized and semiquantitative numeric information from the comprehensive avian diet database was converted from avian dietary data or verbal description. Therefore, high-quality genome sequences and more accurate dietary data will help understand the evolution of digestive enzymes and dietary diversification in birds in the future.

CONCLUSIONS
We detected positively selected sites in most examined digestive enzyme genes, suggesting that different diets may have favored differences in catalytic capacities of these enzymes. The pancreatic amy and the cyp7a1 lipase gene were found to have a higher ω in species consuming more seeds and meat, respectively, suggesting a short burst of positive selection on the corresponding lineages. The gck carbohydrase gene in species consuming less seeds, fruits, or nectar, and the pnlip lipase gene in species consuming less meat, were found to have undergone relaxed selection, suggesting that the two genes have become less important in the corresponding lineages. Together with our functional divergence and gene family analyses, our results document that multiple evolutionary processes have shaped the evolution of digestive enzymes in birds, and suggest that the evolutionary flexibility of these enzymes may have facilitated their dietary diversification.