Comparative transcriptome and flavonoids components analysis reveal the structural genes responsible for the yellow seed coat color of Brassica rapa L.

Background Seed coat color is an important horticultural trait in Brassica crops, which is divided into two categories: brown/black and yellow. Seeds with yellow seed coat color have higher oil quality, higher protein content and lower fiber content. Yellow seed coat color is therefore considered a desirable trait in hybrid breeding of Brassica rapa, Brassica juncea and Brassica napus. Methods Comprehensive analysis of the abundance transcripts for seed coat color at three development stages by RNA-sequencing (RNA-seq) and corresponding flavonoids compounds by liquid chromatography-tandem mass spectrometry (LC-MS/MS) were carried out in B. rapa. Results We identified 41,286 unigenes with 4,989 differentially expressed genes between brown seeds (B147) and yellow seeds (B80) at the same development stage. Kyoto Encyclopedia of Genes and Genomes enrichment analysis identified 19 unigenes associated with the phenylpropanoid, flavonoid, flavone and flavonol biosynthetic pathways as involved in seed coat color formation. Interestingly, expression levels of early biosynthetic genes (BrCHS, BrCHI, BrF3H, BrF3’H and BrFLS) in the flavonoid biosynthetic pathway were down-regulated while late biosynthetic genes (BrDFR, BrLDOX and BrBAN) were hardly or not expressed in seeds of B80. At the same time, BrTT8 and BrMYB5 were down-regulated in B80. Results of LC-MS also showed that epicatechin was not detected in seeds of B80. We validated the accuracy of our RNA-seq data by RT-qPCR of nine critical genes. Epicatechin was not detected in seeds of B80 by LC-MS/MS. Conclusions The expression levels of flavonoid biosynthetic pathway genes and the relative content of flavonoid biosynthetic pathway metabolites clearly explained yellow seed color formation in B. rapa. This study provides a foundation for further research on the molecular mechanism of seed coat color formation.


INTRODUCTION
Brassica rapa (AA = 20), an original parent species of Brassica napus (AACC = 38) and Brassica juncea (AABB = 36), is a major rapeseed crop in the Tibetan Plateau and northern part of China with a short growth period and with resistance to barren field. Its seed coat color can be divided into two categories, brown/black and yellow. Yellow seed coat color confers higher oil quality, higher protein content and lower fiber contents, and thus is considered a desirable trait for breeding new hybrids of B. rapa, B. napus and B. juncea (Li et al., 2012;Wang et al., 2016).
Seed coat color formation is mainly due to the accumulation of proanthocyanidins (PAs), the end product of the flavonoid biosynthetic pathway, which are mainly deposited in the innermost cell layer of the testa (chalaza, micropyle and endothelium) (Debeaujon et al., 2003;Dixon, Xie & Sharma, 2005;Lepiniec et al., 2006). Different results of many studies on the seed coat color of Brassica species showed that seed coat color was controlled by different transcription factors and structural genes, such as TT10 (Fu et al., 2007), TT8 (Li et al., 2012), TT2 , TT6 (Huang et al., 2016), TT1  and TTG1 .
Flavonoid biosynthesis genes and regulatory genes are involved in seed coat color and PAs formation. Qu et al. (2016) analyzed the expression of flavonoid biosynthesis genes in six yellow-and black-seeded B. napus inbred lines with different genetic background and found that three structural genes (BnTT3, BnTT18 and BnBAN), two regulatory genes (BnTTG2 and BnTT16) and three genes encoding transfer proteins (BnTT12, BnTT19 and BnAHA10) might play crucial roles in the formation of different seed coat colors. Hong et al. (2017) also found that down-regulation of 12 biosynthetic genes and three regulatory genes involved in the flavonoid pathway mainly contributed to the reduction of PAs in yellow seed coats. Qu et al. (2020) confirmed that BnTT3, BnTT18, BnTT10, BnTT12 and BnBAN involved in the PA pathway had lower expression levels in yellow-seeded rapeseed, strongly suggesting that seed coat color is mainly determined by levels of epicatechin and its derivatives. Zhai et al. (2020) proved that BnTT8 is vital for seed color by creating yellow-seeded mutants in rapeseed using CRISPR/Cas9. In B. juncea, Liu et al. (2013Liu et al. ( , 2016 found that BjuDFR, BjuLDOX and BjuANR, BjuTT3, BjuTT18, BjuTT4-2, BjuTT4-3, BjuTT19-1 and BjuTT19-3 were transcriptionally regulated, and either not expressed or down-regulated in yellow-seeded testa, implying that these genes for PA biosynthesis are associated with seed color. Although many studies have reported transcript differences between yellow and brown seeds of B. napus and B. juncea, there are few studies on yellow and brown seeds coats development using in B. rapa using RNA-sequencing (RNA-seq) technology. Previous studies have focused on oil accumulation (Wan et al., 2017), fatty acid degradation, nutritional functions, lipid biosynthesis (Chen et al., 2015a) and seed oil content (Huang et al., 2017). As a method of revealing gene expression levels through high-throughput sequencing techniques, RNA-seq is an effective approach for studying the mechanisms underlying metabolite variation. Nevertheless, it is still difficult to relate phenotypic traits to transcript abundance. Thus, we analyzed the color component of seed coats by liquid chromatography-tandem mass spectrometry (LC-MS/MS). By combining transcript abundance from RNA-seq with color components in the seed coats of brown and yellow seeds, it was possible to determine the genes related to seed coat color variation.
To investigate the genes involved in seed coat color formation in B. rapa, we used seeds of the brown-seeded inbred line B147 and its yellow-seeded near-isogenic line B80, at 10, 14 and 28 days after flowering (DAF), each with three biological replicates to construct 18 libraries for analysis of RNA-Seq and for LC-MS. We performed comprehensive analysis of abundant transcripts associated with seed coat color and corresponding flavonoid compounds. Results obtained in this study will provide transcript information for elucidating the molecular mechanism underlying seed coat formation in Brassica species.

Plant materials
Two B. rapa lines, the brown-seeded inbred line B147 and its yellow-seeded near-isogenic line B80, were provided by the Cruciferae Vegetable Breeding Team at Northwest A&F University, planted at the experimental station of Northwest A&F University, Yangling, China. Fresh seeds were collected at 10, 14 and 28 DAF by artificial pollination at the flowering stage and stored at −80 C for RNA extraction and flavonoid components analysis (Figs. 1A-1C).

RNA extraction, library construction and illumina sequencing
Total RNA from each sample was extracted using TRIzol Reagent (Life technologies, Carlsbad, CA, USA) following the manufacturer's instructions, qualified using a NanoDrop 2000 spectrophotometer (NanoDrop Technologies, Wilmington, DE, USA) and then assessed for quality using an Agilent 2100 Bioanalyzer (Santa Clara, CA, USA). Eukaryotic mRNA was enriched from total RNA using oligo (dT) beads, while prokaryotic mRNA was enriched by removing ribosomal RNA (rRNA) using a Ribo-Zero TM Magnetic Kit (Epicentre). The enriched mRNA was fragmented into short fragments using a fragmentation buffer and reverse transcribed into cDNA using random primers. Second-strand cDNA was synthesized using DNA polymerase I, RNase H, dNTPs and buffer. CDNA fragments were purified using a QiaQuick PCR extraction kit prior to end repair, poly (A) addition, and ligation to Illumina sequencing adapters. The ligation products were size selected by agarose gel electrophoresis, PCR amplified, and sequenced using an Illumina HiSeqTM 2500 by Gene Denovo Biotechnology Co. (Guangzhou, China) .

Transcriptome data processing
High-quality clean reads were obtained by removing adapter reads, reads containing more than 10% of unknown nucleotides (N) and low-quality reads containing more than 50% of low-quality (Q-value ≤ 20) bases. Reads mapping to rRNA, detected by the short reads alignment tool Bowtie2 (Langmead & Salzberg, 2012), were removed, and the remaining reads for each sample were mapped to the reference genome (http://Brassicadb. org/brad/) by TopHat2 (Kim et al., 2013), respectively. Alignment parameters were as follows: maximum read mismatch, 2; distance between mate-pair reads, 50bp; error of distance between mate-pair reads, ±80 bp. During the last step of assembly, all of the reassembled fragments were aligned with reference genes and similar fragments were removed.

Gene annotations, classification and enrichment analysis
Gene expression level was normalized using the FPKM (Fragments Per Kilobase of transcript per Million mapped reads) method and calculated using the following formula: FPKM = (10 6 × C × 10 3 )/NL, where C is the number of fragments mapped to a specific unigene, N is the total number of fragments mapped to reference genes, and L is the number of bases in this unigene. The FPKM method is able to eliminate the influence of different gene lengths and sequencing data amounts on the calculation of gene expression. The edgeR package (http://www.rproject.org/) was used to identify differentially expressed genes (DEGs) across samples. Genes with a fold change ≥2 and a false discovery rate (FDR) < 0.05 in a comparison were identified as significant DEGs. DEGs were then subjected to enrichment analysis of GO functions and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. Calculated P-values were subjected to FDR correction, taking FDR ≤ 0.05 as a threshold. GO terms meeting this condition were defined as significantly enriched GO terms among DEGs. This analysis was able to recognize the main biological functions that DEGs exercise. KEGG enrichment analysis identified metabolic pathways or signal transduction pathways significantly enriched with DEGs, comparing with the whole genome background. The formula for calculation was the same as that in GO analysis. Similarly, the calculated P-value was subjected to FDR correction, taking FDR ≤ 0.05 as a threshold. Pathways meeting this condition were defined as pathways significantly enriched in DEGs.

Weighted gene co-expression network analysis
Weighted gene co-expression network analysis (WGCNA) is a typical systematic biological arithmetic for describing the gene co-expression network of multiple samples and was based on the data of Brassica rapa in the STRING protein interaction database by using the interaction relationships (Langfelder & Horvath, 2008;Cheng, Ping & Chen, 2017). It can not only cluster highly correlated genes modules and generalize such modules by the module eigengene or an intramodular hub gene, but also relate modules to one another and to external sample traits (Langfelder & Horvath, 2008). WGCNA was used to analyze the correlation between the transcription factor TTG1, the candidate gene for seed coat color , and those of structural genes in the flavonoid biosynthetic pathway.

Reverse-transcription quantitative PCR analysis
Reverse-transcription quantitative PCR (RT-qPCR) was conducted to verify the reliability of the transcriptome data. Specific PCR primers were designed using Primer Premier 5. Total RNA was extracted as described above. First-strand cDNA synthesis was performed using a PrimeScript TM RT reagent Kit (TaKaRa, Tokyo, Japan) following the manufacturer's instructions, and the final cDNA concentration was adjusted to 100 ng/mL with RNase free water. RT-qPCR was performed in 20 mL volume in a QuantStudio 5 Real-Time PCR Detection System with three replications, according to previous reports ; a housekeeping gene encoding glyceraldehyde-3phosphate dehydrogenase (GAPDH, GO0048316) was used as an internal control . The relative expression levels of genes were measured using the 2 −ΔΔCt method (Livak & Schmittgen, 2001).

Measurement of seed flavonoids, reagents and standards
Extraction of seed flavonoids was performed as described in a previous report (Qu et al., 2013) with minor modification. Frozen fresh pulverous seed samples (0.2 g fresh weight) were soaked in 1.3 mL 75% methanol, and the suspension was placed in an ultrasonic bath for 1 h (Qu et al., 2013). The extract was then centrifuged (12,000 rpm, 20 min) and the limpid solution was collected. The extract was filtered through a 0.22 mm filtration membrane and immediately subjected to LC-MS analysis. Liquid chromatography separations were performed using an XTerra MS C18 column (125 Å pore size, 5 mm, 150 mm × 20 mm; Waters, Milford, MA, USA). The mobile phase consisted of water containing 0.1% (v/v) formic acid (A) and acetonitrile (B) using the following binary gradient: 0-3 min, isocratic 100% A; 3-7 min, isocratic 95% A and 5% B; 7-12 min, isocratic 10% B; 12-19 min, isocratic 25% B; 19-30 min, isocratic 45% B; 30-42 min, isocratic 60% B; 42-50 min, isocratic 95% A and 5% B. The flow rate was 1.2 ml min -1 and the temperature of the column was maintained at 25 C. Samples were analyzed using an LTQ XL linear ion trap mass spectrometer (Thermo, Wilmington, MA, USA) with an ion source voltage of 3.5 kV, a counter current nitrogen flow set at a pressure of 12 psi, and a capillary temperature of 350 C. Mass spectra were recorded over the range 75-2,000 m/z. Tandem mass spectra were obtained in manual mode for targeted masses using an isolation width of 2.0, fragmentation amplitude of 2.2, and threshold set at 6,000. Chromatographic pure methanol (Kermel, Qingdao, China) was used for extraction of seed flavonoids, and ultra-pure water was obtained using a MilliQ Reference system (Billerica, MA, USA). Solvents for LC-MS were also chromatography grade. Flavonoid standards, kaempferol, quercetin and epicatechin were from Sigma-Aldrich (St. Louis, MO, USA).

RNA sequencing and assembly
To explain the molecular basis of seed coat color formation in developing seeds of B. rapa, we used seeds from brown-seeded (B147) and yellow-seeded (B80) lines at 10, 14 and 28 DAF with three biological replicates each, to build 18 libraries for sequencing by Illumina RNA-seq. Between 4,004.21 and 5,871.15 Mbytes of raw data and 3,969.87 and 5,815.16 Mbytes of high-quality clean data were generated from the 18 libraries, with Q20 percentages ranging from 97.50% to 98.13%, Q30 percentages ranging from 94.24% to 95.46%, and GC content ranging from 46.91% to 49.75% (Table S1). The filtered clean data were assembled into clean reads. The number of clean reads ranged from 30,086,000 to 34,020,000. After removal of rRNA, an average of 73.74% reads were mapped to the reference database (http://Brassicadb.org/brad/) (Table S2). Finally, a total of 41,286 unigenes were identified from the 18 libraries of B147 and B80 at 10, 14 and 28 DAF (Online Addition File 1). Gene annotations were provided by BLASTx alignment with a threshold E-value of 1.0E−05, and genes were matched with sequences in the NCBI non-redundant Protein (NR), Swiss-Prot and KEGG database, respectively.

Identification of DEGs in seeds at different developmental stages and enrichment analysis
Transcript abundances were quantified using the software RSEM (Li & Dewey, 2011) and expression levels of genes were calculated using the FPKM method. DEGs across samples were estimated using the edgeR package (http://www.rproject.org/). A total of 4,989 DEGs by multiple comparisons between B147 and B80 at the same developmental period were identified (Online Addition File 2). To identify DEGs between B147 and B80, we performed multiple comparisons of DEGs between B147 and B80 at the diverse development stage of 10, 14 and 28 DAF (Figs. 1D-1F). We identified large numbers of DEGs between B147 and B80; 2,596 DEGs (1,217 up-regulated and 1,379 downregulated) at 10 DAF, 2051 DEGs (806 up-regulated and 1,245 down-regulated) at 14 DAF and 2,599 DEGs (833 up-regulated and 1,766 down-regulated) at 28 DAF. After identifying the DEGs in pairwise comparisons of B147-10 vs. B80-10, B147-14 vs. B80-14 and B147-28 vs. B80-28, we next identified 608 common DEGs in all three pairwise comparisons (Fig. 1G). The DEGs between all the pairwise comparisons were showed in the Fig. 1H.
To better classify the biological functions of these DEGs, we used GO enrichment analysis to identify all GO terms significantly enriched among DEGs compared to the genome background and to filter the DEGs according to their corresponding biological functions. DEGs at 10 DAF were classified into 41 GO categories including 18 biological processes, 14 cellular components and nine molecular functions; DEGs at 14 DAF were classified into 40 GO categories comprising 18 biological processes, 14 cellular components and eight molecular functions and DEGs at 28 DAF were also categorized into 40 GO categories comprising 18 biological processes, 14 cellular components and eight molecular functions. The GO term 'transcription factor activity, protein binding' (GO:0000988) in the molecular function category was unique at 10 DAF. In the biological process category, 'metabolic process' (GO:0008152), 'cellular process' (GO:0009987) and 'single-organism process' (GO:0044699) were the most abundant subgroups while 'biological adhesion' (GO:0022610), 'immune system process' (GO:0002376) and 'locomotion' (GO:0040011) comprised the smallest proportion. In the cellular components category, the dominant groups were 'cell' (GO:005623), 'cell part' (GO:0044464), 'organelle' (GO:0043226) and 'membrane' (GO:0016020), while 'extracellular matrix' (GO:0031012), 'extracellular region part' (GO:0044421), 'cell junction' (GO:0030054), 'virion' (GO:0019012) and 'virion part' (GO:0044423) were the least dominant groups. Within the molecular functions category, most DEGs were sorted into 'binding' (GO:0005488) and 'catalytic activity' (GO:0003824), while the fewest DEGs were sorted into 'transcription factor activity, protein binding' (GO:0000988) at 10 DAF, and 'molecular transducer activity' (GO:0060089) at 14 and 28 DAF (Online Addition Files 3-5).
Based on the FPKM values of these unigenes, different copies of genes showed different expression levels, possibly indicating that these different copies possess different biological functions. Different expression levels of five copies of PAL in seeds of B147 and B80 revealed that Bra003126 and Bra005221 possibly have similar functions. Among five copies of 4CL, the functions of Bra001820 and Bra031263 might be the same; however, transcription levels showed a continuous decline during seed development in B147 and B80, indicating that these two copies show little correlation with seed coat color. Three copies of C4H, Bra022803, Bra018311 and Bra021637, might possess a similar function. The functions of three copies on CHS are possibly the same on account of the similar transcription levels of these copies in seeds of B147 and B80. Among three copies of CHI, the functions of Bra007142 and Bra007145 could be the same. The functions of two copies of FLS are likely different because of the significantly different transcription levels of these two copies in seeds of B147 and B80. The functions of two copies of LDOX and two copies of BAN might be the same on account of their similar transcription levels in seeds of B147 and B80.
We generated a heatmap of transcript expression levels for these 30 genes using the log 2 FPKM values ( Fig. 2A). Based on their different transcription levels, these 30 genes were clustered into seven groups, I to VII. Of these seven groups, three contained only one gene, PAL (Bra003216) in group I, CHI (Bra003209) in group IV and FLS (Bra009358) in group VII. The remaining 27 genes were clustered into four groups. Group II contained six genes: PAL (Bra006985 and Bra005221), C4H (Bra022803 and Bra018311) and 4CL (Bra004109 and Bra030429). All six of these genes are involved in the phenylpropanoid biosynthetic pathway and had the transcript expression levels that increased from 10 to 14 DAF and decreased from 14 to 28 DAF in seeds of B147, but increased from 10 to 28 DAF in seeds of B80 at a low transcription level. Group III contained 11 genes which were divided into four sub-groups, IIIa, IIIb, IIIc and IIId. Group IIIa and group IIIb consisted of one gene, C4H (Bra021636) and CHS (Bra006224), respectively. Group IIIc consisted of four genes, CHS (Bra008792 and Bra023441) and CHI (Bra007142 and Bra007145), which are the EBGs involved in the flavonoid biosynthetic pathway. Transcripts expression levels of these four genes increased from 10 to 14 DAF and decreased from 14 to 28 DAF in seeds of B147, but decreased from 10 to 28 DAF in seeds of B80 at a low transcription level. Group IIId consisted of five genes, DFR (Bra027457), LDOX (Bra013652 and Bra019350) and BAN (Bra021318 and Bra031403), which are LBGs involved in the flavonoid biosynthetic pathway. The transcript expression levels of these five genes increased from 10 to 14 DAF and decreased from 14 to 28 DAF in seeds of B147, but showed little expression in seeds of B80. Group V contained six genes: PAL (Bra039777 and Bra017210), C4H (Bra021637), FLS (Bra037747), F3H (Bra036828) and F3′H (Bra009312). The transcript expression levels of these six genes increased from 10 to 14 DAF and decreased from 14 to 28 DAF in seeds of B147. In seeds of B80, the transcript levels of these six genes showed no regular expression pattern from 10 to 14 DAF, but expression levels were higher than those in seeds at 28 DAF. Group VI consisted of four genes: 4CL (Bra001819, Bra001820 and Bra031263) and C4H (Bra022802). The transcript expression levels of these four genes showed irregular expression patterns in seeds of B147 and B80, indicating that these four genes were likely not related to the formation of seed coat color. Comprehensive analysis of these seven groups revealed that genes in Group III had a meaningful relationship with the seed coat color formation.

WGCNA of DEGs involved in the seed coat color with TTG1
Based on a previous study , we considered the transcription factor TTG1 to be the candidate gene controlling seed coat color. Hence, WGCNA was used to analyze the association between DEGs involved in seed coat color and TTG1. A WGCNA map showed that 10 genes were co-expressed with TTG1 (Fig. 2B), including C4H (Bra022803, Bra018311 and Bra021637), CHS (Bra023411, Bra008792 and Bra006224), CHI (Bra003209), F3H (Bra036828), F3′H (Bra009312) and FLS (Bra037747). All these structural genes are EBGs in the flavonoid biosynthetic pathway. The closed orbicular WGCNA map with complex interactions between genes indicated that all these structural genes were highly correlated. Three copies of C4H and three copies of CHS were clustered together in the WGCNA map, suggesting that the three copies of C4H, Bra022803, Bra018311and Bra021637, and the three copies of CHS, Bra023411, Bra008792 and Bra006224, have a synergic relationship. All these structural genes were clustered into different groups (II, III, IV, V), which indicated that TTG1 was not the only one transcription factor regulating these genes.

Structural genes involved in seed coat color
Based on different numbers of gene copies involved in seed coat color formation, the heatmap of transcript expression levels for 30 DEGs and the WGCNA map, 19 structural genes were speculated to be involved in seed coat color formation (Table S5). These genes included three copies of PAL (Bra003126, Bra005221 and Bra018311), one copy of 4CL (Bra004109), two copies of C4H (Bra022803 and Bra021637), three copies of CHS (Bra006224, Bra008792 and Bra023441), two copies of CHI (Bra007142 and Bra007145), F3H (Bra036828), F3′H (Bra009312), one copy of FLS (Bra037747), one copy of DFR (Bra027457), two copies of LDOX (Bra013652 and Bra019350) and two copies of BAN (Bra021318 and Bra031403).

Transcription factors involved in seed coat color
We analyzed transcription levels of four transcription factors (BrTT2, BrEGL3, BrTT8 and BrMYB5) in seeds of B147 and B80 at different developmental stages (10, 14 and 28 DAF) (Fig. 3). Transcription levels of BrTT2 and BrEGL3 at 10, 14 and 28 DAF showed no significant difference between B147 and B80. The transcription level of BrTT2 showed a continuous decline from 10 DAF to 28 DAF, while that of BrEGL3 showed a continuous increase. The transcription levels of BrTT8 and BrMYB5 at 10, 14 and 28 DAF were four-fold higher in B147 than those in B80. In combination with the results of a previous paper , we propose that a truncated TTG1 without WD40 structures could affect the transcription levels of TT8 and MYB5.

Expression analysis by RT-qPCR
To confirm the validity of RNA-seq data, we selected nine genes, representing transcription factors (TTG1, TT2, TT8 and MYB5), structural genes involved in the flavonoid biosynthetic pathway (CHS, CHI, DFR and BAN) and a gene encoding glutathione-S-transferase (TT19), for RT-qPCR analysis. The sequences of RT-qPCR primers are listed in Table S6. We analyzed the fold change in expression level of these genes at 10, 14 and 28 DAF with reference to B80 were analyzed (Fig. 4). Linear regression analysis indicated that the fold-change in gene expression ratios measured by RT-qPCR and RNA-Seq data were significantly positively correlated (R 2 = 0.7902), which suggested that the transcriptome data were reliable.

Flavonoids accumulation in seeds of B147 and B80
To detect the difference in flavonoid compounds between B147 and B80, we analyzed seeds at 10, 14 and 28 DAF using LC-MS. Referencing a previous report (Qu et al., 2013), we used three types of flavonoid standards, epicatechin, kaempferol and quercetin, and detected their corresponding flavonol derivatives. Identification of flavonoid compounds was determined as in the previous studies (Qu et al., 2013;Ma et al., 2014;Chen et al. 2015b;De Camargo et al., 2017;Pelvan et al., 2018;Rahman, De Camargo & Shahidi, 2018). We detected 31 flavonoid compounds in B147 and 21 compounds in B80. These flavonoid compounds were divided into two groups, those compounds detected both in B147 and B80, and compounds detected only in B147 (Table 1). Compounds were numbered as the order of retention time.
As expected, the three flavonoid compounds (epicatechin, kaempferol and quercetin) and their derivatives were all detected in three seed samples of B147 at three different developmental stages; however, only two compounds and their derivatives (kaempferol and quercetin) were detected in seed samples of B80. No epicatechin, epicatechin derivatives or PAs were found in seeds samples of B80 at any of the three developmental stages, confirming the significant roles of epicatechin and PAs in brown seed coat color  formation. Interestingly, PAs with different polymerized forms were found in seeds of B147 at 28 DAF, demonstrating that PA content gradually increased during the process of seed maturation from 14 DAF to 28 DAF. We examined the content of three flavonoid standards, epicatechin, kaempferol and quercetin in seeds samples of B147 and B80 at 10, 14 and 28 DAF (Fig. 5). The content of these compounds in seeds showed an upward trend from 10 to 14 DAF and a downward trend from 14 to 28 DAF in B147. Epicatechin, the end product of flavonoid biosynthesis catalyzed by oxidoreductase, was not detected in B80, whereas its content peaked at 143 ± 3.97 µg/g fresh weight (µg/g FW) in seeds of B147 at 14 DAF (Fig. 5A). Kaempferol, which is catalyzed by flavonol synthase, was detected in seeds of both B147 and B80. Its content showed no difference between B147 and B80 at 10 and 28 DAF; however, its content was significantly higher in seeds of B147 (0.44 ± 0.02 µg/g FW) than that in seeds of B80 at 14 DAF (Fig. 5B). Quercetin, which is catalyzed by flavonol synthase, was detected in seeds of both B147 and B80, but its content showed no difference at 10 DAF; in seeds at 14 DAF, the quercetin content of B147 (1.24 ± 0.02 µg/g FW) was significantly higher than that of B80; in seeds at 28 DAF, the quercetin content of B147 was lower than that of B80 (Fig. 5C).

Regulation of yellow seed coat color variation
At the first dedicated step of the flavonoid biosynthetic pathway, except for one unigene (BrCHI,Bra003209), expression levels of the other seven genes BrCHS (Bra006224, Bra008792 and Bra023441), BrCHI (Bra007142, Bra007145), BrF3H (Bra036828) and BrF3'H (Bra009312) were all higher in seeds of B147 than those of in B80 at 10, 14 and 28 DAF, which indicated that these seven genes are important factors affecting seed coat formation in B. rapa. The downstream of flavonoid biosynthesis is divided into two biosynthetic branches. One branch is initiated by BrFLS, an enzyme that converts dihydrokaempferol and dihydroquercetin into kaempferol and quercetin, respectively (Liu et al., 2013). As our results suggested that kaempferol, quercetin and its corresponding derivatives were not important in seed coat color variation, we suppose that BrFLS is independent of seed coat color formation in B. rapa. On the other branch, dihydrokaempferol and dihydroquercetin are catalyzed by DFR, LDOX and BAN, successively (Vanderkrol et al., 1990;Boss, Davies & Robinson, 1996;Pelletier, Burbulis & Winkel-Shirley, 1999;Abrahams et al., 2003;Debeaujon et al., 2003;Xie et al., 2003). Transcriptome data showed that these five flavonoid biosynthesis pathway genes were expressed very little or not at all in seeds of B80, and the corresponding flavonoid pathway metabolites, epicatechin, was also not accumulated. We therefore considered that the low expression of these genes resulted in a deficiency of epicatechin and its derivatives, which made the B80 seeds unable to form brown color, resulting in a yellow coat.
A previous study showed that TTG1 was truncated without WD40 structures in B80 . We therefore considered that the low expression of BAN in seeds of B80 was due to release of binding in the TT2-TT8-TTG1 protein complex. In addition, DFR, LDOX and TT8 are direct targets of TTG1 in Arabidopsis (Gonzalez et al., 2008), which might explain the lower expression of DFR, LDOX and TT8 in seeds of B80. In summary, truncated TTG1 without WD40 structures in B80 not only led to the lower expression of DFR, LDOX, TT8 and MYB5, but also resulted in unbinding of the TT2-TT8-TTG1 protein complex, which could then not induce expression of BAN. The low expression of DFR, LDOX and BAN successfully prevented the production of epicatechin, the monomer of PAs, which cause the formation of yellow seeds in B. rapa B80 (Fig. 6). In summary, lower expression of six unigenes involving phenylpropanoid biosynthetic pathway, eight EBGs involved in flavonoid biosynthetic pathway and three transcription factors (TTG1, TT2 and MYB5), little or no expression of five LBGs, and release of binding in the TT2-TT8-TTG1 protein complex together regulate the formation of yellow seed coat color in B80 ( Fig. 6; Online Addition File 6). Figure 6 Schematic map of genes involved in seed coat color formation in B. rapa. PAL, phenylalanine ammonia-lyase; 4CL, 4-coumarate-CoA ligase; C4H, trans-cinnamate 4-monooxygenase; CHS, naringenin-chalcone synthase; CHI, chalcone isomerase; F3H, naringenin 3-dioxygenase; F3′H, flavonoid 3′-monooxygenase; FLS, flavonol synthase; DFR, dihydrokaempferol 4-reductase; LDOX, leucocyanidin oxygenase; BAN, anthocyanidin oxidoreductase. DFR, LDOX and BAN in grey represent genes with low levels of expression in the yellow-seeded material B80. Green rectangle represents the down-regulated expression level and the gene ID in the green rectangle represents the down-regulated expressed gene in B80. The ellipse represents the normal structure of transcription factor TTG1. The irregular shape represents mutated transcription factor TTG1 without complete WD40 structure. The grey shape indicates the hypothesis for seed coat color formation.
PAs are oligomers and polymers of catechin or epicatechin, the end products biosynthesized via the flavonoid biosynthetic pathway (Dixon, Xie & Sharma, 2005;Saito et al., 2013;Xu et al., 2014), and are the main flavonoid compounds affecting the seed coat color in Brassica species. In plants, the flavonoid biosynthetic pathway is the second biosynthetic branch from the phenylpropanoid biosynthetic pathway . Thus, we analyzed significant DEGs involved in the phenylpropanoid, flavonoid, flavone and flavonol biosynthetic pathways related to seed coat color. We determined 19 out of 33 DEGs to be involved in the seed coat formation in B80, because the expression of these 19 genes in B80 was always lower than that in B147. Xu et al. (2014) reported that the promoters of CHS, CHI and F3H remains unchanged in the seed coat of ttg1 mutants in Arabidopsis seed. Similarly, mRNA accumulation of CHS, CHI and F3H was found in the yellow-seeded line B80, a mutant exhibiting truncated TTG1 without WD40 structures. All these results suggested that the lower transcript expression levels of the above EBGs were affected by a lower transcription level of TTG1 in seeds of B80. Little expression of LBGs was similar to a report of B. juncea, which indicated that these LBGs play crucial roles in the formation of pigment in seed coat in Brassica species.
In this study, the peak content of epicatechin appeared in seeds at 14 DAF, which was consistent with the strong expression of BAN gene transcript in seeds at 14 DAF. At this stage, large amounts of epicatechin accumulated in seeds and a small quantity was polymerized into PAs. Meanwhile, in seeds at 28 DAF, a large amount of epicatechin was polymerized into PAs, and thus its content was reduced. The variation in content epicatechin from 10 DAF to 28 DAF again confirmed that the period of PA accumulation was 14 DAF. The reduction of kaempferol content in B147seeds at 28 DAF may be due to the formation of kaempferol derivatives via flavone and flavonol biosynthetic pathways and the formation of quercetin catalyzed by flavonoid 3'-monooxygenase. Alternatively, the increased kaempferol content in yellow seeds (B80) at 28 DAF was possibly due to the weak transcript expression level of F3′H. Thus, the content of kaempferol appeared to increase from 14 to 28 DAF in yellow seeds of B80. The trend of quercetin content during seed development was consistent with the trend of transcript expression levels of one FLS gene, Bra037747. Overall, at least two genes (FLS, F3′H) together affected the variation in kaempferol and quercetin content.
PAs, the oligomeric and polymeric forms of catechin or epicatechin, are considered to be the main pigments affecting seed coat color in Brassica species. However, whether the two other main components in seed coat, kaempferol and quercetin, are related to seed coat color is unknown. We detected the flavonoid compounds kaempferol, quercetin and their corresponding derivatives by LC-MS in both B147 and B80, which indicated that kaempferol, quercetin and their corresponding derivatives are not significant in seed coat color variation.
In our comprehensive analysis of component differences in flavonoids between B147 and B80, epicatechin was unique to B147 at 10, 14 and 28 DAF, indicating that the color difference between brown and yellow seeds of B. rapa is due to the loss of epicatechin, the monomeric form of PAs. Many studies have reported that regulation of PAs is due to the combined action of MYB and R/B-like basic helix-loop-helix (bHLH) transcription factors, together with TTG1, in a MYB-bHLH-WDR (MBW) ternary protein complex (Heim et al., 2003;Baudry et al., 2004;Zimmermann et al., 2004;Lepiniec et al., 2006;Dubos et al., 2008;Xu, Dubos & Lepiniec, 2015) and that the TT2-TT8-TTG1 protein complex directly regulates the expression of BAN in Arabidopsis seed (Baudry et al., 2004). In this study, we found that mutant TTG1 not only regulates the expression of TT8 and MYB5, but also affects the combination of the MBW complex. All these factors (structural unigenes and transcription factors) together regulate the variation of seed coat color.
MYB-bHLH-WDR complexes formed by transcription factors are known to regulate the seed coat color and WDR has been previously identified, while the identification of the partners, MYB and bHLH members has not been reported in B. rapa hitheto. Additional studies on the members of MYB and bHLH families should be continued. The results of this study provide a general framework for future research into the molecular mechanism of seed coat color formation of Brassica crops.

CONCLUSION
In this study, we used seeds of the brown-seeded inbred line B147, and its yellow-seeded near-isogenic line B80, at 10, 14 and 28 DAF to construct 18 libraries for analysis of RNA-Seq and LC-MS. We identified a total of 41,286 unigenes and 4,989 DEGs by multiple comparisons between B147 and B80 at the same developmental period. KEGG enrichment analysis and WGCNA showed that 19 unigene sequences being associated with the phenylpropanoid, flavonoid, flavone and flavonol biosynthetic pathways were involved in seed coat color formation. Truncated TTG1 without WD40 structures in B80 not only led to lower expression of DFR, LDOX, TT8 and MYB5, but also results in unbinding of the TT2-TT8-TTG1 protein complex, which could then not induce the expression of BAN. LC-MS also showed that epicatechin, a product at the end of the DFR-LDOX-BAN metabolism flow, was not detected in seeds of B80. These findings will facilitate better understanding of the regulatory mechanism of seed coat color formation in Chinese cabbage.