Improved haplotype resolution of highly duplicated MHC genes in a long-read genome assembly using MiSeq amplicons

View article
Zoological Science

Main article text

 

Introduction

Materials and Methods

Genomic reference of the great reed warbler MHC region

Amplicon sequencing

Filtering method and allele selection

Mapping of MHC-I and MHC-IIB HTS-amplicon alleles

Coding sequence similarity of MHC-IIB haplotypes

Results

MHC diversity in the long-read genome assemblies

MHC diversity based on amplicon HTS

MHC diversity in amplicons compared to MHC diversity in the GRW Falcon-2017 assembly

MHC diversity in amplicons compared to MHC diversity in the Purge Haplotigs assembly

Haplotype sorting of tandemly duplicated MHC genes in the Purge Haplotigs assembly

Discussion

Conclusion

Supplemental Information

Primers used for MHC-I and MHC-IIB genotyping in the great reed warbler family

The number of unique amplicon alleles of expected length and post-trimming length are reported only for the focal individual. The number of MHC-I amplicon alleles that had deletions leading to a shift in open-reading frame in exon 3 are stated within brackets for each primer pair (PP) combination (in total seven MHC-I alleles had such deletions when considering sequences from both MHC-I PP).The number of MHC-IIB amplicon alleles that contained stop-codons are stated within brackets for each primer pair (PP) combination (in total ten MHC-IIB alleles contained premature stop-codons when considering sequences from both MHC-IIB PP). Fewer unique amplicon alleles were found with two MHC-IIB PPs (PP2 and PP3) and data from both PPs were discarded before the trimming and the mapping steps.

DOI: 10.7717/peerj.15480/supp-1

Number of MHC amplicon alleles (HTS) mapping to genome assemblies

Number of MHC-I and MHC-IIB amplicon allele sequences that match to annotated MHC alleles in genome assemblies (GRW Falcon-2017 and Purge Haplotigs) of the focal individual. The number of amplicons alleles being mapped has been assessed using the standard Geneious RNA mapper (Custom Sensitivity, allowing 0% to 4% mismatches per read).

DOI: 10.7717/peerj.15480/supp-2

Mapping results for MHC-I amplicon alleles sequenced in the focal individual

All primary and associated MHC-I scaffolds (Aaru-UA scaffold) included in the GRW Falcon-2017 are presented. Scaffolds indicated with # were discarded after the post-assembly procedure was performed in the Purge Haplotigs assembly. Amplicon alleles were separated into three categories related to their inheritance in the focal individual: paternal alleles (P, blue), maternal alleles (M, yellow) and unresolved alleles (U, turquoise). Annotated alleles not detected with amplicon mapping are highlighted in grey. One amplicon allele was successfully amplified in the focal individual (both replicates) but not found in its parents and have the prefix “Seq”. The mapping procedure was performed in Geneious Prime® (Geneious RNA mapper). Non-functional genes are indicated with the symbol Ψ .

DOI: 10.7717/peerj.15480/supp-3

Mapping results for MHC-IIB amplicon alleles sequenced in the focal individual

All primary and associated MHC-IIB scaffolds (Aaru-DAB scaffold) included in the GRW Falcon-2017 are presented (including 96 and four annotated MHC-IIB alleles, respectively). Scaffolds indicated with # were discarded after the post-assembly procedure was performed in the Purge Haplotigs assembly. Amplicon alleles were separated into three categories related to their inheritance in the focal individual: paternal alleles (P, blue), maternal alleles (M, yellow) and unresolved alleles (U, turquoise). Additional amplicon alleles that were corresponding to multiple hits with higher mismatches are indicated with numbers (based on matrix of similarity). Annotated MHC alleles not detected or not assigned with amplicon mapping are highlighted in grey. The mapping procedure was performed in Geneious Prime® (Geneious RNA mapper). Non-functional genes are indicated with the symbol Ψ .

DOI: 10.7717/peerj.15480/supp-4

Mean amino-acid pairwise distances (p-distances) between-scaffold including >5 tandemly duplicated MHC-IIB genes contained in an open reading frame

Mean amino-acid pairwise distances (below diagonal) and standard errors (standard error in blue, above diagonal) between annotated scaffolds including tandemly duplicated MHC-IIB gene copies in open reading frame (ORF) in the focal individual: Acar-DAB*120_2;4–10;12–14;16–18, Acar-DAB*301_1;2;4;6, Acar-DAB*357_1–4;6–9, Acar-DAB*45_2–9 and Acar-DAB*554_1–9.

DOI: 10.7717/peerj.15480/supp-5

Comparison of MHC-I and MHC-IIB allelic diversity recovered in the focal individual with amplicon HTS and in the GRW Falcon-2017 assembly

Out of 29 MHC-I amplicon alleles 16 were mapping to annotated alleles and out of 95 MHC-IIB amplicon alleles 58 were mapping. The upper panel represents amplicon alleles that mapped to the assembly and are separated into two categories: alleles mapping once (numbers in black, percentages in orange) and alleles mapping multiple times (numbers in blue, percentages in brown). Out of the 25 annotated MHC-I alleles, 21 were detected by amplicon alleles and out of the 100 annotated MHC-IIB alleles, 78 were detected by amplicon alleles.The lower panel describes the annotated MHC alleles in the GRW Falcon-2017 assembly, which were detected by amplicon alleles, and is separated into two categories: alleles detected uniquely (numbers in black, percentages in orange) and different alleles detected by the same amplicon allele (i.e., shared; numbers in blue, percentages in brown).

DOI: 10.7717/peerj.15480/supp-6

Mean nucleotide pairwise distances (p-distances) between each MHC-IIB gene copy from scaffold Aaru-DAB*554 and all MHC-IIB gene copies at four scaffolds with >5 tandemly duplicated genes (Purge Haplotigs assembly)

Mean nucleotide p-distances computed between each MHC-IIB gene copy from scaffold Aaru-DAB*554 (Acar-DAB*554_1–9) and all MHC-IIB gene copies at four scaffolds with tandemly duplicated genes in open reading frame (ORF) in the focal individual: Aaru-DAB*357 (purple), Aaru-DAB*45 (light grey), Aaru-DAB*301 (medium grey) and Aaru-DAB*120 (dark grey).

DOI: 10.7717/peerj.15480/supp-7

MHC-I amplicon alleles in the focal individual

List of MHC-I (exon 3) amplicon alleles in the focal individual, corresponding sequences obtained with both MHC-I primer pair combinations (HNalla/NH46; HNalla-1/R3Ex3b) and matching MHC-I allele in GeneBank (Sequence ID provided for each MHC-I amplicon allele).

DOI: 10.7717/peerj.15480/supp-8

MHC-IIB amplicon alleles in the focal individual

List of MHC-IIB (exon 2) amplicon alleles in the focal individual, corresponding nucleotide sequences obtained with both MHC-IIB primer pair combinations (PP1 (trimmed length sequences) and PP5) and matching MHC-IIB allele in GeneBank (Sequence ID provided for each MHC-IIB amplicon allele).

DOI: 10.7717/peerj.15480/supp-9

Additional Information and Declarations

Competing Interests

The authors declare there are no competing interests.

Author Contributions

Samantha Mellinger conceived and designed the experiments, performed the experiments, analyzed the data, prepared figures and/or tables, authored or reviewed drafts of the article, and approved the final draft.

Martin Stervander performed the experiments, authored or reviewed drafts of the article, and approved the final draft.

Max Lundberg analyzed the data, authored or reviewed drafts of the article, and approved the final draft.

Anna Drews performed the experiments, authored or reviewed drafts of the article, and approved the final draft.

Helena Westerdahl conceived and designed the experiments, authored or reviewed drafts of the article, and approved the final draft.

Animal Ethics

The following information was supplied relating to ethical approvals (i.e., approving body and any reference numbers):

The focal individual of this study was sacrificed in 1996 and all individuals in this study were blood sampled with the permission from the Swedish Environmental Protection Agency.

DNA Deposition

The following information was supplied regarding the deposition of DNA sequences:

17 sequences of the same length (263 bp) are available at GenBank: MH468838; MH468849; MH468857; MH468874; MH468954; MH468960; MH468992; MH469005; MH469010; MH469033; MH469055; MH469097; MH469099; MH469109; MH469111; MH469115; MH469150.

The remaining 12 amplicon sequences for MHC-I and 95 amplicon sequences for MHC-IIB used in the mapping procedure are available in the Supplementary Files and GenBank: OP524014 to OP524120.

Data Availability

The following information was supplied regarding data availability:

The great reed warbler genome assembly acrAru1 is available at BioProject: PRJNA765537 Sigeman et al., 2021

The MHC contigs are available at Dryad along with manually curated fasta-files of full-length MHC-I and MHC-IIB genes in open reading frame: Westerdahl, Helena (2022), The genomic architecture of the passerine MHC region: high repeat content and contrasting evolutionary histories of single copy and tandemly duplicated MHC genes, Dryad, Dataset, https://doi.org/10.5061/dryad.fqz612jv6.

The raw sequencing data from amplicons (Illumina MiSeq) are available at NCBI BioProject: PRJNA913109.

Funding

This work was supported by the European Research Council (ERC) under the European Union’s Horizon 2020 Research and Innovation Programme (grant number 679799 to Helena Westerdahl), the Swedish Research Council (grant numbers 2015-05149, 2020-04285 to Helena Westerdahl) and by the Jörgen Lindström’s Foundation (grant number 137301 attributed to Samantha Mellinger). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

2 Citations 1,166 Views 51 Downloads