Recent population expansion of longtail tuna Thunnus tonggol (Bleeker, 1851) inferred from the mitochondrial DNA markers

The population genetic diversity and demographic history of the longtail tuna Thunnus tonggol in Malaysian waters was investigated using mitochondrial DNA D-loop and NADH dehydrogenase subunit 5 (ND5). A total of 203 (D-loop) and 208 (ND5) individuals of T. tonggol were sampled from 11 localities around the Malaysian coastal waters. Low genetic differentiation between populations was found, possibly due to the past demographic history, dispersal potential during egg and larval stages, seasonal migration in adults, and lack of geographical barriers. The gene trees, constructed based on the maximum likelihood method, revealed a single panmictic population with unsupported internal clades, indicating an absence of structure among the populations studied. Analysis on population pairwise comparison ФST suggested the absence of limited gene flow among study sites. Taken all together, high haplotype diversity (D-loop = 0.989–1.000; ND5 = 0.848–0.965), coupled with a low level of nucleotide diversity (D-loop = 0.019–0.025; ND5 = 0.0017–0.003), “star-like” haplotype network, and unimodal mismatch distribution, suggests a recent population expansion for populations of T. tonggol in Malaysia. Furthermore, neutrality and goodness of fit tests supported the signature of a relatively recent population expansion during the Pleistocene epoch. To provide additional insight into the phylogeographic pattern of the species within the Indo-Pacific Ocean, we included haplotypes from GenBank and a few samples from Taiwan. Preliminary analyses suggest a more complex genetic demarcation of the species than an explicit Indian Ocean versus Pacific Ocean delineation.


INTRODUCTION
Thunnus tonggol, locally known as aya/tongkol hitam or longtail tuna, is a pelagic-neritic marine fish classified in the subgenus Neothunnus from the tribe Thunini within family Scombridae along with blackfin (T. atlanticus) and yellowfin (T. albacares) tuna (Chow & Kishino, 1995). It is the second smallest Thunnus species (Griffiths et al., 2019) but is also reported as the largest growing species among neritic tuna (Koya et al., 2018). T. tonggol is distributed exclusively in the Indo-Pacific region between 47 N and 33 S (Froese & Pauly, 2011) and is one of the most economically important species in Southeast Asia (Itoh, Tsuji & Chow, 1996;Hidayat & Noegroho, 2018). It represents essential, artisanal, and sustenance fisheries as one of the biggest sources of wild caught food (Frimodt, 1995;Collette, 2001). It is also considered an important sport-fish due to its large size and fighting ability (Griffiths et al., 2019).
Basic population parameters, such as the number and distribution of stocks, as well as population genetic diversity are very much needed for a sound management program. For example, population stock data is essential to support resource recovery and to aid in delineating and monitoring populations for fishery management (Roldan et al., 2000;Kunal et al., 2014). Despite its important contributions, coupled with steadily increasing demands in recent years, there is little research on T. tonggol; it has, in fact, received less attention than other pelagic species in Southeast Asian waters (Willette, Santos & Leadbitter, 2016), including Malaysia. A previous study on population genetics of T. tonggol from the northwest coast of India, based on the mitochondrial displacement loop (D-loop) marker, revealed low genetic differentiation between localities, which suggested a panmictic stock structure (Kunal et al., 2014). However, a study across a wider spatial coverage, also based on the D-loop marker, suggested geographical segregation for T. tonggol within the Indo-Pacific region (Willette, Santos & Leadbitter, 2016). In other tuna species, such as the bigeye tuna (T. obesus), populations from the South China Sea, the Philippines Sea, and western Pacific Ocean consist of a single intermixing identity (Chiang et al., 2006), while yellowfin tuna (T. albacore) along the Indian coast, displayed multiple geographically distinct stocks (Kunal et al., 2013).
Population genetics is an essential tool to improve knowledge on stock delineation and population dynamics of exploited fish (Hauser & Seeb, 2008). Genetic markers, like mitochondrial DNA (mtDNA), has proven to be one of the most efficient tools for evaluating intraspecific genetic variation and to describe population genetics study (Menezes, Kumar & Kunal, 2012;Tan, Jamsari & Siti Azizah, 2012;Nabilsyafiq et al., 2019). Moreover, it is also widely used in evolutionary genetics as markers for population history and to estimate divergence times among taxa (Tan et al., 2020). MtDNA is considered a sensitive and reliable marker (Hoolihan, Anandh & Herwerden, 2006;Habib & Sulaiman, 2016Tan et al., 2019) due to its large quantity in the cell and elevated mutation rate (11.6 times higher than nuclear DNA (Allio et al., 2017)), a consequence of a non-existent or inefficient repair system. In addition, the lack of recombination in mtDNA, coupled with relatively infrequent gene arrangement, makes it a good choice for population genetics study (De Mandal et al., 2014).
The mtDNA NADH dehydrogenase subunit 5 (ND5) and the non-coding and highly polymorphic D-loop markers were adopted to conduct the first population genetics and phylogeographic studies among wild populations of T. tonggol in Malaysian coastal areas. The utilization of D-loop as a population genetic marker has been widely documented in a plethora of marine species, including in tuna (Durand et al., 2005;Chiang et al., 2006;Carpenter et al., 2011). However, mtDNA ND5 gene is rarely used in marine species. Nevertheless, this protein-coding gene contains both slow and rapid evolving regions that permit its application in elucidating the genetic relationships among populations; for instance, six populations of the Persian sturgeon Acipenser persicus from the south Caspian Sea were found to be genetically differentiated, as inferred from the ND5 polymerase chain reaction-restriction fragment length polymorphism (PCR-RFLP) assay (Pourkazemi et al., 2012); meanwhile, moderate genetic differentiation among populations of masu salmon Oncorhynchus masou masou from Japan, Russia, and Korea was apparent based on ND5 and microsatellites markers (Yu et al., 2010). However, low genetic differentiation with high gene flow was detected between sampling locations of the Pearse's mudskipper (Periophthalmus novemradiatus) that inhabits Setiu Wetland, Terengganu, Malaysia (Nabilsyafiq et al., 2019).
The present study aimed at elucidating the population genetics of T. tonggol in Malaysian waters based on mtDNA D-loop and ND5 markers, as well as further investigating the phylogeography of T. tonggol within the Indo-Pacific region, where samples from Taiwan (provided by National Taiwan University) and India (west coast: Kochi, Veraval, and Ratnagiri, and east coast: Andaman Sea; all haplotype sequences retrieved from GenBank) were included in molecular analyses. The output from this study would be beneficial for the conservation and management of this species.

MATERIALS AND METHOD Ethical statement
Only a small clipping of the pectoral fin from each individual fish was collected from the local wet markets. This species is not in the IUCN list of endangered or protected species. As only dead specimens were sampled, no permit was required and no ethical consideration was linked to the study.

Sample collection
The specimens were collected from 11 fish landing sites in Malaysia and were morphologically identified following identification keys as described in Lim et al. (2018). This species can be differentiated from other Thunnus species by having a moderate length pectoral fin, reaching the origin of the second dorsal fin and blackish caudal fin (Lim et al., 2018). A small portion of pectoral fin of each individual was cut and preserved in 95% ethanol and stored in a 1.5 mL centrifuge tube at 4-8 C until further analysis. Each catch locality was confirmed to be non-overlapping (discrete geographical entity) based on feedback from the fishermen and was divided into four regions following Akib et al. (2015) ( Table 1; Fig. 1A). Additionally, four samples from Taiwan and 153 GenBank sequences of T. tonggol from the Indian waters ( Fig. 1B) were included for phylogeographic analysis.

Sequence editing and alignment
Multiple sequences were aligned and edited using ClustalW implemented in MEGA 6.0 (Tamura et al., 2013). DNA sequences were verified for correct identity by using the Basic Local Alignment Search Tool (BLAST) in the National Center for Biotechnology Information (NCBI) database (http://blast.ncbi.nlm.nih.gov/Blast.cgi) before further

Genetic diversity
The complete aligned datasets were used to estimate the number of haplotypes, haplotype diversity (H), and nucleotide diversity (π) in DnaSP 6.0 (Rozas et al., 2017). The polymorphic and parsimony informative sites were examined in MEGA 6.0.

Phylogenetic and population level analyses
The phylogenetic relationships among haplotypes were determined based on the maximum likelihood (ML) method implemented in MEGA 6.0. The best nucleotide substitution models with the lowest BIC score (Bayesian Information Criterion) for the D-loop and ND5 sequences were Tamura 3-parameter (T92) (Tamura, 1992) and Hasegawa-Kishino-Yano with Gamma distribution and invariant sites (HKY+G+1), respectively, as identified in MEGA 6.0. In the case that the T92 and HKY models were unavailable in the BEAST and Arlequin software packages (see below), the TN93 model (Tamura & Nei, 1993) was used instead. The confidence level for each node was assessed by 1,000 bootstrap replications (Felsenstein, 1985). The Pacific bluefin tuna T. orientalis (AB933631) was included as an out-group taxon for D-loop sequences, while the Yellowfin tuna, T. albacares (KM588080) was included as out-group taxon for ND5 gene sequences.
To infer the relationships among haplotypes from Malaysian waters, a minimum spanning network (MSN) was constructed by using the median-joining method implemented in NETWORK version 5.0.1.1 (Bandelt, Forster & Rohl, 1999). The population pairwise comparisons Ф ST for both datasets were determined using Arlequin 3.5 software (Excoffier & Lischer, 2010) and the statistically significant pairwise comparisons were tested with 10,000 permutations. Significant probability values were corrected by performing the False Discovery Rate Procedure (FDR) at a = 0.05 (Benjamini & Hochberg, 1995). Further analysis of genetic differentiation among populations was extended for haplotype-based statistics (H ST ), sequence-based statistics (N ST ) (Lynch & Crease, 1990), and K ST Ã with significance levels assessed using permutation tests with 1,000 replicates (Hudson, Boos & Kaplan, 1992) in DnaSP 6.0. Using the same program, the estimation of gene flow (Nm) based on both haplotype and sequence statistics were calculated following Nei (1973) and Hudson, Boos & Kaplan (1992), respectively. Genetic distance estimates between sampled populations were calculated in MEGA 6.0. Analysis of molecular variance (AMOVA) was performed to infer the population subdivision with three hierarchical levels, including genes within individuals, individuals within demes, and demes within groups of demes (Excoffier, Laval & Schneider, 2005), by using Arlequin 3.5 software. The Mantel test in IBD v 1.52 (Isolation by Distance) (Mantel, 1967;Bohonak, 2002) was used to investigate the correlation between genetic and geographical distance. Genetic distance was represented by population pairwise Ф ST values while geographical distances between sampling locations were measured by using Google Earth. Geographic distance was ln transformed and the strength of the relationship was examined with reduced major axis regression (10,000 randomizations) in IBD v1.52.

Demographic history
Historical demographic and spatial expansions were inspected in the T. tonggol populations. Fu's F S (Fu, 1997) and Tajima's D (Tajima, 1989) were adopted to analyze deviation from neutrality. Historical demographic parameters, including the population before expansion (ϴ 0 ), after expansion (ϴ 1 ), and relative time since population expansion (τ), were computed in Arlequin 3.5. The values of time (τ) were transformed to estimate the actual time (T) since population expansion, using the equation τ = 2mt, where t is the age of the population in generations and µ is the sequence mutation rate per generation. In the present study, a mutation rate of 3.6 × 10 −8 mutation per site/year was applied for the D-loop (Donaldson & Wilson, 1999) and 2% per million years for the ND5 (Brown, George & Wilson, 1979). Bayesian skyline analyses were plotted using BEAST version 2.2.0 (Bouckaert et al., 2019), where the changes in effective population size (Ne) over time were tested. This enabled past demographic changes of T. tonggol to be inferred from the current patterns of genetic diversity within a population (Drummond et al., 2005). Since there was the absence of a population structure (see "Results"), a single population was modeled. The input was prepared in BEAUti. The analysis was run for 10 8 iterations with a burn-in of 10 7 with sampling every 10 4 and a strict molecular clock. All operators were automatically optimized and the results were generated using Tracer version 1.7.1 (Rambaut et al., 2018). Harpending's (1994) raggedness index (Hri) and sum of squared deviations (SSD) were computed in Arlequin 3.5 to evaluate if the sequence data significantly diverged from the assumptions of a population expansion model. The raggedness index has been shown to be a powerful tool in quantifying population growth with limited sample sizes (Ramos-Onsins & Rozas, 2002). In addition, the mismatch distribution was calculated in DnaSP 6.0. The pattern could be used to provide an insight of the past population demography (Chen et al., 2015). A population that has undergone recent expansion shows a unimodal distribution pattern, while a population in equilibrium shows a multimodal distribution pattern (Slatkin & Hudson, 1991;Rogers & Harpending, 1992).

Phylogenetic and population level analyses
The phylogenetic reconstruction inferred from the mtDNA D-loop region and ND5 gene revealed a gene tree with mainly unsupported clades (<50%) and obscure patterns of geographical segregation associated with genetic distribution (Fig. 2; Supplemental 1). This was aligned with the MSN haplotype network that showed no geographical partitioning among the populations studied. Specifically, 180 D-loop haplotypes showed a complex reticulated network (Fig. 3), while 60 ND5 haplotypes revealed a more clarified network pattern (Fig. 4). No dominant haplotype was detected based on the D-loop marker, however, Hap004 and Hap005 were considered the most abundant and common haplotypes, followed by Hap036 and Hap116. Among ND5 haplotypes, Hap01 was the most dominant haplotype followed by Hap03, Hap15, Hap06, and Hap10. Hap01 was found at all sampling sites and was considered the ancestral haplotype. A network with an ancestral haplotype typically shows a star-like or star-burst appearance with the ancestral haplotype centered in it (Ferreri, Qu & Han, 2011).
Pairwise comparison Ф ST analysis corroborated the low and non-significant population structure of T. tonggol from Malaysian waters (D-loop: −0.0324 to 0.1191 (Table 3); ND5: −0.0254 to 0.0739 (Table 4) . Correspondingly, the pairwise genetic distances among populations also exhibit relatively low values ranging from 0.0197 to 0.0251 (D-loop) and 0.0020 to 0.0038 (ND5). The hierarchical AMOVA indicated that more than 99% of the total genetic variation in Malaysian T. tonggol was contributed by genetic differences within populations. Attempts to identify if population subdivisions exist among the T. tonggol populations (BT vs. other populations) returned a non-significant F CT value with less than 1% contribution to the total genetic variation, while more than 99% of the total genetic variation was contributed within populations, based on both datasets. The Mantel Test also supported earlier findings, demonstrating no correlation between genetic differentiation (pairwise Ф ST value) and geographical distance (D-loop: r = −0.3763, P = 0.08 and ND5: r = −0.0050, P = 0.43) among Malaysian populations. D-loop sequences of T. tonggol from Taiwan (TW), the east coast of the Indian Ocean (ECIO), and some of the west coast of the Indian Ocean (WCIO) were clustered with the Malaysian haplotypes, while some other WCIO haplotypes were placed into another clade with high bootstrap support ( Fig. 2A; Supplemental 1). ML tree partitioning was partly in agreement with the pairwise comparisons Ф ST , where TW was not significantly structured for Malaysia, ECIO nor WCIO. In contrast, all pairwise comparisons involving ECIO and WCIO against Malaysian's populations were statistically significant after FDR correction at a = 0.05, except for WCIO-KT (Table 3). Meanwhile, WCIO and ECIO were not genetically subdivided from each other (P > 0.05) ( Table 3). A hierarchical AMOVA revealed the existence of genetic subdivision between the Indian Ocean and Malaysian waters (F CT : 0.09, P < 0.05), yet 90.01% of the genetic variation within the Indo-Pacific region was contributed by genetic differences within populations. Pairwise genetic distances between TW and Malaysian populations ranged from 0.0181 to 0.0219, while the genetic distances were relatively higher for pairwise comparisons involving ECIO and WCIO, that is, from 0.0238 to 0.0371 (Table 3).

Demographic history
Negative values of Tajima's D and Fu's F S (all significant at P < 0.05) neutrality tests were detected in all populations inferred from both the mtDNA D-loop and ND5 gene  (Table 2). Large differences in population sizes before (θ 0 ) and after expansion (θ 1 ) were detected, i.e (on average) 0.116 and 54,915.300 based on D-loop sequences, while 0.103 and 81,838.400 were based on the ND5 gene marker (Table 2). Corresponding to the τ     (Table 2), the calculated expansion time for T. tonggol in Malaysian waters was 67,865 and 284,252 years ago inferred by ND5 and D-loop markers, respectively Bayesian skyline plot (BSP) analysis revealed two significant increases in effective population size that occurred 200,000 and 950,000 years ago based on the D-loop marker (Fig. 5A), while continuous expansion started 150,000 years ago with a more recent expansion around 100,000 years ago based on the ND5 gene marker (Fig. 5B). Goodness of fit tests (Hri and SSD) exhibited non-significant values for the overall samples (P > 0.05) ( Table 2). Population demographic analysis of T. tonggol matched a unimodal distribution for overall samples (Fig. 6).

Phylogenetic and population level analyses
Populations of T. tonggol from Malaysian waters exhibited an absence of geographical structure associated with mtDNA sequences, as evidenced by the single-clade gene trees ( Fig. 2; Supplemental 1), ambiguous genetic partitioning of haplotype networks (Figs. 3 and 4), low and non-significant values of pairwise Ф ST (Tables 3 and 4) (except for several comparisons involving BT population), high contribution of within population variation through AMOVA, and non-significant correlation between genetic differentiation and geographical distance. These results strongly suggest that the T. tonggol populations in Malaysian waters were panmictic with shallow genetic structure due to high gene flow, similarly reported in other studies of the same species (Kunal et al., 2014;Willette, Santos & Leadbitter, 2016) and several other species (Carpenter et al., 2011). Wright (1931) suggested that the level of genetic differentiation among populations is related to the rate of evolutionary processes, like migration, mutation, and drift. Thus, a highly migratory species with a large population size, such as T. tonggol, is predicted to show limited population partitioning. BT population showed significant genetic structure from the rest (except TG and MR) and SB, inferred from the D-loop and ND5 sequences respectively, based on the population pairwise Ф ST analysis, however, all other analyses suggested genetic homogeneity with other T. tonggol populations in Malaysian waters. We believe that this could be due the different weightage of algorithms or characters used in the various analyses, perhaps a different emphasis on nucleotide versus haplotype diversity. Pelagic fish in the marine realm are well-known to exhibit little genetic divergence (Ely et al., 2005). The weak genetic structure observed in T. tonggol within the pelagic environment is typical of pelagic fish due to their biological and life histories (Fauvelot & Borsa, 2011;Pedrosa-Gerasmio, Agmata & Santos, 2014). T. tonggol spawns during the monsoon season (Koya et al., 2018) where the ocean circulation shift (upwelling and down welling) during this season would enrich the water at the surface and thus lead to the growth of plankton (Yohannan & Abdurrahiman, 1998). Although T. tonggol is not a plankton feeder, the plankton bloom somehow enriches the food resources for other fish that become the prey of T. tonggol, hence, creating an optimal spawning ground for the species. T. tonggol is believed to spawn close to coastal waters (Nishikawa & Ueyanagi, 1991), thus the dynamic movement of waters during monsoon, not only helps in circulation of rich nutrients, but also in the larvae dispersal that could span a larger area (Madhavi & Lakshmi, 2012). Another possible explanation for the absence of limited gene flow among populations is the pattern of migration in the adult stage. In addition to the high dispersal potential during egg and larval stages, adults are characterized by high maneuverability during seasonal migration.
An earlier study by Willette, Santos & Leadbitter (2016) showed delineation of the Indian Ocean versus the South China Sea (samples were collected from Vietnam, Indonesia, and the Philippines) but without representatives from Malaysian waters. In the present study, based on the increased sample populations on a finer scale within this biogeographical region and GenBank sequences, we hypothesize that the genetic barrier lies within the Andaman Sea, which hindered partial gene flow between the coast of India and Malaysian waters, as evidenced in the ML tree ( Fig. 2A; Supplemental 1), pairwise comparisons Ф ST (Table 3), and hierarchical AMOVA. There was also another possible break between ECIO and WCIO based on results in the ML tree ( Fig. 2A; Supplemental 1) and pairwise comparisons Ф ST, though the moderate pairwise Ф ST value is not significant (Table 3). However, the phylogenetic relationship of T. tonggol from Malaysian waters and other regions of the South China Sea remains unknown due to the limited genetic data and unavailability of the haplotype sequences from the study by Willette, Santos & Leadbitter (2016) in the public database. We postulated that close genetic relationships would be expected, based on the recent findings regarding the absence of genetic subdivision between TW and Malaysian T. tonggol. Future studies should include more detailed sampling within the Andaman Sea and adjacent waters to substantiate this.

Demographic history
Historical events during the Pleistocene epoch could have shaped the genetic diversification of T. tonggol populations observed in the present study. All relevant statistical tests implied a scenario of past population/demographic expansion in the absence of background selection. Negative Fu's F S values signified the alterations caused by population expansion and/or selection (Fu, 1997), which was further supported by Tajima's D that implied a notable population growth or genetic hitchhiking in a background of recent excess mutations (Tajima, 1989). Likewise, the non-significant sum of squared deviations (SSD) and Harpending's raggedness index (Hri) indicated the occurrence of population expansion in T. tonggol (Kunal et al., 2014) that inhabits Malaysian waters. Furthermore, the star-like pattern of the median-joining network (Fig. 4) and unimodal pattern of mismatch distribution (Fig. 6) further support the occurrence of a sudden demographic expansion during recent history of the taxa (Slatkin & Hudson, 1991;Rogers & Harpending, 1992;Ferreri, Qu & Han, 2011;Kunal et al., 2014;Chen et al., 2015;Pedrosa-Gerasmio, Agmata & Santos, 2014).
The large population size differences before (θ 0 ) and after expansion (θ 1 ) also suggested a rapid population expansion of T. tonggol in the past as also reported in a previous study of populations from India (Kunal et al., 2014). The overall τ value observed in Malaysia was much lower than T. tonggol populations from Indian waters, which was 21.26. The estimated time for population expansion in Indian waters was 593,334 years before the present (Kunal et al., 2014), as compared to 67,865 and 284,252 years ago inferred by the ND5 and D-loop markers, respectively, for T. tonggol from Malaysian waters. This suggests that T. tonggol in the Indian region underwent earlier expansion, with subsequent large population retention. In this study, the ND5 gene marker was able to detect a more recent population expansion of T. tonggol populations in Malaysian waters around 70,000 years ago (based on tau value) and 100,000 years ago (based on BSP analysis). Based on the D-loop marker, two expansion events were detected, where the first round occurred around 200,000 years ago (based on BSP analysis) or 284,000 years ago (based on tau value) (during middle Pleistocene (126,000-781,000 years before present) (Saul, 2016)) and the second round occurred around 950,000 years ago (based on BSP analysis) (during early Pleistocene) (Fig. 5). In general, both markers were able to detect population expansion that occurred around 200,000 years ago (ND5 gene marker indicated population expansion started before 150,000 years ago).

Implication for fisheries management
To date, there is limited information on the population structure of T. tonggol, especially in Malaysia, and from the management point of view, this is a critical issue. The present study provides the first baseline population genetic data on T. tonggol populations in Malaysian waters, which is important information for management planning by authorities.
Managing fishery resources takes a significant effort to protect and replenish the genetic pools for a sustainable harvest. Molecular analyses, in complement with other approaches, may serve as a reliable measurement for an efficient preservation strategy (BjØrnstad & Ried, 2002;Toro, Barragan & Ovilo, 2003). The genetic data suggest that T. tonggol in Malaysia forms a panmictic population as observed by the wide distributional range of this species and non-significant low Ф ST values among the populations studied, thus suggesting a single evolutionary significant unit (ESU). However, this study was based on mitochondrial markers and therefore, restricted to only the pattern of maternal inheritance. For a holistic genetic perspective of bi-parental inheritance, co-dominant markers, such as microsatellites, should be included.

CONCLUSIONS
The T. tonggol populations in Malaysian waters revealed the absence of population structure as inferred by both mtDNA markers and therefore, could be regarded as a single stock unit for management purposes based on the current data. Their inferred demographic history suggests that T. tonggol populations expanded significantly during the middle and early Pleistocene. Overall, this study is a critical first baseline, providing insights for stock management of this neritic species in Malaysian coastal areas. Coupled with other related information, the assimilation of this genetic information could aid the development of effective management plans in the future, not only in Malaysia but also in neighboring countries sharing the same waters. Finally, this has contributed further insights into genetic locality, delineating the species within the Indo-Pacific biogeographical region.