title: PeerJ description: Articles published in PeerJ link: https://peerj.com/articles/index.rss3?journal=peerj&page=854 creator: info@peerj.com PeerJ errorsTo: info@peerj.com PeerJ language: en title: Validation and analysis of the geographical origin of Angelica sinensis (Oliv.) Diels using multi-element and stable isotopes link: https://peerj.com/articles/11928 last-modified: 2021-08-06 description: BackgroundPlace of origin is an important factor when determining the quality and authenticity of Angelica sinensis for medicinal use. It is important to trace the origin and confirm the regional characteristics of medicinal products for sustainable industrial development. Effectively tracing and confirming the material’s origin may be accomplished by detecting stable isotopes and mineral elements.MethodsWe studied 25 A. sinensis samples collected from three main producing areas (Linxia, Gannan, and Dingxi) in southeastern Gansu Province, China, to better identify its origin. We used inductively coupled plasma mass spectrometry (ICP-MS) and stable isotope ratio mass spectrometry (IRMS) to determine eight mineral elements (K, Mg, Ca, Zn, Cu, Mn, Cr, Al) and three stable isotopes (δ13C, δ15N, δ18O). Principal component analysis (PCA), partial least square discriminant analysis (PLS-DA) and linear discriminant analysis (LDA) were used to verify the validity of its geographical origin.ResultsK, Ca/Al, δ13C, δ15N and δ18O are important elements to distinguish A. sinensis sampled from Linxia, Gannan and Dingxi. We used an unsupervised PCA model to determine the dimensionality reduction of mineral elements and stable isotopes, which could distinguish the A. sinensis from Linxia. However, it could not easily distinguish A. sinensis sampled from Gannan and Dingxi. The supervised PLS-DA and LDA models could effectively distinguish samples taken from all three regions and perform cross-validation. The cross-validation accuracy of PLS-DA using mineral elements and stable isotopes was 84%, which was higher than LDA using mineral elements and stable isotopes.ConclusionsThe PLS-DA and LDA models provide a theoretical basis for tracing the origin of A. sinensis in three regions (Linxia, Gannan and Dingxi). This is significant for protecting consumers’ health, rights and interests. creator: Shanjia Li creator: Hui Wang creator: Ling Jin creator: James F. White creator: Kathryn L. Kingsley creator: Wei Gou creator: Lijuan Cui creator: Fuxiang Wang creator: Zihao Wang creator: Guoqiang Wu uri: https://doi.org/10.7717/peerj.11928 license: https://creativecommons.org/licenses/by/4.0/ rights: ©2021 Li et al. title: Establishment of a ferroptosis-related gene signature for prognosis in lung adenocarcinoma patients link: https://peerj.com/articles/11931 last-modified: 2021-08-06 description: ObjectiveLung cancer is the most common malignancy worldwide and exhibits both high morbidity and mortality. In recent years, scientists have made substantial breakthroughs in the early diagnosis and treatment of lung adenocarcinoma (LUAD), however, patient prognosis still shows vast individual differences. In this study, bioinformatics methods were used to identify and analyze ferroptosis-related genes to establish an effective signature for predicting prognosis in LUAD patients.MethodsThe gene expression profiles of LUAD patients with complete clinical and follow-up information were downloaded from two public databases, univariate Cox regression and multivariate Cox regression analysis were used to obtain ferroptosis-related genes for constructing the prognos tic risk model, AUC and calibration plot were used to evaluate the predictive accuracy of the FRGS and nomogram.ResultsA total of 74 ferroptosis-related differentially expressed genes (DEGs) were identi fied between LUAD and normal tissues from The Cancer Genome Atlas (TCGA) database. A five-gene panel for prediction of LUAD prognosis was established by multivariate regression and was verified using the GSE68465 cohort from the Gene Expression Omnibus (GEO) database. Patients were divided into two different risk groups according to the median risk score of the five genes. Based on Kaplan-Meier (KM) analysi, the OS rate of the high-risk group was markedly worse than that of the low-risk group. We also found that risk score was an independent prognostic indicator. The receiver operating characteristic ROC curve showed that the proposed model had good prediction ability. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) functional analyses indicated that risk score was prominently enriched in ferroptosis processes. Moreover, at the score of immune-associated gene sets, significant differences were found between the two risk groups.ConclusionsThis study demonstrated that ferroptosis-related gene signatures can be used as a potential predictor for the prognosis of LUAD, thus providing a novel strategy for individualized treatment in LUAD patients. creator: Jingjing Cai creator: Chunyan Li creator: Hongsheng Li creator: Xiaoxiong Wang creator: Yongchun Zhou uri: https://doi.org/10.7717/peerj.11931 license: https://creativecommons.org/licenses/by/4.0/ rights: ©2021 Cai et al. title: Optimizing the power of human performed audio surveys for monitoring the endangered Houston toad using automated recording devices link: https://peerj.com/articles/11935 last-modified: 2021-08-06 description: Knowledge regarding the locations of populations of endangered species is a critical part of recovery and facilitates land use planning that avoids unnecessary impacts. Regulatory agencies often support the development of survey guidelines designed to standardize the methods and maximize the probability of detection, thereby avoiding incorrectly concluding a species is absent from a site. Here, using simulations with data collected using automated recording devices (ARDs) we evaluated the efficacy of the existing U.S. Fish and Wildlife Service’s survey requirements for the endangered Houston Toad (Anaxyrus houstonensis). We explored the effect of (1) increasing survey duration, (2) increasing the number of surveys, and (3) combinations of environmental conditions (e.g., temperature, humidity, rainfall) on the detection probability and the number of surveys needed to be 95% confident of absence. We found that increases in both the duration of the survey and the number of surveys conducted decreased the likelihood of incorrectly concluding the species was absent from the site, and that the number of surveys required to be 95% confident greatly exceeded the existing survey requirements. Targeting specific environmental conditions was also an effective way to decrease the number of surveys required but the infrequency in which these conditions occurred might make application difficult in some years. Overall, we suggest that the survey effort necessary to achieve confidence in the absence of Houston Toads at a site is more practically achievable with the use of ARDs, but this may not be suitable in all monitoring scenarios. creator: Andrew R. MacLaren creator: Paul S. Crump creator: Michael R.J. Forstner uri: https://doi.org/10.7717/peerj.11935 license: https://creativecommons.org/licenses/by/4.0/ rights: ©2021 MacLaren et al. title: Rephine.r: a pipeline for correcting gene calls and clusters to improve phage pangenomes and phylogenies link: https://peerj.com/articles/11950 last-modified: 2021-08-06 description: BackgroundA pangenome is the collection of all genes found in a set of related genomes. For microbes, these genomes are often different strains of the same species, and the pangenome offers a means to compare gene content variation with differences in phenotypes, ecology, and phylogenetic relatedness. Though most frequently applied to bacteria, there is growing interest in adapting pangenome analysis to bacteriophages. However, working with phage genomes presents new challenges. First, most phage families are under-sampled, and homologous genes in related viruses can be difficult to identify. Second, homing endonucleases and intron-like sequences may be present, resulting in fragmented gene calls. Each of these issues can reduce the accuracy of standard pangenome analysis tools.MethodsWe developed an R pipeline called Rephine.r that takes as input the gene clusters produced by an initial pangenomics workflow. Rephine.r then proceeds in two primary steps. First, it identifies three common causes of fragmented gene calls: (1) indels creating early stop codons and new start codons; (2) interruption by a selfish genetic element; and (3) splitting at the ends of the reported genome. Fragmented genes are then fused to create new sequence alignments. In tandem, Rephine.r searches for distant homologs separated into different gene families using Hidden Markov Models. Significant hits are used to merge families into larger clusters. A final round of fragment identification is then run, and results may be used to infer single-copy core genomes and phylogenetic trees.ResultsWe applied Rephine.r to three well-studied phage groups: the Tevenvirinae (e.g., T4), the Studiervirinae (e.g., T7), and the Pbunaviruses (e.g., PB1). In each case, Rephine.r recovered additional members of the single-copy core genome and increased the overall bootstrap support of the phylogeny. The Rephine.r pipeline is provided through GitHub (https://www.github.com/coevoeco/Rephine.r) as a single script for automated analysis and with utility functions to assist in building single-copy core genomes and predicting the sources of fragmented genes. creator: Jason W. Shapiro creator: Catherine Putonti uri: https://doi.org/10.7717/peerj.11950 license: https://creativecommons.org/licenses/by/4.0/ rights: ©2021 Shapiro and Putonti title: Genome-wide analysis of the lignin toolbox for morus and the roles of lignin related genes in response to zinc stress link: https://peerj.com/articles/11964 last-modified: 2021-08-06 description: Mulberry (Morus, Moraceae) is an important economic plant with nutritional, medicinal, and ecological values. Lignin in mulberry can affect the quality of forage and the saccharification efficiency of mulberry twigs. The availability of the Morus notabilis genome makes it possible to perform a systematic analysis of the genes encoding the 11 protein families specific to the lignin branch of the phenylpropanoid pathway, providing the core genes for the lignin toolbox in mulberry. We performed genome-wide screening, which was combined with de novo transcriptome data for Morus notabilis and Morus alba variety Fengchi, to identify putative members of the lignin gene families followed by phylogenetic and expression profile analyses. We focused on bona fide clade genes and their response to zinc stress were further distinguished based on expression profiles using RNA-seq and RT-qPCR. We finally identified 31 bona fide genes in Morus notabilis and 25 bona fide genes in Fengchi. The putative function of these bona fide genes was proposed, and a lignin toolbox that comprised 19 genes in mulberry was provided, which will be convenient for researchers to explore and modify the monolignol biosynthesis pathway in mulberry. We also observed changes in the expression of some of these lignin biosynthetic genes in response to stress caused by excess zinc in Fengchi and proposed that the enhanced lignin biosynthesis in lignified organs and inhibition of lignin biosynthesis in leaf is an important response to zinc stress in mulberry. creator: Nan Chao creator: Ting Yu creator: Chong Hou creator: Li Liu creator: Lin Zhang uri: https://doi.org/10.7717/peerj.11964 license: https://creativecommons.org/licenses/by/4.0/ rights: © 2021 Chao et al. title: The Erlang distribution approximates the age distribution of incidence of childhood and young adulthood cancers link: https://peerj.com/articles/11976 last-modified: 2021-08-06 description: BackgroundIt is widely believed that cancers develop upon acquiring a particular number of (epi) mutations in driver genes, but the law governing the kinetics of this process is not known. We have previously shown that the age distribution of incidence for the 20 most prevalent cancers of old age is best approximated by the Erlang probability distribution. The Erlang distribution describes the probability of several successive random events occurring by the given time according to the Poisson process, which allows an estimate for the number of critical driver events.MethodsHere we employ a computational grid search method to find global parameter optima for five probability distributions on the CDC WONDER dataset of the age distribution of childhood and young adulthood cancer incidence.ResultsWe show that the Erlang distribution is the only classical probability distribution we found that can adequately model the age distribution of incidence for all studied childhood and young adulthood cancers, in addition to cancers of old age.ConclusionsThis suggests that the Poisson process governs driver accumulation at any age and that the Erlang distribution can be used to determine the number of driver events for any cancer type. The Poisson process implies the fundamentally random timing of driver events and their constant average rate. As waiting times for the occurrence of the required number of driver events are counted in decades, and most cells do not live this long, it suggests that driver mutations accumulate silently in the longest-living dividing cells in the body—the stem cells. creator: Aleksey V. Belikov creator: Alexey Vyatkin creator: Sergey V. Leonov uri: https://doi.org/10.7717/peerj.11976 license: https://creativecommons.org/licenses/by/4.0/ rights: ©2021 Belikov et al. title: Gene selection for studying frugivore-plant interactions: a review and an example using Queensland fruit fly in tomato link: https://peerj.com/articles/11762 last-modified: 2021-08-05 description: Fruit production is negatively affected by a wide range of frugivorous insects, among them tephritid fruit flies are one of the most important. As a replacement for pesticide-based controls, enhancing natural fruit resistance through biotechnology approaches is a poorly researched but promising alternative. The use of quantitative reverse transcription PCR (RT-qPCR) is an approach to studying gene expression which has been widely used in studying plant resistance to pathogens and non-frugivorous insect herbivores, and offers a starting point for fruit fly studies. In this paper, we develop a gene selection pipe-line for known induced-defense genes in tomato fruit, Solanum lycopersicum, and putative detoxification genes in Queensland fruit fly, Bactrocera tryoni, as a basis for future RT-qPCR research. The pipeline started with a literature review on plant/herbivore and plant/pathogen molecular interactions. With respect to the fly, this was then followed by the identification of gene families known to be associated with insect resistance to toxins, and then individual genes through reference to annotated B. tryoni transcriptomes and gene identity matching with related species. In contrast for tomato, a much better studied species, individual defense genes could be identified directly through literature research. For B. tryoni, gene selection was then further refined through gene expression studies. Ultimately 28 putative detoxification genes from cytochrome P450 (P450), carboxylesterase (CarE), glutathione S-transferases (GST), and ATP binding cassette transporters (ABC) gene families were identified for B. tryoni, and 15 induced defense genes from receptor-like kinase (RLK), D-mannose/L-galactose, mitogen-activated protein kinase (MAPK), lipoxygenase (LOX), gamma-aminobutyric acid (GABA) pathways and polyphenol oxidase (PPO), proteinase inhibitors (PI) and resistance (R) gene families were identified from tomato fruit. The developed gene selection process for B. tryoni can be applied to other herbivorous and frugivorous insect pests so long as the minimum necessary genomic information, an annotated transcriptome, is available. creator: Shirin Roohigohar creator: Anthony R. Clarke creator: Peter J. Prentis uri: https://doi.org/10.7717/peerj.11762 license: https://creativecommons.org/licenses/by/4.0/ rights: ©2021 Roohigohar et al. title: The phylogenetic relationships of geoemydid turtles from the Eocene Messel Pit Quarry: a first assessment using methods for continuous and discrete characters link: https://peerj.com/articles/11805 last-modified: 2021-08-05 description: The geoemydid turtles of the Eocoene Messel Pit Quarry of Hesse, Germany, are part of a rich Western European fossil record of testudinoids. Originally referred to as “Ocadia” kehreri and “Ocadia” messeliana, their systematic relationships remain unclear. A previous study proposed that a majority of the Western European geoemydids, including the Messel geoemydids, are closely related to the Recent European representatives of the clade Mauremys. Another study hypothesised that the Western European geoemydid fauna is more phylogenetically diverse, and that the Messel geoemydids are closely related to the East Asian turtles Orlitia and Malayemys. Here we present the first quantitative analyses to date that investigate this question. We use continuous characters in the form of ratios to estimate the placement of the Messel geoemydids in a reference tree that was estimated from molecular data. We explore the placement error obtained from that data with maximum likelihood and Bayesian methods, as well as linear parsimony in combination with discrete characters. We find good overall performance with Bayesian and parsimony analyses. Parsimony performs even better when we also incorporated discrete characters. Yet, we cannot pin down the position of the Messel geoemydids with high confidence. Depending on how intraspecific variation of the ratio characters is treated, parsimony favours a placement of the Messel fossils sister to Orlitia borneensis or sister to Geoemyda spengleri, with weak bootstrap support. The latter placement is suspect because G. spengleri is a phylogenetically problematic species with molecular and morphological data. There is even less support for placements within the Mauremys clade. creator: Eduardo Ascarrunz creator: Julien Claude creator: Walter G. Joyce uri: https://doi.org/10.7717/peerj.11805 license: https://creativecommons.org/licenses/by/4.0/ rights: © 2021 Ascarrunz et al. title: Congruence between morphology-based species and Barcode Index Numbers (BINs) in Neotropical Eumaeini (Lycaenidae) link: https://peerj.com/articles/11843 last-modified: 2021-08-05 description: BackgroundWith about 1,000 species in the Neotropics, the Eumaeini (Theclinae) are one of the most diverse butterfly tribes. Correct morphology-based identifications are challenging in many genera due to relatively little interspecific differences in wing patterns. Geographic infraspecific variation is sometimes more substantial than variation between species. In this paper we present a large DNA barcode dataset of South American Lycaenidae. We analyze how well DNA barcode BINs match morphologically delimited species.MethodsWe compare morphology-based species identifications with the clustering of molecular operational taxonomic units (MOTUs) delimitated by the RESL algorithm in BOLD, which assigns Barcode Index Numbers (BINs). We examine intra- and interspecific divergences for genera represented by at least four morphospecies. We discuss the existence of local barcode gaps in a genus by genus analysis. We also note differences in the percentage of species with barcode gaps in groups of lowland and high mountain genera.ResultsWe identified 2,213 specimens and obtained 1,839 sequences of 512 species in 90 genera. Overall, the mean intraspecific divergence value of CO1 sequences was 1.20%, while the mean interspecific divergence between nearest congeneric neighbors was 4.89%, demonstrating the presence of a barcode gap. However, the gap seemed to disappear from the entire set when comparing the maximum intraspecific distance (8.40%) with the minimum interspecific distance (0.40%). Clear barcode gaps are present in many genera but absent in others. From the set of specimens that yielded COI fragment lengths of at least 650 bp, 75% of the a priori morphology-based identifications were unambiguously assigned to a single Barcode Index Number (BIN). However, after a taxonomic a posteriori review, the percentage of matched identifications rose to 85%. BIN splitting was observed for 17% of the species and BIN sharing for 9%. We found that genera that contain primarily lowland species show higher percentages of local barcode gaps and congruence between BINs and morphology than genera that contain exclusively high montane species. The divergence values to the nearest neighbors were significantly lower in high Andean species while the intra-specific divergence values were significantly lower in the lowland species. These results raise questions regarding the causes of observed low inter and high intraspecific genetic variation. We discuss incomplete lineage sorting and hybridization as most likely causes of this phenomenon, as the montane species concerned are relatively young and hybridization is probable. The release of our data set represents an essential baseline for a reference library for biological assessment studies of butterflies in mega diverse countries using modern high-throughput technologies an highlights the necessity of taxonomic revisions for various genera combining both molecular and morphological data. creator: Carlos Prieto creator: Christophe Faynel creator: Robert Robbins creator: Axel Hausmann uri: https://doi.org/10.7717/peerj.11843 license: https://creativecommons.org/licenses/by/4.0/ rights: © 2021 Prieto et al. title: An efficient sorghum transformation system using embryogenic calli derived from mature seeds link: https://peerj.com/articles/11849 last-modified: 2021-08-05 description: Significant progress has been made on sorghum transformation in the last decades; however, the transformation process has been constrained by the availability of immature embryos because most of the researchers have utilized immature embryos as favorable explants. Although immature embryos have been proven to be optimal for tissue culture and transformation, isolation of immature embryos is time-consuming, labor-intensive, and limited by warm weather. In this study, we developed an efficient genetic transformation system using mature seeds as explants. The nptII and gus gene, used as the selective marker and report gene respectively, have been co-transformed by particle bombardment. After optimization of tissue culture, the G418 concentration, and transgenic, the average transformation frequency at 13.33% was achieved routinely. The transgenic events and transgene copy numbers were determined by PCR and RT-PCR, respectively. The geneticin selection and GUS staining on T1 seedlings confirmed that the transgenic plants were heritable. Our results demonstrated that the efficient sorghum transformation system has been established using mature seeds as explants. This transformation system will promote sorghum research on genetic engineering and genome editing without seasonal weather conditions restriction and explant resources restriction. creator: Lihua Wang creator: Li Gao creator: Guoquan Liu creator: Ruirui Meng creator: Yanlong Liu creator: Jieqin Li uri: https://doi.org/10.7717/peerj.11849 license: https://creativecommons.org/licenses/by/4.0/ rights: ©2021 Wang et al.