PeerJ Preprints: Computational Biologyhttps://peerj.com/preprints/index.atom?journal=peerj&subject=900Computational Biology articles published in PeerJ PreprintsSeven myths on crowding and peripheral visionhttps://peerj.com/preprints/273532019-12-062019-12-06Hans Strasburger
Crowding has become a hot topic in vision research and some fundamentals are now widely agreed upon. For the classical crowding task, one would likely agree with the following statements. (1) Bouma’s law can, succinctly and unequivocally, be stated as saying that critical distance for crowding is about half the target’s eccentricity. (2) Crowding is predominantly a peripheral phenomenon. (3) Peripheral vision extends to at most 90° eccentricity. (4) Resolution threshold (the minimal angle of resolution, MAR) increases strongly and linearly with eccentricity. Crowding increases at an even steeper rate. (5) Crowding is asymmetric as Bouma has shown. For that inner-outer asymmetry, the peripheral flanker has more effect. (6) Critical crowding distance corresponds to a constant cortical distance in primary visual areas like V1. (7) Except for Bouma’s seminal paper in 1970, crowding research mostly became prominent starting in the 2000s. I propose the answer is ‘not really’ or ‘not quite’ to these assertions. So should we care? I think we should, before we write the textbook chapters for the next generation.
Crowding has become a hot topic in vision research and some fundamentals are now widely agreed upon. For the classical crowding task, one would likely agree with the following statements. (1) Bouma’s law can, succinctly and unequivocally, be stated as saying that critical distance for crowding is about half the target’s eccentricity. (2) Crowding is predominantly a peripheral phenomenon. (3) Peripheral vision extends to at most 90° eccentricity. (4) Resolution threshold (the minimal angle of resolution, MAR) increases strongly and linearly with eccentricity. Crowding increases at an even steeper rate. (5) Crowding is asymmetric as Bouma has shown. For that inner-outer asymmetry, the peripheral flanker has more effect. (6) Critical crowding distance corresponds to a constant cortical distance in primary visual areas like V1. (7) Except for Bouma’s seminal paper in 1970, crowding research mostly became prominent starting in the 2000s. I propose the answer is ‘not really’ or ‘not quite’ to these assertions. So should we care? I think we should, before we write the textbook chapters for the next generation.The investigation of 2D monolayers as potential chelation agents in Alzheimer’s diseasehttps://peerj.com/preprints/279422019-11-202019-11-20Neha PavuluruXuan Luo
In this study, we conducted Density Functional Theory calculations comparing the binding energy of the copper- Amyloid-beta complex to the binding energies of potential chelation materials. We used the first-coordination sphere of the truncated high-pH Amyloid-beta protein subject to computational limits. Binding energy and charge transfer calculations were evaluated for copper’s interaction with potential chelators: monolayer boron nitride, monolayer molybdenum disulfide, and monolayer silicene. Silicene produced the highest binding energies to copper, and the evidence of charge transfer between copper and the monolayer proves that there is a strong ionic bond present. Although our three monolayers did not directly present chelation potential, the absolute differences between the binding energies of the silicene binding sites and the Amyloid-beta binding site were minimal proving that further research in silicene chelators may be useful for therapy in Alzheimer’s disease.
In this study, we conducted Density Functional Theory calculations comparing the binding energy of the copper- Amyloid-beta complex to the binding energies of potential chelation materials. We used the first-coordination sphere of the truncated high-pH Amyloid-beta protein subject to computational limits. Binding energy and charge transfer calculations were evaluated for copper’s interaction with potential chelators: monolayer boron nitride, monolayer molybdenum disulfide, and monolayer silicene. Silicene produced the highest binding energies to copper, and the evidence of charge transfer between copper and the monolayer proves that there is a strong ionic bond present. Although our three monolayers did not directly present chelation potential, the absolute differences between the binding energies of the silicene binding sites and the Amyloid-beta binding site were minimal proving that further research in silicene chelators may be useful for therapy in Alzheimer’s disease.Cell lineage trees: the central structure plus key dynamics of biological aging and formulating the limiting problem of comprehensive organismal rejuvenationhttps://peerj.com/preprints/278212019-09-192019-09-19Attila Csordas
The argument makes the case for cell lineage trees and cell tree dynamics to be considered as the central structure and process of understanding organismal level, multicellular biological aging. The limiting challenge of counteracting biological aging is comprehensive organismal rejuvenation. The central theoretical problem of comprehensive biological rejuvenation is to find an algorithm to restore the balance and maintain the healthy dynamics of the aging organismal cell lineage tree. The most comprehensive medical solution of biological aging needs to use individual cell lineage trees as a central tool for diagnosis and treatment.
The argument makes the case for cell lineage trees and cell tree dynamics to be considered as the central structure and process of understanding organismal level, multicellular biological aging. The limiting challenge of counteracting biological aging is comprehensive organismal rejuvenation. The central theoretical problem of comprehensive biological rejuvenation is to find an algorithm to restore the balance and maintain the healthy dynamics of the aging organismal cell lineage tree. The most comprehensive medical solution of biological aging needs to use individual cell lineage trees as a central tool for diagnosis and treatment.A guide to carrying out a phylogenomic target sequence capture projecthttps://peerj.com/preprints/279682019-09-182019-09-18Tobias AndermannMaria Fernanda Torres JimenezPável Matos-MaravíRomina BatistaJosé L Blanco-PastorA. Lovisa S GustafssonLogan KistlerIsabel M LiberalBengt OxelmanChristine D BaconAlexandre Antonelli
High-throughput DNA sequencing techniques enable time- and cost-effective sequencing of large portions of the genome. Instead of sequencing and annotating whole genomes, many phylogenetic studies focus sequencing efforts on large sets of pre-selected loci, which further reduces costs and bioinformatic challenges while increasing sequencing depth. One common approach that enriches loci before sequencing is often referred to as target sequence capture. This technique has been shown to be applicable to phylogenetic studies of greatly varying evolutionary depth and has proven to produce powerful, large multi-locus DNA sequence datasets of selected loci, suitable for phylogenetic analyses. However, target capture requires careful theoretical and practical considerations, which will greatly affect the success of the experiment. Here we provide an easy-to-follow flowchart for adequately designing phylogenomic target capture experiments, and we discuss necessary considerations and decisions from the first steps in the lab to the final bioinformatic processing of the sequence data. We particularly discuss issues and challenges related to the taxonomic scope, sample quality, and available genomic resources of target capture projects and how these issues affect all steps from bait design to the bioinformatic processing of the data. Altogether this review outlines a roadmap for future target capture experiments and is intended to assist researchers with making informed decisions for designing and carrying out successful phylogenetic target capture studies
High-throughput DNA sequencing techniques enable time- and cost-effective sequencing of large portions of the genome. Instead of sequencing and annotating whole genomes, many phylogenetic studies focus sequencing efforts on large sets of pre-selected loci, which further reduces costs and bioinformatic challenges while increasing sequencing depth. One common approach that enriches loci before sequencing is often referred to as target sequence capture. This technique has been shown to be applicable to phylogenetic studies of greatly varying evolutionary depth and has proven to produce powerful, large multi-locus DNA sequence datasets of selected loci, suitable for phylogenetic analyses. However, target capture requires careful theoretical and practical considerations, which will greatly affect the success of the experiment. Here we provide an easy-to-follow flowchart for adequately designing phylogenomic target capture experiments, and we discuss necessary considerations and decisions from the first steps in the lab to the final bioinformatic processing of the sequence data. We particularly discuss issues and challenges related to the taxonomic scope, sample quality, and available genomic resources of target capture projects and how these issues affect all steps from bait design to the bioinformatic processing of the data. Altogether this review outlines a roadmap for future target capture experiments and is intended to assist researchers with making informed decisions for designing and carrying out successful phylogenetic target capture studiesHole in One: an element reduction approach to modeling bone porosity in finite element analysishttps://peerj.com/preprints/279092019-08-192019-08-19Beatriz L SantaellaZ. Jack Tseng
Finite element analysis has been an increasingly widely used tool in many different science and engineering fields over the last decade. In the biological sciences, there are many examples of its use in areas as paleontology and functional morphology. Despite this common use, the modeling of porous structures such as trabecular bone remains a key issue because of the difficulty of meshing such highly complex geometries during the modeling process. A common practice is to mathematically adjust the boundary conditions (i.e. model material properties) of whole or portions of models that represent trabecular bone. In this study we aimed to demonstrate that a physical, element reduction approach constitutes a valid protocol to this problem in addition to the mathematical approach. We tested a new element reduction modeling script on five exemplar trabecular geometry models of carnivoran temporomandibular joints, and compared stress results of both physical and mathematical approaches to trabecular modeling to models incorporating actual trabecular geometry. Simulation results indicate that that the physical, element reduction approach generally outperformed the mathematical approach. Physical changes in the internal structure of experimental cylindrical models had a major influence on the recorded stress values throughout the model, and more closely approximates values obtained in models containing actual trabecular geometry than solid models with modified trabecular material properties. Therefore, we conclude that for modeling trabecular bone in finite element simulations, maintaining or mimicking the internal porosity of a trabecular structure is recommended as a fast and effective method in place of, or alongside, modification of material property parameters to better approximate trabecular bone behavior observed in models containing actual trabecular geometry.
Finite element analysis has been an increasingly widely used tool in many different science and engineering fields over the last decade. In the biological sciences, there are many examples of its use in areas as paleontology and functional morphology. Despite this common use, the modeling of porous structures such as trabecular bone remains a key issue because of the difficulty of meshing such highly complex geometries during the modeling process. A common practice is to mathematically adjust the boundary conditions (i.e. model material properties) of whole or portions of models that represent trabecular bone. In this study we aimed to demonstrate that a physical, element reduction approach constitutes a valid protocol to this problem in addition to the mathematical approach. We tested a new element reduction modeling script on five exemplar trabecular geometry models of carnivoran temporomandibular joints, and compared stress results of both physical and mathematical approaches to trabecular modeling to models incorporating actual trabecular geometry. Simulation results indicate that that the physical, element reduction approach generally outperformed the mathematical approach. Physical changes in the internal structure of experimental cylindrical models had a major influence on the recorded stress values throughout the model, and more closely approximates values obtained in models containing actual trabecular geometry than solid models with modified trabecular material properties. Therefore, we conclude that for modeling trabecular bone in finite element simulations, maintaining or mimicking the internal porosity of a trabecular structure is recommended as a fast and effective method in place of, or alongside, modification of material property parameters to better approximate trabecular bone behavior observed in models containing actual trabecular geometry.Human influence on predator-prey relationship: Red Panda and Snow Leopardhttps://peerj.com/preprints/278962019-08-132019-08-13Jagat Kafle
This paper is a mathematical model based upon the human influence on the predator-prey relationship between Red Panda and Snow Leopard, which are the major species in the mountain ecosystem. It explores if these species get extinct in a certain area due to imbalance in their interaction. First, simple model of the species is discussed with no interaction between the species. Interactive model is then introduced to simulate their population when they interact with each other. The human influence is then introduced to the interactive model to observe if the species get extinct. The data in this paper are approximated for a certain area based upon the population density and habitat of the two species. All the models are simulated in python programs using Euler's method.
This paper is a mathematical model based upon the human influence on the predator-prey relationship between Red Panda and Snow Leopard, which are the major species in the mountain ecosystem. It explores if these species get extinct in a certain area due to imbalance in their interaction. First, simple model of the species is discussed with no interaction between the species. Interactive model is then introduced to simulate their population when they interact with each other. The human influence is then introduced to the interactive model to observe if the species get extinct. The data in this paper are approximated for a certain area based upon the population density and habitat of the two species. All the models are simulated in python programs using Euler's method.MATLAB software for extracting protein name and sequence information from FASTA formatted proteome filehttps://peerj.com/preprints/278562019-07-162019-07-16Wenfa Ng
FASTA file format is a common file type for distributing proteome information, especially those obtained from Uniprot. While MATLAB could automatically read fasta files using the built-in function, fastaread, important information such as protein name and organism name remain enmeshed in a character array. Hence, difficulty exists in automatic extraction of protein names from fasta proteome file to help in building a database with fields comprising protein name and its amino acid sequence. The objective of this work was in developing a MATLAB software that could automatically extract protein name and amino acid sequence information from fasta proteome file and assign them to a new database that comprises fields such as protein name, amino acid sequence, number of amino acid residues, molecular weight of protein and nucleotide sequence of protein. Information on number of amino acid residues came from the use of the length built-in function in MATLAB analyzing the length of the amino acid sequence of a protein. The final two fields were provided by MATLAB built-in functions molweight and aa2nt, respectively. Molecular weight of proteins is useful for a variety of applications while nucleotide sequence is essential for gene synthesis applications in molecular cloning. Finally, the MATLAB software is also equipped with an error check function to help detect letters in the amino acid sequence that are not part of the family of 20 natural amino acids. Sequences with such letters would constitute as error inputs to molweight and aa2nt, and would not be processed. Collectively, given that important information such as protein name is enmeshed in a character array in fasta proteome file, this work sets out to develop a MATLAB software that could automatically extract protein name and amino acid sequence information, and assigns them to a new protein database. Using built-in functions, number of amino acid residues, molecular weight and nucleotide sequence of each protein were calculated; thereby, yielding a new protein database with improved functionalities that could support a variety of biology workflows ranging from sequence alignment to molecular cloning.
FASTA file format is a common file type for distributing proteome information, especially those obtained from Uniprot. While MATLAB could automatically read fasta files using the built-in function, fastaread, important information such as protein name and organism name remain enmeshed in a character array. Hence, difficulty exists in automatic extraction of protein names from fasta proteome file to help in building a database with fields comprising protein name and its amino acid sequence. The objective of this work was in developing a MATLAB software that could automatically extract protein name and amino acid sequence information from fasta proteome file and assign them to a new database that comprises fields such as protein name, amino acid sequence, number of amino acid residues, molecular weight of protein and nucleotide sequence of protein. Information on number of amino acid residues came from the use of the length built-in function in MATLAB analyzing the length of the amino acid sequence of a protein. The final two fields were provided by MATLAB built-in functions molweight and aa2nt, respectively. Molecular weight of proteins is useful for a variety of applications while nucleotide sequence is essential for gene synthesis applications in molecular cloning. Finally, the MATLAB software is also equipped with an error check function to help detect letters in the amino acid sequence that are not part of the family of 20 natural amino acids. Sequences with such letters would constitute as error inputs to molweight and aa2nt, and would not be processed. Collectively, given that important information such as protein name is enmeshed in a character array in fasta proteome file, this work sets out to develop a MATLAB software that could automatically extract protein name and amino acid sequence information, and assigns them to a new protein database. Using built-in functions, number of amino acid residues, molecular weight and nucleotide sequence of each protein were calculated; thereby, yielding a new protein database with improved functionalities that could support a variety of biology workflows ranging from sequence alignment to molecular cloning.TransPrise: a novel machine learning approach for eukaryotic promoter predictionhttps://peerj.com/preprints/278442019-07-102019-07-10Stepan PachganovKhalimat MurtazalievaAlexei ZarubinDmitry SokolovDuane ChartierTatiana V Tatarinova
As interest in genetic resequencing increases, so does the need for effective mathematical, computational, and statistical approaches. One of the difficult problems in genome annotation is determination of precise positions of transcription start sites. In this paper we present TransPrise - an efficient deep learning tool for prediction of positions of eukaryotic transcription start sites. TransPrise offers significant improvement over existing promoter-prediction methods. To illustrate this, we compared predictions of TransPrise with the TSSPlant approach for well annotated genome of Oryza sativa. Using a computer equipped with a graphics processing unit, the run time of TransPrise is 250 minutes on a genome of 374 Mb long. We provide the full basis for the comparison and encourage users to freely access a set of our computational tools to facilitate and streamline their own analyses. The ready-to-use Docker image with all necessary packages, models, code as well as the source code of the TransPrise algorithm are available at ( http://compubioverne.group /). The source code is ready to use and customizable to predict TSS in any eukaryotic organism.
As interest in genetic resequencing increases, so does the need for effective mathematical, computational, and statistical approaches. One of the difficult problems in genome annotation is determination of precise positions of transcription start sites. In this paper we present TransPrise - an efficient deep learning tool for prediction of positions of eukaryotic transcription start sites. TransPrise offers significant improvement over existing promoter-prediction methods. To illustrate this, we compared predictions of TransPrise with the TSSPlant approach for well annotated genome of Oryza sativa. Using a computer equipped with a graphics processing unit, the run time of TransPrise is 250 minutes on a genome of 374 Mb long. We provide the full basis for the comparison and encourage users to freely access a set of our computational tools to facilitate and streamline their own analyses. The ready-to-use Docker image with all necessary packages, models, code as well as the source code of the TransPrise algorithm are available at ( http://compubioverne.group /). The source code is ready to use and customizable to predict TSS in any eukaryotic organism.A novel computational approach to the silencing of Sugarcane Bacilliform Guadeloupe A Virus determines potential host-derived MicroRNAs in sugarcane (Saccharum officinarum L.)https://peerj.com/preprints/278422019-07-042019-07-04Fakiha AshrafMuhammad Aleem AshrafXiaowen HuShuzhen Zhang
Sugarcane Bacilliform Guadeloupe A Virus (SCBGAV, genus Badnavirus, family Caulimoviridae) is an emerging, deleterious pathogen of sugarcane which presents a substantial barrier to producing high sugarcane earnings. The circular, double-stranded (ds) DNA genome of SCBGAV (7.4 Kb) is composed of three open reading frames (ORF) that replicate by a reverse transcriptase. In the current study, we used miRNA target prediction algorithms to identify and comprehensively analyze the genome-wide sugarcane (Saccharum officinarum L.)-encoded microRNA (miRNA) targets against the SCBGAV. A total of 28 potential mature target miRNAs were retrieved from the miRBase database and were further analyzed for hybridization to the SCBGAV genome. Multiple computational approaches—including miRNA-target seed pairing, multiple target positions, minimum free energy, target site accessibility, maximum complementarity, pattern recognition and minimum folding energy for attachments— were considered by all algorithms. Only 4 sugarcane miRNAs are selected for SCBGAV silencing. Among those 4, sof-miR396 was identified as the top effective candidate, capable of targeting the vital ORF3 which encodes polyprotein of the SCBGAV genome. miRanda, RNA22 and RNAhybrid algorithms predicted hybridization of sof-miR396 at common locus position 3394. A Circos plot was created to study the network visualization of sugarcane-encoded miRNAs with SCBGAV genes determines detailed evidence for any ideal targets of SCBGAV ORFs by precise miRNAs. The present study concludes a comprehensive report towards the creation of SCBGAV-resistant sugarcane through the expression analysis of the identified miRNAs.
Sugarcane Bacilliform Guadeloupe A Virus (SCBGAV, genus Badnavirus, family Caulimoviridae) is an emerging, deleterious pathogen of sugarcane which presents a substantial barrier to producing high sugarcane earnings. The circular, double-stranded (ds) DNA genome of SCBGAV (7.4 Kb) is composed of three open reading frames (ORF) that replicate by a reverse transcriptase. In the current study, we used miRNA target prediction algorithms to identify and comprehensively analyze the genome-wide sugarcane (Saccharum officinarum L.)-encoded microRNA (miRNA) targets against the SCBGAV. A total of 28 potential mature target miRNAs were retrieved from the miRBase database and were further analyzed for hybridization to the SCBGAV genome. Multiple computationalapproaches—including miRNA-target seed pairing, multiple target positions, minimum free energy, target site accessibility, maximum complementarity, pattern recognition and minimum folding energy for attachments— were considered by all algorithms. Only4 sugarcane miRNAs are selected for SCBGAV silencing. Among those 4, sof-miR396 was identified as the top effective candidate, capable of targeting the vital ORF3 which encodes polyprotein of the SCBGAV genome. miRanda, RNA22 and RNAhybrid algorithms predicted hybridization of sof-miR396 at common locus position 3394. A Circos plot was created to study the network visualization of sugarcane-encoded miRNAs with SCBGAV genes determines detailed evidence for any ideal targets of SCBGAV ORFs by precise miRNAs. The present study concludes a comprehensive report towards the creation of SCBGAV-resistant sugarcane through the expression analysis of the identified miRNAs.T1000: A reduced toxicogenomics gene set for improved decision makinghttps://peerj.com/preprints/278392019-07-032019-07-03Othman SoufanJessica EwaldCharles ViauDoug CrumpMarkus HeckerNiladri BasuJianguo Xia
There is growing interest within regulatory agencies and toxicological research communities to develop, test, and apply new approaches, such as toxicogenomics, to more efficiently evaluate chemical hazards. Given the complexity of analyzing thousands of genes simultaneously, there is a need to identify reduced gene sets.Though several gene sets have been defined for toxicological applications, few of these were purposefully derived using toxicogenomics data. Here, we developed and applied a systematic approach to identify 1000 genes (called Toxicogenomics-1000 or T1000) highly responsive to chemical exposures. First, a co-expression network of 11,210genes was built by leveraging microarray data from the Open TG-GATEs program. This network was then re-weighted based on prior knowledge of their biological (KEGG, MSigDB) and toxicological (CTD) relevance. Finally, weighted correlation network analysis was applied to identify 258 gene clusters. T1000 was defined by selecting genes from each cluster that were most associated with outcome measures. For model evaluation, we compared the performance of T1000 to that of other gene sets (L1000, S1500, Genes selected by Limma, and random set) using two external datasets. Additionally, a smaller (T384) and a larger version (T1500) of T1000 were used for dose-response modeling to test the effect of gene set size. Our findings demonstrated that the T1000 gene set is predictive of apical outcomes across a range of conditions (e.g.,in vitroand in vivo, dose-response, multiple species, tissues, and chemicals), and generally performs as well, or better than other gene sets available.
There is growing interest within regulatory agencies and toxicological research communities to develop, test, and apply new approaches, such as toxicogenomics, to more efficiently evaluate chemical hazards. Given the complexity of analyzing thousands of genes simultaneously, there is a need to identify reduced gene sets.Though several gene sets have been defined for toxicological applications, few of these were purposefully derived using toxicogenomics data. Here, we developed and applied a systematic approach to identify 1000 genes (called Toxicogenomics-1000 or T1000) highly responsive to chemical exposures. First, a co-expression network of 11,210genes was built by leveraging microarray data from the Open TG-GATEs program. This network was then re-weighted based on prior knowledge of their biological (KEGG, MSigDB) and toxicological (CTD) relevance. Finally, weighted correlation network analysis was applied to identify 258 gene clusters. T1000 was defined by selecting genes from each cluster that were most associated with outcome measures. For model evaluation, we compared the performance of T1000 to that of other gene sets (L1000, S1500, Genes selected by Limma, and random set) using two external datasets. Additionally, a smaller (T384) and a larger version (T1500) of T1000 were used for dose-response modeling to test the effect of gene set size. Our findings demonstrated that the T1000 gene set is predictive of apical outcomes across a range of conditions (e.g.,in vitroand in vivo, dose-response, multiple species, tissues, and chemicals), and generally performs as well, or better than other gene sets available.