PeerJ:Computational Sciencehttps://peerj.com/articles/index.atom?journal=peerj&subject=3670Computational Science articles published in PeerJReproducible research and GIScience: an evaluation using AGILE conference papershttps://peerj.com/articles/50722018-07-132018-07-13Daniel NüstCarlos GranellBarbara HoferMarkus KonkolFrank O. OstermannRusne SileryteValentina Cerutti
The demand for reproducible research is on the rise in disciplines concerned with data analysis and computational methods. Therefore, we reviewed current recommendations for reproducible research and translated them into criteria for assessing the reproducibility of articles in the field of geographic information science (GIScience). Using this criteria, we assessed a sample of GIScience studies from the Association of Geographic Information Laboratories in Europe (AGILE) conference series, and we collected feedback about the assessment from the study authors. Results from the author feedback indicate that although authors support the concept of performing reproducible research, the incentives for doing this in practice are too small. Therefore, we propose concrete actions for individual researchers and the GIScience conference series to improve transparency and reproducibility. For example, to support researchers in producing reproducible work, the GIScience conference series could offer awards and paper badges, provide author guidelines for computational research, and publish articles in Open Access formats.
The demand for reproducible research is on the rise in disciplines concerned with data analysis and computational methods. Therefore, we reviewed current recommendations for reproducible research and translated them into criteria for assessing the reproducibility of articles in the field of geographic information science (GIScience). Using this criteria, we assessed a sample of GIScience studies from the Association of Geographic Information Laboratories in Europe (AGILE) conference series, and we collected feedback about the assessment from the study authors. Results from the author feedback indicate that although authors support the concept of performing reproducible research, the incentives for doing this in practice are too small. Therefore, we propose concrete actions for individual researchers and the GIScience conference series to improve transparency and reproducibility. For example, to support researchers in producing reproducible work, the GIScience conference series could offer awards and paper badges, provide author guidelines for computational research, and publish articles in Open Access formats.Local and relaxed clocks: the best of both worldshttps://peerj.com/articles/51402018-07-032018-07-03Mathieu FourmentAaron E. Darling
Time-resolved phylogenetic methods use information about the time of sample collection to estimate the rate of evolution. Originally, the models used to estimate evolutionary rates were quite simple, assuming that all lineages evolve at the same rate, an assumption commonly known as the molecular clock. Richer and more complex models have since been introduced to capture the phenomenon of substitution rate variation among lineages. Two well known model extensions are the local clock, wherein all lineages in a clade share a common substitution rate, and the uncorrelated relaxed clock, wherein the substitution rate on each lineage is independent from other lineages while being constrained to fit some parametric distribution. We introduce a further model extension, called the flexible local clock (FLC), which provides a flexible framework to combine relaxed clock models with local clock models. We evaluate the flexible local clock on simulated and real datasets and show that it provides substantially improved fit to an influenza dataset. An implementation of the model is available for download from https://www.github.com/4ment/flc.
Time-resolved phylogenetic methods use information about the time of sample collection to estimate the rate of evolution. Originally, the models used to estimate evolutionary rates were quite simple, assuming that all lineages evolve at the same rate, an assumption commonly known as the molecular clock. Richer and more complex models have since been introduced to capture the phenomenon of substitution rate variation among lineages. Two well known model extensions are the local clock, wherein all lineages in a clade share a common substitution rate, and the uncorrelated relaxed clock, wherein the substitution rate on each lineage is independent from other lineages while being constrained to fit some parametric distribution. We introduce a further model extension, called the flexible local clock (FLC), which provides a flexible framework to combine relaxed clock models with local clock models. We evaluate the flexible local clock on simulated and real datasets and show that it provides substantially improved fit to an influenza dataset. An implementation of the model is available for download from https://www.github.com/4ment/flc.Unsupervised segmentation of greenhouse plant images based on modified Latent Dirichlet Allocationhttps://peerj.com/articles/50362018-06-282018-06-28Yi WangLihong Xu
Agricultural greenhouse plant images with complicated scenes are difficult to precisely manually label. The appearance of leaf disease spots and mosses increases the difficulty in plant segmentation. Considering these problems, this paper proposed a statistical image segmentation algorithm MSBS-LDA (Mean-shift Bandwidths Searching Latent Dirichlet Allocation), which can perform unsupervised segmentation of greenhouse plants. The main idea of the algorithm is to take advantage of the language model LDA (Latent Dirichlet Allocation) to deal with image segmentation based on the design of spatial documents. The maximum points of probability density function in image space are mapped as documents and Mean-shift is utilized to fulfill the word-document assignment. The proportion of the first major word in word frequency statistics determines the coordinate space bandwidth, and the spatial LDA segmentation procedure iteratively searches for optimal color space bandwidth in the light of the LUV distances between classes. In view of the fruits in plant segmentation result and the ever-changing illumination condition in greenhouses, an improved leaf segmentation method based on watershed is proposed to further segment the leaves. Experiment results show that the proposed methods can segment greenhouse plants and leaves in an unsupervised way and obtain a high segmentation accuracy together with an effective extraction of the fruit part.
Agricultural greenhouse plant images with complicated scenes are difficult to precisely manually label. The appearance of leaf disease spots and mosses increases the difficulty in plant segmentation. Considering these problems, this paper proposed a statistical image segmentation algorithm MSBS-LDA (Mean-shift Bandwidths Searching Latent Dirichlet Allocation), which can perform unsupervised segmentation of greenhouse plants. The main idea of the algorithm is to take advantage of the language model LDA (Latent Dirichlet Allocation) to deal with image segmentation based on the design of spatial documents. The maximum points of probability density function in image space are mapped as documents and Mean-shift is utilized to fulfill the word-document assignment. The proportion of the first major word in word frequency statistics determines the coordinate space bandwidth, and the spatial LDA segmentation procedure iteratively searches for optimal color space bandwidth in the light of the LUV distances between classes. In view of the fruits in plant segmentation result and the ever-changing illumination condition in greenhouses, an improved leaf segmentation method based on watershed is proposed to further segment the leaves. Experiment results show that the proposed methods can segment greenhouse plants and leaves in an unsupervised way and obtain a high segmentation accuracy together with an effective extraction of the fruit part.BCD Beam Search: considering suboptimal partial solutions in Bad Clade Deletion supertreeshttps://peerj.com/articles/49872018-06-082018-06-08Markus FleischauerSebastian Böcker
Supertree methods enable the reconstruction of large phylogenies. The supertree problem can be formalized in different ways in order to cope with contradictory information in the input. Some supertree methods are based on encoding the input trees in a matrix; other methods try to find minimum cuts in some graph. Recently, we introduced Bad Clade Deletion (BCD) supertrees which combines the graph-based computation of minimum cuts with optimizing a global objective function on the matrix representation of the input trees. The BCD supertree method has guaranteed polynomial running time and is very swift in practice. The quality of reconstructed supertrees was superior to matrix representation with parsimony (MRP) and usually on par with SuperFine for simulated data; but particularly for biological data, quality of BCD supertrees could not keep up with SuperFine supertrees. Here, we present a beam search extension for the BCD algorithm that keeps alive a constant number of partial solutions in each top-down iteration phase. The guaranteed worst-case running time of the new algorithm is still polynomial in the size of the input. We present an exact and a randomized subroutine to generate suboptimal partial solutions. Both beam search approaches consistently improve supertree quality on all evaluated datasets when keeping 25 suboptimal solutions alive. Supertree quality of the BCD Beam Search algorithm is on par with MRP and SuperFine even for biological data. This is the best performance of a polynomial-time supertree algorithm reported so far.
Supertree methods enable the reconstruction of large phylogenies. The supertree problem can be formalized in different ways in order to cope with contradictory information in the input. Some supertree methods are based on encoding the input trees in a matrix; other methods try to find minimum cuts in some graph. Recently, we introduced Bad Clade Deletion (BCD) supertrees which combines the graph-based computation of minimum cuts with optimizing a global objective function on the matrix representation of the input trees. The BCD supertree method has guaranteed polynomial running time and is very swift in practice. The quality of reconstructed supertrees was superior to matrix representation with parsimony (MRP) and usually on par with SuperFine for simulated data; but particularly for biological data, quality of BCD supertrees could not keep up with SuperFine supertrees. Here, we present a beam search extension for the BCD algorithm that keeps alive a constant number of partial solutions in each top-down iteration phase. The guaranteed worst-case running time of the new algorithm is still polynomial in the size of the input. We present an exact and a randomized subroutine to generate suboptimal partial solutions. Both beam search approaches consistently improve supertree quality on all evaluated datasets when keeping 25 suboptimal solutions alive. Supertree quality of the BCD Beam Search algorithm is on par with MRP and SuperFine even for biological data. This is the best performance of a polynomial-time supertree algorithm reported so far.Scan, extract, wrap, compute—a 3D method to analyse morphological shape differenceshttps://peerj.com/articles/48612018-06-082018-06-08Martin HorstmannAlexander T. TophamPetra StammSebastian KruppertJohn K. ColbourneRalph TollrianLinda C. Weiss
Quantitative analysis of shape and form is critical in many biological disciplines, as context-dependent morphotypes reflect changes in gene expression and physiology, e.g., in comparisons of environment-dependent phenotypes, forward/reverse genetic assays or shape development during ontogenesis. 3D-shape rendering methods produce models with arbitrarily numbered, and therefore non-comparable, mesh points. However, this prevents direct comparisons. We introduce a workflow that allows the generation of comparable 3D models based on several specimens. Translocations between points of modelled morphotypes are plotted as heat maps and statistically tested. With this workflow, we are able to detect, model and investigate the significance of shape and form alterations in all spatial dimensions, demonstrated with different morphotypes of the pond-dwelling microcrustacean Daphnia. Furthermore, it allows the detection even of inconspicuous morphological features that can be exported to programs for subsequent analysis, e.g., streamline- or finite-element analysis.
Quantitative analysis of shape and form is critical in many biological disciplines, as context-dependent morphotypes reflect changes in gene expression and physiology, e.g., in comparisons of environment-dependent phenotypes, forward/reverse genetic assays or shape development during ontogenesis. 3D-shape rendering methods produce models with arbitrarily numbered, and therefore non-comparable, mesh points. However, this prevents direct comparisons. We introduce a workflow that allows the generation of comparable 3D models based on several specimens. Translocations between points of modelled morphotypes are plotted as heat maps and statistically tested. With this workflow, we are able to detect, model and investigate the significance of shape and form alterations in all spatial dimensions, demonstrated with different morphotypes of the pond-dwelling microcrustacean Daphnia. Furthermore, it allows the detection even of inconspicuous morphological features that can be exported to programs for subsequent analysis, e.g., streamline- or finite-element analysis.D-GENIES: dot plot large genomes in an interactive, efficient and simple wayhttps://peerj.com/articles/49582018-06-042018-06-04Floréal CabanettesChristophe Klopp
Dot plots are widely used to quickly compare sequence sets. They provide a synthetic similarity overview, highlighting repetitions, breaks and inversions. Different tools have been developed to easily generated genomic alignment dot plots, but they are often limited in the input sequence size. D-GENIES is a standalone and web application performing large genome alignments using minimap2 software package and generating interactive dot plots. It enables users to sort query sequences along the reference, zoom in the plot and download several image, alignment or sequence files. D-GENIES is an easy-to-install, open-source software package (GPL) developed in Python and JavaScript. The source code is available at https://github.com/genotoul-bioinfo/dgenies and it can be tested at http://dgenies.toulouse.inra.fr/.
Dot plots are widely used to quickly compare sequence sets. They provide a synthetic similarity overview, highlighting repetitions, breaks and inversions. Different tools have been developed to easily generated genomic alignment dot plots, but they are often limited in the input sequence size. D-GENIES is a standalone and web application performing large genome alignments using minimap2 software package and generating interactive dot plots. It enables users to sort query sequences along the reference, zoom in the plot and download several image, alignment or sequence files. D-GENIES is an easy-to-install, open-source software package (GPL) developed in Python and JavaScript. The source code is available at https://github.com/genotoul-bioinfo/dgenies and it can be tested at http://dgenies.toulouse.inra.fr/.Bioheat transfer model of transcutaneous spinal cord stimulation-induced temperature changeshttps://peerj.com/articles/49212018-06-042018-06-04Luyao ChenAng KePeng ZhangZhaolong GaoXuecheng ZouJiping He
Transcutaneous spinal cord stimulation (tSCS) has been extensively studied due to its promising application in motor function restoration. Many previous studies have explored both the essential mechanism of action and the methods for determining optimal stimulation parameters. In contrast, the bioheat transfer analysis of tSCS therapy has not been investigated to the same extent, despite widely existing, and being of great significance in assuring a stable and thermally safe treatment. In this paper, we concentrated on the thermal effects of tSCS using a finite element-based method. By coupling the electric field and bioheat field, systematic finite element simulations were performed on a human spinal cord model to survey the influence of anatomical structures, blood perfusion, and stimulation parameters on temperature changes for the first time. The results show that tSCS-induced temperature rise mainly occurs in the skin and fat layers and varies due to individual differences. The current density distribution along with the interactions of multiple biothermal effects synthetically determines the thermal status of the whole spinal cord model. Smaller stimulation electrodes have a higher risk of thermal damage when compared with larger electrodes. Increasing the stimulation intensity will result in more joule heat accumulation, hence an increase in the temperature. Among all configurations in this study that simulated the clinical tSCS protocols, the temperature rise could reach up to 9.4 °C on the skin surface depending on the stimulation parameters and tissue blood perfusion.
Transcutaneous spinal cord stimulation (tSCS) has been extensively studied due to its promising application in motor function restoration. Many previous studies have explored both the essential mechanism of action and the methods for determining optimal stimulation parameters. In contrast, the bioheat transfer analysis of tSCS therapy has not been investigated to the same extent, despite widely existing, and being of great significance in assuring a stable and thermally safe treatment. In this paper, we concentrated on the thermal effects of tSCS using a finite element-based method. By coupling the electric field and bioheat field, systematic finite element simulations were performed on a human spinal cord model to survey the influence of anatomical structures, blood perfusion, and stimulation parameters on temperature changes for the first time. The results show that tSCS-induced temperature rise mainly occurs in the skin and fat layers and varies due to individual differences. The current density distribution along with the interactions of multiple biothermal effects synthetically determines the thermal status of the whole spinal cord model. Smaller stimulation electrodes have a higher risk of thermal damage when compared with larger electrodes. Increasing the stimulation intensity will result in more joule heat accumulation, hence an increase in the temperature. Among all configurations in this study that simulated the clinical tSCS protocols, the temperature rise could reach up to 9.4 °C on the skin surface depending on the stimulation parameters and tissue blood perfusion.Why is nonword reading so variable in adult skilled readers?https://peerj.com/articles/48792018-05-242018-05-24Max ColtheartAnastasia Ulicheva
When the task is reading nonwords aloud, skilled adult readers are very variable in the responses they produce: a nonword can evoke as many as 24 different responses in a group of such readers. Why is nonword reading so variable? We analysed a large database of reading responses to nonwords, which documented that two factors contribute to this variability. The first factor is variability in graphemic parsing (the parsing of a letter string into its constituent graphemes): the same nonword can be graphemically parsed in different ways by different readers. The second factor is phoneme assignment: even when all subjects produce the same graphemic parsing of a nonword, they vary in what phonemes they assign to the resulting set of graphemes. We consider the implications of these results for the computational modelling of reading, for the assessment of impairments of nonword reading, and for the study of reading aloud in other alphabetically written languages and in nonalphabetic writing systems.
When the task is reading nonwords aloud, skilled adult readers are very variable in the responses they produce: a nonword can evoke as many as 24 different responses in a group of such readers. Why is nonword reading so variable? We analysed a large database of reading responses to nonwords, which documented that two factors contribute to this variability. The first factor is variability in graphemic parsing (the parsing of a letter string into its constituent graphemes): the same nonword can be graphemically parsed in different ways by different readers. The second factor is phoneme assignment: even when all subjects produce the same graphemic parsing of a nonword, they vary in what phonemes they assign to the resulting set of graphemes. We consider the implications of these results for the computational modelling of reading, for the assessment of impairments of nonword reading, and for the study of reading aloud in other alphabetically written languages and in nonalphabetic writing systems.Sample entropy analysis for the estimating depth of anaesthesia through human EEG signal at different levels of unconsciousness during surgerieshttps://peerj.com/articles/48172018-05-232018-05-23Quan LiuLi MaShou-Zen FanMaysam F. AbbodJiann-Shing Shieh
Estimating the depth of anaesthesia (DoA) in operations has always been a challenging issue due to the underlying complexity of the brain mechanisms. Electroencephalogram (EEG) signals are undoubtedly the most widely used signals for measuring DoA. In this paper, a novel EEG-based index is proposed to evaluate DoA for 24 patients receiving general anaesthesia with different levels of unconsciousness. Sample Entropy (SampEn) algorithm was utilised in order to acquire the chaotic features of the signals. After calculating the SampEn from the EEG signals, Random Forest was utilised for developing learning regression models with Bispectral index (BIS) as the target. Correlation coefficient, mean absolute error, and area under the curve (AUC) were used to verify the perioperative performance of the proposed method. Validation comparisons with typical nonstationary signal analysis methods (i.e., recurrence analysis and permutation entropy) and regression methods (i.e., neural network and support vector machine) were conducted. To further verify the accuracy and validity of the proposed methodology, the data is divided into four unconsciousness-level groups on the basis of BIS levels. Subsequently, analysis of variance (ANOVA) was applied to the corresponding index (i.e., regression output). Results indicate that the correlation coefficient improved to 0.72 ± 0.09 after filtering and to 0.90 ± 0.05 after regression from the initial values of 0.51 ± 0.17. Similarly, the final mean absolute error dramatically declined to 5.22 ± 2.12. In addition, the ultimate AUC increased to 0.98 ± 0.02, and the ANOVA analysis indicates that each of the four groups of different anaesthetic levels demonstrated significant difference from the nearest levels. Furthermore, the Random Forest output was extensively linear in relation to BIS, thus with better DoA prediction accuracy. In conclusion, the proposed method provides a concrete basis for monitoring patients’ anaesthetic level during surgeries.
Estimating the depth of anaesthesia (DoA) in operations has always been a challenging issue due to the underlying complexity of the brain mechanisms. Electroencephalogram (EEG) signals are undoubtedly the most widely used signals for measuring DoA. In this paper, a novel EEG-based index is proposed to evaluate DoA for 24 patients receiving general anaesthesia with different levels of unconsciousness. Sample Entropy (SampEn) algorithm was utilised in order to acquire the chaotic features of the signals. After calculating the SampEn from the EEG signals, Random Forest was utilised for developing learning regression models with Bispectral index (BIS) as the target. Correlation coefficient, mean absolute error, and area under the curve (AUC) were used to verify the perioperative performance of the proposed method. Validation comparisons with typical nonstationary signal analysis methods (i.e., recurrence analysis and permutation entropy) and regression methods (i.e., neural network and support vector machine) were conducted. To further verify the accuracy and validity of the proposed methodology, the data is divided into four unconsciousness-level groups on the basis of BIS levels. Subsequently, analysis of variance (ANOVA) was applied to the corresponding index (i.e., regression output). Results indicate that the correlation coefficient improved to 0.72 ± 0.09 after filtering and to 0.90 ± 0.05 after regression from the initial values of 0.51 ± 0.17. Similarly, the final mean absolute error dramatically declined to 5.22 ± 2.12. In addition, the ultimate AUC increased to 0.98 ± 0.02, and the ANOVA analysis indicates that each of the four groups of different anaesthetic levels demonstrated significant difference from the nearest levels. Furthermore, the Random Forest output was extensively linear in relation to BIS, thus with better DoA prediction accuracy. In conclusion, the proposed method provides a concrete basis for monitoring patients’ anaesthetic level during surgeries.Modulation of transcriptional activity in brain lower grade glioma by alternative splicinghttps://peerj.com/articles/46862018-05-142018-05-14Jin LiYang WangXianglian MengHong Liang
Proteins that modify the activity of transcription factors (TFs) are often called modulators and play a vital role in gene transcriptional regulation. Alternative splicing is a critical step of gene processing, and differentially spliced isoforms may have different functions. Alternative splicing can modulate gene function by adding or removing certain protein domains and thereby influence the activity of a protein. The objective of this study is to investigate the role of alternative splicing in modulating the transcriptional regulation in brain lower grade glioma (LGG), especially transcription factor ELK1, which is closely related to various disorders, including Alzheimer’s disease and Down syndrome. The results showed that changes in the exon inclusion ratio of proteins APP and STK16 are associated with changes in the expression correlation between ELK1 and its targets. In addition, the structural features of the two modulators are strongly associated with the pathological impact of exon inclusion. The results of our analysis suggest that alternatively spliced proteins have different functions in modifying transcription factors and can thereby induce the dysregulation of multiple genes.
Proteins that modify the activity of transcription factors (TFs) are often called modulators and play a vital role in gene transcriptional regulation. Alternative splicing is a critical step of gene processing, and differentially spliced isoforms may have different functions. Alternative splicing can modulate gene function by adding or removing certain protein domains and thereby influence the activity of a protein. The objective of this study is to investigate the role of alternative splicing in modulating the transcriptional regulation in brain lower grade glioma (LGG), especially transcription factor ELK1, which is closely related to various disorders, including Alzheimer’s disease and Down syndrome. The results showed that changes in the exon inclusion ratio of proteins APP and STK16 are associated with changes in the expression correlation between ELK1 and its targets. In addition, the structural features of the two modulators are strongly associated with the pathological impact of exon inclusion. The results of our analysis suggest that alternatively spliced proteins have different functions in modifying transcription factors and can thereby induce the dysregulation of multiple genes.