PeerJ Computer Science Preprints: Algorithms and Analysis of Algorithmshttps://peerj.com/preprints/index.atom?journal=cs&subject=8200Algorithms and Analysis of Algorithms articles published in PeerJ Computer Science PreprintsA novel collaborative filtering algorithm by bit mining frequent itemsetshttps://peerj.com/preprints/264442018-01-182018-01-18Loc NguyenMinh-Phung T. Do
Collaborative filtering (CF) is a popular technique in recommendation study. Concretely, items which are recommended to user are determined by surveying her/his communities. There are two main CF approaches, which are memory-based and model-based. I propose a new CF model-based algorithm by mining frequent itemsets from rating database. Hence items which belong to frequent itemsets are recommended to user. My CF algorithm gives immediate response because the mining task is performed at offline process-mode. I also propose another so-called Roller algorithm for improving the process of mining frequent itemsets. Roller algorithm is implemented by heuristic assumption “The larger the support of an item is, the higher it’s likely that this item will occur in some frequent itemset”. It models upon doing white-wash task, which rolls a roller on a wall in such a way that is capable of picking frequent itemsets. Moreover I provide enhanced techniques such as bit representation, bit matching and bit mining in order to speed up recommendation process. These techniques take advantages of bitwise operations (AND, NOT) so as to reduce storage space and make algorithms run faster.
Collaborative filtering (CF) is a popular technique in recommendation study. Concretely, items which are recommended to user are determined by surveying her/his communities. There are two main CF approaches, which are memory-based and model-based. I propose a new CF model-based algorithm by mining frequent itemsets from rating database. Hence items which belong to frequent itemsets are recommended to user. My CF algorithm gives immediate response because the mining task is performed at offline process-mode. I also propose another so-called Roller algorithm for improving the process of mining frequent itemsets. Roller algorithm is implemented by heuristic assumption “The larger the support of an item is, the higher it’s likely that this item will occur in some frequent itemset”. It models upon doing white-wash task, which rolls a roller on a wall in such a way that is capable of picking frequent itemsets. Moreover I provide enhanced techniques such as bit representation, bit matching and bit mining in order to speed up recommendation process. These techniques take advantages of bitwise operations (AND, NOT) so as to reduce storage space and make algorithms run faster.Linear time-varying Luenberger observer applied to diabeteshttps://peerj.com/preprints/33412017-10-122017-10-12Onofre Orozco LópezCarlos Eduardo Castañeda HernándezAgustín Rodríguez HerreroGema García SaézMaría Elena Hernando
We present a linear time-varying Luenberger observer (LTVLO) using compartmental models to estimate the unmeasurable states in patients with type 1 diabetes. The LTVLO proposed is based on the linearization in an operation point of the virtual patient (VP), where a linear time-varying system is obtained. LTVLO gains are obtained by selection of the asymptotic eigenvalues where the observability matrix is assured. The estimation of the unmeasurable variables is done using Ackermann's methodology. Additionally, it is shown the Lyapunov approach to prove the stability of the time-varying proposal. In order to evaluate the proposed methodology, we designed three experiments: A) VP obtained with the Bergman minimal model; B) VP obtained with the compartmental model presented by Hovorka in 2004; and C) real patients data set. For experiments A) and B), it is applied a meal plan to the VP, where the dynamic response of each state model is compared to the response of each variable of the time-varying observer. Once the observer is evaluated in experiment B), the proposal is applied to experiment C) with data extracted from real patients and the unmeasurable state space variables are obtained with the LTVLO. LTVLO methodology has the feature of being updated each instant of time to estimate the states under a known structure. The results are obtained using simulation with MatlabTM and SimulinkTM. The LTVLO estimates the unmeasurable states from in silico patients with high accuracy by means of the update of Luenberger gains at each iteration. The accuracy of the estimated state space variables is validated through fit parameter.
We present a linear time-varying Luenberger observer (LTVLO) using compartmental models to estimate the unmeasurable states in patients with type 1 diabetes. The LTVLO proposed is based on the linearization in an operation point of the virtual patient (VP), where a linear time-varying system is obtained. LTVLO gains are obtained by selection of the asymptotic eigenvalues where the observability matrix is assured. The estimation of the unmeasurable variables is done using Ackermann's methodology. Additionally, it is shown the Lyapunov approach to prove the stability of the time-varying proposal. In order to evaluate the proposed methodology, we designed three experiments: A) VP obtained with the Bergman minimal model; B) VP obtained with the compartmental model presented by Hovorka in 2004; and C) real patients data set. For experiments A) and B), it is applied a meal plan to the VP, where the dynamic response of each state model is compared to the response of each variable of the time-varying observer. Once the observer is evaluated in experiment B), the proposal is applied to experiment C) with data extracted from real patients and the unmeasurable state space variables are obtained with the LTVLO. LTVLO methodology has the feature of being updated each instant of time to estimate the states under a known structure. The results are obtained using simulation with MatlabTM and SimulinkTM. The LTVLO estimates the unmeasurable states from in silico patients with high accuracy by means of the update of Luenberger gains at each iteration. The accuracy of the estimated state space variables is validated through fit parameter.4, 8, 32, 64 bit Substitution Box generation using Irreducible or Reducible Polynomials over Galois Field GF(pq)https://peerj.com/preprints/33002017-09-292017-09-29Sankhanil DeyRanjan Ghosh
Substitution Box or S-Box had been generated using 4-bit Boolean Functions (BFs) for Encryption and Decryption Algorithm of Lucifer and Data Encryption Standard (DES) in late sixties and late seventies respectively. The S-Box of Advance Encryption Standard have also been generated using Irreducible Polynomials over Galois field GF(28) adding an additive constant in early twenty first century. In this paper Substitution Boxes have been generated from Irreducible or Reducible Polynomials over Galois field GF(pq). Binary Galois fields have been used to generate Substitution Boxes. Since the Galois Field Number or the Number generated from coefficients of a polynomial over a particular Binary Galois field (2q) is similar to log2q+1 bit BFs. So generation of log2q+1 bit S-Boxes is possible. Now if p = prime or non-prime number then generation of S-Boxes is possible using Galois field GF (pq ), where q = p-1.
Substitution Box or S-Box had been generated using 4-bit Boolean Functions (BFs) for Encryption and Decryption Algorithm of Lucifer and Data Encryption Standard (DES) in late sixties and late seventies respectively. The S-Box of Advance Encryption Standard have also been generated using Irreducible Polynomials over Galois field GF(28) adding an additive constant in early twenty first century. In this paper Substitution Boxes have been generated from Irreducible or Reducible Polynomials over Galois field GF(pq). Binary Galois fields have been used to generate Substitution Boxes. Since the Galois Field Number or the Number generated from coefficients of a polynomial over a particular Binary Galois field (2q) is similar to log2q+1 bit BFs. So generation of log2q+1 bit S-Boxes is possible. Now if p = prime or non-prime number then generation of S-Boxes is possible using Galois field GF (pq ), where q = p-1.Stability analysis of MTopGO for module identification in PPI networkshttps://peerj.com/preprints/32892017-09-272017-09-27Danila VellaAllan TuckerRiccardo Bellazzi
MTopGo is a novel algorithm of module identification for PPI Network analysis, it is designed to consider two key aspects of these models, the topological properties of the network and the apriori knowledge about the proteins involved, represented by GO annotations.
MTopGO rely on random components, thus stability of the results across different runs is a critical aspect of the algorithm. Moreover, when evaluating an algorithm specific for PPI Networks an important aspect is the stability in presence of false positive and false negative edges. In this work, two different stability analyses have been executed to evaluate MTopGO performance. Firstly, one to evaluate the stability of the result over many runs starting from a same input, to consider the range of variability introduced by the random components of the algorithm; secondly, one to evaluate the robustness of the output clusters when the input is affected by noise and uncertainty.
The results showed that MTopGO was more stable in case of false negative edges than false positive edges (adding false edges to a PPI Network was more damaging than removing existing links).
MTopGo is a novel algorithm of module identification for PPI Network analysis, it is designed to consider two key aspects of these models, the topological properties of the network and the apriori knowledge about the proteins involved, represented by GO annotations.MTopGO rely on random components, thus stability of the results across different runs is a critical aspect of the algorithm. Moreover, when evaluating an algorithm specific for PPI Networks an important aspect is the stability in presence of false positive and false negative edges. In this work, two different stability analyses have been executed to evaluate MTopGO performance. Firstly, one to evaluate the stability of the result over many runs starting from a same input, to consider the range of variability introduced by the random components of the algorithm; secondly, one to evaluate the robustness of the output clusters when the input is affected by noise and uncertainty.The results showed that MTopGO was more stable in case of false negative edges than false positive edges (adding false edges to a PPI Network was more damaging than removing existing links).Crypto-Archaeology: unearthing design methodology of DES s-boxeshttps://peerj.com/preprints/32852017-09-262017-09-26Sankhanil DeyRanjan Ghosh
US defence sponsored the DES program in 1974 and released it in 1977. It remained as a well-known and well accepted block cipher until 1998. Thirty-two 4-bit DES S-Boxes are grouped in eight each with four and are put in public domain without any mention of their design methodology. S-Boxes, 4-bit, 8-bit or 32-bit, find a permanent seat in all future block ciphers. In this paper, while looking into the design methodology of DES S-Boxes, we find that S-Boxes have 128 balanced and non-linear Boolean Functions, of which 102 used once, while 13 used twice and 92 of 102 satisfy the Boolean Function-level Strict Avalanche Criterion. All the S-Boxes satisfy the Bit Independence Criterion. Their Differential Cryptanalysis exhibits better results than the Linear Cryptanalysis. However, no S-Boxes satisfy the S-Box-level SAC analyses. It seems that the designer emphasized satisfaction of Boolean-Function-level SAC and S-Box-level BIC and DC, not the S-Box-level LC and SAC.
US defence sponsored the DES program in 1974 and released it in 1977. It remained as a well-known and well accepted block cipher until 1998. Thirty-two 4-bit DES S-Boxes are grouped in eight each with four and are put in public domain without any mention of their design methodology. S-Boxes, 4-bit, 8-bit or 32-bit, find a permanent seat in all future block ciphers. In this paper, while looking into the design methodology of DES S-Boxes, we find that S-Boxes have 128 balanced and non-linear Boolean Functions, of which 102 used once, while 13 used twice and 92 of 102 satisfy the Boolean Function-level Strict Avalanche Criterion. All the S-Boxes satisfy the Bit Independence Criterion. Their Differential Cryptanalysis exhibits better results than the Linear Cryptanalysis. However, no S-Boxes satisfy the S-Box-level SAC analyses. It seems that the designer emphasized satisfaction of Boolean-Function-level SAC and S-Box-level BIC and DC, not the S-Box-level LC and SAC.Melanoma expression analysis with Big Data technologieshttps://peerj.com/preprints/32602017-09-262017-09-26Alicia Fernandez-RoviraRocio Lavado-ValenzuelaMiguel Ángel Berciano GuerreroIsmael Navas-DelgadoJosé F Aldana-Montes
Melanoma is a highly immunogenic tumor. Therefore, in recent years physicians have incorporated drugs that alter the immune system into their therapeutic arsenal against this disease, revolutionizing in the treatment of patients in an advanced stage of the disease. This has led us to explore and deepen our knowledge of the immunology surrounding melanoma, in order to optimize its approach. At present, immunotherapy for metastatic melanoma is based on stimulating an individual’s own immune system through the use of specific monoclonal antibodies. The use of immunotherapy has meant that many of patients with melanoma have survived and therefore it constitutes a present and future treatment in this field. At the same time, drugs have been developed targeting specific mutations, specifically BRAF, resulting in large responses in tumor regression (set up in this clinical study to 18 months), as well as a higher percentage of long-term survivors. The analysis of the gene expression changes and their correlation with clinical changes can be developed using the tools provided by those companies which currently provide gene expression platforms. The gene expression platform used in this clinical study is NanoString, which provides nCounter. However, nCounter has some limitations as the type of analysis is restricted to a predefined set, and the introduction of clinical features is a complex task. This paper presents an approach to collect the clinical information using a structured database and a Web user interface to introduce this information, including the results of the gene expression measurements, to go a step further than the nCounter tool. As part of this work, we present an initial analysis of changes in the gene expression of a set of patients before and after targeted therapy. This analysis has been carried out using Big Data technologies (Apache Spark) with the final goal being to scale up to large numbers of patients, even though this initial study has a limited number of enrolled patients (12 in the first analysis). This is not a Big Data problem, but the underlaying study aims at targeting 20 patients per year just in Málaga, and this could be extended to be used to analyze the 3.600 patients diagnosed with melanoma per year.
Melanoma is a highly immunogenic tumor. Therefore, in recent years physicians have incorporated drugs that alter the immune system into their therapeutic arsenal against this disease, revolutionizing in the treatment of patients in an advanced stage of the disease. This has led us to explore and deepen our knowledge of the immunology surrounding melanoma, in order to optimize its approach. At present, immunotherapy for metastatic melanoma is based on stimulating an individual’s own immune system through the use of specific monoclonal antibodies. The use of immunotherapy has meant that many of patients with melanoma have survived and therefore it constitutes a present and future treatment in this field. At the same time, drugs have been developed targeting specific mutations, specifically BRAF, resulting in large responses in tumor regression (set up in this clinical study to 18 months), as well as a higher percentage of long-term survivors. The analysis of the gene expression changes and their correlation with clinical changes can be developed using the tools provided by those companies which currently provide gene expression platforms. The gene expression platform used in this clinical study is NanoString, which provides nCounter. However, nCounter has some limitations as the type of analysis is restricted to a predefined set, and the introduction of clinical features is a complex task. This paper presents an approach to collect the clinical information using a structured database and a Web user interface to introduce this information, including the results of the gene expression measurements, to go a step further than the nCounter tool. As part of this work, we present an initial analysis of changes in the gene expression of a set of patients before and after targeted therapy. This analysis has been carried out using Big Data technologies (Apache Spark) with the final goal being to scale up to large numbers of patients, even though this initial study has a limited number of enrolled patients (12 in the first analysis). This is not a Big Data problem, but the underlaying study aims at targeting 20 patients per year just in Málaga, and this could be extended to be used to analyze the 3.600 patients diagnosed with melanoma per year.Assessment of spectral properties of Apollo 12 landing sitehttps://peerj.com/preprints/21242017-09-052017-09-05Yann H CheminIan A CrawfordPeter GrindrodLouise Alexander
The geology and mineralogy of the Apollo 12 landing site has been the subject of recent studies that this research attempts to complement from a remote sensing point of view using the Moon Mineralogy Mapper (M3) sensor data, onboard the Chandrayaan-1 lunar orbiter. It is a higher spatial-spectral resolution sensor than the Clementine UVVis sensor and gives the opportunity to study the lunar surface with a comparatively more detailed spectral resolution.
The M3 signatures are showing a monotonic featureless increment, with very low reflectance, suggesting a mature regolith. The regolith maturity is splitting the landing site in a younger Northwest and older Southeast. The mineral identification using the lunar sample spectra from within the Relab database found some similarity to a basaltic rock/glass mix. The spectrum features of clinopyroxene have been found in the Copernican rays and at the landing site. Lateral mixing increases FeO content away from the central part of the ray. The presence of clinopyroxene in the pigeonite basalt in the stratigraphy of the landing site brings forth some complexity in differentiating the Copernican ray’s clinopyroxene from the local source, as the spectra are twins but for their vertical shift in reflectance, reducing away from the central part of the ray.
Spatial variations in mineralogy were not found mostly because of the pixel size compared to the landing site area. The contribution to stratigraphy is limited to the topmost layer which is a clinopyroxene-dominated basalt belonging to the most remote tip of a Copernican ray and its resulting local regolith mix.
The geology and mineralogy of the Apollo 12 landing site has been the subject of recent studies that this research attempts to complement from a remote sensing point of view using the Moon Mineralogy Mapper (M3) sensor data, onboard the Chandrayaan-1 lunar orbiter. It is a higher spatial-spectral resolution sensor than the Clementine UVVis sensor and gives the opportunity to study the lunar surface with a comparatively more detailed spectral resolution.The M3 signatures are showing a monotonic featureless increment, with very low reflectance, suggesting a mature regolith. The regolith maturity is splitting the landing site in a younger Northwest and older Southeast. The mineral identification using the lunar sample spectra from within the Relab database found some similarity to a basaltic rock/glass mix. The spectrum features of clinopyroxene have been found in the Copernican rays and at the landing site. Lateral mixing increases FeO content away from the central part of the ray. The presence of clinopyroxene in the pigeonite basalt in the stratigraphy of the landing site brings forth some complexity in differentiating the Copernican ray’s clinopyroxene from the local source, as the spectra are twins but for their vertical shift in reflectance, reducing away from the central part of the ray.Spatial variations in mineralogy were not found mostly because of the pixel size compared to the landing site area. The contribution to stratigraphy is limited to the topmost layer which is a clinopyroxene-dominated basalt belonging to the most remote tip of a Copernican ray and its resulting local regolith mix.An algorithm for calculating top-dimensional bounding chainshttps://peerj.com/preprints/31512017-08-142017-08-14J. Frederico CarvalhoMikael Vejdemo-JohanssonDanica KragicFlorian T. Pokorny
We describe the \textsc{Coefficient-Flow} algorithm for calculating the bounding chain of an $(n-1)$--boundary on an $n$--manifold-like simplicial complex $S$. We prove its correctness and show that it has a computational time complexity of $O(|S^{(n-1)}|)$ (where $S^{(n-1)}$ is the set of $(n-1)$--faces of $S$). We estimate the big-$O$ coefficient which depends on the dimension of $S$ and the implementation. We present an implementation, experimentally evaluate the complexity of our algorithm, and compare its performance with that of solving the underlying linear system.
We describe the \textsc{Coefficient-Flow} algorithm for calculating the bounding chain of an $(n-1)$--boundary on an $n$--manifold-like simplicial complex $S$. We prove its correctness and show that it has a computational time complexity of $O(|S^{(n-1)}|)$ (where $S^{(n-1)}$ is the set of $(n-1)$--faces of $S$). We estimate the big-$O$ coefficient which depends on the dimension of $S$ and the implementation. We present an implementation, experimentally evaluate the complexity of our algorithm, and compare its performance with that of solving the underlying linear system.Integrating active learning and crowdsourcing into large-scale supervised landcover mapping algorithmshttps://peerj.com/preprints/30042017-06-062017-06-06Stephanie R DebatsLyndon D EstesDavid R ThompsonKelly K Caylor
Sub-Saharan Africa and other developing regions of the world are dominated by smallholder farms, which are characterized by small, heterogeneous, and often indistinct field patterns. In previous work, we developed an algorithm for mapping both smallholder and commercial agricultural fields that includes efficient extraction of a vast set of simple, highly correlated, and interdependent features, followed by a random forest classifier. In this paper, we demonstrated how active learning can be incorporated in the algorithm to create smaller, more efficient training data sets, which reduced computational resources, minimized the need for humans to hand-label data, and boosted performance. We designed a patch-based uncertainty metric to drive the active learning framework, based on the regular grid of a crowdsourcing platform, and demonstrated how subject matter experts can be replaced with fleets of crowdsourcing workers. Our active learning algorithm achieved similar performance as an algorithm trained with randomly selected data, but with 62% less data samples.
Sub-Saharan Africa and other developing regions of the world are dominated by smallholder farms, which are characterized by small, heterogeneous, and often indistinct field patterns. In previous work, we developed an algorithm for mapping both smallholder and commercial agricultural fields that includes efficient extraction of a vast set of simple, highly correlated, and interdependent features, followed by a random forest classifier. In this paper, we demonstrated how active learning can be incorporated in the algorithm to create smaller, more efficient training data sets, which reduced computational resources, minimized the need for humans to hand-label data, and boosted performance. We designed a patch-based uncertainty metric to drive the active learning framework, based on the regular grid of a crowdsourcing platform, and demonstrated how subject matter experts can be replaced with fleets of crowdsourcing workers. Our active learning algorithm achieved similar performance as an algorithm trained with randomly selected data, but with 62% less data samples.The XL-mHG test for gene set enrichmenthttps://peerj.com/preprints/19622017-02-122017-02-12Florian Wagner
The nonparametric minimum hypergeometric (mHG) test is a popular alternative to Kolmogorov-Smirnov (KS)-type tests for determining gene set enrichment. However, these approaches have not been compared to each other in a quantitative manner. Here, I first perform a simulation study to show that the mHG test is significantly more powerful than the one-sided KS test for detecting gene set enrichment. I then illustrate a shortcoming of the mHG test, which has motivated a semiparametric generalization of the test, termed the XL-mHG test. I describe an improved quadratic-time algorithm for the efficient calculation of exact XL-mHG p-values, as well as a linear-time algorithm for calculating a tighter upper bound for the p-value. Finally, I demonstrate that the XL-mHG test outperforms the one-sided KS test when applied to a reference gene expression study, and discuss general principles for analyzing gene set enrichment using the XL-mHG test. An efficient open-source Python/Cython implementation of the XL-mHG test is provided in the xlmhg package, available from PyPI and GitHub (https://github.com/flo-compbio/xlmhg) under an OSI-approved license.
The nonparametric minimum hypergeometric (mHG) test is a popular alternative to Kolmogorov-Smirnov (KS)-type tests for determining gene set enrichment. However, these approaches have not been compared to each other in a quantitative manner. Here, I first perform a simulation study to show that the mHG test is significantly more powerful than the one-sided KS test for detecting gene set enrichment. I then illustrate a shortcoming of the mHG test, which has motivated a semiparametric generalization of the test, termed the XL-mHG test. I describe an improved quadratic-time algorithm for the efficient calculation of exact XL-mHG p-values, as well as a linear-time algorithm for calculating a tighter upper bound for the p-value. Finally, I demonstrate that the XL-mHG test outperforms the one-sided KS test when applied to a reference gene expression study, and discuss general principles for analyzing gene set enrichment using the XL-mHG test. An efficient open-source Python/Cython implementation of the XL-mHG test is provided in the xlmhg package, available from PyPI and GitHub (https://github.com/flo-compbio/xlmhg) under an OSI-approved license.