PeerJ Computer Science:Algorithms and Analysis of Algorithmshttps://peerj.com/articles/index.atom?journal=cs&subject=8200Algorithms and Analysis of Algorithms articles published in PeerJ Computer ScienceA new non-monotonic infeasible simplex-type algorithm for Linear Programminghttps://peerj.com/articles/cs-2652020-03-302020-03-30Charalampos P. TriantafyllidisNikolaos Samaras
This paper presents a new simplex-type algorithm for Linear Programming with the following two main characteristics: (i) the algorithm computes basic solutions which are neither primal or dual feasible, nor monotonically improving and (ii) the sequence of these basic solutions is connected with a sequence of monotonically improving interior points to construct a feasible direction at each iteration. We compare the proposed algorithm with the state-of-the-art commercial CPLEX and Gurobi Primal-Simplex optimizers on a collection of 93 well known benchmarks. The results are promising, showing that the new algorithm competes versus the state-of-the-art solvers in the total number of iterations required to converge.
This paper presents a new simplex-type algorithm for Linear Programming with the following two main characteristics: (i) the algorithm computes basic solutions which are neither primal or dual feasible, nor monotonically improving and (ii) the sequence of these basic solutions is connected with a sequence of monotonically improving interior points to construct a feasible direction at each iteration. We compare the proposed algorithm with the state-of-the-art commercial CPLEX and Gurobi Primal-Simplex optimizers on a collection of 93 well known benchmarks. The results are promising, showing that the new algorithm competes versus the state-of-the-art solvers in the total number of iterations required to converge.Discrete two dimensional Fourier transform in polar coordinates part II: numerical computation and approximation of the continuous transformhttps://peerj.com/articles/cs-2572020-03-022020-03-02Xueyang YaoNatalie Baddour
The theory of the continuous two-dimensional (2D) Fourier Transform in polar coordinates has been recently developed but no discrete counterpart exists to date. In the first part of this two-paper series, we proposed and evaluated the theory of the 2D Discrete Fourier Transform (DFT) in polar coordinates. The theory of the actual manipulated quantities was shown, including the standard set of shift, modulation, multiplication, and convolution rules. In this second part of the series, we address the computational aspects of the 2D DFT in polar coordinates. Specifically, we demonstrate how the decomposition of the 2D DFT as a DFT, Discrete Hankel Transform and inverse DFT sequence can be exploited for coding. We also demonstrate how the proposed 2D DFT can be used to approximate the continuous forward and inverse Fourier transform in polar coordinates in the same manner that the 1D DFT can be used to approximate its continuous counterpart.
The theory of the continuous two-dimensional (2D) Fourier Transform in polar coordinates has been recently developed but no discrete counterpart exists to date. In the first part of this two-paper series, we proposed and evaluated the theory of the 2D Discrete Fourier Transform (DFT) in polar coordinates. The theory of the actual manipulated quantities was shown, including the standard set of shift, modulation, multiplication, and convolution rules. In this second part of the series, we address the computational aspects of the 2D DFT in polar coordinates. Specifically, we demonstrate how the decomposition of the 2D DFT as a DFT, Discrete Hankel Transform and inverse DFT sequence can be exploited for coding. We also demonstrate how the proposed 2D DFT can be used to approximate the continuous forward and inverse Fourier transform in polar coordinates in the same manner that the 1D DFT can be used to approximate its continuous counterpart.Improving parallel executions by increasing task granularity in task-based runtime systems using acyclic DAG clusteringhttps://peerj.com/articles/cs-2472020-01-132020-01-13Bérenger BramasAlain Ketterlin
The task-based approach is a parallelization paradigm in which an algorithm is transformed into a direct acyclic graph of tasks: the vertices are computational elements extracted from the original algorithm and the edges are dependencies between those. During the execution, the management of the dependencies adds an overhead that can become significant when the computational cost of the tasks is low. A possibility to reduce the makespan is to aggregate the tasks to make them heavier, while having fewer of them, with the objective of mitigating the importance of the overhead. In this paper, we study an existing clustering/partitioning strategy to speed up the parallel execution of a task-based application. We provide two additional heuristics to this algorithm and perform an in-depth study on a large graph set. In addition, we propose a new model to estimate the execution duration and use it to choose the proper granularity. We show that this strategy allows speeding up a real numerical application by a factor of 7 on a multi-core system.
The task-based approach is a parallelization paradigm in which an algorithm is transformed into a direct acyclic graph of tasks: the vertices are computational elements extracted from the original algorithm and the edges are dependencies between those. During the execution, the management of the dependencies adds an overhead that can become significant when the computational cost of the tasks is low. A possibility to reduce the makespan is to aggregate the tasks to make them heavier, while having fewer of them, with the objective of mitigating the importance of the overhead. In this paper, we study an existing clustering/partitioning strategy to speed up the parallel execution of a task-based application. We provide two additional heuristics to this algorithm and perform an in-depth study on a large graph set. In addition, we propose a new model to estimate the execution duration and use it to choose the proper granularity. We show that this strategy allows speeding up a real numerical application by a factor of 7 on a multi-core system.Using demographics toward efficient data classification in citizen science: a Bayesian approachhttps://peerj.com/articles/cs-2392019-11-252019-11-25Pietro De LellisShinnosuke NakayamaMaurizio Porfiri
Public participation in scientific activities, often called citizen science, offers a possibility to collect and analyze an unprecedentedly large amount of data. However, diversity of volunteers poses a challenge to obtain accurate information when these data are aggregated. To overcome this problem, we propose a classification algorithm using Bayesian inference that harnesses diversity of volunteers to improve data accuracy. In the algorithm, each volunteer is grouped into a distinct class based on a survey regarding either their level of education or motivation to citizen science. We obtained the behavior of each class through a training set, which was then used as a prior information to estimate performance of new volunteers. By applying this approach to an existing citizen science dataset to classify images into categories, we demonstrate improvement in data accuracy, compared to the traditional majority voting. Our algorithm offers a simple, yet powerful, way to improve data accuracy under limited effort of volunteers by predicting the behavior of a class of individuals, rather than attempting at a granular description of each of them.
Public participation in scientific activities, often called citizen science, offers a possibility to collect and analyze an unprecedentedly large amount of data. However, diversity of volunteers poses a challenge to obtain accurate information when these data are aggregated. To overcome this problem, we propose a classification algorithm using Bayesian inference that harnesses diversity of volunteers to improve data accuracy. In the algorithm, each volunteer is grouped into a distinct class based on a survey regarding either their level of education or motivation to citizen science. We obtained the behavior of each class through a training set, which was then used as a prior information to estimate performance of new volunteers. By applying this approach to an existing citizen science dataset to classify images into categories, we demonstrate improvement in data accuracy, compared to the traditional majority voting. Our algorithm offers a simple, yet powerful, way to improve data accuracy under limited effort of volunteers by predicting the behavior of a class of individuals, rather than attempting at a granular description of each of them.Component-oriented acausal modeling of the dynamical systems in Python language on the example of the model of the sucker rod stringhttps://peerj.com/articles/cs-2272019-10-282019-10-28Volodymyr B. KopeiOleh R. OnyskoVitalii G. Panchuk
Typically, component-oriented acausal hybrid modeling of complex dynamic systems is implemented by specialized modeling languages. A well-known example is the Modelica language. The specialized nature, complexity of implementation and learning of such languages somewhat limits their development and wide use by developers who know only general-purpose languages. The paper suggests the principle of developing simple to understand and modify Modelica-like system based on the general-purpose programming language Python. The principle consists in: (1) Python classes are used to describe components and their systems, (2) declarative symbolic tools SymPy are used to describe components behavior by difference or differential equations, (3) the solution procedure uses a function initially created using the SymPy lambdify function and computes unknown values in the current step using known values from the previous step, (4) Python imperative constructs are used for simple events handling, (5) external solvers of differential-algebraic equations can optionally be applied via the Assimulo interface, (6) SymPy package allows to arbitrarily manipulate model equations, generate code and solve some equations symbolically. The basic set of mechanical components (1D translational “mass”, “spring-damper” and “force”) is developed. The models of a sucker rods string are developed and simulated using these components. The comparison of results of the sucker rod string simulations with practical dynamometer cards and Modelica results verify the adequacy of the models. The proposed approach simplifies the understanding of the system, its modification and improvement, adaptation for other purposes, makes it available to a much larger community, simplifies integration into third-party software.
Typically, component-oriented acausal hybrid modeling of complex dynamic systems is implemented by specialized modeling languages. A well-known example is the Modelica language. The specialized nature, complexity of implementation and learning of such languages somewhat limits their development and wide use by developers who know only general-purpose languages. The paper suggests the principle of developing simple to understand and modify Modelica-like system based on the general-purpose programming language Python. The principle consists in: (1) Python classes are used to describe components and their systems, (2) declarative symbolic tools SymPy are used to describe components behavior by difference or differential equations, (3) the solution procedure uses a function initially created using the SymPy lambdify function and computes unknown values in the current step using known values from the previous step, (4) Python imperative constructs are used for simple events handling, (5) external solvers of differential-algebraic equations can optionally be applied via the Assimulo interface, (6) SymPy package allows to arbitrarily manipulate model equations, generate code and solve some equations symbolically. The basic set of mechanical components (1D translational “mass”, “spring-damper” and “force”) is developed. The models of a sucker rods string are developed and simulated using these components. The comparison of results of the sucker rod string simulations with practical dynamometer cards and Modelica results verify the adequacy of the models. The proposed approach simplifies the understanding of the system, its modification and improvement, adaptation for other purposes, makes it available to a much larger community, simplifies integration into third-party software.R-DECO: an open-source Matlab based graphical user interface for the detection and correction of R-peakshttps://peerj.com/articles/cs-2262019-10-212019-10-21Jonathan MoeyersonsMatthew AmoniSabine Van HuffelRik WillemsCarolina Varon
Many of the existing electrocardiogram (ECG) toolboxes focus on the derivation of heart rate variability features from RR-intervals. By doing so, they assume correct detection of the QRS-complexes. However, it is highly likely that not all detections are correct. Therefore, it is recommended to visualize the actual R-peak positions in the ECG signal and allow manual adaptations. In this paper we present R-DECO, an easy-to-use graphical user interface (GUI) for the detection and correction of R-peaks. Within R-DECO, the R-peaks are detected by using a detection algorithm which uses an envelope-based procedure. This procedure flattens the ECG and enhances the QRS-complexes. The algorithm obtained an overall sensitivity of 99.60% and positive predictive value of 99.69% on the MIT/BIH arrhythmia database. Additionally, R-DECO includes support for several input data formats for ECG signals, three basic filters, the possibility to load other R-peak locations and intuitive methods to correct ectopic, wrong, or missed heartbeats. All functionalities can be accessed via the GUI and the analysis results can be exported as Matlab or Excel files. The software is publicly available. Through its easy-to-use GUI, R-DECO allows both clinicians and researchers to use all functionalities, without previous knowledge.
Many of the existing electrocardiogram (ECG) toolboxes focus on the derivation of heart rate variability features from RR-intervals. By doing so, they assume correct detection of the QRS-complexes. However, it is highly likely that not all detections are correct. Therefore, it is recommended to visualize the actual R-peak positions in the ECG signal and allow manual adaptations. In this paper we present R-DECO, an easy-to-use graphical user interface (GUI) for the detection and correction of R-peaks. Within R-DECO, the R-peaks are detected by using a detection algorithm which uses an envelope-based procedure. This procedure flattens the ECG and enhances the QRS-complexes. The algorithm obtained an overall sensitivity of 99.60% and positive predictive value of 99.69% on the MIT/BIH arrhythmia database. Additionally, R-DECO includes support for several input data formats for ECG signals, three basic filters, the possibility to load other R-peak locations and intuitive methods to correct ectopic, wrong, or missed heartbeats. All functionalities can be accessed via the GUI and the analysis results can be exported as Matlab or Excel files. The software is publicly available. Through its easy-to-use GUI, R-DECO allows both clinicians and researchers to use all functionalities, without previous knowledge.Steady state particle swarmhttps://peerj.com/articles/cs-2022019-08-262019-08-26Carlos M. FernandesNuno FachadaJuan-Julián MereloAgostinho C. Rosa
This paper investigates the performance and scalability of a new update strategy for the particle swarm optimization (PSO) algorithm. The strategy is inspired by the Bak–Sneppen model of co-evolution between interacting species, which is basically a network of fitness values (representing species) that change over time according to a simple rule: the least fit species and its neighbors are iteratively replaced with random values. Following these guidelines, a steady state and dynamic update strategy for PSO algorithms is proposed: only the least fit particle and its neighbors are updated and evaluated in each time-step; the remaining particles maintain the same position and fitness, unless they meet the update criterion. The steady state PSO was tested on a set of unimodal, multimodal, noisy and rotated benchmark functions, significantly improving the quality of results and convergence speed of the standard PSOs and more sophisticated PSOs with dynamic parameters and neighborhood. A sensitivity analysis of the parameters confirms the performance enhancement with different parameter settings and scalability tests show that the algorithm behavior is consistent throughout a substantial range of solution vector dimensions.
This paper investigates the performance and scalability of a new update strategy for the particle swarm optimization (PSO) algorithm. The strategy is inspired by the Bak–Sneppen model of co-evolution between interacting species, which is basically a network of fitness values (representing species) that change over time according to a simple rule: the least fit species and its neighbors are iteratively replaced with random values. Following these guidelines, a steady state and dynamic update strategy for PSO algorithms is proposed: only the least fit particle and its neighbors are updated and evaluated in each time-step; the remaining particles maintain the same position and fitness, unless they meet the update criterion. The steady state PSO was tested on a set of unimodal, multimodal, noisy and rotated benchmark functions, significantly improving the quality of results and convergence speed of the standard PSOs and more sophisticated PSOs with dynamic parameters and neighborhood. A sensitivity analysis of the parameters confirms the performance enhancement with different parameter settings and scalability tests show that the algorithm behavior is consistent throughout a substantial range of solution vector dimensions.Pay attention and you won’t lose it: a deep learning approach to sequence imputationhttps://peerj.com/articles/cs-2102019-08-122019-08-12Ilia SucholutskyApurva NarayanMatthias SchonlauSebastian Fischmeister
In most areas of machine learning, it is assumed that data quality is fairly consistent between training and inference. Unfortunately, in real systems, data are plagued by noise, loss, and various other quality reducing factors. While a number of deep learning algorithms solve end-stage problems of prediction and classification, very few aim to solve the intermediate problems of data pre-processing, cleaning, and restoration. Long Short-Term Memory (LSTM) networks have previously been proposed as a solution for data restoration, but they suffer from a major bottleneck: a large number of sequential operations. We propose using attention mechanisms to entirely replace the recurrent components of these data-restoration networks. We demonstrate that such an approach leads to reduced model sizes by as many as two orders of magnitude, a 2-fold to 4-fold reduction in training times, and 95% accuracy for automotive data restoration. We also show in a case study that this approach improves the performance of downstream algorithms reliant on clean data.
In most areas of machine learning, it is assumed that data quality is fairly consistent between training and inference. Unfortunately, in real systems, data are plagued by noise, loss, and various other quality reducing factors. While a number of deep learning algorithms solve end-stage problems of prediction and classification, very few aim to solve the intermediate problems of data pre-processing, cleaning, and restoration. Long Short-Term Memory (LSTM) networks have previously been proposed as a solution for data restoration, but they suffer from a major bottleneck: a large number of sequential operations. We propose using attention mechanisms to entirely replace the recurrent components of these data-restoration networks. We demonstrate that such an approach leads to reduced model sizes by as many as two orders of magnitude, a 2-fold to 4-fold reduction in training times, and 95% accuracy for automotive data restoration. We also show in a case study that this approach improves the performance of downstream algorithms reliant on clean data.Comparison of the performance of skip lists and splay trees in classification of internet packetshttps://peerj.com/articles/cs-2042019-07-152019-07-15Navid KhezrianMahdi Abbasi
Due to the increasing number of Internet users and the volume of information exchanged by software applications, Internet packet traffic has increased significantly, which has highlighted the need to accelerate the processing required in network systems. Packet classification is one of the solutions implemented in network systems. The most important issue is to use an approach that can classify packets at the speed of the network and show optimum performance in terms of memory usage. In this study, we evaluated the performance in packet classification of two of the most important data structures used in decision trees, i.e. the skip list and splay tree. Our criteria for performance were the time of packet classification, the number of memory accesses, and memory usage of each event. These criteria were tested by the ACL and IPC rules with different numbers of rules as well as by different packet numbers. The results of the evaluation showed that the performance of skip lists is higher than that of splay trees. By increasing the number of classifying rules, both the difference in the speed of packet classification and the superiority of the performance of the skip list over that of the splay tree become more significant. The skip list also maintains its superiority over the splay tree in lower memory usage. The results of the experiments confirm the scalability of this method in comparison to the splay tree method.
Due to the increasing number of Internet users and the volume of information exchanged by software applications, Internet packet traffic has increased significantly, which has highlighted the need to accelerate the processing required in network systems. Packet classification is one of the solutions implemented in network systems. The most important issue is to use an approach that can classify packets at the speed of the network and show optimum performance in terms of memory usage. In this study, we evaluated the performance in packet classification of two of the most important data structures used in decision trees, i.e. the skip list and splay tree. Our criteria for performance were the time of packet classification, the number of memory accesses, and memory usage of each event. These criteria were tested by the ACL and IPC rules with different numbers of rules as well as by different packet numbers. The results of the evaluation showed that the performance of skip lists is higher than that of splay trees. By increasing the number of classifying rules, both the difference in the speed of packet classification and the superiority of the performance of the skip list over that of the splay tree become more significant. The skip list also maintains its superiority over the splay tree in lower memory usage. The results of the experiments confirm the scalability of this method in comparison to the splay tree method.RobOMP: Robust variants of Orthogonal Matching Pursuit for sparse representationshttps://peerj.com/articles/cs-1922019-05-132019-05-13Carlos A. Loza
Sparse coding aims to find a parsimonious representation of an example given an observation matrix or dictionary. In this regard, Orthogonal Matching Pursuit (OMP) provides an intuitive, simple and fast approximation of the optimal solution. However, its main building block is anchored on the minimization of the Mean Squared Error cost function (MSE). This approach is only optimal if the errors are distributed according to a Gaussian distribution without samples that strongly deviate from the main mode, i.e. outliers. If such assumption is violated, the sparse code will likely be biased and performance will degrade accordingly. In this paper, we introduce five robust variants of OMP (RobOMP) fully based on the theory of M-Estimators under a linear model. The proposed framework exploits efficient Iteratively Reweighted Least Squares (IRLS) techniques to mitigate the effect of outliers and emphasize the samples corresponding to the main mode of the data. This is done adaptively via a learned weight vector that models the distribution of the data in a robust manner. Experiments on synthetic data under several noise distributions and image recognition under different combinations of occlusion and missing pixels thoroughly detail the superiority of RobOMP over MSE-based approaches and similar robust alternatives. We also introduce a denoising framework based on robust, sparse and redundant representations that open the door to potential further applications of the proposed techniques. The five different variants of RobOMP do not require parameter tuning from the user and, hence, constitute principled alternatives to OMP.
Sparse coding aims to find a parsimonious representation of an example given an observation matrix or dictionary. In this regard, Orthogonal Matching Pursuit (OMP) provides an intuitive, simple and fast approximation of the optimal solution. However, its main building block is anchored on the minimization of the Mean Squared Error cost function (MSE). This approach is only optimal if the errors are distributed according to a Gaussian distribution without samples that strongly deviate from the main mode, i.e. outliers. If such assumption is violated, the sparse code will likely be biased and performance will degrade accordingly. In this paper, we introduce five robust variants of OMP (RobOMP) fully based on the theory of M-Estimators under a linear model. The proposed framework exploits efficient Iteratively Reweighted Least Squares (IRLS) techniques to mitigate the effect of outliers and emphasize the samples corresponding to the main mode of the data. This is done adaptively via a learned weight vector that models the distribution of the data in a robust manner. Experiments on synthetic data under several noise distributions and image recognition under different combinations of occlusion and missing pixels thoroughly detail the superiority of RobOMP over MSE-based approaches and similar robust alternatives. We also introduce a denoising framework based on robust, sparse and redundant representations that open the door to potential further applications of the proposed techniques. The five different variants of RobOMP do not require parameter tuning from the user and, hence, constitute principled alternatives to OMP.