PeerJ Computer Science Preprints: Algorithms and Analysis of Algorithmshttps://peerj.com/preprints/index.atom?journal=cs&subject=8200Algorithms and Analysis of Algorithms articles published in PeerJ Computer Science PreprintsTime series event correlation with DTW and Hierarchical Clustering methodshttps://peerj.com/preprints/279592019-09-122019-09-12Srishti MishraZohair ShafiSantanu Pathak
Data driven decision making is becoming increasingly an important aspect for successful business execution. More and more organizations are moving towards taking informed decisions based on the data that they are generating. Most of this data are in temporal format - time series data. Effective analysis across time series data sets, in an efficient and quick manner is a challenge. The most interesting and valuable part of such analysis is to generate insights on correlation and causation across multiple time series data sets. This paper looks at methods that can be used to analyze such data sets and gain useful insights from it, primarily in the form of correlation and causation analysis. This paper focuses on two methods for doing so, Two Sample Test with Dynamic Time Warping and Hierarchical Clustering and looks at how the results returned from both can be used to gain a better understanding of the data. Moreover, the methods used are meant to work with any data set, regardless of the subject domain and idiosyncrasies of the data set, primarily, a data agnostic approach.
Data driven decision making is becoming increasingly an important aspect for successful business execution. More and more organizations are moving towards taking informed decisions based on the data that they are generating. Most of this data are in temporal format - time series data. Effective analysis across time series data sets, in an efficient and quick manner is a challenge. The most interesting and valuable part of such analysis is to generate insights on correlation and causation across multiple time series data sets. This paper looks at methods that can be used to analyze such data sets and gain useful insights from it, primarily in the form of correlation and causation analysis. This paper focuses on two methods for doing so, Two Sample Test with Dynamic Time Warping and Hierarchical Clustering and looks at how the results returned from both can be used to gain a better understanding of the data. Moreover, the methods used are meant to work with any data set, regardless of the subject domain and idiosyncrasies of the data set, primarily, a data agnostic approach.Securing ad hoc on-demand distance vector routing protocol against the black hole DoS attack in MANETshttps://peerj.com/preprints/279052019-08-172019-08-17Rohi TariqSheeraz AhmedRaees Shah SaniZeeshan NajamShahryar Shafique
Mobile Ad hoc network is the collection of nodes without having any physical structure involved i.e. access points, routers etc. MANETs are wide-open to similar forms of threats as other wireless mobile communication systems. In Ad-hoc Network nodes performing both as end-points of the communication and routers which makes the Ad-hoc routing protocols further prone towards the security attacks. Black Hole attack is a common security issue encountered in MANET routing protocols. The Black-Hole attack is security attack in which a malicious node imposters themselves as a node with the shortest hop count to the destination node during a packet transmission. A malicious node is capable of disturbing the network with Black Hole attack pretends to have the minimum hop-count route to the destination node (DS). This node responds to all route requests (RREQ) messages in positive and thus catches all the transmission to it. The source node (SN) not knowing the malicious nature of the Black-Hole node thus transmits all the important data. The Black Hole node discards all the important data packets. In this paper a comparatively effective, efficient and easy implemented way for identifying and therefore eluding the attacks of Black-Hole in mobile Ad-hoc networks is presented. The Network Simulator (NS-2) has been used for the implementation of our proposed solution to assess its work in terms of Network Routing load, End-to-End delay and Packet delivery ratio. The results show a considerable improvement in the performance metrics.
Mobile Ad hoc network is the collection of nodes without having any physical structure involved i.e. access points, routers etc. MANETs are wide-open to similar forms of threats as other wireless mobile communication systems. In Ad-hoc Network nodes performing both as end-points of the communication and routers which makes the Ad-hoc routing protocols further prone towards the security attacks. Black Hole attack is a common security issue encountered in MANET routing protocols. The Black-Hole attack is security attack in which a malicious node imposters themselves as a node with the shortest hop count to the destination node during a packet transmission. A malicious node is capable of disturbing the network with Black Hole attack pretends to have the minimum hop-count route to the destination node (DS). This node responds to all route requests (RREQ) messages in positive and thus catches all the transmission to it. The source node (SN) not knowing the malicious nature of the Black-Hole node thus transmits all the important data. The Black Hole node discards all the important data packets. In this paper a comparatively effective, efficient and easy implemented way for identifying and therefore eluding the attacks of Black-Hole in mobile Ad-hoc networks is presented. The Network Simulator (NS-2) has been used for the implementation of our proposed solution to assess its work in terms of Network Routing load, End-to-End delay and Packet delivery ratio. The results show a considerable improvement in the performance metrics.Machine learning approach for automated defense against network intrusionshttps://peerj.com/preprints/277772019-06-032019-06-03Farhaan Noor HamdaniFarheen Siddiqui
With the advent of the internet, there is a major concern regarding the growing number of attacks, where the attacker can target any computing or network resource remotely Also, the exponential shift towards the use of smart-end technology devices, results in various security related concerns, which include detection of anomalous data traffic on the internet. Unravelling legitimate traffic from malignant traffic is a complex task itself. Many attacks affect system resources thereby degenerating their computing performance. In this paper we propose a framework of supervised model implemented using machine learning algorithms which can enhance or aid the existing intrusion detection systems, for detection of variety of attacks. Here KDD (knowledge data and discovery) dataset is used as a benchmark. In accordance with detective abilities, we also analyze their performance, accuracy, alerts-logs and compute their overall detection rate.
These machine learning algorithms are validated and tested in terms of accuracy, precision, true-false positives and negatives. Experimental results show that these methods are effective, generating low false positives and can be operative in building a defense line against network intrusions. Further, we compare these algorithms in terms of various functional parameters
With the advent of the internet, there is a major concern regarding the growing number of attacks, where the attacker can target any computing or network resource remotely Also, the exponential shift towards the use of smart-end technology devices, results in various security related concerns, which include detection of anomalous data traffic on the internet. Unravelling legitimate traffic from malignant traffic is a complex task itself. Many attacks affect system resources thereby degenerating their computing performance. In this paper we propose a framework of supervised model implemented using machine learning algorithms which can enhance or aid the existing intrusion detection systems, for detection of variety of attacks. Here KDD (knowledge data and discovery) dataset is used as a benchmark. In accordance with detective abilities, we also analyze their performance, accuracy, alerts-logs and compute their overall detection rate.These machine learning algorithms are validated and tested in terms of accuracy, precision, true-false positives and negatives. Experimental results show that these methods are effective, generating low false positives and can be operative in building a defense line against network intrusions. Further, we compare these algorithms in terms of various functional parametersResolve the cell formation problem in a set of three manufacturing cellshttps://peerj.com/preprints/276922019-04-292019-04-29Boris Almonacid
The problem of cell formation is an NP-Hard problem, which consists of organising a group of machines and pieces in several cells. The machines are arranged in a fixed way inside the cells, and each machine has some manufacturing operation that applies in different pieces or parts. The idea of the problem is to be able to minimise the movements made by the pieces to reach the machines in the cells. For this problem, a data set has been organised using three manufacturing cells. Through the data set an experiment has been carried out that focuses on obtaining the best solution using a global search solution within 6 days for each instance. The experimental results have been able to obtain the general optimum value for a set of test instances.
The problem of cell formation is an NP-Hard problem, which consists of organising a group of machines and pieces in several cells. The machines are arranged in a fixed way inside the cells, and each machine has some manufacturing operation that applies in different pieces or parts. The idea of the problem is to be able to minimise the movements made by the pieces to reach the machines in the cells. For this problem, a data set has been organised using three manufacturing cells. Through the data set an experiment has been carried out that focuses on obtaining the best solution using a global search solution within 6 days for each instance. The experimental results have been able to obtain the general optimum value for a set of test instances.Preliminary experiments with the Andean Condor Algorithm to solve problems of Continuous Domainshttps://peerj.com/preprints/276782019-04-242019-04-24Boris L Almonacid
In this article a preliminary experiment is carried out in which a set of elements and procedures are described to be able to solve problems of continuous domains integrated in the Andean Condor Algorithm. The Andean Condor Algorithm is a metaheuristic algorithm of swarm intelligence inspired by the movement pattern of the Andean condor when searching for its food. An experiment focused on solving the problem of the function 1st De Jong's \(f(x_1 \cdots x_n) = \sum_{i=1}^n x_i^2,~ -100 \leq x_i \leq 100\). According to the results obtained, solutions have been obtained close to the overall optimum value of the problem.
In this article a preliminary experiment is carried out in which a set of elements and procedures are described to be able to solve problems of continuous domains integrated in the Andean Condor Algorithm. The Andean Condor Algorithm is a metaheuristic algorithm of swarm intelligence inspired by the movement pattern of the Andean condor when searching for its food. An experiment focused on solving the problem of the function 1st De Jong's \(f(x_1 \cdots x_n) = \sum_{i=1}^n x_i^2,~ -100 \leq x_i \leq 100\). According to the results obtained, solutions have been obtained close to the overall optimum value of the problem.Approximate string searching with fast fourier transforms and simplexeshttps://peerj.com/preprints/276152019-03-272019-03-27Daniel Liu
Previous algorithms for solving the approximate string matching with Hamming distance problem with wildcard ("don't care") characters have been shown to take \(O(|\Sigma| N \log M)\) time, where \(N\) is the length of the text, \(M\) is the length of the pattern, and \(|\Sigma|\) is the size of the alphabet. They make use of the Fast Fourier Transform for efficiently calculating convolutions. We describe a novel approach of the problem, which makes use of special encoding schemes that depend on \((|\Sigma| - 1)\)-simplexes in \((|\Sigma| - 1)\)-dimensional space.
Previous algorithms for solving the approximate string matching with Hamming distance problem with wildcard ("don't care") characters have been shown to take \(O(|\Sigma| N \log M)\) time, where \(N\) is the length of the text, \(M\) is the length of the pattern, and \(|\Sigma|\) is the size of the alphabet. They make use of the Fast Fourier Transform for efficiently calculating convolutions. We describe a novel approach of the problem, which makes use of special encoding schemes that depend on \((|\Sigma| - 1)\)-simplexes in \((|\Sigma| - 1)\)-dimensional space.Component-oriented acausal modeling of the dynamical systems in Python language on the example of the model of the sucker rod stringhttps://peerj.com/preprints/276122019-03-222019-03-22Volodymyr B KopeiOleh R OnyskoVitalii G Panchuk
As a rule, the limitations of specialized modeling languages for acausal modeling of the complex dynamical systems are: limited applicability, poor interoperability with the third party software packages, the high cost of learning, the complexity of the implementation of hybrid modeling and modeling systems with the variable structure, the complexity of the modifications and improvements. In order to solve these problems, it is proposed to develop the easy-to-understand and to modify component-oriented acausal hybrid modeling system that is based on: (1) the general-purpose programming language Python, (2) the description of components by Python classes, (3) the description of components behavior by difference equations using declarative tools SymPy, (4) the event generation using Python imperative constructs, (5) composing and solving the system of algebraic equations in each discrete time point of the simulation. The classes that allow creating the models in Python without the need to study and apply specialized modeling languages are developed. These classes can also be used to automate the construction of the system of difference equations, describing the behavior of the model in a symbolic form. The basic set of mechanical components is developed — 1D translational components "mass", "spring-damper", "force". Using these components, the models of sucker rods string are developed and simulated. These simulation results are compared with the simulation results in Modelica language. The replacement of differential equations by difference equations allow simplifying the implementation of the hybrid modeling and the requirements for the modules for symbolic mathematics and for solving equations.
As a rule, the limitations of specialized modeling languages for acausal modeling of the complex dynamical systems are: limited applicability, poor interoperability with the third party software packages, the high cost of learning, the complexity of the implementation of hybrid modeling and modeling systems with the variable structure, the complexity of the modifications and improvements. In order to solve these problems, it is proposed to develop the easy-to-understand and to modify component-oriented acausal hybrid modeling system that is based on: (1) the general-purpose programming language Python, (2) the description of components by Python classes, (3) the description of components behavior by difference equations using declarative tools SymPy, (4) the event generation using Python imperative constructs, (5) composing and solving the system of algebraic equations in each discrete time point of the simulation. The classes that allow creating the models in Python without the need to study and apply specialized modeling languages are developed. These classes can also be used to automate the construction of the system of difference equations, describing the behavior of the model in a symbolic form. The basic set of mechanical components is developed — 1D translational components "mass", "spring-damper", "force". Using these components, the models of sucker rods string are developed and simulated. These simulation results are compared with the simulation results in Modelica language. The replacement of differential equations by difference equations allow simplifying the implementation of the hybrid modeling and the requirements for the modules for symbolic mathematics and for solving equations.Pattern recognition techniques for the identification of Activities of Daily Living using mobile device accelerometerhttps://peerj.com/preprints/272252019-02-122019-02-12Ivan Miguel PiresNuno M. GarciaNuno PomboFrancisco Flórez-RevueltaSusanna SpinsanteMaria Canavarro TeixeiraEftim Zdravevski
This paper focuses on the recognition of Activities of Daily Living (ADL) applying pattern recognition techniques to the data acquired by the accelerometer available in the mobile devices. The recognition of ADL is composed by several stages, including data acquisition, data processing, and artificial intelligence methods. The artificial intelligence methods used are related to pattern recognition, and this study focuses on the use of Artificial Neural Networks (ANN). The data processing includes data cleaning, and the feature extraction techniques to define the inputs for the ANN. Due to the low processing power and memory of the mobile devices, they should be mainly used to acquire the data, applying an ANN previously trained for the identification of the ADL. The main purpose of this paper is to present a new method based on ANN for the identification of a defined set of ADL with a reliable accuracy. This paper also presents a comparison of different types of ANN in order to choose the type for the implementation of the final model. Results of this research probes that the best accuracies are achieved with Deep Neural Networks (DNN) with an accuracy higher than 80%. The results obtained are similar with other studies, but we compared tree types of ANN in order to discover the best method in order to obtain these results with less memory, verifying that, after the generation of the model, the DNN method, when compared with others, is also the fastest to obtain the results with better accuracy.
This paper focuses on the recognition of Activities of Daily Living (ADL) applying pattern recognition techniques to the data acquired by the accelerometer available in the mobile devices. The recognition of ADL is composed by several stages, including data acquisition, data processing, and artificial intelligence methods. The artificial intelligence methods used are related to pattern recognition, and this study focuses on the use of Artificial Neural Networks (ANN). The data processing includes data cleaning, and the feature extraction techniques to define the inputs for the ANN. Due to the low processing power and memory of the mobile devices, they should be mainly used to acquire the data, applying an ANN previously trained for the identification of the ADL. The main purpose of this paper is to present a new method based on ANN for the identification of a defined set of ADL with a reliable accuracy. This paper also presents a comparison of different types of ANN in order to choose the type for the implementation of the final model. Results of this research probes that the best accuracies are achieved with Deep Neural Networks (DNN) with an accuracy higher than 80%. The results obtained are similar with other studies, but we compared tree types of ANN in order to discover the best method in order to obtain these results with less memory, verifying that, after the generation of the model, the DNN method, when compared with others, is also the fastest to obtain the results with better accuracy.Skill ranking of researchers via hypergraphhttps://peerj.com/preprints/274802019-01-122019-01-12Xiangjie KongLei LiuShuo YuAndong YangXiaomei BaiBo Xu
Researchers use various skills in their work, such as writing, data analyzing and experiments design. These research skills have greatly influenced quality of their research outputs, as well as their scientific impact. Although there are many indicators having been proposed to quantify the impact of researchers, studies of evaluating their scientific research skills are very rare. In this paper, we analyze the factors affecting researchers' skill ranking and propose a new model based on hypergraph theory to evaluate the scientific research skills. To validate our skill ranking model, we perform experiments on PLoS One dataset and compare the rank of researchers' skills with their papers citation counts and h-index. Finally, we analyze the patterns about how researchers' skill ranking increased over time. Our studies also show the change patterns of researchers between different skills.
Researchers use various skills in their work, such as writing, data analyzing and experiments design. These research skills have greatly influenced quality of their research outputs, as well as their scientific impact. Although there are many indicators having been proposed to quantify the impact of researchers, studies of evaluating their scientific research skills are very rare. In this paper, we analyze the factors affecting researchers' skill ranking and propose a new model based on hypergraph theory to evaluate the scientific research skills. To validate our skill ranking model, we perform experiments on PLoS One dataset and compare the rank of researchers' skills with their papers citation counts and h-index. Finally, we analyze the patterns about how researchers' skill ranking increased over time. Our studies also show the change patterns of researchers between different skills.A local search algorithm for the constrained max cut problem on hypergraphs.https://peerj.com/preprints/274342018-12-182018-12-18Nasim SameiRoberto Solis-Oba
In the constrained max k-cut problem on hypergraphs, we are given a weighted hypergraph H=(V, E), an integer k and a set c of constraints. The goal is to divide the set V of vertices into k disjoint partitions in such a way that the sum of the weights of the hyperedges having at least two endpoints in different partitions is maximized and the partitions satisfy all the constraints in c. In this paper we present a local search algorithm for the constrained max k-cut problem on hypergraphs and show that it has approximation ratio 1-1/k for a variety of constraints c, such as for the constraints defining the max Steiner k-cut problem, the max multiway cut problem and the max k-cut problem. We also show that our local search algorithm can be used on the max k-cut problem with given sizes of parts and on the capacitated max k-cut problem, and has approximation ratio 1-|Vmax|/|V|, where |Vmax| is the cardinality of the biggest partition. In addition, we present a local search algorithm for the directed max k-cut problem that has approximation ratio (k-1)/(3k-2).
In the constrained max k-cut problem on hypergraphs, we are given a weighted hypergraph H=(V, E), an integer k and a set c of constraints. The goal is to divide the set V of vertices into k disjoint partitions in such a way that the sum of the weights of the hyperedges having at least two endpoints in different partitions is maximized and the partitions satisfy all the constraints in c. In this paper we present a local search algorithm for the constrained max k-cut problem on hypergraphs and show that it has approximation ratio 1-1/k for a variety of constraints c, such as for the constraints defining the max Steiner k-cut problem, the max multiway cut problem and the max k-cut problem. We also show that our local search algorithm can be used on the max k-cut problem with given sizes of parts and on the capacitated max k-cut problem, and has approximation ratio 1-|Vmax|/|V|, where |Vmax| is the cardinality of the biggest partition. In addition, we present a local search algorithm for the directed max k-cut problem that has approximation ratio (k-1)/(3k-2).