Optimal sequence for chain matrix multiplication using evolutionary algorithm

The Chain Matrix Multiplication Problem (CMMP) is an optimization problem that helps to find the optimal way of parenthesization for Chain Matrix Multiplication (CMM). This problem arises in various scientific applications such as in electronics, robotics, mathematical programing, and cryptography. For CMMP the researchers have proposed various techniques such as dynamic approach, arithmetic approach, and sequential multiplication. However, these techniques are deficient for providing optimal results for CMMP in terms of computational time and significant amount of scalar multiplication. In this article, we proposed a new model to minimize the Chain Matrix Multiplication (CMM) operations based on group counseling optimizer (GCO). Our experimental results and their analysis show that the proposed GCO model has achieved significant reduction of time with efficient speed when compared with sequential chain matrix multiplication approach. The proposed model provides good performance and reduces the multiplication operations varying from 45% to 96% when compared with sequential multiplication. Moreover, we evaluate our results with the best known dynamic programing and arithmetic multiplication approaches, which clearly demonstrate that proposed model outperforms in terms of computational time and space complexity.


INTRODUCTION
Optimization means to find the optimal and diverse solution for a complex problem (Bengio, Lodi & Prouvost, 2020). There are many complex problems exist in the real life, it is difficult to solve these problems by divination. In these problems, the resources are limited, which lead to many constraints. Optimization plays an important role to solve these problems, because optimization uses the resources in efficient way. These complex problems have many scenarios where an objective can be transformed into an optimization to represent the composition of linear maps. Therefore, MM is a basic tool of linear algebra, and as such has several applications in different areas of applied mathematics and also in mathematics, physics, and engineering. The computation of matrix products is a fundamental operation in all computation applications of linear algebra. MM is a binary operation in which we produce the result from two matrices in a new matrix (Mishra et al., 2020), whereas, CMM is a sequence of matrices in which we find the most efficient way to multiply a sequence of matrices, to decide which order to accomplish the multiplications. We only defined the number of operations to multiply the matrices.
Moreover, the matrices have the cost which is determined in the form of rows and columns (p × q). The matrix multiplication is totally depends on this cost. The multiplication is possible if and only if the number of columns of first matrix is equal to the number of rows of second matrix. Chain Matrix multiplication is an associative operation, the chain matrix multiplication order does not affect the final result but it can affect the total number of performed operations as shown in Figs. 2 and 3.
In this article, we have proposed an efficient Group Counseling Optimization (GCO) algorithm based model. The main contributions made by this article are as follows: The proposed model implemented the GCO algorithm to for CMM problems. It finds out the optimal sequence for CMM. The comparison of proposed model has done with the existing techniques such as dynamic programing approach, arithmetic multiplication approach and sequential approach based on space complexity, time complexity and number of multiplication operations.
The rest of the article is organized as follows. "Related Work" summarizes the related work and reviews the literature on evolutionary algorithms and techniques used for the CMMP. In "Proposed Model", we explain the proposed model in detail. "Experimental Design" discusses the experimental design. "Tool and Technology", discuss the tool and technologies. "Results and Discussions" presents experimental results and comparison of proposed model with existing techniques for CMMP. "Concluding Remarks" concludes this research work.

RELATED WORK
Chain Matrix multiplication is an associative operation, the chain matrix multiplication order does not affect the final result but it can affect the total number of performed operations. Different architectures and techniques were proposed to solve this problem, which is shown in Fig. 4 and the summary of literature review discussed in Table 1. There as multiple approaches used for CMMP like: dynamic programing approach (Ben Charrada, Ezouaoui & Mahjoub, 2011), sequential approach (Kung, 1982(Kung, , 1980, greedy approach (Lakhotia et al., 2015) and arithmetic approach (Hafeez et al., 2007). According to the literature, the dynamic programing approach and arithmetic approach for the CMMP provides the optimal results but the problem is that these approaches are time consuming and required the more space. According to the literature it is also stated that the greedy approach provides optimal sequence for the CMM in some case but mostly provided the sequence for CMM which one perform the more multiplication operations, because the greedy approach stuck at local optima. That's why greedy approach only used for the small data set where the local optima is the global optima. The sequential multiplication is well known approach used for the CMMP, but the sequential approach failed to provides the optimal sequence for CMM and it is also time consuming approach and required more space.
The product of chain matrix multiplication can be acquired by using the standard tree method that was proposed by Zuo & Lastovetsky (2007). The product of A1, A2, A3…, A8 can be obtained by using binary tree method. Input matrices A1, A2, A3…, A8 are input from the leaves and tree will compute the final result A12345678 at the root. Direct communications are enabled between these four servers by directly transferring output A12 from node 1 to node 2, output A56 from node 3 to node 4, output A1234 from node 2 to node 4. The binary tree built such that the root are at the bottom level and leaves at the top level. Each particular node corresponds to a matrix product and the leaves corresponds the product of two successive matrices of the chain matrix. The root corresponds the final results of the given sequence of matrices. The main issue of this approach is that the execution time increases for large grid matrix multiplications.
MapReduce is a programing technique and a programing model which was designed for distributed computing (Seo et al., 2010). This technique consists of two important tasks that is Map and Reduce. Map function takes the set of data and converts it into individuals elements and broken down into tuples. Reduce task proceeds the output from the map function as input then associations those tuples into a small set of tuples. For the large matrix multiplication, Myung and Jaeseok proposed an implementation based MapReduce framework (Myung & Lee, 2012). They expanded a binary multiplication problem to n-ary multiplication for joining the several matrices operation and represented a matrix which is consists of records (row, col, and val). The main issue of mapreduce technique is that size of matrix does not fit in the memory and difficult to optimize the multi-way join operation in MapKey if the same number assigned to more attributes. A Dynamic Programming Algorithm approach was proposed to solve the large complex problems in Nishida, Ito & Nakano (2011) which is work like divide and conquer principle. In dynamic programing a recursive function is defined to get the optimized parentheses which give the minimum number of multiplications. In this approach, the original chain splitting into sub-chain of length such that the product (A i …A k ) (A k + 1 … A j ). A function (Tithi et al., 2015) w (i, k, j) is used which proceeds the cost of parenthesis combination of (A i …A k ) and (A k …A j ). This algorithm allocates a cost to all products and then stores the best solution together with its cost. It will compute the matrices will all possible ways of multiplication with each other and store them in a table, and gives the optimal sequence at the end. The main issue of this approach is that the problem size is held fixed Graphics Processing Units (GPUs) built approach was proposed and tested using C++ AMP on NVIDIA GPUs (Shyamala, Kiran & Rajeshwari, 2017). In this approach two types of functions are used in C++ AMP, A Pre-Processing function which is used for the multiplication number calculation with minimum number of multiplication operations of matrices is chooses for GPU computing. A Matrix Multiplication Parallel function is used to observes for keyword (restrict (amp)) to be executed to get the code on GPU. The drawback of this approach is that it has limited sized matrices numbers concurrently runs with different values (e.g., 3 × 4). The proposed work is implemented with an integrated graphics card. Using the greedy approach a solution was proposed to determine the minimum number of multiplication operations (Lakhotia et al., 2015). In this study, they modify the greedy approach with divide and conquer approach and the main idea behind this modification is to solve the multiplication problems in a top down fashion. They take an input in array order p[0….n], and divide the p array into n sub-array. Each sub-array consists at least one or at most 2 elements. This process was done in a greedy way, at each step only one least element is selected among all elements in the array p. So that, the cost of multiplication kept minimum at each single step. This approach ensures that the result is optimal with minimum cost consists and the output was a fully parenthesized of matrices. This algorithm did not chosen the correct least value when the dimensions of matrices are same.
Many algorithms and methods are proposed for better performance using Strassen's implementation. Strassen's algorithm known as Dynamic General Fast Matrix Multiplication (DGEFMM) algorithm which was used for any size of matrix with minimum number of scalar multiplication using minimum storage (Benson & Ballard, 2015). Matrix multiplication operations are more expensive than the matrix addition, this tradeoff is known as faster algorithms. Fast Strassen's algorithm follows the same block structure as recursive multiplication with seven matrix multiplications and 18 additions. A hardware accelerator systolic suitable architecture "(point to point multiplication operation is used between all interrelated processing elements)" for large scale matrix multiplication was proposed (Zuo et al., 2017), it is very suitable for hardware design and requires lower bandwidth than systolic structure. The drawback of this approach is problematic to complete the whole matrix operations at a time due to limited hardware resources. It is essential to divide the matrix into small portions, and multiply each of the small chunks with the others chunks. The chain multiplier is able to handle the block matrix multiplication well. The main issue of this approach is that there are limited hardware resources.
For CMMP Mabrouk, Hasni & Mahjoub (2017) proposed Dynamic Programming based three phase approach. The Dynamic Programing provides the optimal sequence (parenthesization) for chain matrix multiplication problems, but it is time consuming because time varies with n 3 here n is number of matrices.
Henrik Barthel and Marcin were designed a new approach (Barthels, Copik & Bientinesi, 2018) based on expressions. These expressions consists of the products of vectors and matrices. These expressions are mapped onto a computational kernel set of K. Additionally; the mapping of expression has to minimize a user-selected expense metric "(such as number of flops or execution time)." The output is then a sequence of kernel calls that computes the original expression. The main issue of this approach is that the type of pattern matching CLAK kernel uses is expensive.

PROPOSED MODEL
The proposed model based on the Group Counseling Optimizer (GCO) (Eita & Fahmy, 2014) algorithm in which we generate the parenthesis for the CMM to minimize the CMM operations (scalar products). The flow chart of proposed model is shown in Fig. 5. In Fig. 6 the Pop is the population, Gen is the generations, P and G also donated to population and generations respectively. The product of population and generations is the fitness evolution value like: If population is 100 and generations are 50 then the fitness evolution value is 5,000.
The model firstly takes the input file which contains the number of matrices, rows and columns. The model reads the data from the file and stores it in the string form. After that, model assigns the name to each matrix like M 1 , M 2 , and M 3 and so on. After assigning the name to each matrix, proposed model check that the criteria for matrix multiplication. If multiplication is not possible, then model shows the error message, otherwise the model assign the random structuring sequence to the matrices as shown in the Table 2. Table 2 shows that, the population has the 4 individuals Each individual in the population is called chromosome and chromosome is the combination of gens as shown in Fig. 6.
After initialization of population, the model calculates the fitness value of each chromosome and stores it as shown in Table 3.
After calculating and storing the fitness value of chromosomes, the model stores the chromosomes at their best position as shown in the Table 4.
After storing the chromosomes at their best position, the model starts the process of reproduction of new chromosomes. In this process, the model firstly generate the multiple random structures for the matrices, then select the best structure from the generated structures on the bases of fitness value (scalar products) and store it in the column of corresponding parent chromosome as shown in the Table 5. After reproduction of chromosomes, the model checks that which one is best from parent and child chromosomes, then select the best one chromosome and store in the generation table in the ascending order as shown in the Table 6.
After achieving the first generation the model use it for generating further generations. The generations are generated until the break point. After achieve the last generation, the model select the best solution from the last generation. The best solution is selected on       the bases of scalar products, the chromosome which one has the minimum value of scalar products select as an optimal solution. The fitness function decides that how fit a solution from the all generated solutions. The fitness function gives the score to each individuals. The selection probability of an individual is based on its fitness cost. High fitter chromosomes has high chances of survival to next generation, whereas, the worst fit chromosomes has low chances of survival. The fitness of the individuals is computed according to the following function: Where; The fitness function applied to compute the cost of all individuals and compared with the whole population. Furthermore, then sort the population according to its fitness score. The minimum score known as the best individual in the population and has high probability to survive the next generation and sorting them from best to worst order. With the use of stack implementation compute the cost (fitness) of each matrix string.
For example: We have three number of matrix: ((M 1 M 2 ) M 3 ) M 1 = 5 × 10 M 2 = 10 × 15 M 3 = 15 × 20 So, the fitness of above individuals: Total Fitness: Performance of this work in the form of cost which increase the overall efficiency of Chain Matrix Multiplications. In optimization, the cost is the continuous process of getting the best results with no impact on the system and guaranteeing the system satisfaction scores are sustained. In chain matrix multiplication, the goal is to find the most efficient way to multiply the matrices. The multiplication order that minimizes the total number of required operations to reduce the overall cost of CMM.

EXPERIMENTAL DESIGN
The evaluation of the proposed version of CMM compared with the existing approaches for CMM like dynamic programing approach for CMM, arithmetic approach for CMM, sequential multiplication approach for CMM. Results of the proposed model of CMM compared with the existing CMM approaches and represented the results. The behavior of some existing approaches has shown to observe how much performance is incremented and how it underutilizes the desires bandwidth. The behavior of existing approaches has been discussed in "Related Work". The data set is collected from different articles published by Ben Charrada, Ezouaoui & Mahjoub (2011), Hafeez et al. (2007) and Kung (1982Kung ( , 1980. The senility analysis performed on this data. There are the following parameters of data set.  (Shyamala, Kiran & Rajeshwari, 2017;Mabrouk, Hasni & Mahjoub, 2017;Barthels, Copik & Bientinesi, 2018;Srivastava et al., 2020), computational time (Coello, Pulido & Lechuga, 2004) and space complexity (Coello, Pulido & Lechuga, 2004) are also used in this research work.

TOOL AND TECHNOLOGY
The experiments for the proposed computational model were implemented using MATLAB R2013b running on Microsoft Windows 10 64-bit OS. The PC was built with 8 GM Random Access Memory (RAM) and an Intel Core i5 2.30 GHz Central Processing Unit (CPU).

RESULTS AND DISCUSSIONS
The results of proposed model for optimal solution of CMM problems (CMMP) are demonstrated. The proposed model compared with the dynamic programing approach, sequential multiplication approach and arithmetic multiplication approach for the CMMP. So far we have demonstrated a GCO based model that computes the optimal cost for chain matrix multiplications. Table 7 summarizes the main results of time complexity and space complexity of different algorithms. In the Table 7 "n" is the number of matrices. The results show that proposed model outperform as compare to other techniques in terms of time complexity and space complexity.

CONCLUDING REMARKS
This research concludes that the GCO can enhance the power of simple dynamic programing problems by reducing its space and time complexity at a great extent. Moreover, the use of GCO algorithm also reduces the arithmetic multiplication operations for CMMP. The experimental results shows that our enhanced CMM version based on GCO provide good performance and reduce the time for matrix multiplication from 45% to 96% when compared with sequential multiplication. Moreover, we evaluate our results with the best known dynamic programing arithmetic multiplication approach which clearly demonstrate that proposed model outperforms in terms of computational time and space complexity. We have also identified that when we minimize the required operation for CMM operation, the number of resources increases and it requires higher data throughput bandwidth. Fine grain nature of matrix multiplication problem through dynamic programing; the 50 matrix chain product problem was solved on one processor. One of the major drawback of DP approach is that it requires number of processors equal to the number of matrices in parallel computing is a difficult task to fulfill in most of the cases. The proposed model compared with other existing approach of multiplication and shows that our proposed approach has better optimal solution.