Exclusive use and evaluation of inheritance metrics viability in software fault prediction—an experimental study

Software Fault Prediction (SFP) assists in the identification of faulty classes, and software metrics provide us with a mechanism for this purpose. Besides others, metrics addressing inheritance in Object-Oriented (OO) are important as these measure depth, hierarchy, width, and overriding complexity of the software. In this paper, we evaluated the exclusive use, and viability of inheritance metrics in SFP through experiments. We perform a survey of inheritance metrics whose data sets are publicly available, and collected about 40 data sets having inheritance metrics. We cleaned, and filtered them, and captured nine inheritance metrics. After preprocessing, we divided selected data sets into all possible combinations of inheritance metrics, and then we merged similar metrics. We then formed 67 data sets containing only inheritance metrics that have nominal binary class labels. We performed a model building, and validation for Support Vector Machine(SVM). Results of Cross-Entropy, Accuracy, F-Measure, and AUC advocate viability of inheritance metrics in software fault prediction. Furthermore, ic, noc, and dit metrics are helpful in reduction of error entropy rate over the rest of the 67 feature sets.


INTRODUCTION
Object-Oriented software development is a widely used software development technique. It is not possible to produce a completely fault-free software system. The failure rates in a thousand lines of code for industrial projects are 15-50, 10-20 for Microsoft (McConnell, 2004) and specifically in Windows 2000, which comprises thirty-five million Lines of Code(LOC) recorded sixty-three thousand errors (Foley, 2007). Residual errors may cause failure since many types of errors have been newly detected (Bilton, 2016;Osborn, 2016;Grice, 2015).
Early detection of faults may save time, costs, and decrease the software complexity since it is proportional to the testing. Extensive testing is required to locate all remaining errors. The extensive tests are impossible (Jayanthi & Florence, 2018;Kaner, Bach & Pettichord, 2008). This is why the cost of testing at times elevated to over 50% of the total software development cost (Majumdar, 2010). This number may rise to seventy-five percent as reported by IBM (Hailpern & Santhanam, 2002). Software testing is essential and inevitable to produce software without errors. Thorough testing of entire classes with limited manpower is a challenging task. It is more feasible to identify faulty classes and test them to produce software with good quality. It is observed those faults are not uniformly dispersed throughout the software product. Certain classes are more faulty as compare to others and are clustered in a limited number of classes (Sherer, 1995). Research shows that errors are limited to 42.04% of the entire components in a software project (Gyimothy, Ferenc & Siket, 2005). Similarly, the findings of Ostrand, Weyuker & Bell (2004) revealed that only 20% of the entire software components are faulty in a project.
The fault prediction process typically includes the training and prediction phases. In the first phase of training, a prediction model is constructed utilizing software metrics at class or method level together with fault information associated with each module of the software. Later, the said model is employed on a newly developed software version for faults prediction.
We utilize methods of classification for labeling classes into fault-free or faulty classes by using software metrics and fault information. Quality of software is improved by detecting classes with faults within the software using fault prediction models. The model performance has an effect on the modeling technique (Elish & Elish, 2008a;Catal & Diri, 2009) and metrics (Briand et al., 2000;Ostrand, Weyuker & Bell, 2005). Several researchers have developed and validated fault prediction models based on statistical techniques or machine learning. They have used software metrics, data sets, and feature reduction methods to achieve improvement in the performance of model.
The software development industry is utilizing Object-Oriented paradigm; however, the usage of metrics at class-level in fault prediction is less common while comparing with other types of metrics (Catal & Diri, 2009). Our literature review findings show that method level software metrics employing machine learning techniques are in use by the researches (Catal, 2011). It is beneficial to use publicly available data sets for SFP since they help to develop verifiable and repeatable models (Catal & Diri, 2009). It is pertinent to mention here that models of machine learning have superior precision and accessibility as comparing it with expert opinion-based approaches or statistical methods.
Besides others, inheritance is a prominent feature of Object-Oriented programming. Inheritance is the class's ability to obtain the properties of some other class. It is divided into several types, namely single, multiple, multi-level, and so on. Object-Oriented supports reuse in three different ways (Karunanithi & Bieman, 1993), where the primary technique assessment of the inheritance measures. Finally, 'Experiment and Results' explains the threats to validity, conclusion remarks and future directions of the investigation.

Software fault prediction
The process of SFP typically includes two phases training and prediction. In the training phase method or class level software metrics with faulty data associated with the individual module of the software are used to construct a prediction model. Later, this model is used for the prediction of fault-prone classes in a newer version of the software. Fault prediction is beneficial to improve software quality and reduce testing costs. Moreover, it enables testing teams to limit testing on fault-prone classes only. In software, faults prediction theoretically established the target to grasp which parts may require concentration. Several SFP methods have been used (Rathore & Kumar, 2017a;Catal, 2011), which share three main ingredients (Beecham et al., 2010); Feature set, Class label, and Model.
The feature set consists of the metrics from a software artifact. It is believed that these are decent class label predictors. The metrics are classified into product, project, and process metrics. Out of these product metrics are mostly utilized (Gómez et al., 2006). The product metrics are further grouped into class, method, and file levels. Generally, 60% metrics at method-level are applied after that 24% metrics at class-level (Catal & Diri, 2009). Product metrics also comprise design, code, volume, and complexity metrics. The fault prediction model performance is so much dependent on these metrics. Scholars have evaluated the usage frequency of metrics in (Malhotra, 2015;Catal & Diri, 2009;Beecham et al., 2010;Radjenović et al., 2013;Gondra, 2008;Chappelly et al., 2017;Nair et al., 2018), where the utmost commonly applied software product metrics in software fault prediction are Halstead (Halstead, 1977), McCabe (McCabe, 1976), LoC in structural programming, and C&K metrics suite (Chidamber & Kemerer, 1991) in Object-Oriented paradigm. These metrics have became the de-facto standard metrics in SFP. PROMISE (Boetticher, 2007) andD'Ambros (D'Ambros, Lanza &Robbes, 2010) are more often used data sets repositories containing these metrics. These repositories contain data sets of about 52% of the studies published after 2005 (Malhotra, 2015). Since these data sets are available publicly therefore they are used very frequently. The second reason is the absence of fault data of industrial projects of software.
The second very significant element is the class label in software fault prediction, which contains metrics actual value. Within the software fault prediction domain, fault free or faulty are shown as continuous or nominal-binary or to point total faults in an occurrence. Though, usage of continuous labels present in the literature (Rathore & Kumar, 2017b), but leading class labels are nominal class labels in SFP (Catal, 2011;Rathore & Kumar, 2017a).
In software fault prediction, model building is a third key factor, which is a relationship between feature set and label of class. It can be applied by the use of Machine learning (ML) algorithms, statistical methods, or even expert opinion (Catal & Diri, 2009), where ML is an extensively utilized method for model building (Beecham et al., 2010). It expressively expands the accuracy of classification (Han, Pei & Kamber, 2011). In the SFP domain, several ML algorithms are utilized. The performance of these ML algorithms has been compared by Malhotra, whose findings conclude that Bayesian networks and Random forest are outperformers while compared with other algorithms of ML (Malhotra, 2015).
The findings of literature review disclosed that 22% statistical methods are utilized (Grice, 2015), and 59% is machine learning. Several performance measures and machine learning methods are explored that make use of object-oriented metrics to predict faults. These studies are cataloged into multiple tables. The studies from 1990-2003 are shown in Table  1, studies from 2004 to 2007 are listed in Table 2 and finally studies from 2008 to 2020 are listed in Table 3 (Aziz, Khan & Nadeem, 2020).

Software inheritance
Inheritance makes it possible to make use of the components of previous objects by recently created objects in an object-oriented paradigm. The superclass or base class is a source of inheritance and a subclass or derived class inherits from a superclass. The term main class and secondary class can also be used interchangeably for super and subclass. The sub-class can possess its components and methods in addition to inherit visible methods and properties from the main class. Inheritance offers (Aziz, Khan & Nadeem, 2019): 1. Reusability: reuse is a resource available by the inheritance where superclass's public methods are utilized into subclass without code rewriting. 2. Overriding: define a new behavior for a method that already exists. This happens when the class in question extends any other class and creates a method with the same signature as the ''parent'' class in the subclass. 3. Extensibility: extend the logic of the supper class according to the business logic of the sub-class. 4. Maintainability:it is straightforward to walk-through the source code as soon as the software program is split into portions. 5. Data hiding: Inheritance presents a feature to hiding data by marking a method as a private in the main class so that it cannot be utilized or alter by the sub-class. In the object-oriented paradigm, the basis of inheritance is an ''IS-A'' bond, which describes ''R is a Z type of thing'', blue is a color, the bus is a vehicle. The inheritance is uni-directional, ''the house is a building'', but ''the building is not a house'' etc. The inheritance has other additional important characteristics (Aziz, Khan & Nadeem, 2019): 1. Generalization: dissemination of commonalities amongst several classes is termed as generalization (Pason, 1994). 2. Specialization: increasing the functionality of a class is described as specialization (Breesam, 2007). The research shows that inheritance has various forms. There are described in the subsequent lines (Shivam, 2013): 1. Single Inheritance: when a sub-class only inherits through a single main-class is denoted as single inheritance. 3. Multilevel Inheritance: multilevel inheritance in object-oriented paradigm, refers to an approach when a sub-class spread out from a derived class, making the derived class a main class of the freshly formed class. 4. Hierarchical Inheritance: in the situation of hierarchical inheritance one superclass is expanded by means of many sub-classes. 5. Hybrid Inheritance: is a mixture of multi-layer and multiple inheritance. In multiple inheritance, subclasses are expended from two superclasses. Although these superclasses are derived classes rather than the base classes.

Inheritance metrics
Depth of Inheritance Tree (DIT) (Chidamber & Kemerer, 1994): the DIT metrics is a measurement of how significantly subclasses may efface the metrics of this class. In the case of multiple inheritances, DIT would be the maximum distance from the node to the root of a tree.
DIT = Max inheritance path from the class to the root (1) • The lowest class in the hierarchy would inherit a larger amount of methods, consequently, it would be hard to predict their behavior.
• While in the design phase, deeper trees will create more complexity since several classes and methods are being used.
• The lower a particular class is in the hierarchy, the higher possibility reuse of inherited methods.

Number of children (NOC) (Chidamber & Kemerer, 1994)
• Increasing the NOC will rise in reuse since inheritance is a type of reuse.
• The larger the sub-class, the bigger the probability of inadequate abstraction of the main-class. In the case where a class has a large number of subclasses, it would be a situation of misappropriation of a child-class. • The quantity of subclasses presents an impression of the possible impact on the design of a class. In case when a class contains larger numbers of subclasses, it would require further testing of methods present within the class.

NOC = number of immediate sub − classes of a class (2)
Attribute Inheritance Factor (AIF) (Abreu & Carapuça, 1994): AIF is the ratio of the sum of all classes inherited attributes in the system to the all classes total number of available attributes. It is a metric at system-level which gauges the range of inherited attribute within the system. The equation to calculated AIF is as under: Method Inheritance Factor (MIF) (Abreu & Carapuça, 1994): MIF is the ratio of sum of all classes inheritance methods of the system with the total number of all classes presented methods. It is a metric at the system-level. It is proposed to maintain MIF in between 0.25 and 0.37. The equation to calculated MIF is under: Number Of Methods Inherited (NMI) (Lorenz & Kidd, 1994): NMI metric calculates the total methods inherited by a sub-class.
Number of Methods Overridden (NMO): a larger value of NMO reveals a design issue, showing that these methods were overridden as a last-minute design. It is recommended that a sub-class should be a specialization of its main classes, which results in a brand-new distinctive name for the methods. Number of New Methods (NNA): the usual anticipation of a sub-class is how to additionally specialize or add up objects of the main class. If there is not any method with a similar name in any superclass, the said method is defined as an additional method in the subclass. Inheritance Coupling (Li & Henry, 1993): is an association between classes that facilitates to use of earlier defined objects, consist of variables and methods. Inheritance reduces class complexity by decreasing the number of methods of a single class, but then this becomes design and maintenance difficult. Inheritance improves reusability and efficiency when using current objects. Simultaneously, inheritance has led to complexities in testing and understanding software. This implies that inheritance coupling affects several quality attributes such as complexity, reusability, efficiency, maintainability, understandability, and testability. Number Of Inherited Methods (NIM): is a simple metric that describes the extent to which a particular class may reuse. It calculates the number of methods a class may gain access to in its main class. The greater the inheritance of methods, the greater the reuse of a class will be via subclasses. Comparing this metric amongst the number of superclasses referred and the manner it is referred to methods not specified in the class could be exciting since it shows how much internal reuse occurred within the calling class and its superclass. It may be an inward call to an inward method, even though it is hard to measure it. Also, inheriting from larger superclasses could be a problem for the reason that only a subset of the behavior may be used/needed in subclasses. This is a limit of the single inheritance based on the object-oriented paradigm.

Fan-In and Fan-Out Metric (Henry & Kafura, 1981):
Henry and Kafura first defined the Fan-In and Fan-Out metrics (Henry & Kafura, 1981). These are ''module-level'' metrics and expanded for the object-oriented paradigm. Assuming a class X, we note its Fan-In such as the number of classes, that make use of characteristics of class X. Likewise, the Fan-Out for a class X is the number of classes utilized by X. Sheetz, Tegarden, and Monarchi originate a set of basic counts. Fundamental complexity or inter-module complexity (Card & Glass, 1990) has been recognized as an important part of the complexity of a structured system. Several researchers have utilized module-defined Fan in and Fanout (Belady & Evangelisti, 1981;Card & Glass, 1990;Monarchi & Puhr, 1992). Extending these ideas to variables in object-oriented systems seems appropriate and straightforward. The number of methods using variables (variable Fan-In) is very similar to the number of modules calling the module (Fan-In), and the number of objects accessed by the variable (variable Fan-Out) and the digital module called by the module (Fan-Out). Fan-Down: Fan-Down is the number of objects below the object hierarchy (subclasses). Fan-Up: Fan-Up is the number of objects above in hierarchy (superclasses). Class-To-Root Depth: the maximum number of levels in the hierarchy that are above the class. Class-To-Leaf Depth: the maximum number of levels in the hierarchy that are below the class.

Measure of Functional Abstraction (MFA):
MFA is the share of the count of methods inherited by a class with the sum of methods of the class. Its range is from 0 to 1. IFANIN: IFANIN metric counts the immediate base classes in the hierarchy.

Inheritance metrics and their usage
Inheritance is a key characteristic of the object-oriented paradigm. It facilitates the class level design and forms the ''IS-A'' relationship among classes since the basic segment of the development of a system is the design of class (Rajnish, Choudhary & Agrawal, 2010). The utilization of inheritance shrinkages the costs of testing efforts and maintenance of the system (Chidamber & Kemerer, 1994). The reuse employing inheritance will thus deliver software, that is greatly understandable, maintainable, and reliable (Basili, Briand & Melo, 1996). In an experiment, Harrison et al. describe the absence of inheritance as easier to control or grasp as compared to the software that makes use of inheritance aspect (Harrison, Counsell & Nithi, 1998). However, Daley's experiments reveal software with tertiary inheritance may possibly be simply revised as compared to the software with no inheritance aspect (Daly et al., 1996).
The Inheritance metrics calculate numerous aspects of inheritance, that include breadth and depth of the hierarchy, besides the overriding complexity (Krishna & Joshi, 2010). Similarly, Bhattacherjee and Rajnish executed a study about inheritance metrics related with classes (Rajnish, Bhattacherjee & Singh, 2007;Rajnish & Bhattacherjee, 2008a;Rajnish & Bhattacherjee, 2007;Rajnish & Bhattacherjee, 2006a;Rajnish & Bhattacherjee, 2006b;Rajnish & Bhattacherjee, 2005). However, it is agreed the deeper the inheritance hierarchy, the class reusability will be enhanced but system maintainability will be complicated. In order to streamline the insight, software designers striving to keep inheritance hierarchy narrow and dispose of reusability by the usage of inheritance (Chidamber & Kemerer, 1994). Hence, it's important to assess the difficulty of the inheritance hierarchy to resolve the disparity among depth and shallowness.
Several metrics focus on inheritance are well-defined by the researchers. These metrics with their references are listed in Table 4 (Aziz, Khan & Nadeem, 2019). Table 4 are discussed briefly in 'Inheritance metrics' of the paper. We specifically take these inheritance metrics in the background of object oriented software fault prediction.

Data sets in SFP
In software fault prediction, so many data sets are being used. These data sets are categorized into public, private, partial, and unknown data sets types (Catal & Diri, 2009a). Out of these, public type data sets utilization has increased from 31% to 52% since 2005 onward (Malhotra, 2015). It is a fact that fault information normally not available for private projects however there are public data sets widely available with fault information, these can be downloaded for free. Also, there are many fault repositories, out of this Tera-PROMISE (Boetticher, 2007) warehouses, and D'Ambros warehouses are usually utilized for fault predicting (D'Ambros, Lanza & Robbes, 2010).
A publicly available repository identified as Tera-PROMISE presents substantial data sets of many projects. Its previous edition was named as NASA repository (Shirabad & Menzies, 2005). The data sets of NASA are a vital resource of the Tera-PROMISE repository because its data sets are a widely used library of SFP. Nearly 60% of articles published between 1991 and 2013 take advantage of this archive (Card & Agresti, 1988). The library of Tera-PROMISE presents metrics related to product and process along with digital and nominal class labels for buildup regression and classification models.
The D'Ambros library retains data sets of six software systems. These are Equinox, Eclipse JDT Core, Eclipse PDE UI, Framework, and Mylyn.
During the review of the literature, it is identified that scholars make use of private, and public data sets for the proof of their study. In this respect, Table 5 indicates the name of the author, publication year, and public data sets or private data sets applied in said experiments.

LITERATURE REVIEW
In this section, emphases are on inheritance metrics to find out how these might be effective in SFP. This paper is not performing a systematic literature review. It explores how various inheritance metrics are advantageous in fault prediction.

Inheritance in SFP
Object-oriented metrics are employed for the prediction of faults to produce quality software. The attributes that ascertain software quality are fault tolerance, understandability, defect density, maintainability, normalized rework rate, reusability, and many more others.
There are several metrics levels comprising class level, method level, file level, process level, component level, and quantitative levels. The method-level metrics are comprehensively applied for the prediction of faults problem. Halstead (1977) and McCabe (1976) metrics suggested in the year 1970s though these are still the greatest predominant metrics at the method-level. The class-level metrics are merely applied in object-oriented programs. The C&K (Chidamber & Kemerer, 1994) set of metrics is yet the ultimate predominant metrics suite at class-level being employed for fault prediction. Table 6  Table 4 Inheritance metrics.
The metrics collection of C&K crafted and applied by Chidamber & Kemerer (1994) are the ultimate frequently utilize metrics set for software related to object oriented. Briand et al. (2000) have analyzed the collection of object oriented design metrics suggested by Basili, Briand & Melo (1996). R. Subramanyam validated that {dit}, {cbo} and {wmc} metrics are fault predictor at class level (Subramanyam & Krishnan, 2003).
Experimental evaluations of the classification algorithm have been built for fault prediction through researches (Kaur & Kaur, 2018). Basili et al. revealed that many C&K metrics are observed to be associated with failure propensity (Basili, Briand & Melo, 1996). Tang et al. assessed C&K metrics suite and discovered that any of these metrics except {rfc} and {wmc} were deemed vital (Tang, Kao & Chen, 1999). Briand et al. carryout forty-nine metrics to ascertain, which model to apply for the prediction of faults. Conclusions shows apart from {noc} all metrics are useful to predict faults tendency (Briand et al., 2000). Wust and Briand determined that {dit} metrics are inversely correlated to fault proneness and {noc} metrics is an insignificant predictor of fault tendency (Briand, Wüst & Lounis, 2001). Yu et al. selected eight metrics to explored the relationship amongst these metrics and the tendency to identify faults. So firstly they explored the correlation among metrics and found four closely associated sets. After this, they utilize univariate analysis to observe, which set can classify faults (Yu, Systa & Muller, 2002). Malhotra and Jain applied logistic regression methods to examine the correlation amongst metrics of object-oriented along with faults tendency. The receiver operating characteristics (ROC) evaluation was employed. The predictive model performance was assessed through ROC (Malhotra & Jain, 2012). Yeresime et al. have investigated using linear regression, logistic regression, and artificial neural network methods for the prediction of software faults making use of  (Suresh, Kumar & Rath, 2014). The impact of inheritance metrics in SFP is authenticated in experimentation, where the artificial neural network

METHODOLOGY
The research approach comprises on three interconnected phases, as shown in Fig. 1, which includes selection, preprocessing, and experimentation/evaluation phases. The first phase comprises on the choice of data sets of inheritance metrics and performance measure. The selection of inheritance metrics data sets is based on dual criteria, the data set should be publicly available and metrics correlation should not be ≥ 0.7 OR ≤ −0.7. In the second pre-processing phase, other metrics in addition to the inheritance for example loc, cbo, wmc are removed to keep the data set consistent. These data sets are then split by all possible combinations of their features sets. After this, the data sets are cleaned, filtered, and remove the related anomalies. Finally, in the experiment/evaluation phase, the final form of the data set is used for the experiment, in which SVM (Support Vector Machine) is built and cross-validated. The calculation of cross entropy losses, Accuracy, F-Measure and AUC is performed. Accordingly, a score is calculated for each selected inheritance metric to determine the superior.

Selection phase
(1) Selection of inheritance metrics: from the inheritance metrics mentioned in 'Theoretical Background' of this paper, we choose only those metrics that meet the criteria stated as follows.

Data set must be publicly available
This condition is comprehended since the software projects fault information is extremely rarely approachable. The fundamental issue is that the information of fault for large enterprise projects is stored digitally and propriety. The bug information on small projects is too less but available for the public. Thus, labeled data is infrequently accessible. The accessibility of the data set, which is publicly available will permit the assessment of the metrics related to inheritance in fault prediction. Lastly, 40 data sets were discovered with inheritance metrics (Jureczko & Madeyski, 2010;Menzies & Di Stefano, 2004;a51, 2009;Niu & Mahmoud, 2012;D'Ambros, Lanza & Robbes, 2010;Wagner, 2010;Abdelmoez, Goseva-Popstojanova & Ammar, 2006;Abdelmoez et al., 2005;Monarchi & Puhr, 1992;Shepperd et al., 2013). A total of nine inheritance metrics are found in this data set; which are Inheritance Tree Depth (dit), Number of Children (noc), Functional Abstraction Measure (mfa), Inheritance Coupling (ic), Number of Method Inherited (nomi), Inherited Attribute Number (noai), Dependent on Child (doc) , Number of methods called per class (fanOut) and Number of classes that call class methods (fanIn).
A total of 40 data sets, out of which about 35 data sets are found on the servers of tera-PROMISE repository (Boetticher, 2007) and five data sets are located in the D'Ambros repository (D'Ambros, Lanza & Robbes, 2010). In this regard, Table 7 depicts the detailed information of these data sets. The first column indicates the data set name along with the version number if exist. Second column shows the detail about the total number of records and third column shows the percentage of fault for each base data sets. Overall, nine distinct metrics of inheritance are discovered in 40 data sets, where 3is used to label the presence of a metric in the associated data set and × is labeled where the metrics are not present in the data set.
Unluckily, all nine inheritance metrics do not exist in a single data set. However, set of inheritance metrics comprising {dit, noc, ic, mfa} are found in 30 data sets, {dit, Overall {dit} and {noc} features exist in all forty data sets. The feature {ic} is found in 31 data sets, feature mfa} in 30 data sets and similarly other in multiple data sets through these 659 data sets created for the experiment that will be explained in subsequent sections. The conclusion drawn is based upon the predictive ability of {ic} on 31 data sets. Same goes for rest of the feature sets.
It is prominent to mentioned here that some of the data sets are already utilized in an experiment to comparing inheritance metrics with C&K metrics (Aziz, Khan & Nadeem, 2019).
Correlation should not be ≥ 0.7 OR ≤ −0.7 Software metrics have a tendency of correlation as these focus on the related characteristic of object-oriented programming for example in our case inheritance. A high value of correlation as ≥ 0.7 or ≤ −0.7 is a category of repetition that needs that repetition metric should be eliminated. The issue is the effect of managing the repetition metric might be negative, triggering uncertainty for the mining algorithm and determine a depleted pattern of quality (Han, Pei & Kamber, 2011). Furthermore, the advantages of removing correlated metrics are significantly better as compare to the cost (Jiarpakdee, Tantithamthavorn & Hassan, 2018). In the event of a lesser value of correlation, almost near to ≥ 0.7 or ≤−0.7, the rejection of a metric may deprive the data set of significant important information.
In the study, in the case metrics illustrated in the second criterion, we execute the Pearson r and the Spearman p correlation coefficient for the pairs discovered in forty selected public data sets. Correlation between the features of each data set is computed in unfiltered data set. In order to explain the feature to feature correlation analysis further data set ant−1.7 has four features {dit}, {noc}, {ic}, and {mfa} as shown in Table 7.  Table 7.
The strongly correlated features are dropped in their corresponding data sets only. It is important to mention here that correlation is only a precautionary step otherwise it does not manipulate feature set.
The presence of strongly correlated features in the SVM models make it difficult to converge and justify the generality of the results. Therefore, strongly correlated features are identified and dropped. Pearson correlation coefficient is used for the purpose.
Nevertheless, all combinations are positively correlated, where not a single pair is equal to ≥ 0.7 or ≤−0.7. The nine inheritance metrics also meet the second criterion.
Tools/programming language and environments used to compute Pearson r and the Spearman p correlation coefficient is R platform with ggp4br R package. 2) Selection of Performance Measure. The models of machine learning built using classification are measured with their accomplishment by categorizing the unidentified occurrences. A confusion matrix is a method of showing its capability. Catal et al. have computed many performance metrics, derived and originated through confusion matrix (Catal, 2012). Malhotra also has suggested the overall explanation of many assessment measures used in software fault prediction. The finding revealed that True Positive Rate(TPR) is the frequently utilized performance measure in software fault prediction, succeeding measures are Precision and AUC (Malhotra, 2015).
Cross entropy is a mean to compute the overall deviation of the models' probability from the actual label. Independent of the threshold is a key property of cross entropy (Hinojosa et al., 2018). It is effective in both training and testing phases (Golik, Doetsch & Ney, 2013;Kline & Berardi, 2005). So cross entropy is choose as a primary performance measure and Accuracy, F-Measure and AUC are selected as a supporting measure for this experiment.

Preprocessing phase
(1) Remove non-inheritance metrics. Collected data sets have numerous metrics other than inheritance metrics, including {loc}, {wmc}, {ca}, {cbo}, and several others. Because we are planning to assess the inheritance metrics in the background of SFP, so all non-inheritance metrics are removed. This might affect the performance but it will be better to evaluate the inheritance metrics viability on software fault prediction.
(2) Uniformity of Labels. Every metrics encompass continuous numeric values with the associated data set however discrepancies are located in the class namely [bug]. These are settled through the guidelines mentioned as under: 3) Splitting. The main objective of this study is to measure the significance of nine selected metrics of inheritance so forty data sets are separated into numerous possible sets of features after removing non-inheritance metrics.
The objectives of splitting process are to visualize the impact of every possible feature set and determine the most significant feature set out of the available data sets.
Each data set out of these forty data sets are separated into numerous possible sets of features by splitting and combining features into all possible unique combinations. In order to explain further, data set ant−1.7 has four features {dit}, {noc}, {ic}, and {mfa} as shown in Table 7. Splitting this single data set into all possible unique combinations will create about 15 sub-data sets as explained by the formula 2 [number of features] − 1 where 2 4 − 1 = 15 unique sub-data sets.
Resultantly 659 sub-data sets are formed by applying the same process on all forty publicly acquired data sets. Overall 67 unique features has been generated from these sub-data sets. First column of Table 8 shows these 67 unique features under the heading ''features'', 2nd column depicts the number of metrics combined to generate a unique feature and third column shows total number of sub-data sets formed by applying splitting process on to forty data sets that generated overall 659 sub-data sets.  Finally, 659 sub-data sets were formed, which contains 67 different features sets (column 3 of Table 8). Afterwards, all 659 sub data sets have been passed through three phases; dropping same instances, dropping inconsistent instances, and filtration.
(4) Cleaning. The next step is cleaning where identical instances in the data sets are eliminated since these instances are worthless and, sometimes, confusing for the model. Afterwards, inconsistent instances are also removed, since these are inconsistency in data sets (Henderson-Sellers, 1995). In inconsistency, the occurrences of all the metrics contain same values and contain dissimilar class tags.
Our objective is only to identify the anomaly on the data sets. This problem may possibly be addressed in four different ways. First option is to drop both the instances, consequently, information will be lost. Second option is to drop instances of minor class, due to this, data set will become more skewed. Third option is to drop instances of major class that resultantly produces less skewness, and in fourth option keep both instances resultantly reciprocities negates the effect of each other. In this study, third option is applied to keep the impact minimum on the data sets. (5) Filtration. Small data sets have been dropped where the number of instances are ≤100 in the filtration phase. This filter is applied for the reason to employ ten-fold cross validation, not including replacement, that is typically the situation with validation of the model. Consequently, 293 data sets are eliminated while using this filter. (6) Skewness ≤ 9:1. The objective is to identify the skewed data sets out of 659 sub-data sets and drop them. Skewness has not been addressed further in this study.
Skewness shows faulty or free of fault occurrences that must contain a percentage of data sets to ≤90 and ≥10. The skewness filter is applied in the case where only 100 instances in the data set, then a minimum one record from both groups exists in the case where no hierarchical 10-fold cross validation that is replaced. After applying this filter onto all 659 sub-data sets only one sub-data set found skewed, which is hence eliminated.
Though remaining data sets are imbalance yet it is addressed in two folds, model building algorithm selection and performance measure selection. In first case, we use SVM which is usually the choice of modeling for imbalance data set (Xing, Guo & Lyu, 2005;Elish & Elish, 2008b;Singh, Kaur & Malhotra, 2009;Di Martino et al., 2011;Yu, 2012;Malhotra, Kaur & Singh, 2010). In the case of performance evaluation we use cross entropy which is, again independent of imbalancencess in the data set.
Lastly, after applying cleaning, filtering, and skewness, 659 data sets were decreased to 365, which are shown in the last columns of Table 8.

Experiment setup
Dataset: 365 preprocessed data sets, as depicted in Table 8. Tools: R Language version 3.4.3 (Rajnish & Bhattacherjee, 2005) in R Studio 1.1.383 (Rajnish & Bhattacherjee, 2006b). Data Splitting technique: Ten-fold stratified cross-validation without replacement. The stratified splitting maintains the ratio of classes in all the folds. Moroever, we reported the average results attained in the 10 folds. Classifiers Algorithm: SVM is generally appreciated by the SFP community (Xing, Guo & Lyu, 2005;Elish & Elish, 2008b;Singh, Kaur & Malhotra, 2009;Di Martino et al., 2011;Yu, 2012), for its applicability on real-world applications, non-linear data coverage, and well generalization in high dimensional space. SVM is utilized for model building. Stratified splitting without replacement is done for ten-fold cross validation. Finally, we reported the average results computed in the 10-folds splitting in the form Cross entropy loss, Accuracy, F-Measure and AUC for all data sets. SVM parameters: Gaussian kernel: The kernal has following equation: It is a general-purpose kernel. It does not require any specific pattern of data. Moreover, SVM has been built and validated Gaussian kernel functions of the kernel and the best model. Complete working of SVM model building and result collection is shown in Algorithm 1. The focus of this article is the viability of inheritance metrics in software fault prediction. It is found that the conventional techniques nevertheless being quite old, are still being used by the SFP community. The obvious reason is the outperformance of the discussed algorithms on the data set used. We didn't use the advanced machine learning techniques, explicitly. The reason is that they are accumulated using the algorithms discussed in the articles, therefore they are implicitly discussed in the article. Apart from that, advanced machine learning techniques are found more effective in huge data sets having a large number of features. Unfortunately, such data sets are seriously lacking in the Software engineering domain.
The experiment was conducted while keeping the industrial objective in mind where model is deployed with single threshold. Although model is supposed to be checked at every threshold but needs to be deployed with the threshold that gives the best results. So performance measure that visualize the performance of the model on that very threshold is of our interest, exactly where error entropy sits well, whereas AUC computes the models performance at every threshold, which does not suit with the objective.

RESULTS AND DISCUSSION
There are facts that inheritance metrics are significantly different from other metrics. Firstly, semantic distinction of inheritance metrics from other metrics need no comments and justification. Secondly, data set view point, Rashid et al. (Aziz, Khan & Nadeem, 2019) proved the empirical distinction between inheritance, and non-inheritance metrics in his study. Moreover, inheritance metrics itself quite distinct from each other. This has been approved by visualizing the tendency of correlation between inheritance metrics. To illustrate this we compute the Pearson r, and the Spearman p correlation coefficient for the pairs discovered in forty selected public data sets. Correlation between the features of each data set is computed in unfiltered data set. Nevertheless, all combinations are positively correlated, where not a single pair is equal to ≥ 0.7 or ≤ −0.7.
The fundamental objective of this work is to assess the exclusive viability of inheritance metrics in SFP, whereas the secondary aim is to achieve the greatest outcomes from the algorithms of machine learning. Keeping these in view, filtration is applied to data sets and experiments are planned.

Overall cross entropy loss
In this context, Table 9 displays the outcomes of the experiments where the name of the feature is shown in the first column, the number of features in the second column, the overall number of data sets in the third column. Averages of Cross Entropy, Accuracy, F-Measure, and AUC are in column four, five, six and seven. The minimum values of Cross Entropy, Auuracy, F-measure, and AUC are in column eight, nine, ten, and eleven respectively. Cross-entropy loss, Accuracy, F-Measure, and AUC are computed for all 365 data sets comprising of 67 unique combinations. These unique combinations are plotted in Fig. 2 where the number of features are increasing from bottom to top. The upper part depicts the least mean entropy loss, average in the middle, and maximum at the bottom. This graph proves our objective that adding inheritance metrics will reduce the entropy loss. The findings are also validated by Accuracy, F-Measure, and AUC mentioned in Table  9. Figure 2 graphically compares the cross entropy loss calculated through SVM on 1, 2, 3, 4, 5, and 6 feature sets of inheritance metrics on 365 data sets comprising 67 unique features. The lower part of Fig. 2 is heavy since the features are less. The results are gradually decreasing while moving upward since feature sets are increasing from 2 to 6. This graph proves our objective that adding inheritance metrics will reduce the entropy loss. The findings are also validated by Accuracy, F-Measure, and AUC. Table 9 shows that overall {dit,ic,noc,mfa} achieved least entropy rate of 0.000723, and {fanIn, fanOut, noc, noai, nomi} achieved average least entropy rate of 0.001707. Figure 3 shows the absence of outliers in the results across all performance measures. It can therefore be safely stated that the averages of performance measures are not biased.

Feature wise cross entropy loss
Regarding the exclusive assessment of inheritance metrics in the context of SFP, which is the main goal of this article, adding inheritance metrics will reduce the Cross Entropy Loss. In this regard feature wise Cross Entropy Loss, Accuracy, F-Measure, and AUC are extracted from Table 9. The first column of Table 10 shows the feature set, number of features from 1 to 6 in the second column, and least Cross Entropy Loss in the third column. In order to support the findings Accuracy, F-measure, and AUC in column four, five, and six respectively. The overall findings are: 1. The results shown in Table 10 contain two distant feature sets, feature number 1 to 4, and 5 to 6. The first set comprises of {mfa, ic, noc, dit} and second set comprises of {dit, fanIn, fanOut, noc, noai, nomi}.     Table 10 is depicted in Fig. 4, which shows the results of two distant feature sets. The first set comprises of {mfa, ic, noc, dit}, and second set comprises of {dit, fanIn, fanOut, noc, noai, nomi}. Adding inheritance metrics into {mfa} the Cross Entropy rate is reduced significantly from 1 to 4 features sets. Similarly adding {dit} into the 5th set will further reduce the Entropy rate. The findings are also supported by the results of other performance measure Accuracy, F-measure, and AUC in Table 10.

Feature wise average rate
Regarding the overall exclusive assessment of inheritance metrics on to 365 data sets comprising 67 unique features, the average is calculated for all unique features to make an overall assessment. Table 11 shows the results where the first column contains a feature set, number of inheritance features in the 2nd column, and average score for Cross Entropy  Loss in 3rd column. In order to support the findings Accuracy, F-measure, and AUC in column four, five and six respectively. The overall findings are: 1. The results are shown in Table 11 also contain two distant feature sets, feature number 1 to 2, and 3 to 6.  Table 11 is depicted in Fig. 5, which shows the results of two distant feature sets. The first set comprises of {mfa,ic} and the second set comprises of {dit, fanIn, fanOut, noc, noai,{nomi}. Adding inheritance metrics into {mfa}, the entropy rate is reduced significantly from 1 to 2 features sets. Similarly adding {noai, noc} and {dit} into {fanIn, fanOut, nomi} will further reduce the error rate. The findings are also validated by the results of other performance measure Accuracy, F-measure and AUC in Table 11.

Cardinality of features
The cardinality of the feature set is 6 in this paper. Figure 6 shows the average Cross Entropy rate of a single feature set, double feature set, and up to 6 feature set. This clearly depicted, adding inheritance metrics will gradually reduce the entropy rate. The average entropy rate is 83.3 when single inheritance metrics is used. The average rate is reduced to 63.1 when two inheritance metrics are used. The average rate is further reduced to 39.8 when three  Table 9 and depicts in Table 12. The first column shows the Cardinality of Feature set and column two, three, four and five contains averages of Cross entropy, Accuracy, F-Measure and AUC. The average of minimum values of Cross Entropy, Accuracy, F-Measue, AUC in column six, seven eight and nine. The graphical representation of Cross Entropy Loss is shown in Fig. 6 which proves our objects that adding inheritance matrices will reduce the cross entropy rate. Similarly, the results of Accuracy, F-Measure and AUC depicts in Table 12 also endorse these findings.
Furthermore, we conduct an experimental study to see how significantly metrics of inheritance assist in the prediction of software faults. Thus we made compassion of inheritance metrics and Chidamber and Kemerer (C&K) metrics suite. The findings demonstrate an appropriate impact of metrics of inheritance in software fault prediction (Aziz, Khan & Nadeem, 2019).

THREATS, CONCLUSION AND FUTURE WORK Threats to validity
Our study relies on the data sets obtained from the repositories of tera-PROMISE, NASA and D'Ambros. In these repositories, insufficient information is available regarding the faults nor it indicates any certain type of software fault.
Primarily, faults are not indicated for the specific category of software fault. Therefore the projection might not be generalized for all categories of faults associated with the software. Likewise selected data set to encompass limited software products by diversification in team, design, scope, etc. The circumstance of fault might not be due to the inheritance aspect only.
Since the information associated with the projects are not available in the selected data sets, therefore, the identification of most associated factor to faults cannot be determined. Though the experiments do advocate the predictive ability of inheritance metrics yet the causation relationship may not be guaranteed.
We do not claim the general representation of the results across the algorithm. Yet the applicability of the result may conclude the mapping of independent variable over the dependent variable.
We selected SVM as a modeling algorithm, for being its general acceptance by the SFP community. Since the training process is heavily dependent on the data set available thus results may vary by varying the modeling algorithm. Likewise, results may also differ by varying the kernel function of SVM.
Lastly, selected metrics of inheritance are not covering every aspect related to inheritance in software products. Therefore generalization of selected metrics of inheritance might not possibly be the effect of every aspect of inheritance.

Conclusion and future work
In this paper, we assessed software metrics of inheritance exclusively for their viability on software fault prediction. Experiments on forty distinct data sets argue inheritance metrics viability to faults prediction. The consensus of SVM revealed that inheritance metrics may accomplish the least entropy rate with a set of common characteristics. Overall {fanIn, fanOut, noc, noai, nomi} and individually {dit, ic, noc, mfa} proved to be the best predictor with the least entropy rate, whereas {dit}, {noc}, {ic} are helpful in reduction of entropy rate. We report that adding inheritance metrics is useful for predicting faults. These findings are also validated through performance measures of Accuracy, F-Measure, and AUC.
Regarding future work, we anticipate that some scholars may rebuild our experimentation and attempt to assess other metrics of inheritance than those we have employed. Since only nine metrics of inheritance are reviewed and assessed in this article and several other metrics of inheritance are defined in the literature. Due to the lack of availability of public data sets for these remaining metrics, these may not be assessed. This stimulates the requirement to build a data set that transmits data for the rest of the metrics. Also in this paper classification has been used due to less availability of continuous labels. So, the data set for the continuous value might be used for regression, to further evaluate the performance of inheritance metrics.

ADDITIONAL INFORMATION AND DECLARATIONS Funding
The authors received no funding for this work.