PeerJ Computer Science:Programming Languageshttps://peerj.com/articles/index.atom?journal=cs&subject=11920Programming Languages articles published in PeerJ Computer ScienceA domain-specific language for managing ETL processeshttps://peerj.com/articles/cs-18352024-01-262024-01-26Aleksandar PopovićVladimir IvkovićNikola TrajkovićIvan Luković
Maintenance of Data Warehouse (DW) systems is a critical task because any downtime or data loss can have significant consequences on business applications. Existing DW maintenance solutions mostly rely on concrete technologies and tools that are dependent on: the platform on which the DW system was created; the specific data extraction, transformation, and loading (ETL) tool; and the database language the DW uses. Different languages for different versions of DW systems make organizing DW processes difficult, as minimal changes in the structure require major changes in the application code for managing ETL processes. This article proposes a domain-specific language (DSL) for ETL process management that mitigates these problems by centralizing all program logic, making it independent from a particular platform. This approach would simplify DW system maintenance. The platform-independent language proposed in this article also provides an easier way to create a unified environment to control DW processes, regardless of the language, environment, or ETL tool the DW uses.
Maintenance of Data Warehouse (DW) systems is a critical task because any downtime or data loss can have significant consequences on business applications. Existing DW maintenance solutions mostly rely on concrete technologies and tools that are dependent on: the platform on which the DW system was created; the specific data extraction, transformation, and loading (ETL) tool; and the database language the DW uses. Different languages for different versions of DW systems make organizing DW processes difficult, as minimal changes in the structure require major changes in the application code for managing ETL processes. This article proposes a domain-specific language (DSL) for ETL process management that mitigates these problems by centralizing all program logic, making it independent from a particular platform. This approach would simplify DW system maintenance. The platform-independent language proposed in this article also provides an easier way to create a unified environment to control DW processes, regardless of the language, environment, or ETL tool the DW uses.Teaching programming using eduScrum methodologyhttps://peerj.com/articles/cs-18222024-01-232024-01-23Patrik Voštinár
There are a large number of professions in the world today. Some professions are disappearing, and some new ones are emerging. However, they all have something in common: the need to manage them. Throughout its history, humanity has developed several constantly changing forms of management. For this reason, school absolvents must enter the labour market with skills already sufficiently developed, such as communication, cooperation, teamwork, responsibility, and the ability to plan their work. The article focuses on the issue of teaching programming through mobile applications and basic robotics through the innovative form of teaching-EduScrum. The EduScrum methodology is based on the agile software development method Scrum, which develops soft skills. The article describes our experience with this teaching in computer science classes. We established several hypotheses evaluated using descriptive statistics on a sample of 251 students. The main objective of the research is to verify whether teaching computer science in primary and secondary schools using the eduScrum methodology is more suitable than the classical-frontal teaching of computer science. The research showed that secondary school students preferred the eduScrum methodology more than traditional frontal teaching and the primary school students preferred traditional frontal teaching.
There are a large number of professions in the world today. Some professions are disappearing, and some new ones are emerging. However, they all have something in common: the need to manage them. Throughout its history, humanity has developed several constantly changing forms of management. For this reason, school absolvents must enter the labour market with skills already sufficiently developed, such as communication, cooperation, teamwork, responsibility, and the ability to plan their work. The article focuses on the issue of teaching programming through mobile applications and basic robotics through the innovative form of teaching-EduScrum. The EduScrum methodology is based on the agile software development method Scrum, which develops soft skills. The article describes our experience with this teaching in computer science classes. We established several hypotheses evaluated using descriptive statistics on a sample of 251 students. The main objective of the research is to verify whether teaching computer science in primary and secondary schools using the eduScrum methodology is more suitable than the classical-frontal teaching of computer science. The research showed that secondary school students preferred the eduScrum methodology more than traditional frontal teaching and the primary school students preferred traditional frontal teaching.Predicting the final grade using a machine learning regression model: insights from fifty percent of total course grades in CS1 courseshttps://peerj.com/articles/cs-16892023-12-112023-12-11Carlos Giovanny Hidalgo SuarezJose LlanosVíctor A. Bucheli
This article introduces a model for accurately predicting students’ final grades in the CS1 course by utilizing their grades from the first half of the course. The methodology includes three phases: training, testing, and validation, employing four regression algorithms: AdaBoost, Random Forest, Support Vector Regression (SVR), and XGBoost. Notably, the SVR algorithm outperformed the others, achieving an impressive R-squared (R2) value ranging from 72% to 91%. The discussion section focuses on four crucial aspects: the selection of data features and the percentage of course grades used for training, the comparison between predicted and actual values to demonstrate reliability, and the model’s performance compared to existing literature models, highlighting its effectiveness.
This article introduces a model for accurately predicting students’ final grades in the CS1 course by utilizing their grades from the first half of the course. The methodology includes three phases: training, testing, and validation, employing four regression algorithms: AdaBoost, Random Forest, Support Vector Regression (SVR), and XGBoost. Notably, the SVR algorithm outperformed the others, achieving an impressive R-squared (R2) value ranging from 72% to 91%. The discussion section focuses on four crucial aspects: the selection of data features and the percentage of course grades used for training, the comparison between predicted and actual values to demonstrate reliability, and the model’s performance compared to existing literature models, highlighting its effectiveness.Evaluating the effectiveness of decomposed Halstead Metrics in software fault predictionhttps://peerj.com/articles/cs-16472023-11-272023-11-27Bilal KhanAamer Nadeem
The occurrence of faults in software systems represents an inevitable predicament. Testing is the most common means to detect such faults; however, exhaustive testing is not feasible for any nontrivial system. Software fault prediction (SFP), which identifies software components that are more prone to errors, seeks to supplement the testing process. Thus, testing efforts can be focused on such modules. Various approaches exist for SFP, with machine learning (ML) emerging as the prevailing methodology. ML-based SFP relies on a wide range of metrics, ranging from file-level and class-level to method-level and even line-level metrics. More granularized metrics are expected to possess a higher degree of micro-level coverage of the code. The Halstead metric suite offers coverage at the line level and has been extensively employed across diverse domains such as fault prediction, quality assessment, and similarity approximation for the past three decades. In this article, we propose to decompose Halstead base metrics and evaluate their fault prediction capability. The Halstead base metrics consist of operators and operands. In the context of the Java language, we partition operators into five distinct categories, i.e., assignment operators, arithmetic operators, logical operators, relational operators, and all other types of operators. Similarly, operands are classified into two classes: constants and variables. For the purpose of empirical evaluation, two experiments were designed. In the first experiment, the Halstead base metrics were used along with McCabe, Lines of Code (LoC), and Halstead-derived metrics as predictors. In the second experiment, decomposed Halstead base metrics were used along with McCabe, LoC, and Halstead-derived metrics. Five public datasets were selected for the experiments. The ML classifiers used included logistic regression, naïve Bayes, decision tree, multilayer perceptron, random forest, and support vector machines. The ML classifiers’ effectiveness was assessed through metrics such as accuracy, F-measure, and AUC. Accuracy saw an enhancement from 0.82 to 0.97, while F-measure exhibited improvement from 0.81 to 0.99. Correspondingly, the AUC value advanced from 0.79 to 0.99. These findings highlight the superior performance of decomposed Halstead metrics, as opposed to the original Halstead base metrics, in predicting faults across all datasets.
The occurrence of faults in software systems represents an inevitable predicament. Testing is the most common means to detect such faults; however, exhaustive testing is not feasible for any nontrivial system. Software fault prediction (SFP), which identifies software components that are more prone to errors, seeks to supplement the testing process. Thus, testing efforts can be focused on such modules. Various approaches exist for SFP, with machine learning (ML) emerging as the prevailing methodology. ML-based SFP relies on a wide range of metrics, ranging from file-level and class-level to method-level and even line-level metrics. More granularized metrics are expected to possess a higher degree of micro-level coverage of the code. The Halstead metric suite offers coverage at the line level and has been extensively employed across diverse domains such as fault prediction, quality assessment, and similarity approximation for the past three decades. In this article, we propose to decompose Halstead base metrics and evaluate their fault prediction capability. The Halstead base metrics consist of operators and operands. In the context of the Java language, we partition operators into five distinct categories, i.e., assignment operators, arithmetic operators, logical operators, relational operators, and all other types of operators. Similarly, operands are classified into two classes: constants and variables. For the purpose of empirical evaluation, two experiments were designed. In the first experiment, the Halstead base metrics were used along with McCabe, Lines of Code (LoC), and Halstead-derived metrics as predictors. In the second experiment, decomposed Halstead base metrics were used along with McCabe, LoC, and Halstead-derived metrics. Five public datasets were selected for the experiments. The ML classifiers used included logistic regression, naïve Bayes, decision tree, multilayer perceptron, random forest, and support vector machines. The ML classifiers’ effectiveness was assessed through metrics such as accuracy, F-measure, and AUC. Accuracy saw an enhancement from 0.82 to 0.97, while F-measure exhibited improvement from 0.81 to 0.99. Correspondingly, the AUC value advanced from 0.79 to 0.99. These findings highlight the superior performance of decomposed Halstead metrics, as opposed to the original Halstead base metrics, in predicting faults across all datasets.Early prediction of student performance in CS1 programming courseshttps://peerj.com/articles/cs-16552023-10-312023-10-31Jose LlanosVíctor A. BucheliFelipe Restrepo-Calle
There is a high failure rate and low academic performance observed in programming courses. To address these issues, it is crucial to predict student performance at an early stage. This allows teachers to provide timely support and interventions to help students achieve their learning objectives. The prediction of student performance has gained significant attention, with researchers focusing on machine learning features and algorithms to improve predictions. This article proposes a model for predicting student performance in a 16-week CS1 programming course, specifically in weeks 3, 5, and 7. The model utilizes three key factors: grades, delivery time, and the number of attempts made by students in programming labs and an exam. Eight classification algorithms were employed to train and evaluate the model, with performance assessed using metrics such as accuracy, recall, F1 score, and AUC. In week 3, the gradient boosting classifier (GBC) achieved the best results with an F1 score of 86%, followed closely by the random forest classifier (RFC) with 83%. These findings demonstrate the potential of the proposed model in accurately predicting student performance.
There is a high failure rate and low academic performance observed in programming courses. To address these issues, it is crucial to predict student performance at an early stage. This allows teachers to provide timely support and interventions to help students achieve their learning objectives. The prediction of student performance has gained significant attention, with researchers focusing on machine learning features and algorithms to improve predictions. This article proposes a model for predicting student performance in a 16-week CS1 programming course, specifically in weeks 3, 5, and 7. The model utilizes three key factors: grades, delivery time, and the number of attempts made by students in programming labs and an exam. Eight classification algorithms were employed to train and evaluate the model, with performance assessed using metrics such as accuracy, recall, F1 score, and AUC. In week 3, the gradient boosting classifier (GBC) achieved the best results with an F1 score of 86%, followed closely by the random forest classifier (RFC) with 83%. These findings demonstrate the potential of the proposed model in accurately predicting student performance.An ensemble learning based IDS using Voting rule: VEL-IDShttps://peerj.com/articles/cs-15532023-09-292023-09-29Sura EmanetGozde Karatas BaydogmusOnder Demir
Intrusion detection systems (IDSs) analyze internet activities and traffic to detect potential attacks, thereby safeguarding computer systems. In this study, researchers focused on developing an advanced IDS that achieves high accuracy through the application of feature selection and ensemble learning methods. The utilization of the CIC-CSE-IDS2018 dataset for training and testing purposes adds relevance to the study. The study comprised two key stages, each contributing to its significance. In the first stage, the researchers reduced the dataset through strategic feature selection and carefully selected algorithms for ensemble learning. This process optimizes the IDS’s performance by selecting the most informative features and leveraging the strengths of different classifiers. In the second stage, the ensemble learning approach was implemented, resulting in a powerful model that combines the benefits of multiple algorithms. The results of the study demonstrate its impact on improving attack detection and reducing detection time. By applying techniques such as Spearman’s correlation analysis, recursive feature elimination (RFE), and chi-square test methods, the researchers identified key features that enhance the IDS’s performance. Furthermore, the comparison of different classifiers showcased the effectiveness of models such as extra trees, decision trees, and logistic regression. These models not only achieved high accuracy rates but also considered the practical aspect of execution time. The study’s overall significance lies in its contribution to advancing IDS capabilities and improving computer security. By adopting an ensemble learning approach and carefully selecting features and classifiers, the researchers created a model that outperforms individual classifier approaches. This model, with its high accuracy rate, further validates the effectiveness of ensemble learning in enhancing IDS performance. The findings of this study have the potential to drive future developments in intrusion detection systems and have a tangible impact on ensuring robust computer security in various domains.
Intrusion detection systems (IDSs) analyze internet activities and traffic to detect potential attacks, thereby safeguarding computer systems. In this study, researchers focused on developing an advanced IDS that achieves high accuracy through the application of feature selection and ensemble learning methods. The utilization of the CIC-CSE-IDS2018 dataset for training and testing purposes adds relevance to the study. The study comprised two key stages, each contributing to its significance. In the first stage, the researchers reduced the dataset through strategic feature selection and carefully selected algorithms for ensemble learning. This process optimizes the IDS’s performance by selecting the most informative features and leveraging the strengths of different classifiers. In the second stage, the ensemble learning approach was implemented, resulting in a powerful model that combines the benefits of multiple algorithms. The results of the study demonstrate its impact on improving attack detection and reducing detection time. By applying techniques such as Spearman’s correlation analysis, recursive feature elimination (RFE), and chi-square test methods, the researchers identified key features that enhance the IDS’s performance. Furthermore, the comparison of different classifiers showcased the effectiveness of models such as extra trees, decision trees, and logistic regression. These models not only achieved high accuracy rates but also considered the practical aspect of execution time. The study’s overall significance lies in its contribution to advancing IDS capabilities and improving computer security. By adopting an ensemble learning approach and carefully selecting features and classifiers, the researchers created a model that outperforms individual classifier approaches. This model, with its high accuracy rate, further validates the effectiveness of ensemble learning in enhancing IDS performance. The findings of this study have the potential to drive future developments in intrusion detection systems and have a tangible impact on ensuring robust computer security in various domains.PyMC: a modern, and comprehensive probabilistic programming framework in Pythonhttps://peerj.com/articles/cs-15162023-09-012023-09-01Oriol Abril-PlaVirgile AndreaniColin CarrollLarry DongChristopher J. FonnesbeckMaxim KochurovRavin KumarJunpeng LaoChristian C. LuhmannOsvaldo A. MartinMichael OsthegeRicardo VieiraThomas WieckiRobert Zinkov
PyMC is a probabilistic programming library for Python that provides tools for constructing and fitting Bayesian models. It offers an intuitive, readable syntax that is close to the natural syntax statisticians use to describe models. PyMC leverages the symbolic computation library PyTensor, allowing it to be compiled into a variety of computational backends, such as C, JAX, and Numba, which in turn offer access to different computational architectures including CPU, GPU, and TPU. Being a general modeling framework, PyMC supports a variety of models including generalized hierarchical linear regression and classification, time series, ordinary differential equations (ODEs), and non-parametric models such as Gaussian processes (GPs). We demonstrate PyMC’s versatility and ease of use with examples spanning a range of common statistical models. Additionally, we discuss the positive role of PyMC in the development of the open-source ecosystem for probabilistic programming.
PyMC is a probabilistic programming library for Python that provides tools for constructing and fitting Bayesian models. It offers an intuitive, readable syntax that is close to the natural syntax statisticians use to describe models. PyMC leverages the symbolic computation library PyTensor, allowing it to be compiled into a variety of computational backends, such as C, JAX, and Numba, which in turn offer access to different computational architectures including CPU, GPU, and TPU. Being a general modeling framework, PyMC supports a variety of models including generalized hierarchical linear regression and classification, time series, ordinary differential equations (ODEs), and non-parametric models such as Gaussian processes (GPs). We demonstrate PyMC’s versatility and ease of use with examples spanning a range of common statistical models. Additionally, we discuss the positive role of PyMC in the development of the open-source ecosystem for probabilistic programming.Benefits, challenges, and usability evaluation of DeloreanJS: a back-in-time debugger for JavaScripthttps://peerj.com/articles/cs-12382023-02-242023-02-24Paul LegerFelipe RuizHiroaki FukudaNicolás Cardozo
JavaScript Web applications are a common product in industry. As with most applications, Web applications can acquire software flaws (known as bugs), whose symptoms are seen during the development stage and, even worse, in production. The use of debuggers is beneficial for detecting bugs. Unfortunately, most JavaScript debuggers (1) only support the “step into/through” feature in an execution program to detect a bug, and (2) do not allow developers to go back-in-time at the application execution to take actions to detect the bug accurately. For example, the second limitation does not allow developers to modify the value of a variable to fix a bug while the application is running or test if the same bug is triggered with other values of that variable. Using concepts such as continuations and static analysis, this article presents a usable debugger for JavaScript, named DeloreanJS, which enables developers to go back-in-time in different execution points and resume the execution of a Web application to improve the understanding of a bug, or even experiment with hypothetical scenarios around the bug. Using an online and available version, we illustrate the benefits of DeloreanJS through five examples of bugs in JavaScript. Although DeloreanJS is developed for JavaScript, a dynamic prototype-based object model with side effects (mutable variables), we discuss our proposal with the state-of-art/practice of debuggers in terms of features. For example, modern browsers like Mozilla Firefox include a debugger in their distribution that only support for the breakpoint feature. However DeloreanJS uses a graphical user interface that considers back-in-time features. The aim of this study is to evaluate and compare the usability of DeloreanJS and Mozilla Firefox’s debugger using the system usability scale approach. We requested 30 undergraduate students from two computer science programs to solve five tasks. Among the findings, we highlight two results. First, we found that 100% (15) of participants recommended DeloreanJS, and only 53% (eight) recommended Firefox’s debugger to complete the tasks. Second, whereas the average score for DeloreanJS is 71.6 (“Good”), the average score for Firefox’s debugger is 55.8 (“Acceptable”).
JavaScript Web applications are a common product in industry. As with most applications, Web applications can acquire software flaws (known as bugs), whose symptoms are seen during the development stage and, even worse, in production. The use of debuggers is beneficial for detecting bugs. Unfortunately, most JavaScript debuggers (1) only support the “step into/through” feature in an execution program to detect a bug, and (2) do not allow developers to go back-in-time at the application execution to take actions to detect the bug accurately. For example, the second limitation does not allow developers to modify the value of a variable to fix a bug while the application is running or test if the same bug is triggered with other values of that variable. Using concepts such as continuations and static analysis, this article presents a usable debugger for JavaScript, named DeloreanJS, which enables developers to go back-in-time in different execution points and resume the execution of a Web application to improve the understanding of a bug, or even experiment with hypothetical scenarios around the bug. Using an online and available version, we illustrate the benefits of DeloreanJS through five examples of bugs in JavaScript. Although DeloreanJS is developed for JavaScript, a dynamic prototype-based object model with side effects (mutable variables), we discuss our proposal with the state-of-art/practice of debuggers in terms of features. For example, modern browsers like Mozilla Firefox include a debugger in their distribution that only support for the breakpoint feature. However DeloreanJS uses a graphical user interface that considers back-in-time features. The aim of this study is to evaluate and compare the usability of DeloreanJS and Mozilla Firefox’s debugger using the system usability scale approach. We requested 30 undergraduate students from two computer science programs to solve five tasks. Among the findings, we highlight two results. First, we found that 100% (15) of participants recommended DeloreanJS, and only 53% (eight) recommended Firefox’s debugger to complete the tasks. Second, whereas the average score for DeloreanJS is 71.6 (“Good”), the average score for Firefox’s debugger is 55.8 (“Acceptable”).Modularization in Belief-Desire-Intention agent programming and artifact-based environmentshttps://peerj.com/articles/cs-11622022-12-012022-12-01Gustavo Ortiz-HernándezAlejandro Guerra-HernándezJomi F. HübnerWulfrano Arturo Luna-Ramírez
This article proposes an extension for the Agents and Artifacts meta-model to enable modularization. We adopt the Belief-Desire-Intention (BDI) model of agency to represent independent and reusable units of code by means of modules. The key idea behind our proposal is to take advantage of the syntactic notion of namespace, i.e., a unique symbol identifier to organize a set of programming elements. On this basis, agents can decide in BDI terms which beliefs, goals, events, percepts and actions will be independently handled by a particular module. The practical feasibility of this approach is demonstrated by developing an auction scenario, where source code enhances scores of coupling, cohesion and complexity metrics, when compared against a non-modular version of the scenario. Our solution allows to address the name-collision issue, provides a use interface for modules that follows the information hiding principle, and promotes software engineering principles related to modularization such as reusability, extensibility and maintainability. Differently from others, our solution allows to encapsulate environment components into modules as it remains independent from a particular BDI agent-oriented programming language.
This article proposes an extension for the Agents and Artifacts meta-model to enable modularization. We adopt the Belief-Desire-Intention (BDI) model of agency to represent independent and reusable units of code by means of modules. The key idea behind our proposal is to take advantage of the syntactic notion of namespace, i.e., a unique symbol identifier to organize a set of programming elements. On this basis, agents can decide in BDI terms which beliefs, goals, events, percepts and actions will be independently handled by a particular module. The practical feasibility of this approach is demonstrated by developing an auction scenario, where source code enhances scores of coupling, cohesion and complexity metrics, when compared against a non-modular version of the scenario. Our solution allows to address the name-collision issue, provides a use interface for modules that follows the information hiding principle, and promotes software engineering principles related to modularization such as reusability, extensibility and maintainability. Differently from others, our solution allows to encapsulate environment components into modules as it remains independent from a particular BDI agent-oriented programming language.FCMpy: a python module for constructing and analyzing fuzzy cognitive mapshttps://peerj.com/articles/cs-10782022-09-232022-09-23Samvel MkhitaryanPhilippe GiabbanelliMaciej K WozniakGonzalo NápolesNanne De VriesRik Crutzen
FCMpy is an open-source Python module for building and analyzing Fuzzy Cognitive Maps (FCMs). The module provides tools for end-to-end projects involving FCMs. It is able to derive fuzzy causal weights from qualitative data or simulating the system behavior. Additionally, it includes machine learning algorithms (e.g., Nonlinear Hebbian Learning, Active Hebbian Learning, Genetic Algorithms, and Deterministic Learning) to adjust the FCM causal weight matrix and to solve classification problems. Finally, users can easily implement scenario analysis by simulating hypothetical interventions (i.e., analyzing what-if scenarios). FCMpy is the first open-source module that contains all the functionalities necessary for FCM oriented projects. This work aims to enable researchers from different areas, such as psychology, cognitive science, or engineering, to easily and efficiently develop and test their FCM models without the need for extensive programming knowledge.
FCMpy is an open-source Python module for building and analyzing Fuzzy Cognitive Maps (FCMs). The module provides tools for end-to-end projects involving FCMs. It is able to derive fuzzy causal weights from qualitative data or simulating the system behavior. Additionally, it includes machine learning algorithms (e.g., Nonlinear Hebbian Learning, Active Hebbian Learning, Genetic Algorithms, and Deterministic Learning) to adjust the FCM causal weight matrix and to solve classification problems. Finally, users can easily implement scenario analysis by simulating hypothetical interventions (i.e., analyzing what-if scenarios). FCMpy is the first open-source module that contains all the functionalities necessary for FCM oriented projects. This work aims to enable researchers from different areas, such as psychology, cognitive science, or engineering, to easily and efficiently develop and test their FCM models without the need for extensive programming knowledge.