Utilizing grid search cross-validation with adaptive boosting for augmenting performance of machine learning models

View article
PeerJ Computer Science

Main article text

 

Introduction

  • Comparing the performance of various ML and deep learning (DL) algorithms to predict students’ performance.

  • Improving the performance of those ML algorithms (DT and FFNN) that show inferior performance results in performance, accuracy, precision, recall, and F1 score.

  • Using various techniques such grid search cross-validation, adaptive boosting, extreme gradient boosting, early stopping, feature engineering, and dropping inactive neurons to improve ML algorithm performance.

  • Using a Decision Tree (DT) ML algorithm in determining the feature weights of students in predicting their performance.

  • Classifying students according to their performance and study behavior features (Withdrawn, Fail, Pass, Distinction).

  • Recommending tailored learning paths to students according to their feature weights.

Literature Review

Data Description and Preprocessing

Processing the raw dataset

Dropping null values

Converting object data types features to categorical features

Scaling features

where x is the feature, μ is the feature mean, and σ is the feature standard deviation. Feature scaling helps the ML model in speeding up the training process and generating accurate results.

Performing one-hot encoding

Multiclass Classification Models Training and Performance Evaluation

Methodology

Decision tree (DT)

Improving DT model classifier performance using grid search cross-validation

Using ensembling to increase DT model classifier performance

Adaptive boosting (AdaBoost)

eXtreme gradient boosting (XGBoost)

Assessing XGBoost model with learning curves

Early stopping with XGBoost model training

Feed-forward neural networks (FFNN)

DFFNN early stopping during the training process

Increasing DFFNN performance by dropping inactive neurons

Feature engineering: merging grades to increase DFFNN performance

Discussion and Limitations

  • The models may suffer from poor transfer learning ability and integration.

  • The models require structure training data and hand-crafted features.

  • Every model may require a special training process and planning.

  • The ML models may suffer from a slow learning process in real-time.

Conclusion and Future Work

Supplemental Information

Students online activities count during VLE interaction.

DOI: 10.7717/peerj-cs.803/supp-1

Assessments score relationship with the students final result.

DOI: 10.7717/peerj-cs.803/supp-2

Gender wise final result of students with VLE clickstreams.

DOI: 10.7717/peerj-cs.803/supp-3

Confusion matrix for the DT model.

DOI: 10.7717/peerj-cs.803/supp-4

Merging VLE tables for the ML models input vectors.

DOI: 10.7717/peerj-cs.803/supp-5

DFFNN confusion matrix for distinction, fail, pass, withdrawn grades.

DOI: 10.7717/peerj-cs.803/supp-6

Gender wise prior education and assessments relationship.

DOI: 10.7717/peerj-cs.803/supp-7

Additional Information and Declarations

Competing Interests

The authors declare that they have no competing interests.

Author Contributions

Muhammad Adnan conceived and designed the experiments, performed the computation work, prepared figures and/or tables, and approved the final draft.

Alaa Abdul Salam Alarood conceived and designed the experiments, performed the experiments, performed the computation work, authored or reviewed drafts of the paper, and approved the final draft.

M. Irfan Uddin conceived and designed the experiments, performed the experiments, analyzed the data, performed the computation work, prepared figures and/or tables, and approved the final draft.

Izaz ur Rehman analyzed the data, performed the computation work, authored or reviewed drafts of the paper, and approved the final draft.

Data Availability

The following information was supplied regarding data availability:

The data is available at https://analyse.kmi.open.ac.uk/open_dataset.

The code of the model is available at Kaggle: https://www.kaggle.com/adnankust/utilizing-grid-search-cross-validation.

Funding

The authors received no funding for this work.

64 Citations 3,576 Views 313 Downloads

Your institution may have Open Access funds available for qualifying authors. See if you qualify

Publish for free

Comment on Articles or Preprints and we'll waive your author fee
Learn more

Five new journals in Chemistry

Free to publish • Peer-reviewed • From PeerJ
Find out more