Profiling waitlisted incoming students for future delinquency with an ensemble of statistical machine learning algorithms
- Published
- Accepted
- Subject Areas
- Data Mining and Machine Learning
- Keywords
- Profiling, Delinquency, Student, Waitlisted
- Copyright
- © 2017 Lauron et al.
- Licence
- This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
- Cite this article
- 2017. Profiling waitlisted incoming students for future delinquency with an ensemble of statistical machine learning algorithms. PeerJ Preprints 5:e3312v1 https://doi.org/10.7287/peerj.preprints.3312v1
Abstract
Given a dataset \(\mathcal{R}=\{R_1, R_2, \dots, R_r\}\) of \(r\)~records of waitlisted incoming freshman students (WIFS), where for any \(i=1, 2, \dots, r\), \(R_i\) is a \((m+1)\)--tuple \((O_i, P_i^{(1)}, P_i^{(2)}, \dots, P_i^{(m)})\), \(O_i\) is any one in a set \(\mathcal{O}=\{O_1, O_2, \dots, O_o\}\) of \(o\)~classes, and \(P_i^{(1)}, P_i^{(2)}, \dots, P_i^{(m)}\) are \(m\)~potential predictors for~\(O_i\). Our purpose is to find a statistical machine learning algorithm (SMLA) \(\mathbb{A}\) such that \(V_i=\mathbb{A}(P_i^{(1)}, P_i^{(2)}, \dots, P_i^{(m)})\), where \(V_i\) is a predicted class by~\(\mathbb{A}\) that was developed using \(n\le m\) correct number of predictors for \(O\in\mathcal{O}\), and \(\mathbb{A}\)~is the best algorithm such that the metric \(v^{-1}\sum_{i=1}^v |O_i - V_i|\) is minimum across \(v<r\)~records in the validation set \(\mathcal{V}\subset\mathcal{R}\). Our problem is to find the subset \(\{P_i^{(1)}, P_i^{(2)}, \dots, P_i^{(n)}\}\) and to train \(\mathbb{A}\)~using \(t<r\) records from the training set \(\mathcal{T}\subset\mathcal{R}\), such that \(\mathcal{T}\cap\mathcal{V}=\emptyset\), so that \(\mathbb{A}\)~can predict whether a WIFS trying to enter an undergraduate program at UPLB will incur at least a ``delinquency'' once the student is accepted into the program. The \(\mathbb{A}\)~can be a useful decision-support tool for UPLB deans and college secretaries in deciding whether a WIFS will be accepted into the program or not.
Author Comment
Submitted and accepted as contributed paper to the 18th National Student-Faculty Conference on the Statistical Sciences (SFCon-Stat 2017), SEARCA, Los Banos, Laguna, Philippines, 16 October 2017.