Predicting judging-perceiving of Myers-Briggs Type Indicator (MBTI) in online social forum

View article
Brain, Cognition and Mental Health

Main article text

 

Introduction

Methodology

Tools and resources

Dataset

  1. Publicly available dataset of substantial size.

  2. Dataset which is not based on microblogs (microblog data needs to be handled differently)

  3. Cited at least twice to establish reference for comparison.

Data preprocessing

Data cleaning

Tokenization, lemmatization and punctuation removal

  • No removal of noun or stop words

  • Removal of noun

  • Removal of stop words

  • Removal of noun and stop

Feature extractions and dimensionality reduction

Character-level TF and TF-IDF

Word-Level TF-IDF

Classification and model validation

  • Character-level TF (1500 attributes)

  • Character-level TFIDF (1500 attributes)

  • Word-level TF (1500 attributes)

  • Word-level TFIDF (1500 attributes)

  • LIWC (78 attributes)

  • Combo (Combination of all features which comprise of Character-level TF, Character- level TFIDF, Word-level TF, Word-level TFIDF and LIWC) (6078)

Results & Discussion

Results

Benchmarking with previous researches

Conclusion & Future Work

Additional Information and Declarations

Competing Interests

The authors declare there are no competing interests.

Author Contributions

En Jun Choong conceived and designed the experiments, performed the experiments, analyzed the data, prepared figures and/or tables, authored or reviewed drafts of the paper, and approved the final draft.

Kasturi Dewi Varathan conceived and designed the experiments, analyzed the data, authored or reviewed drafts of the paper, and approved the final draft.

Data Availability

The following information was supplied regarding data availability:

The source code is available at GitHub: https://github.com/EnJunChoong/MBTI_JP_Prediction.

The datasets are available at:

Choong, EnJun (2021): 8k MBTI Dataset From Personality Cafe. figshare. Dataset. https://doi.org/10.6084/m9.figshare.14587572.v1

This data came from Kaggle: https://www.kaggle.com/datasnaek/mbti-type.

The Twitter<sup>1<sup>dataset from Plank & Hovy, 2015 (Corpus of 1.2M English tweets (1,500 authors) annotated for gender and MBTI) is available at https://bitbucket.org/bplank/wassa2015/src/master/

The Twitter<sup>2<sup>dataset from Verhoeven, Plank & Daelemans (2016) (TwiSty is a corpus developed for research in author profiling. It contains personality (MBTI) and gender annotations for a total of 18,168 authors spanning six languages) is availble at: https://www.uantwerpen.be/en/research-groups/clips/research/datasets/

The Kaggle<sup>3<sup>dataset used in our research is available at Kaggle: https://www.kaggle.com/datasnaek/mbti-type/metadata.

The Reddit<sup>4<sup>dataset (MBTI9k is a dataset of Reddit posts and comments labeled with MBTI personality types) is available upon request at: http://takelab.fer.hr/data/mbti.

Funding

This work was supported by the Impact Oriented Interdisciplinary Research Grant University of Malaya (Project Code: IIRG001A-19SAH). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Appendix A

 

See Table A1A4

Table A1:
Five-fold cross validation result average for Kaggle dataset without MBTI keyword removed.
Features Dataset Kaggle Kaggle_noNN Kaggle_noNNnoSW Kaggle_noSW
Models cnb lgb lgr rf svm cnb lgb lgr rf svm cnb lgb lgr rf svm cnb lgb lgr rf svm
combo Acc 75.58 81.68 69.19 78.14 80.13 68.60 74.57 63.97 70.33 71.43 68.82 74.86 64.98 70.96 72.37 74.96 81.41 70.01 78.42 79.82
F1-macro 74.57 80.77 68.10 75.04 79.21 67.62 73.34 62.65 63.43 70.22 67.80 73.53 63.82 64.69 71.25 73.89 80.51 68.93 75.47 78.94
AUROC 81.51 88.86 74.68 86.83 86.82 74.08 81.25 67.75 77.82 77.79 74.04 81.53 69.34 78.80 78.26 80.96 88.80 76.03 87.08 86.83
Precision-macro 74.48 80.91 68.00 80.46 79.24 67.49 73.45 62.59 73.81 70.18 67.68 73.76 63.74 74.05 71.18 73.83 80.59 68.81 80.49 78.89
Recall-macro 74.68 80.65 68.31 74.02 79.19 67.93 73.29 62.78 64.05 70.30 68.08 73.37 64.05 65.00 71.37 73.96 80.44 69.13 74.44 79.00
char_tf Acc 67.23 81.66 73.16 76.30 75.20 66.01 74.25 66.53 69.33 69.86 65.57 73.81 66.23 68.77 69.71 66.61 81.16 73.09 75.28 75.01
F1-macro 66.52 80.76 72.31 72.16 74.44 65.22 73.00 65.67 61.63 68.94 64.80 72.59 65.37 60.89 68.83 65.87 80.29 72.26 70.78 74.26
AUROC 72.91 88.50 80.60 85.19 82.57 71.30 80.72 72.47 76.40 75.92 70.71 80.69 71.91 75.44 75.60 72.22 88.44 80.21 85.04 82.34
Precision-macro 66.49 80.88 72.14 79.83 74.23 65.18 73.12 65.60 73.08 68.79 64.78 72.66 65.31 72.08 68.70 65.84 80.32 72.08 78.83 74.07
Recall-macro 67.12 80.68 72.66 71.37 74.85 65.73 72.95 66.12 62.71 69.28 65.34 72.57 65.82 62.11 69.22 66.44 80.27 72.64 70.15 74.71
char_tfidf Acc 67.13 81.22 73.60 76.14 75.58 65.97 74.35 66.61 68.98 69.70 65.75 73.81 66.30 69.12 69.40 66.57 80.95 73.03 76.29 75.19
F1-macro 66.44 80.29 72.76 71.89 74.86 65.20 73.08 65.75 60.92 68.84 65.02 72.48 65.41 61.26 68.54 65.90 80.03 72.20 72.20 74.48
AUROC 73.09 88.43 80.66 85.42 82.84 71.23 80.42 72.64 76.71 75.92 70.74 80.36 71.93 76.70 75.66 72.40 88.26 80.26 85.20 82.63
Precision-macro 66.43 80.42 72.58 79.84 74.65 65.16 73.21 65.68 72.92 68.70 65.02 72.65 65.34 72.79 68.41 65.91 80.12 72.03 79.70 74.30
Recall-macro 67.07 80.19 73.11 71.13 75.33 65.73 73.00 66.19 62.21 69.25 65.60 72.38 65.83 62.44 68.96 66.54 79.96 72.59 71.40 75.00
word_tf Acc 68.94 80.17 69.70 71.18 73.35 63.60 72.61 63.70 63.64 65.38 65.96 73.88 66.02 66.98 69.24 67.87 81.15 70.63 72.91 74.33
F1-macro 68.16 79.20 68.83 63.78 72.33 62.84 71.30 62.77 48.69 64.23 65.12 72.80 65.06 57.16 67.98 66.99 80.31 69.78 66.89 73.36
AUROC 74.63 87.44 75.80 80.56 79.49 68.45 78.27 68.47 69.52 70.52 71.23 80.27 71.69 74.03 74.43 73.76 88.06 76.83 82.91 81.06
Precision-macro 68.07 79.29 68.70 76.91 72.22 62.87 71.37 62.73 68.96 64.13 65.06 72.74 64.97 70.79 67.91 66.88 80.28 69.64 77.55 73.23
Recall-macro 68.69 79.12 69.23 64.56 72.52 63.38 71.27 63.15 54.77 64.45 65.58 72.91 65.41 59.61 68.09 67.40 80.35 70.21 66.95 73.56
word_tfidf Acc 68.80 79.90 69.85 70.75 73.35 63.73 72.31 63.47 63.88 65.08 66.15 74.19 66.09 67.85 69.33 67.86 80.53 70.48 72.66 74.37
F1-macro 68.04 78.94 68.99 63.17 72.37 62.93 70.99 62.54 49.84 64.06 65.31 73.05 65.13 58.67 68.15 66.99 79.65 69.64 66.35 73.43
AUROC 74.59 87.19 75.84 80.87 79.54 68.45 78.36 68.48 68.72 70.62 71.28 80.53 71.70 75.51 74.52 73.82 87.67 76.93 83.25 81.16
Precision-macro 67.97 79.01 68.86 76.19 72.25 62.95 71.04 62.51 67.84 63.97 65.24 73.04 65.04 72.08 68.05 66.88 79.63 69.49 77.82 73.29
Recall-macro 68.60 78.90 69.40 64.09 72.59 63.44 70.96 62.93 55.24 64.36 65.77 73.07 65.49 60.66 68.30 67.42 79.68 70.07 66.52 73.69
LIWC Acc 56.65 57.40 58.16 60.69 57.84 56.28 56.73 57.49 60.48 57.25 55.84 56.74 57.30 60.47 57.24 56.44 57.73 58.27 60.54 58.27
F1-macro 55.98 56.23 57.45 46.74 57.18 55.69 55.71 56.84 46.37 56.75 55.33 55.75 56.63 45.63 56.75 55.95 56.68 57.63 46.53 57.67
AUROC 59.78 59.73 61.38 57.37 61.21 59.01 59.30 60.52 57.47 60.30 58.04 58.66 60.04 56.23 60.24 58.96 59.15 61.18 57.50 61.16
Precision-macro 56.54 56.45 58.01 52.29 57.78 56.34 56.01 57.45 52.04 57.50 56.05 56.07 57.21 51.82 57.55 56.71 56.99 58.25 52.13 58.32
Recall-macro 67.23 81.66 73.16 76.30 75.20 66.01 74.25 66.53 69.33 69.86 65.57 73.81 66.23 68.77 69.71 66.61 81.16 73.09 75.28 75.01
DOI: 10.7717/peerj.11382/table-A1

Notes:

* cnb = Complement Naïve Bayes, lgb = Light GBM, lgr = Logistic Regression, rf= Random Forest, svm = Support Vector Machine.

** combo is the combination of char_tf, char_tfidf, word_tf, word_tfidf and LIWC.

Table A2:
Five-fold cross validation result average for Kaggle-Filtered dataset with MBTI keyword removed.
Feature Dataset Kaggle-Filtered Kaggle-Filtered_noNN Kaggle-Filtered_noNNnoSW Kaggle-Filtered_noSW
Models cnb lgb lgr rf svm cnb lgb lgr rf svm cnb lgb lgr rf svm cnb lgb lgr rf svm
combo Acc 61.22 66.26 59.28 62.01 64.04 59.66 61.47 56.66 61.09 60.95 59.59 61.83 57.99 60.81 60.61 60.68 65.89 60.65 61.90 63.78
F1-macro 60.27 63.83 58.04 45.84 62.84 58.87 59.00 55.26 43.35 59.89 58.69 59.59 56.67 42.98 59.67 59.60 63.56 59.38 45.84 62.66
AUROC 64.21 69.19 61.50 62.00 68.07 62.98 63.43 57.49 59.04 63.80 62.67 63.40 59.76 59.83 63.67 63.87 69.01 63.45 62.21 68.15
Precision-macro 60.28 64.45 58.02 62.79 62.77 58.98 59.32 55.27 59.27 59.88 58.76 59.82 56.67 57.76 59.70 59.58 64.06 59.35 62.04 62.58
Recall-macro 60.63 63.64 58.23 52.91 63.04 59.35 58.95 55.39 51.66 60.18 59.09 59.54 56.83 51.38 60.05 59.87 63.39 59.57 52.83 62.91
char_tf Acc 60.02 63.77 60.38 61.16 62.57 58.70 61.44 57.77 60.63 60.10 58.69 61.20 57.69 60.62 60.09 59.53 64.21 60.66 61.43 62.10
F1-macro 59.19 61.38 59.44 42.66 61.57 57.91 59.10 56.81 41.48 59.16 57.93 58.64 56.67 41.69 59.20 58.77 61.74 59.71 43.43 61.14
AUROC 63.21 66.61 64.03 59.07 66.73 62.10 62.53 59.57 57.79 63.43 61.70 62.12 59.49 57.25 63.15 62.99 66.60 63.92 60.61 66.44
Precision-macro 59.29 61.76 59.49 60.16 61.54 58.05 59.39 56.89 57.36 59.20 58.09 58.96 56.74 56.83 59.27 58.91 62.26 59.74 61.84 61.13
Recall-macro 59.65 61.26 59.82 51.55 61.91 58.38 59.10 57.14 50.91 59.54 58.42 58.58 56.98 50.95 59.61 59.28 61.63 60.08 51.92 61.50
char_tfidf Acc 59.82 64.50 60.50 61.36 62.31 58.63 61.26 57.24 60.92 59.81 58.60 61.17 57.32 60.47 59.65 59.55 63.83 60.94 61.43 62.24
F1-macro 59.08 62.01 59.55 43.82 61.47 57.87 59.09 56.29 42.79 59.04 57.86 58.68 56.34 42.41 58.87 58.85 61.06 59.99 43.97 61.37
AUROC 63.18 66.42 64.02 59.09 66.74 62.05 62.92 59.59 57.94 63.33 61.71 62.19 59.53 57.69 63.05 62.98 65.99 63.96 59.50 66.46
Precision-macro 59.24 62.50 59.59 60.80 61.50 58.04 59.30 56.40 58.54 59.18 58.04 59.00 56.42 55.67 59.01 59.03 61.69 60.01 61.22 61.39
Recall-macro 59.62 61.86 59.92 51.96 61.92 58.38 59.07 56.64 51.41 59.56 58.38 58.65 56.66 51.00 59.38 59.42 60.92 60.36 52.05 61.80
word_tf Acc 61.13 65.80 60.30 61.22 61.76 59.25 60.03 56.32 60.29 59.62 60.24 60.65 58.69 60.99 60.70 62.30 65.52 61.21 62.12 63.88
F1-macro 60.33 63.99 59.32 42.70 60.74 58.45 58.60 55.32 40.61 58.57 59.37 59.22 57.74 43.86 59.69 61.32 63.91 60.22 46.13 62.86
AUROC 64.36 69.11 63.30 60.13 65.73 62.45 62.27 58.22 56.91 62.33 63.55 63.93 61.10 59.25 64.10 65.49 69.35 65.44 62.85 67.81
Precision-macro 60.40 64.13 59.36 61.20 60.73 58.58 58.56 55.43 53.95 58.59 59.44 59.23 57.80 58.20 59.70 61.30 63.94 60.24 63.17 62.82
Recall-macro 60.81 63.92 59.68 51.61 61.07 58.93 58.71 55.63 50.47 58.87 59.80 59.37 58.09 51.73 60.03 61.67 63.90 60.58 53.06 63.19
word_tfidf Acc 60.95 64.85 60.02 61.20 61.63 59.23 59.64 56.30 60.76 59.31 60.33 60.76 58.64 61.16 60.79 62.31 65.27 61.25 62.31 63.75
F1-macro 60.13 62.75 59.03 44.09 60.83 58.44 57.93 55.35 42.42 58.46 59.48 59.15 57.71 44.75 59.93 61.36 63.46 60.27 47.16 62.83
AUROC 64.35 68.08 63.30 60.53 65.81 62.47 61.66 58.19 57.30 62.41 63.58 63.19 61.12 60.37 64.19 65.51 69.02 65.45 62.74 67.90
Precision-macro 60.20 63.06 59.07 59.56 60.90 58.57 57.93 55.47 57.61 58.55 59.55 59.14 57.78 58.84 59.99 61.35 63.60 60.28 62.86 62.83
Recall-macro 60.60 62.65 59.38 51.92 61.33 58.92 57.96 55.68 51.21 58.88 59.92 59.20 58.07 52.05 60.38 61.74 63.40 60.63 53.47 63.25
LIWC Acc 56.58 57.08 58.19 60.65 58.02 56.45 57.59 57.65 59.77 57.42 55.55 56.94 57.57 60.29 57.76 56.15 58.24 58.28 60.55 57.95
F1-macro 55.88 55.99 57.51 46.46 57.34 55.89 56.45 57.01 45.62 56.85 55.10 55.65 56.91 45.72 57.20 55.68 56.89 57.63 46.32 57.36
AUROC 59.84 59.37 61.44 58.10 61.39 59.37 59.30 60.93 57.50 60.77 57.96 58.08 60.15 56.81 60.44 58.54 59.74 61.12 57.90 61.22
Precision-macro 56.14 56.04 57.75 56.46 57.60 56.29 56.50 57.29 53.91 57.21 55.64 55.68 57.19 55.14 57.56 56.19 56.90 57.89 56.08 57.70
Recall-macro 56.40 56.25 58.08 52.17 57.92 56.57 56.70 57.61 51.36 57.54 55.90 55.83 57.51 51.73 57.90 56.47 57.06 58.24 52.08 58.04
DOI: 10.7717/peerj.11382/table-A2

Notes:

* cnb = Complement Naïve Bayes, lgb = Light GBM, lgr = Logistic Regression, rf= Random Forest, svm = Support Vector Machine.

** combo is the combination of char_tf, char_tfidf, word_tf, word_tfidf and LIWC.

Table A3:
Five-fold cross validation result standard deviation for Kaggle dataset without MBTI keyword removed.
Features Dataset Kaggle Kaggle_noNN Kaggle_noNNnoSW Kaggle_noSW
Models cnb lgb lgr rf svm cnb lgb lgr rf svm cnb lgb lgr rf svm cnb lgb lgr rf svm
combo Acc 0.68 1.09 1.41 0.99 0.96 1.52 1.14 0.59 1.19 0.82 1.31 1.28 0.75 0.62 0.80 1.02 0.88 0.76 1.35 0.89
F1-macro 0.76 1.14 1.34 1.27 1.01 1.55 1.09 0.53 1.42 0.77 1.31 1.23 0.76 0.61 0.88 1.11 0.98 0.76 1.65 0.95
AUROC 0.92 1.22 1.28 1.67 0.73 1.11 1.20 0.80 1.80 0.55 1.10 1.36 0.80 1.63 0.67 1.19 1.35 0.70 1.39 0.85
Precision-macro 0.72 1.17 1.36 1.13 1.01 1.54 1.22 0.53 2.26 0.83 1.29 1.37 0.74 1.58 0.85 1.08 0.89 0.76 1.50 0.92
Recall-macro 0.81 1.13 1.27 1.20 1.02 1.56 1.03 0.52 1.22 0.72 1.29 1.15 0.77 0.55 0.95 1.16 1.06 0.76 1.54 1.00
char_tf Acc 0.80 0.96 1.65 1.43 1.52 1.11 1.25 1.53 1.35 1.47 1.29 1.34 0.92 0.88 1.40 0.54 0.80 0.93 0.76 1.36
F1-macro 0.88 1.04 1.63 1.85 1.52 1.12 1.14 1.46 1.74 1.53 1.30 1.22 0.83 1.08 1.44 0.60 0.90 0.96 1.07 1.39
AUROC 0.93 1.25 1.16 2.01 1.20 0.99 1.46 1.44 1.58 1.66 0.78 1.54 1.18 1.49 1.40 0.77 1.22 0.93 1.51 0.93
Precision-macro 0.91 1.01 1.63 1.68 1.51 1.10 1.31 1.41 2.71 1.51 1.26 1.37 0.79 1.81 1.43 0.63 0.81 0.95 0.79 1.38
Recall-macro 0.99 1.12 1.57 1.65 1.48 1.12 1.03 1.40 1.43 1.57 1.30 1.09 0.75 0.91 1.46 0.68 1.03 0.99 0.94 1.42
char_tfidf Acc 0.81 1.36 1.58 0.89 1.34 1.24 1.00 1.59 1.26 1.47 1.18 1.47 1.27 1.31 1.36 0.83 1.07 1.27 0.66 1.17
F1-macro 0.90 1.42 1.58 1.10 1.31 1.28 1.04 1.51 1.62 1.48 1.23 1.40 1.23 1.84 1.40 0.91 1.17 1.26 0.94 1.13
AUROC 0.88 1.37 1.19 1.45 1.17 0.94 1.44 1.54 1.89 1.56 0.72 1.56 1.49 1.08 1.39 0.72 1.48 1.02 1.03 0.89
Precision-macro 0.92 1.45 1.57 1.21 1.30 1.26 1.06 1.46 2.63 1.46 1.22 1.57 1.19 2.26 1.39 0.93 1.10 1.24 0.60 1.12
Recall-macro 0.99 1.42 1.53 0.98 1.24 1.32 1.06 1.43 1.33 1.47 1.28 1.31 1.19 1.47 1.43 1.00 1.25 1.22 0.84 1.06
word_tf Acc 0.35 1.04 1.21 0.90 0.92 1.31 0.83 0.85 0.46 1.18 1.65 1.09 1.74 0.96 1.47 0.85 2.00 1.12 1.30 1.47
F1-macro 0.38 1.17 1.16 1.41 0.86 1.34 0.80 0.83 1.18 1.26 1.70 1.09 1.76 2.25 1.68 0.95 2.14 1.19 2.17 1.61
AUROC 0.25 1.35 1.17 1.11 1.03 1.21 1.06 1.10 1.05 1.22 1.37 1.47 1.69 1.14 1.52 1.26 1.65 1.24 1.81 1.44
Precision-macro 0.40 1.05 1.13 1.13 0.90 1.32 0.87 0.81 2.04 1.25 1.68 1.12 1.74 0.64 1.61 0.95 2.06 1.17 0.80 1.55
Recall-macro 0.48 1.28 1.11 1.09 0.78 1.37 0.78 0.83 0.61 1.30 1.73 1.07 1.76 1.45 1.77 1.04 2.24 1.26 1.75 1.69
word_tfidf Acc 0.31 1.30 1.40 1.43 0.95 1.37 0.75 1.23 0.82 1.25 1.69 1.61 1.82 0.70 1.43 0.91 1.55 1.24 0.94 1.53
F1-macro 0.39 1.47 1.35 2.33 0.86 1.38 0.83 1.24 1.61 1.29 1.73 1.64 1.84 1.23 1.59 1.01 1.71 1.26 1.31 1.71
AUROC 0.24 1.26 1.21 1.58 0.93 1.11 1.05 1.16 1.20 1.14 1.35 1.34 1.73 2.29 1.43 1.23 1.48 1.26 1.48 1.39
Precision-macro 0.42 1.33 1.31 1.78 0.91 1.37 0.81 1.23 2.07 1.28 1.71 1.68 1.81 1.40 1.55 1.01 1.59 1.23 1.42 1.62
Recall-macro 0.52 1.62 1.28 1.77 0.75 1.41 0.87 1.26 1.01 1.31 1.76 1.61 1.84 0.86 1.67 1.09 1.84 1.27 1.09 1.85
LIWC Acc 1.66 1.15 1.13 0.79 1.06 1.37 1.10 0.63 0.96 0.86 1.48 1.22 1.17 0.41 0.49 0.76 1.14 0.84 0.67 1.66
F1-macro 1.60 1.08 1.05 1.18 1.00 1.27 1.01 0.66 1.28 0.78 1.45 1.07 1.21 0.19 0.54 0.77 1.17 0.86 0.89 1.63
AUROC 1.76 1.43 1.05 0.77 0.95 1.51 1.46 0.78 1.42 0.60 1.66 1.13 0.97 0.94 1.17 1.03 1.31 1.34 1.09 1.42
Precision-macro 1.50 1.04 0.98 2.34 0.93 1.15 0.95 0.68 2.88 0.65 1.38 0.98 1.21 1.34 0.61 0.78 1.19 0.87 1.93 1.58
Recall-macro 1.56 1.06 1.01 0.85 0.97 1.19 0.97 0.71 0.94 0.68 1.44 0.99 1.27 0.31 0.64 0.82 1.23 0.91 0.69 1.64
DOI: 10.7717/peerj.11382/table-A3

Notes:

* cnb = Complement Naïve Bayes, lgb = Light GBM, lgr = Logistic Regression, rf= Random Forest, svm = Support Vector Machine.

** combo is the combination of char_tf, char_tfidf, word_tf, word_tfidf and LIWC.

Table A4:
Five-fold cross validation result standard deviation for Kaggle-Filtered dataset with MBTI keyword removed.
Feature Dataset Kaggle-Filtered Kaggle-Filtered_noNN Kaggle-Filtered_noNNnoSW Kaggle-Filtered_noSW
Models cnb lgb lgr rf svm cnb lgb lgr rf svm cnb lgb lgr rf svm cnb lgb lgr rf svm
combo Acc 1.00 0.99 1.42 0.35 0.83 1.00 0.78 1.10 0.17 0.85 0.89 1.26 0.97 0.72 0.93 1.38 1.16 0.91 0.28 0.80
F1-macro 0.94 0.88 1.40 0.41 0.84 0.94 0.59 0.99 0.58 0.75 0.87 1.21 0.87 1.17 0.94 1.39 1.23 0.75 0.60 0.81
AUROC 0.93 0.55 1.41 1.27 1.24 0.78 0.77 1.12 1.00 0.89 0.99 0.79 0.88 0.83 1.09 0.82 1.16 1.21 0.62 0.88
Precision-macro 0.90 1.08 1.38 1.81 0.83 0.89 0.62 0.96 1.03 0.71 0.84 1.19 0.85 4.18 0.93 1.36 1.23 0.73 1.21 0.79
Recall-macro 0.91 0.85 1.40 0.35 0.85 0.91 0.56 0.96 0.23 0.69 0.87 1.19 0.83 0.74 0.97 1.40 1.20 0.68 0.33 0.81
char_tf Acc 0.85 0.96 1.79 0.61 1.04 0.83 0.80 0.92 0.48 0.58 1.04 0.92 1.11 0.28 0.66 1.12 0.75 1.70 0.40 0.75
F1-macro 0.85 0.89 1.68 1.33 1.02 0.87 1.14 0.89 0.37 0.64 1.07 0.97 1.01 0.93 0.57 1.14 0.56 1.68 0.71 0.70
AUROC 0.99 0.60 1.38 0.90 0.84 1.32 0.43 1.03 0.73 0.56 1.26 0.95 0.66 1.13 0.56 0.82 0.60 1.45 1.48 0.88
Precision-macro 0.82 0.98 1.59 4.16 0.99 0.88 0.94 0.86 3.81 0.66 1.07 0.97 0.97 2.47 0.52 1.12 0.69 1.63 2.53 0.66
Recall-macro 0.85 0.87 1.61 0.73 1.01 0.92 1.13 0.89 0.40 0.70 1.11 0.95 0.98 0.36 0.52 1.17 0.58 1.67 0.45 0.67
char_tfidf Acc 1.15 0.98 1.63 0.64 0.69 1.11 1.11 1.20 0.53 0.85 1.08 0.73 0.74 0.64 0.71 1.16 0.87 1.72 0.47 0.78
F1-macro 1.18 1.16 1.52 1.05 0.64 1.13 0.57 1.18 1.05 0.92 1.10 0.89 0.71 1.04 0.61 1.17 0.84 1.65 0.70 0.71
AUROC 1.02 1.03 1.44 0.99 0.93 1.33 0.67 1.14 1.60 0.67 1.23 0.75 0.67 0.79 0.64 0.80 1.07 1.34 1.01 0.94
Precision-macro 1.17 1.11 1.44 3.99 0.61 1.12 0.78 1.15 3.14 0.96 1.08 0.73 0.69 3.71 0.54 1.15 0.94 1.59 2.88 0.67
Recall-macro 1.23 1.14 1.45 0.69 0.61 1.16 0.52 1.19 0.59 1.01 1.13 0.86 0.71 0.68 0.54 1.20 0.81 1.62 0.48 0.68
word_tf Acc 0.72 0.73 1.49 0.42 0.83 0.53 0.66 0.73 0.40 1.17 1.04 1.19 0.48 0.25 0.64 0.56 1.74 0.60 0.33 0.80
F1-macro 0.74 0.78 1.44 0.86 0.88 0.47 0.62 0.76 0.86 1.19 1.12 1.00 0.49 1.79 0.78 0.58 1.76 0.71 0.96 0.78
AUROC 0.62 0.73 1.37 1.04 0.40 0.91 0.56 0.67 1.03 0.88 0.83 0.94 0.69 1.51 0.76 0.75 1.01 1.17 1.35 0.73
Precision-macro 0.73 0.77 1.38 3.41 0.89 0.41 0.62 0.77 3.60 1.17 1.13 1.02 0.49 0.68 0.79 0.58 1.78 0.74 1.71 0.77
Recall-macro 0.76 0.80 1.41 0.47 0.95 0.42 0.62 0.81 0.42 1.21 1.19 0.99 0.51 0.62 0.86 0.61 1.75 0.81 0.40 0.81
word_tfidf Acc 0.61 1.16 1.59 0.57 0.82 0.61 0.86 0.81 0.39 0.68 1.01 1.60 0.45 0.54 0.74 0.56 1.24 0.85 0.37 0.81
F1-macro 0.65 0.78 1.58 0.69 0.82 0.53 0.67 0.86 0.69 0.70 1.07 1.58 0.45 0.87 0.76 0.54 1.10 0.95 1.37 0.78
AUROC 0.61 0.43 1.30 1.63 0.49 0.92 0.31 0.66 1.96 0.91 0.82 1.11 0.69 1.20 0.62 0.71 1.15 1.20 0.76 0.68
Precision-macro 0.65 1.11 1.54 3.21 0.83 0.46 0.71 0.88 2.54 0.70 1.07 1.57 0.45 2.38 0.75 0.53 1.21 0.95 0.73 0.76
Recall-macro 0.69 0.71 1.58 0.56 0.87 0.47 0.64 0.92 0.42 0.73 1.12 1.58 0.47 0.55 0.78 0.54 1.07 1.03 0.62 0.80
LIWC Acc 1.50 1.29 1.47 0.62 1.28 1.25 1.34 1.30 0.76 0.89 1.02 1.37 0.90 0.51 0.86 0.52 1.14 0.81 0.09 0.92
F1-macro 1.45 1.34 1.42 0.58 1.22 1.24 1.28 1.33 0.77 0.90 1.06 1.30 0.87 1.28 0.85 0.49 1.37 0.83 0.67 0.93
AUROC 1.75 1.62 1.05 1.26 0.96 1.63 1.01 1.05 2.12 0.83 1.34 1.61 0.67 1.08 1.15 0.81 1.67 1.13 1.44 1.23
Precision-macro 1.36 1.34 1.35 1.95 1.15 1.19 1.26 1.34 2.20 0.89 1.09 1.27 0.82 1.74 0.83 0.47 1.40 0.84 0.30 0.91
Recall-macro 1.41 1.39 1.40 0.54 1.19 1.24 1.28 1.40 0.66 0.93 1.13 1.29 0.86 0.69 0.86 0.49 1.50 0.87 0.24 0.96
DOI: 10.7717/peerj.11382/table-A4

Notes:

* cnb = Complement Naïve Bayes, lgb = Light GBM, lgr = Logistic Regression, rf= Random Forest, svm = Support Vector Machine.

** combo is the combination of char_tf, char_tfidf, word_tf, word_tfidf and LIWC.

References

 
37 Citations 14,967 Views 2,734 Downloads