The influence of rater training on inter-rater reliability when using the rat grimace scale
- Published
- Accepted
- Subject Areas
- Animal Behavior, Veterinary Medicine, Anesthesiology and Pain Management
- Keywords
- rat grimace scale, RGS, refinement, welfare, pain assessment, training, 3Rs, scale validation
- Copyright
- © 2018 Zhang et al.
- Licence
- This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
- Cite this article
- 2018. The influence of rater training on inter-rater reliability when using the rat grimace scale. PeerJ Preprints 6:e26721v1 https://doi.org/10.7287/peerj.preprints.26721v1
Abstract
Background. Rodent grimace scales facilitate evaluation of the affective component of pain and can identify a range of acute pain levels. Reported rater training in the use of these scales varies considerably and may contribute to observed variability in inter-rater reliability. This study evaluated the effect of training on inter-rater reliability with the Rat Grimace Scale (RGS). Methods. Two training sets, of 42 and 150 images, were prepared from several acute pain models. Four trainee raters, with no previous experience with the RGS, progressed through 2 rounds of training, first scoring 42 images (S1) followed by 150 images (S2a). After each round, trainees reviewed the RGS and any problematic images with an experienced rater. The 150 images were re-scored in a final round (S2b). Inter-rater reliability was evaluated using the intra-class correlation coefficient (ICC) and ICCs compared with a Feldt test. Results. Inter-rater reliability increased from moderate (ICC 0.58 [95%CI: 0.43-0.72]) to very good (ICC 0.85 [0.81-0.88]) between S1 and S2b (p < 0.01) with a significant increase also observed between S2a and S2b (p < 0.01). The ICCs for individual action units orbital tightening, ears and nose/cheek also improved from S1 to S2b (p < 0.01). The action units with the highest and lowest ICCs at S2b were orbital tightening (0.84 [0.80-0.87]) and whiskers (0.63 [0.57-0.70]), respectively. In comparison to an experienced rater the ICCs for all trainees improved, ranging from 0.88 to 0.91 at S2b. Discussion. Training improves inter-rater reliability between trainees, with an associated reduction in 95%CI. Additionally, training resulted in improved inter-rater reliability alongside an experienced rater. Training improves the scoring of individual action units though scoring of whiskers is more difficult that other sites. Conclusion. The beneficial effects of training potentially reduce data variability and improve experimental animal welfare.
Author Comment
This is a submission to PeerJ for review.
Supplemental Information
Calculated Intra-class Correlation Coefficients (ICC) for the 150 image set after the second round of scoring (S2b)
ICCs for the full image set (150 images) are displayed alongside those where images for which scores for any action unit differed by 2 points for any 2 raters (122 images). Data are ICC [95% CI].
Bar graph (mean ± SEM) showing RGS scores at baseline (n = 41 images) and 6-9 hours after treatment (n = 29 images: intraplantar Complete Freund’s Adjuvant; n = 19 images, plantar incision; n = 10 images)
Broken horizontal line indicates derived analgesia intervention threshold (Oliver, De Rantere & Ritchie et al., 2014).