The influence of rater training on inter-and intra-rater reliability when using the rat grimace scale

Emily Zhang; Vivian Leung; Daniel SJ Pang

doi:10.7287/peerj.preprints.26721v2

The influence of rater training on inter-and intra-rater reliability when using the rat grimace scale

Emily Zhang¹, Vivian Leung², Daniel SJ Pang ²

1 Western College of Veterinary Medicine, University of Saskatchewan, Saskatoon, Saskatchewan, Canada

2 Department of Clinical Sciences, Université de Montréal, St-Hyacinthe, QC, Canada

DOI: 10.7287/peerj.preprints.26721v2

Published: 2018-04-16
Accepted: 2018-04-16

Subject Areas: Animal Behavior, Veterinary Medicine, Anesthesiology and Pain Management
Keywords: rat grimace scale, RGS, refinement, welfare, pain assessment, training, 3Rs, scale validation

Copyright: © 2018 Zhang et al.
Licence: This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.

Cite this article: Zhang E, Leung V, Pang DS. 2018. The influence of rater training on inter-and intra-rater reliability when using the rat grimace scale. PeerJ Preprints 6:e26721v2 https://doi.org/10.7287/peerj.preprints.26721v2

Abstract

Rodent grimace scales facilitate assessment of spontaneous pain and can identify a range of acute pain levels. Reported rater training in using these scales varies considerably and may contribute to observed variability in inter-rater reliability. This study evaluated the effect of training on inter-rater reliability with the Rat Grimace Scale (RGS). Two training sets, of 42 and 150 images, were prepared from several acute pain models. Four trainee raters progressed through 2 rounds of training, first scoring 42 images (S1) followed by 150 images (S2a). After each round, trainees reviewed the RGS and any problematic images with an experienced rater. The 150 images were then re-scored (S2b). Four years after training, all trainees re-scored the 150 images (S2c). Inter- and intra-rater reliability was evaluated using the intra-class correlation coefficient (ICC) and ICCs compared with a Feldt test. Inter-rater reliability increased from moderate (0.58 [95%CI: 0.43-0.72]) to very good (0.85 [0.81-0.88]) between S1 and S2b (p < 0.01) and also increased between S2a and S2b (p < 0.01). The action units with the highest and lowest ICCs at S2b were orbital tightening (0.84 [0.80-0.87]) and whiskers (0.63 [0.57-0.70]), respectively. In comparison to an experienced rater the ICCs for all trainees improved, ranging from 0.88 to 0.91 at S2b. Four years later, very good inter-rater reliability was retained (0.82 [0.76-0.84]) and intra-rater reliability was good or very good (0.78-0.87). Training improves inter-rater reliability between trainees, with an associated reduction in 95%CI. Additionally, training resulted in improved inter-rater reliability alongside an experienced rater. Performance was retained after several years. The beneficial effects of training potentially reduce data variability and improve experimental animal welfare.

Author Comment

This new version includes results from an additional analysis to evaluate intra-rater reliability 4 years after the initial training period.

Supplemental Information

Rat Grimace Scale Training Manual

DOI: 10.7287/peerj.preprints.26721v2/supp-1

Download