Objective: The objective of this work is to obtain validity evidence for an evaluation instrument used to assess the performance level of a mastoidectomy. The instrument has been previously described and had been formulated by a multi-institutional consortium.
Design: Mastoidectomies were performed on a virtual temporal bone system and then rated by experts using a previously described 15 element task-based checklist. Based on the results, a second, similar checklist was created and a second round of rating was performed.
Setting: Twelve otolaryngological surgical training programs in the United States. Participants 66 individuals with a variety of temporal bone dissection experience, from medical students to attending physicians. Raters were attending surgeons from 12 different institutions.
Results: Intraclass correlation (ICC) scores varied greatly between items in the checklist with some being low and some being high. Percentage agreement scores were similar to previous rating instruments. There is strong evidence that a high score on the task- based checklist is necessary for a rater to consider a mastoidectomy to be performed at the level of an expert but a high score is not a sufficient condition.
Conclusions: Rewording of the instrument items to focus on safety does not result in increased reliability of the instrument. The strong result of the Necessary Condition Analysis suggests that going beyond simple correlation measures can give extra insight into grading results. Additionally, we suggest using a multiple point scale instead of a binary pass/fail question combined with descriptive mastery levels.