Keywords

High-Stakes Testing, Interrater Reliability, Intrarater Reliability, Nursing Education, Simulation

 

Authors

  1. Kardong-Edgren, Suzan
  2. Oermann, Marilyn H.
  3. Rizzolo, Mary Anne
  4. Odom-Maryon, Tamara

Abstract

AIM: This article reports one method to develop a standardized training method to establish the inter- and intrarater reliability of a group of raters for high-stakes testing.

 

BACKGROUND: Simulation is used increasingly for high-stakes testing, but without research into the development of inter- and intrarater reliability for raters.

 

METHOD: Eleven raters were trained using a standardized methodology. Raters scored 28 student videos over a six-week period. Raters then rescored all videos over a two-day period to establish both intra- and interrater reliability.

 

RESULTS: One rater demonstrated poor intrarater reliability; a second rater failed all students. Kappa statistics improved from the moderate to substantial agreement range with the exclusion of the two outlier raters' scores.

 

CONCLUSION: There may be faculty who, for different reasons, should not be included in high-stakes testing evaluations. All faculty are content experts, but not all are expert evaluators.