Investigations of the reliability of observational gait analysis for the assessment of lameness in horses.
Authors: Hewetson M, Christley R M, Hunt I D, Voute L C
Journal: The Veterinary record
Summary
# Editorial Summary: Observational Gait Analysis and Lameness Assessment Reliability Observational lameness grading remains a cornerstone of equine clinical practice, yet this 2006 investigation raised important questions about the consistency of subjective assessment methods. Hewetson and colleagues had 16 independent observers grade lameness severity in 20 videotaped horses using both a numerical rating scale (NRS) and verbal rating scale (VRS), then analysed inter-observer and intra-observer agreement, correlation patterns, and systematic bias. Whilst both scales demonstrated high correlation coefficients and acceptable inter-observer consistency at around 56–60 per cent agreement with negligible systematic bias, the clinical utility of this finding was limited: agreement between the two scales themselves proved unacceptable for clinical purposes when scores were compared directly, despite significant statistical correlation. These results underscore that subjective gait assessment scales, whilst reliable within themselves, cannot be used interchangeably and are only moderately reliable overall—a finding with considerable implications for practitioners relying on visual lameness grading for diagnosis, monitoring treatment response, or communicating clinical findings across different settings and observers.
Read the full abstract on PubMed
Practical Takeaways
- •Do not switch between numerical and verbal lameness scales in your assessments—the tools produce clinically different results despite statistical correlation
- •When evaluating lameness on video or in clinical practice, standardize to one rating scale within your operation and be aware that observer agreement is around 56-60%, meaning re-evaluation by a second opinion is valuable
- •These moderate reliability findings suggest that subjective gait analysis alone has limitations; combine visual assessment with additional diagnostic tools (flexion tests, imaging, etc.) for clinical decision-making
Key Findings
- •Observer agreement was moderate at 56% for numerical rating scale (NRS) and 60% for verbal rating scale (VRS) with high Kendall coefficient of concordance
- •Both scales showed high correlation between and within observers with no significant bias among observers' mean scores
- •NRS and VRS scores were significantly correlated with each other but differences between scales were clinically unacceptable
- •Both rating scales demonstrated only moderate reliability for assessing lameness severity and should not be used interchangeably