Inter-observer variability of two grading systems for equine glandular gastric disease.
Authors: Tallon Rose, Hewetson Michael
Journal: Equine veterinary journal
Summary
# Editorial Summary: Inter-observer Variability of Two Grading Systems for Equine Glandular Gastric Disease Rose & Hewetson (2021) investigated whether two existing grading approaches for equine glandular gastric disease produce consistent assessments across different veterinarians, recruiting 82 respondents (49 diplomates and 33 non-diplomates) to score 20 gastroscopic images using either descriptive terminology (severity, distribution, appearance, shape) or a 0–2 verbal rating scale. When all four descriptive variables were combined, inter-observer agreement was essentially absent (α = 0.19), though individual categories showed fair-to-moderate agreement: severity (α = 0.52), distribution (α = 0.44), appearance (α = 0.38) and shape (α = 0.32), with the verbal rating scale performing similarly to severity scoring (α = 0.53). Lesion appearance and shape—but notably not distribution—predicted both treatment decisions and severity classification, and images graded 2/2 on the verbal scale were strongly associated with descriptions of severity (odds ratio 75.2). Whilst specialist qualifications improved consistency across all parameters, these findings highlight significant variability in how EGGD lesions are interpreted in practice, indicating that clearer descriptive boundaries are needed before either system can reliably standardise clinical assessment and treatment decisions across the profession.
Read the full abstract on PubMed
Practical Takeaways
- •Current descriptive grading systems for EGGD lack standardization; clinicians should be aware that lesion assessment varies significantly between observers, even among specialists
- •Focus on lesion appearance and shape rather than distribution when assessing severity and deciding treatment, as these features show stronger clinical correlation
- •A simple verbal rating scale (0–2) performs as well as complex multi-descriptor systems for EGGD, potentially simplifying clinical practice and improving consistency
Key Findings
- •Inter-observer agreement for the combined four-descriptor system was poor (α = 0.19), while severity and verbal rating scale showed fair to moderate agreement (α = 0.52–0.53)
- •Diplomates demonstrated better agreement across all descriptive categories compared to non-diplomates
- •Lesion appearance and shape, but not distribution, were significantly associated with severity rating and treatment decisions
- •A verbal rating scale score of 2/2 was strongly associated with lesions being described as severe (OR 75.2, 95% CI 51.12–110.48)