Interobserver Reliability of the Animal Welfare Indicators Welfare Assessment Protocol for Horses.
Authors: Czycholl Irena, Klingbeil Philipp, Krieter Joachim
Journal: Journal of equine veterinary science
Summary
# Editorial Summary: Interobserver Reliability of the AWIN Welfare Assessment Protocol for Horses Standardised welfare assessment tools are essential for consistent evaluation across different farms and operators, yet their practical utility depends on whether different observers can reliably reach the same conclusions about individual animals. Czycholl and colleagues evaluated the interobserver reliability of the Animal Welfare Indicators (AWIN) protocol by having two trained assessors simultaneously score 18 horses using various statistical measures including Cohen's kappa, intraclass correlation coefficients, and limits of agreement at both individual and farm levels. The majority of welfare indicators demonstrated acceptable to good interobserver reliability at both levels; however, specific facial tension indicators on the Horse Grimace Scale (moderate presence of tension above the eye area and orbital tightening at score 1) and assessment of swollen joints showed poorer agreement between observers. Whilst the protocol performs well overall, the authors recommend refining scoring definitions for several parameters—particularly Body Condition Score—to improve consistency, which would strengthen its application as a standardised tool for on-farm welfare monitoring. For equine professionals implementing the AWIN protocol in practice, these findings suggest the system is reasonably reliable when observers are trained, though attention to detailed scoring criteria and consideration of individual assessor calibration would further support robust welfare evaluation.
Read the full abstract on PubMed
Practical Takeaways
- •The AWIN protocol is sufficiently reliable for on-farm welfare assessment when observers are trained, though some facial expression indicators need clearer scoring guidance
- •Swollen joint assessment and body condition scoring should be interpreted with caution due to lower observer agreement, and consider standardized training refinements before implementation
- •Use farm-level summaries rather than individual assessments for more consistent welfare monitoring outcomes
Key Findings
- •Most AWIN welfare assessment indicators demonstrated acceptable to good interobserver reliability at individual level (Cohen's kappa ≥0.4)
- •Farm-level assessment showed acceptable to good reliability for most indicators using Spearman correlation and ICC (≥0.4 to ≥0.7)
- •Horse Grimace Scale indicators (moderate tension above eye area and orbital tightening) and swollen joints showed poor interobserver agreement
- •Body Condition Score and some other indicators require improved scoring detail definitions to enhance reliability