Consistency of assessment is essential to the farm assurance process. This study evaluated the inter-observer reliability of 31 farm assurance assessors, six veterinarians and four researchers for five pig welfare outcome measures proposed for inclusion into the UK pig farm assurance schemes. These were (1) tail lesions, (2) body lesions, (3) lameness, (4) pigs requiring hospitalisation and (5) oral behaviour. The following inter-observer reliability testing methods against a gold standard Trainer were used: a comparison of farm prevalence and the numbers of affected pigs in each pen identified by observers, Cohen's kappa (κ), Kendall's W, proportional agreement, sensitivity, and specificity. All measures achieved potentially high levels of inter-observer reliability and it was concluded that none should be excluded from farm assurance at this stage. However, across all the measures, 45% of observers did not record an overall farm prevalence 'close' to that of the gold standard Trainer. With the level of training and testing that took place in this study there would be a danger of significant bias occurring in a national assessment scheme. The data collected enabled some comparison of the methods used to assess inter-observer reliability. It is suggested that when the aim is to achieve agreement between observers on the overall farm prevalence the inter-observer reliability testing should focus on the closeness of the overall farm prevalence recorded by observers, but that other types of analysis may be helpful during training.