Quantifying interrater agreement and reliability between thoracic pathologists: Paradoxical behavior of Cohen's kappa in the presence of a high prevalence of the histopathologic feature in lung cancer Journal Article


Authors: Tan, K. S.; Yeh, Y. C.; Adusumilli, P. S.; Travis, W. D.
Article Title: Quantifying interrater agreement and reliability between thoracic pathologists: Paradoxical behavior of Cohen's kappa in the presence of a high prevalence of the histopathologic feature in lung cancer
Abstract: Introduction: Cohen's kappa is often used to quantify the agreement between two pathologists. Nevertheless, a high prevalence of the feature of interest can lead to seemingly paradoxical results, such as low Cohen's kappa values despite high “observed agreement.” Here, we investigate Cohen's kappa using data from histologic subtyping assessment of lung adenocarcinomas and introduce alternative measures that can overcome this “kappa paradox.” Methods: A total of 50 frozen sections from stage I lung adenocarcinomas less than or equal to 3 cm in size were independently reviewed by two pathologists to determine the absence or presence of five histologic patterns (lepidic, papillary, acinar, micropapillary, solid). For each pattern, observed agreement (proportion of cases with concordant “absent” or “present” ratings) and Cohen's kappa were calculated, along with Gwet's AC1. Results: The prevalence of any amount of the histologic patterns ranged from 42% (solid) to 97% (acinar). On the basis of Cohen's kappa, there was substantial agreement for four of the five patterns (lepidic, 0.65; papillary, 0.67; micropapillary, 0.64; solid, 0.61). Acinar had the lowest Cohen's kappa (0.43, moderate agreement), despite having the highest observed agreement (88%). In contrast, Gwet's AC1 values were close to or higher than Cohen's kappa across patterns (lepidic, 0.64; papillary, 0.69; micropapillary, 0.71; solid, 0.73; acinar, 0.85). The proportion of positive versus negative agreement was 93% versus 50% for acinar. Conclusions: Given the dependence of Cohen's kappa on feature prevalence, interrater agreement studies should include complementary indices such as Gwet's AC1 and proportions of specific agreement, especially in settings with a high prevalence of the feature of interest. © 2024 The Authors
Keywords: diagnostic accuracy; sensitivity and specificity; reproducibility; interobserver coefficient; performance metrics; predominant histologic subtypes
Journal Title: JTO Clinical and Research Reports
Volume: 5
Issue: 1
ISSN: 2666-3643
Publisher: Elsevier BV  
Date Published: 2024-01-01
Start Page: 100618
Language: English
DOI: 10.1016/j.jtocrr.2023.100618
PROVIDER: scopus
PMCID: PMC10820331
PUBMED: 38283651
DOI/URL:
Notes: Article -- MSK Cancer Center Support Grant (P30 CA008748) acknowledged in PubMed and PDF -- MSK corresponding author is Kay See Tan -- Source: Scopus
Altmetric
Citation Impact
BMJ Impact Analytics
MSK Authors
  1. William D Travis
    743 Travis
  2. Kay See   Tan
    241 Tan