Clinical annotations for prostate cancer research: Defining data elements, creating a reproducible analytical pipeline, and assessing data quality Journal Article


Authors: Keegan, N. M.; Vasselman, S. E.; Barnett, E. S.; Nweji, B.; Carbone, E. A.; Blum, A.; Morris, M. J.; Rathkopf, D. E.; Slovin, S. F.; Danila, D. C.; Autio, K. A.; Scher, H. I.; Kantoff, P. W.; Abida, W.; Stopsack, K. H.
Article Title: Clinical annotations for prostate cancer research: Defining data elements, creating a reproducible analytical pipeline, and assessing data quality
Abstract: Background: Routine clinical data from clinical charts are indispensable for retrospective and prospective observational studies and clinical trials. Their reproducibility is often not assessed. We developed a prostate cancer-specific database for clinical annotations and evaluated data reproducibility. Methods: For men with prostate cancer who had clinical-grade paired tumor–normal sequencing at a comprehensive cancer center, we performed team-based retrospective data collection from the electronic medical record using a defined source hierarchy. We developed an open-source R package for data processing. With blinded repeat annotation by a reference medical oncologist, we assessed data completeness, reproducibility of team-based annotations, and impact of measurement error on bias in survival analyses. Results: Data elements on demographics, diagnosis and staging, disease state at the time of procuring a genomically characterized sample, and clinical outcomes were piloted and then abstracted for 2261 patients (with 2631 samples). Completeness of data elements was generally high. Comparing to the repeat annotation by a medical oncologist blinded to the database (100 patients/samples), reproducibility of annotations was high; T stage, metastasis date, and presence and date of castration resistance had lower reproducibility. Impact of measurement error on estimates for strong prognostic factors was modest. Conclusions: With a prostate cancer-specific data dictionary and quality control measures, manual clinical annotations by a multidisciplinary team can be scalable and reproducible. The data dictionary and the R package for reproducible data processing are freely available to increase data quality and efficiency in clinical prostate cancer research. © 2022 Wiley Periodicals LLC.
Keywords: cancer survival; retrospective studies; outcome assessment; cancer grading; reproducibility; quality control; reproducibility of results; pathology; retrospective study; cancer research; prostate cancer; prostatic neoplasms; prostate tumor; measurement error; process development; electronic health records; clinical data; cancer prognosis; data processing; open source software; measurement accuracy; humans; human; male; article; electronic health record; multidisciplinary team; data quality; data accuracy; patient history of castration
Journal Title: Prostate
Volume: 82
Issue: 11
ISSN: 0270-4137
Publisher: John Wiley & Sons  
Date Published: 2022-08-01
Start Page: 1107
End Page: 1116
Language: English
DOI: 10.1002/pros.24363
PUBMED: 35538298
PROVIDER: scopus
PMCID: PMC9246896
DOI/URL:
Notes: Article -- Export Date: 1 August 2022 -- Source: Scopus
Altmetric
Citation Impact
BMJ Impact Analytics
MSK Authors
  1. Susan Slovin
    254 Slovin
  2. Michael Morris
    577 Morris
  3. Karen Anne Autio
    119 Autio
  4. Dana Elizabeth Rathkopf
    272 Rathkopf
  5. Howard Scher
    1130 Scher
  6. Daniel C Danila
    154 Danila
  7. Wassim Abida
    154 Abida
  8. Philip Wayne Kantoff
    197 Kantoff
  9. Ethan Sean Barnett
    31 Barnett
  10. Emily Ann Carbone
    27 Carbone
  11. Niamh Marie Keegan
    18 Keegan
  12. Barbara Nweji
    5 Nweji
  13. Alexander Blum
    2 Blum