Linked Entity Attribute Pair (LEAP): A harmonization framework for data pooling Journal Article


Authors: Thomas, S.; Lichtenberg, T.; Dang, K.; Fitzsimons, M.; Grossman, R. L.; Kundra, R.; Lavery, J. A.; Lenoue-Newton, M. L.; Panageas, K. S.; Sawyers, C.; Schultz, N. D.; Sirintrapun, S. J.; Topaloglu, U.; Welch, A.; Yu, T.; Zehir, A.; Gardos, S.
Article Title: Linked Entity Attribute Pair (LEAP): A harmonization framework for data pooling
Abstract: PURPOSE: As data-sharing projects become increasingly frequent, so does the need to map data elements between multiple classification systems. A generic, robust, shareable architecture will result in increased efficiency and transparency of the mapping process, while upholding the integrity of the data. MATERIALS AND METHODS: The American Association for Cancer Research's Genomics Evidence Neoplasia Information Exchange (GENIE) collects clinical and genomic data for precision cancer medicine. As part of its commitment to open science, GENIE has partnered with the National Cancer Institute's Genomic Data Commons (GDC) as a secondary repository. After initial efforts to submit data from GENIE to GDC failed, we realized the need for a solution to allow for the iterative mapping of data elements between dynamic classification systems. We developed the Linked Entity Attribute Pair (LEAP) database framework to store and manage the term mappings used to submit data from GENIE to GDC. RESULTS: After creating and populating the LEAP framework, we identified 195 mappings from GENIE to GDC requiring remediation and observed a 28% reduction in effort to resolve these issues, as well as a reduction in inadvertent errors. These results led to a decrease in the time to map between OncoTree, the cancer type ontology used by GENIE, and International Classification of Disease for Oncology, 3rd Edition, used by GDC, from several months to less than 1 week. CONCLUSION: The LEAP framework provides a streamlined mapping process among various classification systems and allows for reusability so that efforts to create or adjust mappings are straightforward. The ability of the framework to track changes over time streamlines the process to map data elements across various dynamic classification systems.
Journal Title: JCO Clinical Cancer Informatics
Volume: 4
ISSN: 2473-4276
Publisher: American Society of Clinical Oncology  
Date Published: 2020-01-01
Start Page: 691
End Page: 699
Language: English
DOI: 10.1200/cci.20.00037
PUBMED: 32755461
PROVIDER: scopus
PMCID: PMC7469618
DOI/URL:
Notes: Article -- Export Date: 1 September 2020 -- Source: Scopus
Altmetric
Citation Impact
BMJ Impact Analytics
MSK Authors
  1. Charles L Sawyers
    225 Sawyers
  2. Katherine S Panageas
    512 Panageas
  3. Ahmet Zehir
    343 Zehir
  4. Nikolaus D Schultz
    487 Schultz
  5. Stuart M Gardos
    21 Gardos
  6. Stacy Bridget Thomas
    6 Thomas
  7. Ritika   Kundra
    89 Kundra
  8. Jessica Ann Lavery
    79 Lavery
  9. Angelica Noel Welch
    1 Welch