Bag-of-words technique in natural language processing: A primer for radiologists Review


Authors: Juluru, K.; Shih, H. H.; Keshava Murthy, K. N.; Elnajjar, P.
Review Title: Bag-of-words technique in natural language processing: A primer for radiologists
Abstract: Natural language processing (NLP) is a methodology designed to extract concepts and meaning from human-generated unstructured (free-form) text. It is intended to be implemented by using computer algorithms so that it can be run on a corpus of documents quickly and reliably. To enable machine learning (ML) techniques in NLP, free-form text must be converted to a numerical repre-sentation. After several stages of preprocessing including tokeniza-tion, removal of stop words, token normalization, and creation of a master dictionary, the bag-of-words (BOW) technique can be used to represent each remaining word as a feature of the document. The preprocessing steps simplify the documents but also poten-tially degrade meaning. The values of the features in BOW can be modified by using techniques such as term count, term frequency, and term frequency–inverse document frequency. Experience and experimentation will guide decisions on which specific techniques will optimize ML performance. These and other NLP techniques are being applied in radiology. Radiologists’ understanding of the strengths and limitations of these techniques will help in communi-cation with data scientists and in implementation for specific tasks. © RSNA, 2021.
Journal Title: RadioGraphics
Volume: 41
Issue: 5
ISSN: 0271-5333
Publisher: Radiological Society of North America, Inc.  
Date Published: 2021-09-01
Start Page: 1420
End Page: 1426
Language: English
DOI: 10.1148/rg.2021210025
PUBMED: 34388050
PROVIDER: scopus
PMCID: PMC8415041
DOI/URL:
Notes: Article -- Export Date: 1 October 2021 -- Source: Scopus
Altmetric
Citation Impact
BMJ Impact Analytics
MSK Authors
  1. Krishna   Juluru
    35 Juluru
  2. Hao-Hsin Shih
    6 Shih