Natural language processing (NLP) is a "field of computational linguistics that allows computers to understand human language,"1 offering a means to process and analyze large quantities of free-text data. Natural language processing has historically been used in computer science, customer service, and other industries2; however, in the last several decades, NLP has gained momentum in healthcare with its utility in examining large quantities of unstructured electronic health record (EHR) data. In the United States, 86% of office-based practices have adopted EHRs, and 96% of nonfederal acute care hospitals use certified EHR platforms to document patient information.3 It is projected that all paper-based charting will be replaced by electronic documentation, resulting in an exponential increase in EHR data.4 Thus, the need for NLP will increase to allow for evaluation and processing of large quantities of these EHR data.
Traditionally, prior studies using NLP in EHR have primarily examined data from radiology and pathology reports and medication-associated data and focused on chronic diseases such as cancer.5,6 However, little is known about applications of NLP using nurse-generated data in any related medical domain.
Nurses constitute the largest sector of healthcare providers internationally. In the United States alone, there are approximately 4 million nurses, nearly three times greater than the number of physicians.7 In most settings, nurses are required to document their care using EHRs, which results in exceptionally large volumes of nursing documentation, including narrative clinical notes. According to several studies, nurses in inpatient and outpatient settings currently spend up to 40% of their time on documentation, which includes reading and reviewing patient clinical notes.8
Even though NLP can be a promising avenue to process substantial amounts of nurse-generated data, little is known about the use of NLP to process nursing data. Nursing documentation differs from documentation by other health providers, including physicians. For example, a recent study found that only 26% of patients' notes in EHRs included synonyms between the physician's and the nurse's clinical notes.9 Today, there is a major gap in our knowledge regarding the extent to which NLP is applicable to nursing notes. In addition, it is unclear which NLP methods are applied to nursing data and how nursing notes add value to evaluating patient outcomes in EHR data.
OBJECTIVE
The purpose of this study is to review and summarize the literature on the use of NLP and text mining (referred as NLP hereafter) in nursing notes. We aim to describe the following regarding the use of NLP in nursing notes: (1) purpose and data source; (2) target clinical population and setting; and (3) NLP methods, evaluation, and performance and indicators of study quality. We further synthesize and discuss current trends and identify gaps related to NLP in nursing notes to guide future research and applied NLP efforts.
MATERIALS AND METHODS
Our review procedures were guided by the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-analyses) recommendations.10 To facilitate screening and data extraction of our integrative review, we used Covidence, a systematic review software by Veritas Health Innovation (Melbourne, Australia, available at http://www.covidence.org), which is designed to facilitate the collaboration and organization among all authors. Our review consisted of three stages, which include: (1) article retrieval, (2) study selection, and (3) data extraction and synthesis.
Article Retrieval
We initially searched PubMed, CINAHL, and EMBASE on June 25, 2019, and repeated our search on October 16, 2021, to identify potentially relevant studies using NLP in nursing notes. Search terms capturing concepts of NLP and nursing notes (Table 1) were derived from keywords and Medical Subject Headings vocabulary11 for the database queries. We limited our search results to the English language; however, we did not apply date constraints. Our search identified 319 records from PubMed, 338 from EMBASE, and 32 from CINAHL, achieving a total of 689 records (Figure 1). After excluding abstracts that did not meet our inclusion/exclusion criteria, we reviewed eligibility for 57 full-text articles.
Study Selection
Studies were included if they focused on the development or implementation of NLP using data generated by nurses (eg, inpatient or outpatient clinical notes, nursing handoff data). Articles that used NLP for domains potentially relevant to nursing but without explicitly describing that data were used that were generated by nurses were excluded. Articles not in English, review articles, and articles without full-text availability were also excluded. Four authors (S.M., M.T., M.H., J.S.) independently reviewed the title and abstract for each retrieved article. Articles were labeled by potential relevancy as "yes," "no," or "maybe" based on eligibility criteria. Disagreements and articles labeled as "maybe" were discussed to reach a consensus. The same authors (S.M., M.T., M.H., J.S.) then independently reviewed the full text of 57 articles identified as potentially relevant during the title and abstract screening. Articles were included if they met the inclusion criteria. Disagreements among authors were resolved through discussion. Of the 57 reviewed articles, six studies did not use nursing notes, one study did not use NLP, four studies were excluded as they were conference proceedings without details of study, two studies were identified as the same study in different journals, and one study was not in English. After excluding these studies, 43 studies were included in our final review (Figure 1).
Data Extraction and Synthesis
Data were manually extracted by four authors (S.M., M.T., M.H., J.S.) from the remaining 43 included studies9,12-53 in our review (Supplemental Digital Content 1, http://links.lww.com/CIN/A208). Two other coauthors with expertise in health informatics (M.T., J.S.) have reviewed and validated all the extracted data elements. A formal quality assessment was not conducted as formal reporting standards vary for NLP studies. Instead, we developed a data extraction spreadsheet based on the relevancy of NLP and recently reported NLP-focused systematic reviews.57,58 We included information related to the study purpose, data type, quantity, source, clinical setting, target population, NLP (system used, type of NLP method), use of standard terminology, and NLP system performance. We also evaluated the overall quality of a study based on the (1) clarity of the study purpose statement, (2) the adequacy of the description of the NLP approach described, (3) reporting of system evaluation metrics, and (4) description of the limitations of the NLP methods in their study. The following criteria were used to categorize reporting of NLP system evaluation and performance, when methodologically appropriate54,55:
Reported Evaluation Metrics:
* Full reporting of relevant metrics of performance that will allow comprehensive understanding of the performance of an NLP system, including but not limited to F1 score, precision, recall, area under the curve, sensitivity, and/or specificity
* Partial reporting of relevant measures of performance that allow only partial understanding of system performance (often only one of the performance metrics reported and authors of the review were unable to determine comprehensive system performance based on reported metrics)
* Did not report evaluation metrics-relevant performance measures not reported such as F1 score, precision, recall, area under the curve, sensitivity, and/or specificity
* Not applicable-study methods did not require reporting of performance metrics, for example, the goal of the study was to use NLP as a feature extraction technique for further unsupervised machine learning methods
RESULTS
Forty-three studies were included in our review. Although years of publication ranged from 2003 to 2021, more than 86% of the studies were published from 2015 to 2019, and more than half of the studies were published between 2019 and 2021.
Study Purpose, Data Source, and Patient Population
Approximately 40% of the studies (n = 17)15,19,24,28,30,33-38,41,43,47,49,52,53 included only nursing notes, whereas more than 53% of the studies (n = 23)9,12,16-18,20,22,23,25-27,29,31,32,39,40,42,44-46,48,50,51,56 included nursing notes and other types of notes written by physicians and other allied health professionals. Natural language processing in nursing notes was primarily conducted in inpatient settings (n = 26),9,15-18,20,24,27,31,32,36-48,50,51,53 followed by home care setting (n = 9).12,13,19,22,25,26,30,35,52 Other settings included emergency departments (n = 5)21,23,28,33,34 and assisted living facilities (n = 1).49 Most of the narrative notes were extracted from either large academic health centers or from freely available data sets, most commonly the Medical Information Mart for Intensive Care data set (n = 8).15,17,24,29,36,38,41,53 Other data sources included Visiting Nurse Service of New York data (n = 8)12,13,19,26,27,35,42,52 and Veterans Health Administration data (n = 2).50,51 No studies used data generated as a result of audio transcription of nurses' dictations.
The most common patient population identified in our review of studies included the general patient population (n = 16)18,20,22,25,26,29,31-34,39,40,43,45,51,52 who did not classify with a specific diagnosis or level of medical acuity. However, chronic diseases, such as cardiovascular disease (n = 1)24 and heart failure (n = 5),14,37,38,41,55 as well as those patients in critical care unit/ICU (n = 8),15,24,28,29,36,38,41,44,53 were the most reported diagnosis of patients in this review. The number of distinct patients for whom notes were used in the studies varied, ranging from 22 to 1 188 302 patients. In terms of clinical outcomes, NLP was used to assess data related to mortality,30,44,47 hospital readmissions,31,35 patient safety such as falls risk,27,53 infections from indwelling urinary catheters,54 and wound identification and progression.38 Natural language processing was also used to extract patient's sexual orientation and gender identity52 and to identify types of social support55 used by patients.
NLP Methodological Approaches, Evaluation, and Performance
A variety of NLP systems were used across the studies including (Supplemental Digital Content 2, http://links.lww.com/CIN/A209) Moonstone,51 v3NLP,50 Metamap,49 MedLEE,15 MTERMS,31,32 NimbleMiner,12 EMT-C,23 TextBlob,24 StanfordCoreNLP,18 MySQL,53 LASSO,17 and SAS Text miner,39,46 among other systems. More than 23% (n = 10) of the studies did not report if any specific NLP system was used. NimbleMiner was the only reported free and publicly available NLP system.12 More than 53% (n = 23) of the studies used manually curated rule-based NLP processing methods.15,17,23,24,28,30-33,35-37,40,43-45,47-53 Eighteen studies (42%) used hybrid NLP methodology,9,12-14,16,19-22,25-27,29,33,39,41,42,46 some of which consisted of hierarchical Dirichlet processes,41 latent Dirichlet allocation with principal component analysis,20,34 and hybrid with logistical regression.21 The most common hybrid NLP method was random forest with neural word embeddings.18,19,25,28,31-33,35,48 Only two studies used statistical methods for NLP exclusively.18,38
The most common standard terminologies used were the SNOMED-CT (Systematized Nomenclature of Medicine, approximately 40%, n = 17)12-14,16,22,25,27,29,31,32,36,41,42,45,51-53 and the UMLS (Unified Medical Language System, approximately 35%, n = 15).9,12-14,19,22,25,28-30,36,41,42,49,51 Nursing standard terminologies, such as the International Classification for Nursing Practice and the Omaha System, were used in only eight studies.18,25,28,31,33,35-37 Other standard terminologies were also used in some studies including the International Classification of Diseases,18,25,28,31,33,35,37,38,43,56,59 LOINC (Logical Observation Identifiers Names and Codes),32,52,53 the International Nursing Knowledge Association NANDA terminologies,59 and RXNorm.47 More than 47% of the studies (n = 20) did not report using any standard terminology.21,23,24,26,27,29,30,32,39-41,44-46,49,50,52-54,57
The NLP systems in our review evaluated a minimum of 20231 to a maximum of approximately 9.1 million notes by Hatef et al,45 which included nursing notes; however, the NLP system or its performance was not reported for this study, which utilized 9 million notes.45 NimbleMiner42 was used to process the largest number of notes (n = approximately 5.58 million), followed by MySQL53 with at least 1.1 million nursing notes. Eight studies did not report the number of documents that were used in their study to perform NLP.21,24,37,40,41,44,47,48
For studies reporting the F score, the highest overall F score was reported with a range from 80% to 96% by Koleck et al,58 and colleagues, who also used NimbleMiner as their NLP system.48 Natural language processing system analyzed more than 5 million notes to identify symptom information in clinical notes.48
Indicators for Quality Across Studies
Table 2 summarizes and compares the indicators of quality across studies included in our review. All studies (n = 43) clearly described and defined the purpose of their study. The majority (n = 40) of the studies described their NLP approach adequately. Two studies did not describe their NLP approach adequately.51,57 Seven studies were excluded from NLP system performance evaluation requirement because they implemented text mining methods that did not require NLP system evaluation.17,24,34,38,41,44,49 Among the remaining eligible studies (n = 36), full performance evaluation metrics, based on the criteria described in Materials and Methods, were not reported for 42% (n = 15)14,20,21,26,27,36,43,45,49,51-53,56,57,59 of the studies. Full NLP evaluation metrics were reported for 61% (n = 22) from the 36 eligible studies.
DISCUSSION
This integrative review on the use of NLP in nursing notes identified 43 relevant studies. Overall, we found that NLP in nursing notes is an emerging research area, with most studies conducted in the last 2 years. Even though there was a significant rise in the number of published NLP studies in recent years, a simple PubMed search in January 2022 for keywords "natural language processing" or "text mining" yields more than 40 000 studies, whereas nursing-specific studies (n = 689) constitute a tiny fraction (1.7%) of these articles. Thus, the overall number of studies is limited, and NLP in nursing notes is still an emerging area of study as nursing notes add value to the EHR data. Many studies in our review were conducted within hospital settings, making it difficult for the findings to be applicable to other settings (eg, nursing homes, skilled nursing facilities, outpatient clinics, etc). Furthermore, all the studies focused on common adult health conditions, widening the gap on the pediatric population. Future studies should include other more diverse care settings and be inclusive of all ages across the human life span.
Natural language processing of nursing data was conducted in a variety of domains, such as symptom identification and risk predictions. Our review highlights the fact that NLP can assist in identifying patient characteristics that are rarely captured as structured data, such as sexual orientation and gender identity52 and prediction of symptoms.42 In another study in our review, polarity of the sentiment included in nursing notes was found to be associated with 30-day mortality of patients after hospitalization.24 Yet, another study used nursing notes to identify patients at substantial risk for falls.27 Furthermore, the use of nursing notes has allowed researchers to predict home care patients at risk of emergency department visit or hospitalization based on nursing notes and the use of NLP.25 These examples demonstrate that NLP in nursing notes offers valuable information that can be used to understand patient characteristics such as critical symptoms or assess patient outcomes.
This review found that Medical Information Mart for Intensive Care, a freely available data set of clinical notes (including nursing notes), was used in more than a quarter of the NLP studies. More resources that include nursing data would support more nursing-relevant NLP research. In addition, open-source NLP platforms might have the potential to further spread NLP adoption. However, our review found only one open-source NLP system NimbleMiner, which was used across 10 studies,12-14,19,22,25-27,29,42 which has a proven ability to process copious quantities of clinical data with good F1 performance scores. Our review did not find studies describing the application of other freely available NLP systems.
We found that 61% (n = 22) of eligible (n = 36) studies reported comprehensive evaluation of NLP system performance. This finding is concerning because the remainder of the studies do not provide complete understanding of the accuracy of different NLP systems used in processing nursing data. We strongly encourage research teams engaging in research with NLP in nursing notes to report their measures of NLP system performance. Comprehensive assessment of different systems' performances will enable better understanding of which systems are applicable and better performers for the use of NLP in nursing data and ensure that researchers are able to generalize and compare results. In addition, our review found that 47% of the studies did not use standard terminologies, with nursing terminologies being applied in only eight studies (16%). We encourage further NLP projects to use existing standard terminologies, especially nursing-specific standard terminologies, such as the International Classification for Nursing Practice or the Omaha System to enable generalizability of NLP methods.
CONCLUSION
This integrative review identified a growing trend of NLP with nursing data (Figure 2). However, only half of the studies reported full NLP system performance metrics. We encourage further NLP studies to use appropriate evaluation measures (eg, F score) when reporting results for comparability with other studies. Furthermore, researchers using NLP in nursing notes are encouraged to use existing standard nursing terminologies and open-source systems to enable future scalability of the methodologies. Finally, more evidence is needed to understand the applicability of NLP beyond the inpatient setting. Future studies should consider applying NLP to a variety of populations, including the pediatric and adult outpatient population, to expand the knowledge that can be gleaned from using NLP in nursing notes.
References