What is the enthusiasm surrounding big data science in healthcare all about? Big data is a new term and a new paradigm describing the massive amounts of digital data that is captured and collected in multiple data sets, integrated, and analyzed. The sheer complexity of big data is often outlined as 3 Vs: volume (large data sets), velocity (a continual stream of data in near-real time), and variety (multiple sources of structured and unstructured data).
Big data science presents an opportunity for a healthcare organization to harness and leverage captured digital information for analysis and decision making. It also presents an opportunity for researchers to pool multiple data sources for discovery. Potential data sources can include: a healthcare organization's operational inpatient and outpatient electronic health records (EHRs), administrative databases (coding, regulatory, quality measures, finance, reimbursement, readmissions), and retrospective data warehouse; patient-generated observations of daily living in the personal health record; research databases, science databases (genomics, epigenomics), and nonhealthcare social networks (Internet, e-mail).
The data scientist is fundamental to analysis of big data. The data scientist is a professional with data analysis skills, from fields such as bioinformatics, statistics, epidemiology, and information technology, and data communication skills. Basically, the role is to link, map, clean, and transform data. Skills are needed to integrate structured (traditional databases) and unstructured (text-based documents) data sources, apply analytics beyond traditional data processing (data mining, high-performance analytics, machine learning, artificial neural networks), and communicate the analysis to the clinical audience. Communication with clinicians is often described as "storytelling" or data narratives and data visualization that transform the data into practical evidence. Effective data visualization includes graphs, charts, maps, tag clouds, animations, or any image that is engaging, informative, understandable, and actionable.
A combination of multiple EHR data sets with millions of patient records may be analyzed to answer a particular research question. Or EHR data sets may be analyzed to develop and test models for prediction of outcomes for particular risks and interventions, prediction of cost for particular interventions, or prediction of readmissions. Some studies have included real-time patient data from the operational database. Although the potential for research with EHR big data is promising, challenges remain with shortcomings in the quality of data capture, database analysis expertise, and healthcare organizations' appreciation for the potential value in big data science (Hunt & Chang, 2013). Combinations of research databases have been used to discover models to assess risk and predict outcomes of disease and interventions.
Nurses are already interacting with big data science. Data visualization skills are needed to interpret and communicate the growing number of data dashboards in the clinical setting (Skiba, 2014). Nursing research journals are sharing big data resources, workshops, and encouraging nurses to disseminate big data research (Henly, 2014). The National Institute of Nursing Research sponsors an initiative (http://www.youtube.com/watch?v=S00DyTsdFm4) and training in big data science (http://www.ninr.nih.gov/training/trainingopportunitiesintramural/bootcamp#.VFcMT). Nurses may not recognize that the daily documentation of patient assessments, care, and outcomes in the EHR is more than "charting" nursing care; the documentation is data collection, with the potential for analysis that can generate new knowledge and advancements in care. Every clinical nurse who is entering patient data into an EHR is contributing to big data science.
References