Someone will recommend a book, or I'll read a good review, and I'll start looking for it. I am a voracious reader and would have filed for bankruptcy years ago if I had to pay for every book I read. I used to go to the library, but in the era of COVID-19 I turned to online apps provided by my local libraries-Libby, Hoopla, and Axis 360. I frequently end up frustrated. My local library's version of Libby, for instance, has 32,000 eBooks and audiobooks, but apparently never the one I want right now.
How many books are there? Put another way, for the online resources to mimic Borges' universal library, how much would its offerings need to expand? Google tells me that the total number of published books since Gutenberg is around 130 million, and over 2 million per year are added to the listings worldwide. My library's holdings are several orders of magnitude off these numbers.
Google, of course, famously wanted to put all books online, or at least all the older ones, before running into the wrath of the publishers. The company is a major proponent of the oft-expressed idea that "data wants to be free" and is expert at monetizing that data. As recently as this week, as I write this, Google was fined $593 million in France for failing to negotiate in good faith with publishers over profit-sharing for the news publishers provide. Previously, the company had been in a long battle over copyright as a result of scanning older books without the permission of publishers or authors.
These battles encapsulate a basic modern tension. It is (at least I believe it is) in the public good to acquire, collate, distribute, and analyze large datasets, including books. Yet these datasets were originally someone's intellectual property, paid for with sweat or money, and the original owners (or original acquirers, not necessarily the same thing) rarely want to give that property away for free so that others earn billions of dollars off their hard work. Nor is every use to which this data is put equally virtuous.
A previous generation would have considered it odd to think of Anna Karenina or the Sunday Times as digital assets to be monetized, yet that is exactly what they are, and we all got comfortable with that fact as the world migrated online. That we still have a New York Times is largely due to this phenomenon, just as the inability of local newspapers to monetize their digital assets has contributed to their demise. And, one might add, to the demise of American democracy.
I used to think of this as falling into that broad category of "other people's problems." Medicine seemed to be relatively recalcitrant to, or at least significantly isolated from, the monetization of digital data. Fortress Healthcare had several ramparts around it: HIPAA regulations, lack of interoperability between health systems' electronic health records, the virtual impenetrability of those records, their sheer volume, and the absolute meaninglessness of much of the data being collected. Even a computer doing machine learning will get bored analyzing CBCs and chem profiles from my clinic, and the best natural language processing programs will be stumped by the grammatical excesses of my colleagues, though not of course by my dictations. And let's not even talk about the inherent messiness of complex biologic systems, both at an individual and collective level.
Early efforts in this field were largely unsuccessful, with IBM pouring billions into a failed attempt to rationalize cancer care through its Watson program, mostly succeeding in irritating the trustees of Memorial and MD Anderson. Watson's computer engineers, like most engineers believing the universe to be an ultimately rational place, were disabused of that peculiar notion by the American health care system in general and the field of oncology in particular. This cast somewhat of a pall on the exercise.
But this is changing rapidly, with an efflorescence of attempts to extract monetizable data from the health care system, and those rapidly increasing in scope as the big money piles in. Let me share an incomplete listing to give you some flavor of what is happening.
Since I've already mentioned Google, a good place to start is Google Cloud's interaction with HCA Healthcare, a Nashville-based giant with, to quote the press release announcing the collaboration, 32 million annual patient encounters, 93,000 nurses, and 47,000 active and affiliated physicians. The partnership will make use of Google's health care analytics, including BigQuery, described as "a planetary scale database," to make sense of this huge dataset. One wonders about the need for planetary scale, since HCA is pretty much an American operation, but I'm sure Google has grand plans for the entire solar system, should HCA ever open a clinic on Mars. Google Cloud has also partnered with Mayo Clinic. The press releases wallow in corporate cliches such as "transforming healthcare," a phrase which always makes me nervous when applied to my clinic, along with (per Google Cloud's CEO) "being an accelerant for innovation." Accelerants, you will recall, are used by arsonists when they want to burn down your house.
Somewhat closer to home, and easier for a poor oncologist to understand, is Roche's purchase of, respectively, Flatiron Health and Foundation Medicine. Flatiron's goal, according to its website, is to solve the problem caused by the real-world clinical data which is "unstructured and stored across thousands of disconnected community clinics, medical centers and hospitals." Its website tells us that Flatiron works with over 280 cancer centers and seven "major" academic medical centers (because why would they work with a minor academic medical center?), as well as more than 20 drug developers, and has access to almost 3 million patient records. And not just access to the records: Flatiron sells OncoEMR, its cancer-specific electronic health records platform, to practices, then mines data from the platform.
If Flatiron analyzes other people's data, Foundation Medicine creates genomic data. I certainly use their services, both for clinical and research purposes, and I am not alone. Foundation has analyzed over a half-million tumor samples from cancer patients, providing useful-or occasionally somewhat useful-genomic data to help guide care. That's what most oncologists consider Foundation's raison d'etre. But its website touts its data analytics and describes how it uses them to partner with more than 50 biopharma companies. My library's Libby app has 32,000 books available, which is quite a bit of collected knowledge. Foundation's library contains the book of life for a half-million souls, which is in its own way also quite a bit of knowledge, or at least the beginnings of knowledge. You can do a lot of number-crunching with a half-million cancer genomes.
Roche paid $2.4 billion for Foundation Medicine and $1.9 billion for Flatiron Health. Never underestimate the Swiss when it comes to buying in the health care space, as anyone who remembers Roche's purchase of Genentech will attest. But there is something fascinating about a Big Pharma company spending over $4 billion to acquire what are, in essence, two premiere Big Data companies. My thoughts are that Roche certainly believes it can monetize the data developed by its purchases, as well as using the data to generate novel therapeutic agents for cancer patients. And tying together lab scientists at Genentech with large genomic and clinical databases certainly has its attractions. I don't have a clue what the return on investment for Flatiron and Foundation will amount to, but a single drug development hit would wipe out those sunk costs in a hurry.
Of course, a large health care company can directly purchase a practice network and mine it for data. The classic example of this is McKesson, which bought the US Oncology Network back in 2010. McKesson describes itself as "the oldest and largest healthcare company in the nation" and is a behemoth with $231.1 billion in revenues last year. One of McKesson's divisions is Ontada, launched last year with a press release that-surprise-uses phrases like "transforming the fight against cancer." But it's a data analytics company, using information garnered from patients treated in the US Oncology Network to "help life sciences companies leverage evidence-based data and insights to accelerate innovation." Yet another accelerant.
And then there is Microsoft Cloud for Healthcare, which has partnerships with the Walgreens Boots Alliance, Providence Health System, Humana, and Novartis. Among other things they are going "to transform the exam room by deploying ambient clinical intelligence solutions that capture, with patient consent, interactions between clinicians and patients so that clinical documentation writes itself." OK, that admittedly sounds like an improvement on EPIC documentation, but what else will those ambient clinical intelligence solutions do? Dictations first, but eventually something more intrusive, I'd bet, and definitely something that involves coding for billing purposes. I could go on and on, but I'm getting tired of transforming accelerants.
We mustn't imagine that there is anything inherently nefarious in any of these arrangements. If data analytics allows one to generate new agents that tackle dangerous cancers, prolonging and improving the lives of my patients, that's for the collective benefit of society. It's a better use of Big Data analytics than, say, their role in the 2016 election. Once large, connected medical databases existed, someone inevitably was going to dissect them with the intent of extracting further knowledge and revenue from the health care system. That's the modern digital age, the age of artificial intelligence and machine learning, and there's no escaping it.
But all this does raise issues. The data being mined is coming from patients, and the patients receive no direct benefit from the use of their data. Whether they receive indirect benefits depends on the purpose to which their data is used. But it is a reasonable guess that when a new drug comes on the market based on mining genomic data, there will be no patient discount for having contributed your genome and clinical outcomes. Nor are you ever really asked how your data should be used, or whether you wish to share it with some corporate entity. At most you might be asked to sign some form in clinic giving permission for unnamed others to do whatever they want with the most personal aspects of your being, albeit with appropriate HIPAA protections.
If those protections still mean something, which is open to question, the deidentified data used by the data analysts is supposed to be safe from a privacy standpoint: just subtract a few crucial facts, and no one should be able to identify you. Maybe, maybe not. Luc Rocher and colleagues, in a 2019 Nature Communications paper, demonstrated that "99.98% of Americans would be correctly re-identified in any dataset using 15 demographic attributes." This was true even for what they term "heavily incomplete" datasets. Privacy is a relative term, ultimately depending on the goodwill and care of data analysts I have never met, many of whom seem to reside in St. Petersburg.
What impresses is how rapidly all this is happening, and happening essentially without oversight or public input. Will tomorrow's oncology clinic be radically different from todays? Will an ambient intelligence (whatever that is) listen in, creating my notes through some synthesis of my interaction with the patient and the existing electronic health record? Will that patient receive a new blockbuster drug resulting from a prior patient's genomic sequencing? Will my clinic be run more efficiently and safely as a result of data analytics? I can imagine all these things, or none of them, but I suspect I'll find out soon. In the meantime, I picked up some good books at the local bookstore today. I think I'll read them now.
GEORGE W. SLEDGE, JR., MD, is Professor of Medicine at Stanford University. He also is Oncology Times' Editorial Board Chair. His OT writing experience has been recognized with an APEX Award for Publication Excellence and a FOLIO: Eddie Honorable Mention Award. Comment on this article and previous postings on his OT blog at bit.ly/OT-Sledge.