Healthcare’s Data Problem

In healthcare, data is a primary driver of decisions. The term “data-driven” has even entered mainstream consciousness as a useful, disciplined approach for making better decisions about clinical and nonclinical matters. The assumption is collecting more data will generate more information and knowledge, thus better judgment and outcomes. Data typically comes from:

Electronic health records (EHRs)
Administrative, financial, and operational systems
Disease and patient registries
Claims and pharmacy databases
Clinical trials data from the research literature
Public health and patient surveys

The gold standard for whether to diagnose or treat in a certain way is if that intervention has been validated by clinical trials and approved by professional peer associations and regulators. This has stood healthcare in good stead, allowing us to move forward cautiously in the past century. For this and other reasons, it takes practitioners about 17 years to adopt a new practice, even after it has been proven to be effective and preferable in research studies.

But in the last few decades, dark clouds have cast shadows over this traditional knowledge acquisition model. Here are three examples of significant challenges, along with some ideas on the role of intelligent automation.

Data volume and velocity

Healthcare has embraced big data. The groundbreaking Flexner Report of 1910 set the foundation for all medical education and assumed that all scientific knowledge could be learned in a four-year curriculum. Today, that is not true. Medical research literature has grown exponentially in the last 100 years. In 1950, medical knowledge was on pace to double every 50 years. In 2020, the doubling time for medical knowledge has fallen to a mere 73 days. A medical student graduating in 2020 went through four doublings of knowledge during their studies and what they learned represents only 6% of all medical knowledge.

Researchers looked at what it would take a primary care physician to keep up to date with relevant literature. They estimated this effort would require 627.5 hours per month (29 hours per weekday) or 3.6 FTEs of physician time. In other words, it is impossible for a physician to keep up today.

So, what does a physician really need to know (and what could be stored elsewhere)? In the past, a medical student had to memorize over four years the names of 206 bones, 700 muscles, 78 organs, 30,000 diseases, 20,000 prescription drugs, and much more. In contrast, an attending physician I know said that when he quizzes his students on rounds today, they can find the exact, correct answers within seconds on their smartphones or tablets. Medical education has had to slowly pivot from memorization (‘what’) to methodology (‘how’), although there is core knowledge all physicians should know.

What can physicians do with all the research literature they have not read? In the past, journals published digests and meta-analyses based on editorial judgment. This was useful but will eventually fall short given the growing enormity of the task. Can intelligent automation help?

Enabling physicians to find relevant information quickly can be automated using simplified guided interfaces such as Automation Anywhere Automation Co-Pilot, in which digital assistants can identify, search, and curate findings from multiple sources to extract the desired information.

Several research groups are also exploring the accuracy and reliability of automated text summarization approaches to the biomedical research literature. Inclusion/exclusion criteria for article relevance have been developed based on combinations of natural language processing, machine learning, and statistical methods. These systems have generally achieved good performance and will probably improve over time. Interestingly, some researchers and startups are now using similar AI-powered approaches to analyze patient information stored in EHRs, which could yield useful insights on quality, safety, compliance, and operational effectiveness.

Data diversity

Medical outcomes and healthcare costs are often influenced by nonmedical factors. Providers are familiar with changes in regulatory rules, payer requirements, and business models that often affect the choice of therapy and outcomes of treatment.

Beyond the healthcare system, researchers have identified many real-world factors as Social Determinants of Health (SDOH). Regarding their relative importance to an individual’s health status, socioeconomic factors (e.g., education, job, income, family, safety) contribute 40%. Health behaviors (e.g., smoking, alcohol consumption, diet and exercise, sexual behavior, buying habits) contribute 30%; 10% of health status is influenced by the physical environment such as the local availability of fresh produce. The reality is traditional healthcare services account for just 20% of one’s health status.

If health status is impacted by many nonmedical factors, a future medical practice must consider a whole host of nontraditional data points such as education, access to care, social isolation, income level, food insecurity, and so on, to make the best clinical decisions and set the right priorities.

Where automation will likely play an early role is to systematize the collection of disparate SDOH data from multiple sources and cleanse, curate, and store the data for later analysis and research. Per predefined rules, flags can be raised by combinations of factors that indicate high risk or predict poor outcomes. SDOH research is fairly nascent, so it is unclear how different factors should be weighted and how to incorporate these data into care delivery decisions. But the consensus is this is necessary to achieve the most effective outcomes.

Data as evidence

As the volume, velocity, and heterogeneity of data expand, it is clear that clinical guidelines based on traditional gold-standard approaches such as randomized controlled clinical trials (RCTs) will not be able to keep up. Trials are costly, take months to years, and can be obsolete by the time they are published due to new technologies. In addition, recent regulation has mandated the use of “real-world” evidence in drug development. Data from the real world is not amenable to RCT methodology and will have to be analyzed by observational statistical methods where the selection and assignment of subjects is not under the control of the researcher. A second gold standard will be needed to approve interventions validated not by RCTs, but through observational studies.

Where can automation help? Observational studies can cover a large number of patients whose data need to be collected, verified, processed, and analyzed in a secure and compliant manner. These steps are often manual, repetitive, and error-prone. Automation can speed data gathering while reducing errors. Rules can be applied during data processing to flag exceptions and other phenomena of interest, and real-time dashboards can keep investigators apprised of study status. Automation could also be applied to patient recruiting for identification, onboarding, monitoring, and alerting.

Healthcare’s challenge with data raises more issues than just the three above, but this brief discussion does hint at the scope of work needed to move medicine into a future data-driven world where data comes from nearly everywhere.