Data Series: Decentralize the Data

Decentralize the Data.jpg

One of the early visionaries for the internet was JCR Licklider, known as “Lick”. Lick was a computer scientist and a psychologist. He earned a PhD in psychoacoustics from the University of Rochester and then worked as a research fellow at Harvard. He eventually moved to MIT to be an associate professor as his interest in computer science grew. He was known to be outgoing, collaborative, and kind.

Later in his career, he was working late one night at the Pentagon and a female custodian was cleaning his office. She mentioned that she often left his office for last so she could admire the artwork on the wall as she cleaned. He asked her which painting she liked the most. She chose a Cezanne, which was also one of his favorites, but he promptly gave her the painting as a gift. This is the kind of humility and kindness that is essential for applying data science appropriately in healthcare.

Lick and the other early pioneers of the internet recognized that a key to success would be to decentralize the network and allow each node to have equal authority to switch and route the flow of data. At first, the major universities rejected the idea of sharing their computers and authority with others in the network. MIT and Stanford were the first to dissent. But when threatened with losing funding for new computers, they relented and joined the network. Similar to this application of data science for the development of the internet, we will need to decentralize healthcare data and give authority to local clinical teams to use data to improve value-based patient outcomes. The value of decentralized data is becoming recognized In other industries, but it’s not the current strategy in healthcare.

Hospitals, academic medical centers, and large physician practices have centralized data into fragmented silos, called electronic medical records. The current strategy is to centralize the data even more into data lakes or data warehouses, but centralized data actually limits the ability to learn from it. Centralized data can generate averages but not insight. Averages lead to one-size-fits-all approaches, but patients are not average. Obtaining insight that leads to improved patient outcomes while minimizing cost, maintaining privacy, and maximizing security is only possible through a decentralized infrastructure. The cybersecurity risks that occur on the internet are because of companies that centralize data, not because of the internet’s decentralized design.

Imagine if a large medical center decentralized their patient data into each local environment in the context of each patient problem to be able to gain insight by each clinical team. What hacker would want to attack tens of patients’ data for a rare disease or even a few thousand patients’ records with a common problem like an inguinal hernia, instead of going after centralized data in other healthcare systems that store millions of patients’ data in one place?

Our small hernia team has worked with decentralized data for the past ten years. Decentralized data is flexible and adaptive. As our clinical team learned to interact with the data, we learned you can change a measurement in real time – outcome measures can be modified as you learn. Centralized data is inflexible and is typically viewed on static data dashboards. It attempts to have uniform definitions for all data, regardless of the context of how that data was produced. But any clinician knows that you can’t have uniform definitions for different clinical situations – a wound infection in a compound fracture will be different than a wound infection after an elective abdominal wall hernia repair with mesh. Attempting very generic and static definitions inhibits the ability to learn and improve patient outcomes.

In addition to lower costs, better data security, and availability of data, decentralized data allows for the appropriate use of analytics and visualization for the clinical team to gain insight to improve outcomes. I’ve met with a large research group that has access to all of the Veterans Administration healthcare data. The analysis of this centralized “big data” requires massive computing capabilities and can generate very accurate algorithms, but only for the average patient from the total population. For example, they can generate algorithms that can predict almost exactly how many veterans will suffer a heart attack in the next year, but they can’t identify who. When the algorithms generated from centralized data are applied in each local environment, they fall apart and are not accurate at all.

Even worse, centralized algorithms may contribute to unintended harm by poorly representing patient populations at the margins of the centralized dataset. In one example, a study of a frequently used algorithm resulted in a bias against Black patients. The algorithm is intended to identify patients who are at risk and would benefit from greater resources. The researchers found that almost half of the Black patients should have been identified to receive additional health resources, but the algorithm identified less than 20%. In another recent example, a large centralized dataset was used to evaluate a treatment for COVID-19. Ultimately, many flaws in evaluating the data in such a large dataset were recognized and the publications were retracted. (In previous blog posts, I described how to evaluate the appropriate use of a drug for COVID-19 and how to implement a decentralized data and analytics infrastructure to better manage a pandemic.)

Companies can make a lot of money charging for access to a large healthcare dataset, analyzing “big data,” or providing population health algorithms from large centralized datasets, but that isn’t how data science works. Before I learned the principles of data science, I worked with and participated in several healthcare data registries – it seemed natural to look at the data all together. I didn’t yet realize the impact of local environmental variables and the need to analyze and interact with data at the local level if the goal is to improve outcomes. The centralized healthcare data registries are great for generating publications, but not for gaining true insight or improving patient outcomes.

To achieve a sustainable healthcare system the data science principle of decentralized data will be necessary, and this requires a change in leadership thinking. Traditional 20th century leadership is about command and control management with centralized data used by a privileged few as a scarce resource. The 21st century leader requires humility and a servant leader mindset, recognizing the value of the front-line team. Decentralized data readily available for the clinical team to gain insight and use that insight to improve outcomes requires trust and authority from the leadership – humble leadership is not weak leadership. In exchange for the access and authority to manage and analyze patient data, each clinical team will be accountable to measure and improve value-based outcomes that will benefit the patient, the organization, the clinical team and the system as whole.

Previous
Previous

Data Series: Context is Key

Next
Next

Data is Messy