Data is Messy

“The goal is to turn data into information and information into insight.”

— Carly Fiorina, former Chief Executive Officer, Hewlett Packard

In my last blog post, “Moneyball for Healthcare – Why Hasn’t it Happened?” I described how baseball eventually adopted data science principles, but healthcare still hasn’t. The reasons why healthcare hasn’t yet applied the principles of data science are many, but one issue is a lack of understanding what the principles of data science are.

In my next several blog posts, I’ll describe the principles of data science, specifically applied to healthcare, and contrast them to how healthcare currently manages and thinks about data. Understanding and applying data science principles will help transform our global healthcare system to one that is sustainable. The first principle I’d like to discuss is that data in the real world is messy and complex; even seemingly simple concepts can become quite difficult when you look closely.

In our current healthcare system, data is often measured by one-size-fits-all simple measurements – take a pain score for example. Pain scores are utilized by every hospital and physician clinic. Most are recorded by nurses or patients themselves using a scale that ranges from 0-10 or 0-100, where 0 is no pain, and 10 or 100 is the worst pain imaginable. This scale is termed the Visual Analog Scale (VAS), shown below. Sounds straightforward, right? Not when we take a closer look.

Several years ago, we were in one of our hernia Clinical Quality Improvement (CQI) meetings, discussing how best to measure pain. We had recently attempted to improve patient outcomes by implementing a surgical pain management strategy that included a long-acting local anesthetic that was given before and/or during surgery. We initially were looking at pain scores to measure the impact of our attempt to improve outcomes, but as we looked at the data, the pain scores weren’t making sense in some cases. We saw some patients who had very low scores and then a single, very high score for no obvious reason. When we saw this pattern in enough patients, we knew we needed to investigate. Our patient care manager and I took our data and went to the surgical floor to speak with the nurses who were recording pain scores.

It turned out to be a pretty quick investigation. When we explained our question and showed the pain data, the nurses immediately solved the mystery. “We make up those scores so the patients can go walking in the hall and do other activities more comfortably.” Apparently, the hospital administration had implemented a rule that only one pain pill can be given if the pain score is five or less. So, while many patients request two pain pills if they are planning to walk in the hall or do other types of physical therapy, the nurses weren’t allowed to give out two pills unless they recorded a score greater than five. This led to nurses recording a high pain score so they could help minimize their patients’ discomfort while walking, even if patients weren’t in pain prior to the walk. Additionally, this would allow them to get more walking in, possibly helping to decrease the chance of blood clots or pneumonia. Nurses were essentially lying to help lessen the suffering of their patients and potentially decrease post-operative complications.

Based on this conversation with the nurses, it was apparent that the use of pain scores as an accurate measurement would not be possible. Instead, we decided to collect the actual amount of pain medications given and length of hospital stay as a better reflection of post-operative pain.

In another example, we were about to see a new patient in the clinic who had an incisional hernia that developed after a prior abdominal operation. We looked at the form the patient had filled out and saw that the patient noted their pain was a ten out of ten, the worst pain imaginable. I was expecting to see a person curled up in the fetal position, sobbing when we opened the door. Instead, the patient was smiling and greeted us as if we were meeting at a dinner party. The patient was calm and looked to be in no pain at all. As we talked, the patient began to tear up a bit and apologized, explaining their mother had died unexpectedly just a few days earlier and they were still in shock. The ten out of ten score was clearly mostly due to the emotional pain from the recent loss of their mother, not necessarily from pain caused by their hernia.

These examples of what seems, on the surface, to be a simple data point (the numerical value of a person’s pain) is not simple, but complex and messy. In healthcare, we try to collect data in a uniform, static, one-size-fits-all way but in reality, it will never be that simple. So, what are the principles of data science that can be used to deal with this data messiness? Data science is all about measurement and improvement. Any data point, like an assessment of pain, should be improved over time, which means the clinical team has to be given the authority to change how things are measured. And most importantly, if we know data in healthcare is messy, we need to not try to control it, we should simply aim to learn from it.

Over the next several blog posts, I’ll focus on each of the concepts in bold:

Decentralizing data into a specific context in each local clinical environment
Diverse small teams with different perspectives for any disease or patient problem should be defined and given authority to apply data science to real patient care
Data that matters the most to outcomes that measure value for any whole patient process should be collected
A variety of data analytics and visualization tools can be used by the clinical team to gain insight into how to improve the measurements and how to improve value-based outcomes
The variety of analytical tools available are not used to prove or disprove a static hypothesis, but for the clinical team to gain insight
Insight is gained by discovering weighted correlations, not causations and interacting with the data, not by passively viewing static dashboards – this is the appropriate use of a human-computing symbiosis that can be used to improve the outcome of any complex process
Networking these insights and the algorithms that come from the analytics with other clinical teams caring for similar diseases and patient problems will allow for the ongoing improvement of value, leading to a sustainable healthcare system

Data Series: Decentralize the Data

Moneyball for Healthcare - Why Hasn’t it Happened?

Home About Blog Contact