Facts and Stats – Things to consider when analyzing Data

by Dr. Jon Hart

We have an unfathomable amount of data available to us today in healthcare. Beyond our clinical data, we have multiple discrete fields of other bits and bytes that we can look at to discover better ways to care for our patients. Sometimes, though, in our zeal to find problems or solutions, we get things a bit wrong in putting the data together into useful, actionable insights.

A couple of years ago I wrote about raising 3-legged chickens or eating them. The point was to know your end goal and be able to measure it. That’s the starting point, but even when we think we know our final goal, we often get misled by poorly aggregated or analyzed data. 

Where data missteps are the most public and visible are in clinical trials and the medical literature. A study in 2015 showed that a “systematic review and meta-analysis investigating fabrication and falsification of research found that 33.7% of those surveyed admitted to questionable research practices, including modifying results to improve the outcome, questionable interpretation of data, withholding methodological or analytical details, dropping observations or data points from analyses because of a ‘gut feeling that they were inaccurate’ and deceptive or misleading report of design, data or results.”(1) 

An Article in Nature(2) this past summer stated that at least one quarter of clinical trials had problematic or fabricated data. This includes both incorrect data collection as well as interpretation of the data. Either way, when this occurs, the data doesn’t serve us in our search for meaningful truth.

We also need to guard against these errors when we’re looking at our population’s data, searching for issues and answers. Let’s take a bit of a whimsical look at how we can get confused by “the facts” when drawing conclusions or sharing information based on statistics (and our mistaken interpretations).

Comedian Demetri Martin used to do a bit called “Fascinating Facts,”(3) and I’d like to use some of his factual findings to illustrate common stumbles in the use of statistics and data. The “facts” come from Mr. Martin:

  • Failure to adjust for multiple comparisons. 

    • It’s safer to fly in a plane than it is to fly in a car. True, but we need to ensure we are comparing and analyzing the right processes. Don’t wholesale compare care at home outcomes with those from the ICU.

    • You’re more likely to die in a terrible accident than in a wonderful accident. Be clear as to what variables are being measured. In this case, the descriptor of the accident is measured, not the incidence of an accident or death. I see this logic applied to hospital marketing often. ☺

  • Sampling bias. 
    Over 85% of German Shepherds are dogs. (The rest are Germans who herd sheep.) Specificity is important. Is our sample of sheepherders in Deutschland skewed by the inclusion of dogs? If you’re looking for Medicare Part B spend the PCP can directly impact, start by filtering out Ophthalmologic and Oncology drugs or you’ll only see the dogs. 

  • Data dredging (multiple analyses are conducted on a dataset to identify significant associations, without a clear hypothesis or theoretical framework)
    Nearly 45% of all Americans are torsos. Yes, almost half of our bodies are composed of our torsos. True, but not useful. Start with your hypothesis, then measure the right thing, rather than trying to find your hypothesis in the data.

  • Mis-stating the significance of the results.
    By the age of 90, the average person has been dead at least 8 years. How you state your result is important. Contextualize or restate what you’re truly trying to say. While true, this stat just sounds like you’re being mean! Are you expressing the average age of death or just calling someone really old?

  • Effect of confounding variables.
    Experts believe there’s about 25% more camouflage in the world than we realize. Probably the most significant of all the Fascinating Facts: we don’t know what we don’t know. Be willing to admit what’s unknown. 

Data analysis and interpretation are essential in pop health and VBC. Do it right.

  1. If you’re looking to solve a specific problem, limit the variables in the data (preferably to one). 

  2. Be specific. 

  3. Pont toward an actionable answer.

  4. Let the data speak to you. It's OK to go into it with an opinion, that’s your hypothesis. However, let the information guide you, not the other way around.

  5. Clearly state the problem, the data insights, and the proposed solutions.

  6. Admit what you don’t know.


1 Thiese,Matthew S. Zachary C. Arnold, and Skyler D. Walker (2015). “The misuse and abuse of statistics in biomedical research.” Biochem Med (Zagreb). 2015 Feb; 25(1): 5–11.

2 Van Noorden, Richard (2023). “Medicine is plagued by untrustworthy clinical trials. How many studies are faked or flawed?” Nature 619, 454-458 (2023)

3 Martin, Demetri (2017). https://www.youtube.com/watch?v=RYR4k4Q44k8

Previous
Previous

Aligning Physician Compensation with VBC

Next
Next

Health Literacy