Where data comes from for data-driven decision making

Has it ever occurred to you that you could be headed to a bizarre and macabre death caused by getting tangled in your bedsheets, and that controlling our collective cheese consumption could be the thing that saves you from that fate? Per capita cheese consumption correlates with the number of people who died by getting tangled in their bedsheets.

chart: Per capita cheese consumption correlates with People who died by becoming tangled in their bedsheets
“Per capita cheese consumption correlates with Number of people who died by becoming tangled in their bedsheets” by Tyler Vigen is licensed under CC by 4.0

But that’s not really a thing. It’s a spurious correlation – factors whose variation makes it appear that they’re related to each other, even though they’re not. And it highlights why being able to get a number or plot a graph might not be giving you the business insight you think it is.

This might sound like something that just needs better math to sort out, but really, this problem is rooted in the very nature of data, and complicated by the very human nature of decision-making.

You can watch the video version of this post, or keep reading below.

Let’s look at how data is created and used in the context of your strategy.

All strategies start with a vision of what you’re interested in doing or achieving, or where you want to go, which you translate into some measurable and trackable goals. In other words, you’ve determined some pathways to achieving your vision, and you’ve got some signposting and accountability in place. Using data strategy to support this means that you’ve determined specific data that you can collect that will give you relevant clues about the context of your decisions and your progress along the pathways, and that you’re recording that data in a systematic way with the intention of using it to support decision-making, including assessing, deciding on, and maybe even revising your pathways to your vision. Sounds pretty standard, right? But it doesn’t really capture how this gets implemented.

In determining what to record, you’ve simplified a pretty complex world into a set of easy-to-use variables or clues that make up your raw data.

In the case of our spurious example, we’ve reduced the world to the annual supply of cheese in the US, the population of the US, and the causes of death as classified and recorded by the CDC. Note that even our calculation for per capita consumption is a rough estimate based on the supply divided by the population. That’s not an actual measure of what was consumed.

Similarly, when you record client data for a CRM, you’re simplifying a pretty complex human being to a set of easy to record fields, like name, zip code, gender, and how you acquired that client, and you simplify your client’s history with you to data such as revenue from a purchase and date of purchase. Then you aggregate it all into calculations for things like Customer Lifetime Value to approximate how much money you think you can expect from a single customer going forward, on the assumption that this simplification adequately captures everything you need to know.

So your data is a simplified representation or model of things in the real world. And your calculations, your mathematical models, are approximations.

Now, even though you’re simplifying the world, you’ll still end up with too much data to just look through for insights.

So you classify that data, aggregate that data, and create stories about that data so that it’s more digestible and usable for decision-making.

In other words, you explore and analyze and curate.

If you download the USDA cheese data used for our spurious example, you’ll see that what was charted is the total of the so-called “natural” cheeses. According to the USDA classifications, a lot of processed cheeses are made from natural cheeses, so we can’t just add them in or look at just the processed cheeses, because there’s no good way to tell them apart. At least, not with this data. So we ignored it.

We curate like this all of the time, like when you define a classification system, which, overall, really is a good thing, or when you chart data together to show a trend, or when you juxtapose charts on dashboards, with the thought that these things all make more sense in the context of the other things, or should be interpreted in the context of the other things.

So, the nature of data – a simplified model of the world – is already interrelated with human decision-making – how to simplify and curate. But it doesn’t end there.

In looking at our data as part of strategic decision-making, or talking about ourselves as data driven or assisted or informed, we’re not giving credit to all of the things we’re recording just in our memories that are also playing a part in our decision making, such as:

  • our perception of things that have and haven’t worked before, or
  • the relative importance of a certain event in the grand scheme of things, or
  • how much I love cheese and am invested in a pathway that lets me keep it.
dipping sales in March to May of 2020, with an "obvious" COVID explainer in head only

In the case of the CRM, if all you were looking at is the data we already discussed, how would you determine a reason for the dip in sales in 2020 during the pandemic? You would do this by bringing in other things you know about the world, things that aren’t in your data, and things you’re probably not recording as part of your decision-making, because they’re just too obvious.

With all of the simplifying and curating, and all of our memories and biases coming together, it can be easy to get nudged off course, a little a time, until you’re heading in the wrong direction.

How are you expected to tell the difference between your data helping you achieve your goals, and your data tricking you into giving up cheese and bedsheets unnecessarily? Luckily, there’s a tool you can use that will help you reflect on your big decisions and how you’re simplifying the world and curating and using data, both inside and outside of your head. It works by turning your decision-making process into data you can audit. In my next video/blog, I’ll explain this type of decision auditing and walk through a template you can use.

Have you had an experience with a spurious correlation?

How have you been misled by a spurious correlation, or simplified a complex thing or phenomenon in the real world for easier data capture? I’d love to hear about in the comments!

Author: Barbara

Barbara is the Managing Member and Primary Consultant of Blou Designs LLC

Leave a Reply

Your email address will not be published. Required fields are marked *

Are you a robot? *