Data for everyone and more bad ideas

Rick
October 26, 2021

The Parable

You visit your doctor for a regular health check. Unfortunately, there are a lot of patients that need diagnoses and the doctor does not have time. But he heard of this new idea of data democratization and since you are the expert of your body he does not let you wait and sends you a link to the following image. "You can find the data there, I use it all the time to check for lung cancer".

Image: CT Scan from Wikimedia [1]

Sounds like a bad idea? How about:

You visit your data team to check the performance of a marketing campaign. Only this time, there is a lot of colleagues that need analytics and the data team does not have time. But they heard of this new idea of data democratization and since you are the marketing expert they do not let you wait and send you a link to the following table. "You can find the data there, I use it all the time to calculate conversion rates".

Table: ANALYTICS.POSTHOG_LANDING.EVENTS_SESSIONS

The Typical Problem - More Questions than Answers

News, Information, Data are addictive. Humans want to know more. They get an insight and this only increases the thirst for more.

There was an election. Who won? Who lost? By how much? In which regions? Compared to the last period? ...

Any data team feels that, for every insight they provide they get two more request. Without this thirst, we would still hunter & gather our way through the Savana and not care about data.

That is why, data teams will always be under pressure. However, their impact on the organization can vary highly from "most decision are backed by data, some even automated" to "the slide for the board meeting needs to get updated, we found a last-minute data quality problem".

Some (Bad) Ideas to Release the Pressure

1. Give everybody access to all the data, so they can answer their questions in self-service.

This is not computationally kind. [2] Now, everybody has to think through all the things that could be prepared: how can data be accessed, are there data quality problems, what does this column mean, how does this table relate to the other one, how should this data be filtered and aggregated, ...

The result is exactly what you expect, people waste a lot of time, everybody does it differently and finally the data team can not help effectively, because they themselves can not debug all the unique ways people approached analytics

"I have three consecutive vlookups in Excel, the result looks odd, please find the error".

2. Rapidly scale up the size of the data team to answer more questions.

This might actually work. But will be expensive and slow, with diminishing returns.  New comers will need to learn 2 things: your business, your technology. Maybe you can skip the second part with some hires, but the first one will take the longest anyway.

3. Serve only one department and ignore the others, until they escalate to the C-Level.

Ok ok, that is not an idea, but emerging behavior. I hope it is obvious why this is bad.

4. Buy an AI to answer business questions for you.

AI in 2021 is 100% artificial and 1% intelligent. The 1% can yield amazing results on narrow problems, especially with a lot of good data. But for business questions, context is king and humans have so much more of it.

Caveat: "There are no bad ideas in tech, only bad timing." - Andreesen Horowitz [3]

AI is still too early of an idea to rescue your data team from the thirst of curiosity.

A better way forward

1. Understand where you are.

Do you have a centrally managed data infrastructure that other teams can use and get onboarding support on? Is data owned and maintained by the data team or is every business unit responsible for its own data?

2. Define where you want to be.

Tip #1: You always want to have a centralized infrastructure. The cost of maintaining multiple stacks with highly overlapping capabilities is never worth it in the long term.

Tip #2: "A complex system that works is invariably found to have evolved from a simple system that worked." - Gall's Law; going from data mess to data mesh is easier if you first stop in data excellence. One team that owns the infrastructure and the most important data assets of a company is an important nucleus to get to a data-driven organization.

Tip #3: Depending on the size of the company, it can be perfectly fine to have centralized data ownership. However, consider moving up to the data mesh model, if (1) your team loses productivity because of constant context switching between domains, (2) the team gets too big, and people can not collaborate anymore.

Tip #4: Excellence means the team has high standards and delivers excellent insights. It also means that other teams often start begging for attention from "Your Excellency" to get certain analyses.

3. Get there.

That will be another blog post.

Conclusion

Do not just give everybody access. You know your data is not ready for someone who barely knows SQL to get a trust-worthy insight. Be computationally kind with your non-technical audience.

First build a strong nucleus with the right infrastructure and an excellent data team. Everybody should take data-driven decisions, but not everybody needs to have access to all the raw data.

At Snowboard we help to build this strong nucleus, so teams can find, understand and trust their data and metrics.

PS: the image of the CT scan contains a tumor

[1] https://commons.wikimedia.org/wiki/File:Adenocarcinoma_-_CT_scan_(5499628365).jpg

[2] https://www.goodreads.com/quotes/tag/computational-kindness

[3] https://ritholtz.com/2020/02/no-bad-ideas-only-bad-timing/