4 Questions many Data Scientists can't answer.

Are you a Data Scientist in a commercial organisation? Are you leveraging Advanced Analytics, AI, Machine Learning to super-charge the decision making powers of your business?

If you are, can you answer these four questions posed in this article? If you answer NO to any of these four questions, then the chances are that your Initiative has failed to achieve value creation. Gasp!

Over the last 20 years, I have built, managed and overseen hundreds of Insights initiatives in different organisations. Some have been wildly successful, but some have failed to make an impact and failed to create value. Today, I spend time with many different organisations and Data Scientists alike, and when I am reviewing initiatives, I always start with five questions. Data Scientists and insight team are generally able to answer Question 1, but struggle with questions 2-5.

How many can you answer?

Setting the Scene

I start with the concept that your Analytics Initiative in its most basic form will have an Output and an intended Business Outcome. Your Output is what your Initiative will produce, I.e. in Machine Learning think of this as a prediction, forecast, etc. A Business Outcome is how you are trying to create an impact in commercial terms, i.e. some indicator of improved business performance such as revenue up cost out, the margin increased, risk down.

"A prediction from your Machine Learning model is an Output; it is not a Business Outcome."

Before we even enter into these questions, we make an assumption (for this article) that we have an Initiative that is Strategically aligned and appropriately defined in business and commercial terms.

Question 1: What is the Output of your Initiative?

Almost every Data Scientist can nail Question 1, often with dissertation level detail.

At this point, we usually get lost in long-winded complex discussions on feature engineering, modelling techniques, loss functions and evaluation metrics etc. The response is almost always of the form ... my model predicts this, forecasts that or optimises this and this is how I did it, evaluated it, optimised it etc. Data Scientists in most cases have a deep understanding of how to build models and the right processes to follow in building, evaluating and improving models.

However, the team often start to struggle when the questions go beyond the model's output.

Question 2: What is the intended Business Outcome of your Initiative? in Commercial Terms, please ...

Many Data Scientists struggle to answer this question. I often find myself having to reiterate the difference between an Output and a Business Outcome. The output is what comes out of your model, the outcome is how that Output is used to create a value-adding outcome to the business.

Predicting a customer has a higher probability of leaving is an output, enhancing margin through targeted interventions that reduce high-value customers leaving is a Business Outcome.

If you are not clear on the intended Business Outcome in commercial terms, then it is impossible to measure commercial impact (what are you measuring?) and even harder to define the processes and decisions that would lead you to the Outcome.

If you have not been able to answer questions 1 & 2, stop now as there is probably very little chance that your Initiative has been successful in creating value for your business.

Now it is getting a little harder... Question 3 is where it starts to come unstuck

Question 3: How do you measure that Business Outcome is being achieved?

Many Initiatives that get developed have almost no consideration for measuring the impact on a Business Outcome. I want to be very clear that a high-quality evaluation metric on your test and validation data set is reflective of a robust model, i.e. a reliable Output. However, that is not reflective of a value-creating Business Outcome. We need to differentiate between the two measurements.

As Data Scientists, we need to be able to provide a scientifically robust measurement of incremental value (in commercial terms) that is directly attributable to the Initiative. If you are not familiar with areas such Experimental Design or Causal Impact, make it a new focus area.

"If you don't measure the Business Outcome, then its is difficult to understand if you Initiative has had an impact on the performance of the business.”

Now it is getting much harder... Question 4 where it all tends to come unstuck

Question 4: How does the business get from Output to Business Outcome

If your model produces an Output (prediction, forecast, scored label, etc.) and you have a clear definition of a Business Outcome. How do I get from Output to measured Business Outcome?

What does that mean?

Who in the business uses the output? How often? What actions do they take based on that insight? What interventions (and how many) need to take place for the Business Outcome to be achieved? As Data Scientists, we need to think beyond the model output and understand that:

"Somebody in the business has to do something with the Outputs of your model for a Business Outcome to be achieved."

Understanding how to move from Output to Business Outcome is what we refer to as "Mapping the Decision-Chain". A Decision-Chain is a detailed mapping of stakeholders, outputs, interventions and actions that move you from an Output to a Business Outcome.

If you can't define how you get from Output to Business Outcome, then it is unlikely that your Initiative has been integrated into a decision-making process and even less likely your initiatives are making an impact on value creation.

"If nobody uses the Output of your Model, then its improbable to suggest that your target Business Outcome Is being achieved.”

If you have managed to get successfully through Questions 2, 3 & 4, then well done! You are an Insights super-star and probably driving significant value to your business. However, Question 5 looks at balancing how effectively and efficiently you get from Output to Business outcome.

Question 5: How good does your model need to be, why?

I ask this question because it directly impacts how long you need to spend getting your model right. We must ensure that our model is fit-for-purpose. To understand fit-for-purpose, we need to understand the risk and impact on the Business Outcome if your predictions are inaccurate?

See post, Fit-for-purpose-model-performance: assessing risk & impact of your model.

There are certainly times when we need high performing models and long development periods to achieve that (i.e. Autonomous cars not running people down in crosswalks). However, in commercial realities, decision-makers often just need something slightly better than they have now to start making an impact.

One of the primary mechanisms I use for determining a required performance level is by looking at the length of the decision-chain. The length of the decision-chain refers to how many decision-makers and interventions are involved in getting you from Output to Business Outcome. Far too often, we see a significant amount of human-decision-making sitting between Output and Business Outcome. What this means is that your insight is a directional decision-support tool at-best. What this also means is the risk and impact on the business from inaccurate predictions is going to be inherently lower. If you find your Output right next to your Outcome with little to no human intervention, then that is a great outcome, and you should be very mindful of the performance of your models (i.e. High-Frequency Algorithmic trading)

As insight professionals, this does not mean that we need to be comfortable with rubbish Outputs; we still need to hold our selves to high levels of quality. But in these situations, you may need to ask yourself, would a simple solution suffice? Maybe this is a process problem, and not an Analytics problem. Ultimately, we think about how to start making an impact on the Business Outcome as quickly as possible.

If you can't provide a clear definition of how good your model needs to be and why then there is every chance you have spent way too long developing a solution that is not fit-for-purpose.

This is not to say that a better model does not result in a better outcome. What this means is that to make immediate impact, a good model, built in a short cycle, may be fit-for-purpose. Over time you can always make it better... just don't spend a huge amount of time upfront getting to your first release, if your risk and impact are low!


In summary, there are some brilliant people, doing amazing things in the commercial world to help super-charge decision-making. However, there is also room for improvement, and I believe that a lot of that actually exists beyond the models output.

Challenge yourself and ask yourself these questions about your latest Machine Learning Initiative:

  • Can I clearly define the Business Outcome in Commercial Terms?

  • Can I measure the incremental value (Business Outcome) that is directly attributable to my Initiative?

  • Do I know how (in specific detail) the business gets from the Output of the Model to a measured Business Outcome?

  • Can I clearly define how good my model needs to be and why?

Whilst, some people do this incredibly well, I have seen this to be a significant gap in many of the projects that I have seen over the years.

If you have further questions please contact me: peter.inge@ingeniousinsight.com web: www.ingeniousinsight.com