Over the past few years as we implemented our analytics platform for businesses of all shapes & sizes, more often than not I was asked if our system was capable of delivering Predictive Analytics. It occurred to me that management was perhaps trying to find a compass to unearth the magical pot of gold at the end of the rainbow, in vain and this inspired me to write a post.
Spoiler Alert — Like the mythical treasure, the best one will find at the end of the “forecast rainbow” is dirt, perhaps because the fundamental construct of rainbows are misunderstood.
Admittedly, there is significant hype about Big Data, Analytics and Prediction algorithms. If concepts such as Deep Learning, Machine Learning and Artificial Intelligence were to deliver amazing predictions, shouldn’t IBM Watson be providing explicit “actionable insights” to help its creator boost its revenue (& profitability) after IBM invested over $1 billion into it? Clearly, it’s performance on the stock market says otherwise. The chart below depicts the stock price of IBM over the past 5 years. Watson was released as an analytics product in 2014 and the share price dropped considerably till early 2016 (presumably nothing to do with Watson but due to a bunch of externalities).
Well, in simple terms, this domain is not only nascent but also misrepresented to a degree, creating a perception that it can perform miracles for anyone. Undoubtedly, there is power in the analytical engines – but the engine is not the problem.
So here is the deal, for predictive analysis to work meaningfully — three fundamental concepts need to hold true: 1Training dataset
1Training dataset– Every prediction model needs what’s termed a training dataset. Therefore, if you want a good prediction model for YOUR business, you need good quality historical data of YOUR business
2Data volume – All prediction models are based on statistical techniques and the field of statistics is strongly grounded in “data quantum” for the results to be “statistically significant”, i.e. you need a large corpus of (good) historical data
3Reliability – History should be a reasonably good indicator of the future
Which one of above three aspects do you think is the primary reason for prediction models to fail?
Based on our experience, without an ounce of doubt, it’s the quality of the training dataset, i.e., the data that sits in ERP systems, CRM, etc. We haven’t seen a single company that has “clean data” which can be directly fed into prediction models. Why? There are many reasons but they can be essentially distilled into the following:
- A user’s lack of understanding of the fields / forms in (complex) systems
- Fundamentally bad system designs that allow errors during data capture
- Incorrect business rules applied to the data as it flows through the organisation
- Inconsistent data definitions and interpretations across different parts of the business
- Human tendency to cut corners
All of these factors are exacerbated over time with people transitioning through the organisation, acquisitions, new system implementations, and the list goes on. It’s suffice to say that data quality can degrade rapidly in an organisation.
So, how do you know if you have good or bad data in your organisation? Here are a few pointers:
- Do you look at your management reports and wonder what half of the numbers mean or wish you got some metrics that would help you better run your business?
- Do you look at the reports and question it’s accuracy, i.e., you find yourself saying “that just does not look right to me” and take measures to “fix it” yourself?
- Do you trust your reports, BUT have a back office team that downloads data from core systems and apply “business rules” to produce the reports?
These are all signs that your data is not ready for prediction models.
This observation now leads us to two fundamental questions:
- Is forecasting a futile exercise for companies?
- Are there no real life applications for big data based predictive analytics?
Forecasting within a business
I think forecasting the sales / profit for the next quarter (month, year, etc.,) is the most important task of every business leader. They must understand what lies ahead, chart a plan and navigate the company to deliver the results. However, do not expect an automated model to do this for you. I recommend four simple steps in this journey, based on the “Crawl-Walk-Jog-Run Framework”, which when translated to analytics, can be broadly explained as below:
CRAWL – Understand your HISTORICAL data through interactive dashboards that really allow you to convert data into information
WALK – Implement tools to help you track CURRENT budget vs actuals such that you can monitor performance in real time and respond as necessary
JOG – Explore predictive models to guide and facilitate decisions about the FUTURE through concepts such as sensitivity analysis and scenario analysis
RUN – Leverage analytics to adapt and learn as you grow on an ONGOING basis
As you can now see, getting step 1 sorted is quintessential for every business. If we now assume you are comfortable with your management reports, the simplest way to forecast is to build a rolling thirteen-month average model in Excel as your guiding star. If you can beat your historical performance, chances are you are headed in the right direction. At this stage, the results from a complex / expensive predictive model may be +/- 10% of the Excel outcome. Sometimes in jest (and sometimes not), I advise my clients “Don’t build a computer to achieve what a calculator can do”. You have to mature before you can actually get the license to drive a powerful predictive model.
Role of big-data based predictive analytics
So what does this mean for all the hype around big data & predictive analytics. Let’s go back to IBM Watson and look at an amazing feat it accomplished in Aug 2016 as per the excerpt¹:
The IBM Watson AI super-computer has saved a woman’s life by successfully diagnosing a rare form of leukaemia in minutes, a task which had baffled doctors at the University of Tokyo for months. The disease was identified after Watson spent just 10 minutes cross-referencing the patient’s genetic changes with 20 million cancer research papers.
Why does Watson do so well in one task and not in the other?
Here is the breakdown of the facts behind this use case:
- Good Training data – IBM Watson had an impeccable training dataset, with no errors, to learn and understand patterns
- Massive Volume – Watson churned massive quantum of data that would be impossible for a human to process and find this match in a relatively short period of time
- Reliable History – The problem provided to the machine was clearly a subset of the data it held, i.e., the past was an accurate predictor of future
It turns out that predictive analytic engines are indeed extremely powerful and useful when the three criteria mentioned above are satisfied. There are a number of real-life, useful applications for big data, in addition to the one above and some are outlined below:
- Predicting sales based on data coming out of a point-of-sale system from retail outlets
- Improving commute performance of public transport systems based on data streamed from Internet-of-Things devices (premise for smart cities)
- Increasing fraud detection and prevention based on credit card transaction patterns
In summary, you are best served to understand the stage of evolution of your company and where you believe the organisational Data Quotient lies. Manage this journey one step at a time for it’s not the pot of gold at the end of the rainbow that matters – it’s where you start and how you enjoy the process of creating the rainbow, which by and itself is magical to behold.
 – The Huffington Post