I recently had a fascinating and enlightening conversation with one of the leading figures in predictive analytics and business forecasting, Dr. Barry Keating, Professor of Business Economics & Predictive Analytics at the University of Notre Dame.
He is really driving the field forward with his research into advanced analytics and applying that cutting-edge insight to solve real-world forecasting challenges for companies. So I took the opportunity to get his thoughts on how predictive analytics differs to what we’ve been doing so far with time series modeling, and what advanced analytics means for our field. Here are his responses.
What’s the Difference Between Times Series Analysis & Predictive Analytics?
In time series forecasting, the forecaster aims to find patterns like trend, seasonality, and cyclicality, and makes a decision to use an algorithm to look for these specific patterns. If the patterns are in the data, the model will find them and project them into the future.
But we as a discipline realized at some point that there were a lot of things outside our own 4 walls that affected what we are forecasting and we asked ourselves what if we could in some way include these factors in our models. Now we can go beyond times series analysis by using predictive analytics models like simple regression and multiple regression, using a lot more data.
The difference here compared to time series is that time series looks only for specific patterns whereas predictive analytics lets the data figure out what the patterns are. The result is much improved forecast accuracy.
Does Time Series Forecasting a Have Place in the age of Advanced Analytics?
Time series algorithms will always be useful because they’re easy to do and quick. Time series is not going away – people will still be using holt-Winters, Box-Jenkins, times series decomposition etc. long into the future.
What’s the Role of Data in all This?
The problem now isn’t using the models but collecting the data that lies outside our organization. Data these days has different observations. We used to think when we had 200 or 300 observations in a regression we had a lot of data – now we might use 2 or 3 million observations.
“We used to think 200 observations was a lot of data – now we might use 2 or 3 million”
Today’s data is different not only because of the size, but also in its the variety. We don’t just have numbers in a spreadsheet – it may be streaming data, it may not be numbers but text, audio, or video. Velocity is also different; in predictive analytics we don’t want to wait for monthly or weekly information, we want information from the last day or hour.
The data is different in terms of value. Data is much more valuable today than it was in the past. I always tell my students to not throw data away. What you think isn’t valuable, probably is valuable.
Given we are Drowning Data, how do we Identify What Data is Useful?
When the pandemic started, digital purchases were increasing at 1% a year and constituted 18% of all purchases. Then, in the first 6 weeks of the pandemic, they increased 10%. That’s 10 years’ worth of online purchases happening in just weeks. That shift meant we now need more data and we need it much more quickly.
“You don’t need to figure out which data is important; you let the algorithm figure it out”
You don’t need to figure out which data is important; you put it all in and let the algorithm figure it out. As mentioned, if you’re doing time series analysis, you’re telling the algorithm to look for trend, cyclicality and seasonality. With predictive analytics it looks for any and all patterns.
Predictive analytics assumes that you have a lot of data – and I mean a lot
It’s very difficult for us as humans to take a dataset, identify patterns and project them forward but that’s exactly what predictive analytics does. This assumes that you have a lot of data and I mean a lot, and different to what we were using in the past.
Do you Need Coding Skills to do This?
Come to an IBF conference or training boot camp and you will learn how to do Holt-Winters, for example. Do we teach people how to do that in R, Python, or Spark? No. You see a lot of advertising for coding for analytics. Do you need to do that to be a forecaster or data scientist? Absolutely not.
There are commercial analytics packages where somebody who is better at coding than you could ever hope to be has already done it for you. I’m talking about IBM SPSS Modeler, SAS Enterprise Miner, or Frontline Systems XLMiner. All of these packages do 99% of what you want to do in analytics.
Now, you have to learn how to use the package and you have to learn enough about the algorithms so you don’t get in trouble, but you don’t have to do coding.
“Do you need to be a coder? Absolutely not”
What about the remaining 1%? That where coding comes in handy. It’s great to know coding. If I write a little algorithm in Python to pre-process my data, I can hook it up to any of those packages. And those packages I mentioned can be customized; you can pop in a little bit of Python code. But do you need to be a coder? Again, absolutely not.
Is Knowing Python a Waste of Time Then?
Coding and analytics are two different skills. It’s true that most analytics algorithms are coded in R, Python and Spark but these languages are used for a range of different things [i.e., they are not explicitly designed for data science or forecasting] and knowing those language allows you do those things, but being a data scientist means knowing how to use the algorithms for a specific purpose. In our case as Demand Planners, it’s about using K Nearest Neighbor, Vector Models, Neural Networks and the like.
All this looks ‘golly gee whiz’ to a brand-new forecaster who may assume that coding ability is required, but they can actually be taught in the 6 hour workshops that we teach at the IBF.
What’s the Best way to get Started in Predictive Analytics?
The best way to start is with time series, then when you’re comfortable add some more data, then try predictive analytics with some simple algorithms, then get more complicated. Then when you’re comfortable with all that go to ensemble models where, instead of using 1 algorithm, use 2, 3, or 5. The last research project I did at Notre Dame used 13 models at the same time. We took an ‘average’ of the results and the results were incredible.
The IBF workshops allow you to start out small with a couple of simple algorithms that can be shown visually – we always start with K-Nearest Neighbor and for a very good reason. I can draw a picture of it and show you how it works without putting any numbers of the screen. There aren’t even any words on the screen. Then you realize “Oh that’s how this works.”
“Your challenge is to pick the right algorithm and understand if it’s done a good job”
It doesn’t matter how it’s coded because you know how it works and you see the power – and downsides – to it. You’re off to the races; you’ve got your first algorithm under your belt, you know the diagnostic statistics you need to look at, and you let the algorithm do the calculation for you. Your challenge is to pick the right algorithm and understanding whether it’s done a good job.
To add advanced analytics models to your bag of tricks, get your hands on Eric Wilson’s new book Predictive Analytics For Business Forecasting. It is a must-have for the demand planner, forecaster or data scientist looking to employ advanced analytics for improved forecast accuracy and business insight. Get your copy.