On the IBF On Demand podcast, I recently spoke to a real innovator doing great things in the forecasting space. He comes from a disease forecasting background and many of his predictive analytics techniques are directly applicable to business forecasting. We’re not just talking about what will will happen but why, and identifying changes in consumer behavior before they happen. 

His name is John Cordier and he runs a consulting firm called Epistemix. Backed by the National Institutes of Health and the Bill & Melinda Gates Foundation, Epsitemix models the spread of infectious of diseases, and now applies that expertise to business.

There is much we can learn from him when it comes to maximizing the value of our own data. The following are some highlights of that conversation.

Can you give a breakdown of how you approach epidemiological forecasting? Then we’ll dive how those approaches can be applied to demand forecasting.

We approach forecasting from the bottom up. Those in our space would recognize agent-based modeling as our underlying technique, but all you need to know about how we forecast is that we represent every single person in the entire country and forecast based on their behaviors.

In the infectious diseases space, our technique has found is that within the United States there are 9 “ontological units of epidemicity.” Meaning there are 9 regions in the US with distinct seasonal behaviors. The data tells us why in late spring COVID-19 cases go up in the South but not in the Northeast and then inverts in fall, and so on.

We’re able to generate these seasonal patterns from human behaviors. There’s so many behavioral variables you can build into a model. What we end up doing is calibrating to the one or two most important pieces of data that we prepare the model against. With COVID we disregarded cases because case counts was too noisy for us to have any accurate projections so we use hospitalizations and deaths which are more stable, yet not perfect.


So if I’m understanding this correctly, you’re looking at the seasonalities but you’re mainly looking at the human behavior driving some of those spreads and then instead of just modeling the data, you’re monitoring how the drivers are changing and impacting each other?

Yes. We aim to capture the non-linear relationships. We’re getting these non-linear connections between different actions so we can say which behaviors are driving an epidemic – or indeed the adoption of a product or service – forward. If it’s a disease we can test what interventions will drive those numbers down, or in the case of a consumer product or service, test strategies we think might drive sales up.

This is fascinating because I can see so many use cases for it. I’ve been preaching for years that we need to look externally and develop that ‘outside-in’ type of thinking in predictive analytics for business forecasting. Are we limiting ourselves by only looking at sales data?

If you’re only using the data that exists internally and you’re making your decisions on those assumptions, you’re really saying that the future is going to look like the past. If you’re not looking at how behavior changes or how the environment is changing and how that impacts the drivers of adoption or purchases, you’re going to miss the tipping point of adoption or when you hit market saturation

One example of this is a social media app that came out in 2020 called Clubhouse and everyone was saying it’s going to totally take over Facebook and Instagram. Well, if you looked at the behaviors of adoption you would have seen that the tipping point is going to come pretty early. We forecast that tipping point in early 2021 right after the Consumer Electronics Show. You could have known then that in fact this wasn’t actually going to be the next big thing.

In business predictive analytics we start with data and the different types of standardized models that can be applied to understand the data, but you start by representing the reasons why something’s happening. Why do you start with the ‘why’?

Our goal is not just to predict demand, but to give decision makers the ability to understand how to shape demand given the resources they have available to them. Whether forecasting a health outcome, product sales, distribution of an idea, or the number of votes a given candidate is likely going to win, our synthetic population is the starting point to generate a forecast from the bottom up. 

Our interactive synthetic population includes a representation of every person, household, school, workplace, hospital etc in the country – consider this the ‘clay’ or ‘substrate’ that you’re starting with. Then we enable businesses or subject matter experts to test their assumptions about how their population is changing. These set the baseline assumptions.

You can then generate models that recreate the past data and explore “what-if” questions about how things might look different given changes in behavior or changes in the environment. Once you have a working model you’ve really created a simulation of the most meaningful drivers of behavior and outcomes that you can test demand shaping strategies against.

What is synthetic data?

Synthetic data is a broad description of techniques used to take observed data and create data that downstream increase the confidence of a decision somebody has to make. In our world of synthetic data, you can take a real data set and describe how any behaviors in the population might change. Using that as a baseline forecast, you can test assumptions to better understand the outcome you’ll achieve through different decisions you can make.

Because our synthetic population has both time and geography included, you can generate synthetic data sets complete with geospatial and time series data.

So the output is more of a probabilistic type of forecast?

It’s a stochastic type of forecast, so you’ll get something different every time. Everything is probabilistic. Our users might run a thousand simulations and get a different result every single time. But once you have that data, you have more or less ‘beat down’ the noise to get narrower confidence intervals around your uncertainty bounds of the most likely outcome.

It’s a relationship between people, things, and the environment and how all three are going to change in the future, right?

Yes. We’re able to understand the interactions between people in households, at work, in the community, and how people react to things they come in contact with online – all in combination to see the emergent behavior of entire populations. Malcolm Gladwell and others have written about what makes social epidemics take off – it comes down to those three things: what population, what thing, and the environment they exist in.

Whether it’s a product or a disease, the terms adoption and spread are similar. Imagine a scenario where you and your partner are at the dinner table and you ask “have you heard about the thing in the news today?” Maybe they didn’t, but you’re now the third person that has brought it up.  After hearing it from a few people, they look it up and tell another 4 people. Next thing you know, it’s made national news for the next cycle and millions of people are exposed to the information.

Information, behavior, and diseases – when using a bottom up approach – are emergent phenomena. At Epistemix, we help companies project emergent phenomena into the future.

Do you have any other business related examples?

We have a couple of customers that are launching consumer apps. What they’re trying to understand is the demand for their app, the network effect they’re able to create, and what actions they can take most influence adoption. 

They use the synthetic population to test marketing strategies designed to influence demand. One of the companies we’re working with, Earbuds – a new music sharing app – is trying to understand the most sustainable path to 150,000 monthly active users by June of 2023 by testing influencer driven campaigns, targeting specific online communities, and other marketing campaigns.

I see parallels with retailers modeling population shifts, for example people moving from New York to Texas and Florida.

As it happens we did a project last year with the Remaking Cities Institute out of Carnegie Mellon University and the question they wanted to ask was how remote work acceleration is impacting where consumer retail is going to be. 

We did a study to see where the workforce of Chicago is going to be in 2024. Because we had to root this in an actual problem that businesses wanted to solve, we framed the question as “where do I get my coffee in 2024?” The whole idea was to base coffee shop site selection off of consumer preferences of where they’re working, the projected density of the population, and existing locations. 

We identified locations within Chicago where the supply will be too high and other locations where there are going to be too few coffee shops. Given the information, generated from the synthetic population as a starting point, coffee shops and developers can understand the future population density and drop in three or four coffee shops in an area to capture new demand.

The cool thing about working with the synthetic population is that you can add information as you learn more about the problem you’re trying to solve. For example,  take the question of “what will happen to coastal cities in the US over the next 50 years given the increase in extreme weather events?” The data doesn’t yet exist, but you need to forecast what might happen if… Using the Epistemix platform you can test how the different “what ifs…” influence people’s behavior and test what decisions you can make to shape or be positioned to take advantage of the changes to come.

My Thoughts On These Cutting Edge Approaches

This was a great, high-level conversation. You can watch the full conversation here. You may not be ready for all of it but we can start leveraging some of the key elements. The key is to start thinking ahead of where the consumers may be in the future and how they’ll be behaving. Are you ready to not only look at a forecast but to create an idea of what your market will look like in the future? If we can do that, sales and marketing can act on changing consumer behavior before it happens.

This is a mindset change for a lot of us. This is Predictive Analytics, using different types of modelling – getting into probabilistic and forward-looking projections, and trying to understand the ‘why’.

This level of analytics is very much still emerging. Understanding and adopting these techniques requires continual learning, changing your thinking, and learning new skills and information and bringing them into your forecasting process. If you’re prepared to challenge yourself, you may find that you advance both yourself and your company.

Click to order your copy now.


To add advanced analytics models to your bag of tricks, get your hands on Eric Wilson’s new book Predictive Analytics For Business ForecastingIt is a must-have for the demand planner, forecaster or data scientist looking to employ advanced analytics for improved forecast accuracy and business insight. Get your copy.