You’re not going to get advanced modeling like machine learning in Excel. Excel can’t handle large data sets either, making it clunky and problematic. And when you start feeding it multiple SKU’s or a whole lot of different variables, running all the different simulations and computations can weigh even the best machine down.   

This is where open source software comes in for analysts who want a little more to work with. Open Source Software is a type of software where the source code is publicly accessible or open and grants users the right to change, modify or share it.

To help handle and extract insight from Big Data, people have turned to open source platforms like Hadoop and Apache Spark. For a lot of people in the data science world, they used software like SAS at college and learned to code in languages like R and Python. All of these, as well as some others not mentioned, do an excellent job on the platforms they have set out. While some of us might be afraid of coding and learning these languages, they are all relatively user-friendly and many elements are simpler than an Excel macro.

What is Hadoop? Hadoop is used frequently with big volumes of previously unmanageable data. It is an open-source software framework for storing data and running applications on clusters of commodity hardware. It provides massive storage for any kind of data, enormous processing power and the ability to handle virtually limitless concurrent tasks or jobs. Hadoop is used by companies with Big Data like Airbnb, Uber, and Netflix.

What is Apache Spark? Apache Spark is another platform used to manage data and actually can work with Hadoop. It is an open-source engine developed specifically for handling large-scale data processing and analytics. Spark offers the ability to access data in a variety of sources, including Hadoop Distributed File System (HDFS), OpenStack Swift, Amazon and Cassandra.

What is Python? Originating as an open source scripting language, Python usage has grown over time. It is an interactive and interpreted high-level object-oriented programming language. It is easy to learn and understand. It is largely used as an open-source scripting language that supports many libraries used for data analysis (pandas), scientific computation (NumPy, SciPy), and machine learning (scikit-learn). Python is used by many of the larger tech giants such as Google, Quora and Reddit, etc.

What is R? R is a free open-source platform. As it is open-source, it is highly extensible and there are quick releases of the software with the latest techniques. R is strong in visualizations and graphics and offers multiple different functions. It is not hard to learn to code in R and once you learn the fundamentals of the logic, the possibilities are endless. You can find multiple information sources for R over the web. Companies that use R include Facebook, Google and Microsoft.

How Is Open Source Software Different To Specialized Demand Planning Software?

These are what we referred to as open source software which makes them unique compared to a demand planning package that you purchase and may install. In general, open source is any program whose source code is made available for use or modification as users or other developers see fit.  Additionally, they are available for free with a user community made up of fellow practitioners creating packages and codes anyone can use.

With the open source, someone may have already tried to solve your problem and has developed the model you need.

Open Source Is Highly Flexible

For Big Data and advanced analytics professionals, the flexibility of the open source code and minimal/ no cost are what makes platforms like these so attractive. But what makes them even more worthwhile is that with the open source, someone may have already tried to solve your problem and has developed the model you need. Basic neural networks, decision trees, logistic regression, and even time-series models have been developed, tested, and are available for copying and pasting. Users do not find themselves limited to the methods and configurations of an off the shelf package that is part of a legacy system, but rather can design and develop what they need.

Open source also gives you the capability to code and create something new.

Open source also gives you the capability to code and create something new. Open source tools give developers the ability to tinker with them, thereby increasing the chances of rapid improvements or experimentation that could expand the usage or features of tools.

Open Source Communities Help Solve Your Problems

People who work with open source machine learning tools also find they have thriving online communities at their disposal that allow them to tap into collective thinking when they run into unexpected difficulties. R and Python are both open-source programming languages with a large community. New libraries and tools are added regularly. Those forums currently have hundreds of answers to common problems, and as machine learning tools become even more popular, the knowledge base will expand even more.

Should I Use Open Source Software Then?

All of this does not come without risk or problems though. While many new college kids may cut their teeth on data analytics tools, there are not many people experienced enough to code or create models. While coding is not as scary as it sounds, it still requires time, effort, experience, and working through many potential bugs. Given the need for specific skills and the time and effort required to leverage open source software, investing in specialized demand planning software may be more advisable.

Open source platforms do come with limitations. A good planning system can do a lot more than just model, which justifies its cost. Besides being most likely more user friendly, most software packages offer the advantages of stability, easier deployment, better support, and governance. Advancements in these software packages mean the models today are more advanced than they were, and many even offer interfaces that integrate with open source platforms like R. This provides the various features of advanced planning systems while providing modeling extension capabilities with R and Python.

[bar id=”8202″]


Eric will be speaking at IBF’s Predictive Business Analytics, Forecasting & Planning Conference in New Orleans from April 28-20, 2020. Learn more about the tools discussed in this article and how to leverage them as a competitive advantage. Includes special data science workshop.