A Data Science Central Community
The Ultimate Guide for Choosing Algorithms for Predictive Modeling
There are three ways to look at data. The first is analytics. This is when you look at data from the (potentially very recent) past. Think analytics. It allows you to explore the questions what happened and why did it happen? The second is monitoring. This is looking at things as they happen. In many cases, monitoring is used to find abnormalities. Finally, there is predictive analytics. This is looking at data in a way that helps make predictions about what might happen in the future.
Basically, predictive analytics is what drives the actions that make the changes which will, in turn, be monitored by the analytical phase. As you build your predictive analysis model, you will have various algorithms that you can select in the categories of machine-learning, data-mining, and statistics. Once you know more about your data, and what you want to accomplish, making this decision will become a bit easier.
The algorithms that are right for you depend on what you are trying to accomplish. For example:
- Classification algorithms are great if customer retention is your focus or if you are trying to put together a recommendation system.
- Clustering algorithms work well for segmentation or use with social data.
- Regression algorithms are generally used as a way of predicting outcomes from events that are calendar driven.
It should be considered a best practice to use the maximum number of algorithms that you can as long as they are the types of algorithms that you need. The more information that you have to compare and analyze, the better off you will be. It can be quite enlightening to find surprises, or to reveal interesting bits of information. This can lead to your ability to solve problems. Perhaps even more importantly, this can reveal to you which information in your data can be used to predict future trends. Let’s begin by going over some of the most popular predictive algorithms and methodologies.
The Ensemble Model
Many people have found that using an ensemble model is the best method for successful predictive analytics. This is multiple models that all use the same data set. Basically, a mechanism is created to gather all of the output from the various models. This information is then used to provide a final analysis to the person running the test.
The specifics of each model can vary. For example, decision trees, scenarios, queries, etc. are all models. To pick the correct models, you have to understand what works best for your data, and the problem you are trying to solve. Before proceeding, you will need to clearly define the questions you are trying to answer? For example:
- Will a new target audience be receptive to our current email marketing efforts?
- Should we create a microsite or a reviews page for a new product line?
- Are customers with poor credit going to default if we offer in-house financing?
- Will consumers buy clothing made with cheaper fabrics if prices are cut?
Unsupervised Clustering Algorithms
These algorithms are very useful in helping you find relationships that may not have been clear to you at first glance. If you are interested in finding similarities between various user personas, clustering algorithms might be the way to go. You can also use these to discover product relationships as well. If you’ve ever wanted to bundle services or wondered how you could influence customers to respond to your upselling efforts, these might be some algorithms to consider.
If you have data that you receive on a continuing basis, regression algorithms might help you to predict future trends based upon that data. For example, if you purchase raw materials for manufacturing processes, you could use the monthly price data that you gather to predict seasonal fluctuations in those prices.
Not an Exact Science
There is no exact formula for finding the ideal algorithms for predictive modeling. It takes a combination of understanding the types of algorithms available to you, understanding exactly what it is that you need to know, and understanding how to interpret the information that you receive.
Ultimately, the work that goes into selecting algorithms to help to predict future trends and events is worthwhile. It can result in better customer service, improved sales, and better business practices. Each of these things can, of course, result in increased profits or lowered expenses. Both are desirable outcomes. The information above should act as a bit of a primer on the subject for those new to using analytics.