How to Find an Algorithm that Fits
This blog post is authored by Brandon Rohrer, Senior Data Scientist at Microsoft.
Choosing a machine learning algorithm is a lot like shoe shopping. Performance isn’t the only thing you’re looking for. If it were, we’d all be wearing thousand-dollar feather-light track shoes. Instead, we consider how we’ll be using them. Some shoes are good for standing all day, and some are good for climbing cliffs. The price usually matters. And, of course, how they look can trump everything else.
The same goes for algorithms. Prediction performance is important, but it’s not everything. Some algorithms are easy to explain, and some are robust to noisy data and missing values. Run time can be important. And appearance can matter as much with algorithms as it does in footwear. I have had more than one customer tell me that whatever I do for them, it needs to have “neural network” in the name.
Just like with shoes, there isn’t one perfect algorithm for a problem, but there are several that are good enough. Here are some tricks for finding a fit.
1. Define Your Problem & Gather Your Data
If you don’t know what question you are trying to answer, you can’t choose a good algorithm. Understanding your question clearly is the single most important step in this process.
The next step is to make sure your data collection is up to the task. This process goes by names like “quality assurance” or “sanitization”; it’s not glamorous but it’s also very important. Its purpose is to ensure that data is relevant and accurate, that missing values are handled correctly_{}, and that there is enough data to produce a meaningful answer.
Dive deeper into question and data quality.
2. Choose Your Algorithm Family
Once you have quality data and a sharp question you are well on your way. Algorithms come in just a few major families. The nature of your question itself dictates which algorithm family you’ll be interested in. Find where your question fits on the table below, and you’ll find your algorithm family. If your question doesn’t look like any of these, it can probably be made to. You might have to get creative.
If your question looks like… |
Use this algorithm family: |
Is this an example of A or B? |
Two-class classification |
Is this an example of A, B, C, …, or Z? |
Multiclass classification |
How many? How much? |
Regression |
What is the structure of this data? |
Clustering |
Is this example unusual? |
Anomaly detection |
What action should I take next? |
Reinforcement learning |
Dive deeper into algorithm family selection and matching algorithm families to questions.
3a. Choose An Algorithm - The Quick Way
Once you’ve found your algorithm family, the final step is to choose one of its members to run your data through. If you’re looking to find a pretty good algorithm, but not worried about finding the absolute best, you can use this handy cheat sheet to the collection of algorithms in Azure Machine Learning.
3b. Choose An Algorithm - Dive Deeper
Using the cheat sheet is like popping into your neighborhood discount store, grabbing an inexpensive pair of sneakers in your size, and hitting the road. But sometimes you want to do your homework first. You read reviews, compare weight and materials and try on a few pairs. Individual algorithms are quite different in their requirements and performance. Dive deeper into how to choose an algorithm that meets your specific requirements.
And of course when it really has to be a perfect fit, you try on every pair of shoes in the store. Data science works the same way. If you happen to have the time and the passion, the only way to be certain you have the very best algorithm for your problem is to try them all.
Brandon Rohrer
Follow me on Twitter