DevOps for Data Science - Automated Testing

I have a series of posts on DevOps for Data Science where I am covering a set of concepts for a DevOps “Maturity Model” – a list of things you can do, in order, that will set you on the path for implementing DevOps in Data Science. In this article, I'll cover the next maturity you should focus on - Automated Testing.

This might possibly be the most difficult part of implementing DevOps for a Data Science project. Keep in mind that DevOps isn't a team, or a set of tools - it's a mindset of "shifting left", of thinking of the steps that come after what you are working on, and even before it. That means you think about the end-result, and all of the steps that get to the end-result, while you are creating the first design. And key to all of that is the ability to test the solution, as automatically as possible.

There are a lot of types of software testing, from Unit Testing (checking to make sure individual code works), Branch Testing (making sure the code works with all the other software you've changed in your area) to integration testing (making sure your code works with everyone else's) and Security Testing (making sure your code doesn't allow bad security things to happen). In this article, I'll focus on only two types of testing to keep it simple: Unit Testing and Integration Testing.

For most software, this is something that is easy to think about (but not necessarily to implement). If a certain function in the code takes in two numbers and averages them, that can be Unit tested with a function that ensures the result is accurate. You can then check your changes in, and Integration tests can run against the new complete software build with a fabricated set of results to ensure that everything works as expected.

But not so with Data Science work - or at least not all the time. There are a lot of situations where the answer is highly dependent on minute changes in the data, parameters, or other transitory conditions, and since many of these results fall within ranges (even between runs) you can't always put in a 10 and expect a 42 to come out. In Data Science you're doing predictive work, which by definition is a guess.

So is it possible to perform software tests against a Data Science solution? Absolutely! You not only can you test your algorithms and parameters, you should. Here's how:

First, make sure you know how to code with error-checking and handling routines in your chosen language. You should know how to work with standard "debugging" tools in whatever Integrated Development Environment (IDE) as well. Next, implement a Unit Test framework within your code. Data Scientists most often use Python and/or R in their work, as well as SQL. Unit testing frameworks exist within all of these:


After you've done the basics above, it's time to start thinking about the larger testing framework. It's not just that the code runs and integrates correctly, it's that it returns an expected result. In some cases, you can set a deterministic value to test with, and check that value against the run. In that case, you can fully automate the testing within the solution's larger Automated Testing framework, whatever that is in your organization. But odds are (see what I did there) you can't - the values can't be deterministic due to the nature of the algorithm.

In that case, pick the metric you use for the algorithm (p-value, F1-score, or AUC, or whatever is appropriate for the algorithm or family you're using) and store it in text or PNG output. From there, you'll need a "manual step" in the testing regimen of your organization's testing framework. This means that as the software is running through all of the tests of everyone else's software as it creates a new build, it stops and sends a message to someone that a manual test has been requested.

No one likes these stops - they slow everything down, and form a bottleneck. But in this case, they are unavoidable, with the alternative being that you just don't test that part of the software, which is unacceptable. So the way to make this as painless as possible is to appoint one of the Data Science team members as the "tester on call", that will watch the notification system (which should be sent to the whole Data Science team alias, not an individual) and manually check the results quickly (but thoroughly) and allow the test run to complete. You can often do this in just a few minutes, so after a while it will just be part of the testing routine, allowing a "mostly" automated testing system, essential for the Continuous Integration and Continuous Delivery phases (CI/CD). We'll pick up on Continuous Delivery in the next article.