DevOps for Data Science – DevOps isn’t the Toolchain (But you still have to care about the tech)

 In this series on DevOps for Data Science, I’m covering the Team Data Science Process, a definition of DevOps, and why Data Scientists need to think about DevOps. It’s interesting that most definitions of DevOps deal more with what it isn’t, than what it is. Mine, as you recall, is quite simple:

DevOps is including all parties involved in getting an application deployed and maintained to think about all the phases that follow  and precede their part of the solution

 Now, to do that, there are defined processes, technologies, and professionals involved – something DevOps calls “People, Process and Products”. And there are a LOT of products to choose from, for each phase of the software development life-cycle (SDLC).  But some folks tend to focus on the technologies – referred to as the DevOps “Toolchain”. Understanding each of these technologies is indeed useful. Here’s one possible list I made, just using Microsoft technologies:


And of course the playing field for Open Source Software (OSS) is even larger, and contains more options and branches.

 While knowing a set of technologies is important, it’s not the primary issue. I tend to focus on what I need to do first, then on how I might accomplish it. I let the problem select the technology, and then I go off and learn that as well as I need to so that I can get my work done. I try not to get too focused on a given technology stack – I grab what I need, whether that’s Microsoft or OSS. I choose the requirements and constraints for my solution, and pick the best fit. Sometimes one of those constraints is that everything needs to work together well, so I may stay in a “family” of technologies for a given area. In any case, it’s the problem that we are trying to solve, not the choice of tech. 

That being said, knowing the tech is a very good thing. It will help you “shift left” as you work through the process – even as a Data Scientist. It wouldn’t be a bad idea to work through some of these technologies to learn the process. I have a handy learning plan you can use here: (Careful – working through all the references I have here could take a while – but it’s a good list)

 See you in the next installment on the DevOps for Data Science series. (And we’ve just made an interesting new announcement: )