Data Science and Standard Patterns

In Data Science, the word “Pattern” has a specific meaning, involving the patterns that arise from data. This type of analysis is quite common in Data Mining and other technologies used by a Data Scientist.

In IT practices such as systems architecture and software solutions design, the word “Pattern” has another definition, which is to boil down repeatable processes into a framework you can use to standardize operations. Data Science, which has often been a small team within an organization that is used for very specific tasks, is now being called on to do broader, integrated work with the rest of the IT organization. As this occurs, I’m seeing more desire for standardized approaches, frameworks and processes Data Science teams can follow to best facilitate that integration.

I’ve explained a couple of things the Data Science team can do to more effectively work within teams (The Team Data Science Process) and to standardize and automate their work as much as possible (DevOps for Data Science). Those two areas are fairly generic across multiple industries, but in your day-to-day work, your team will have specific workload requirements, and you can standardize those too.

Software architects often begin with a “Patterns and Practices” approach. There’s no reason to re-create something that already exists, so most solution design revolves around the questions “Has someone already done this successfully” and “are there libraries, code, or other assets that we can re-use”? There are several places to find those patterns, such as the Microsoft Patterns and Practices site, the associated github, and the Azure Architecture Center to name a few.

Your data Science Team should look into leveraging or creating a set of vetted Patterns and Practices for your work. You’ll find that adhering to a standard (where that makes sense) makes your solution design faster, easier, more maintainable and understandable, and also reduces errors. Check out the Patterns and Practices links above, and see if those have elements you can re-use. Here is a list of the articles for the Team Data Science Process and DevOps for Data Science:

The Team Data Science Process is here

DevOps for Data Science

  1. Infrastructure as Code (IaC)
  2. Continuous Integration (CI) and Automated Testing
  3. Continuous Delivery (CD)
  4. Release Management (RM)
  5. Application Performance Monitoring
  6. Load Testing and Auto-Scale