The Keys to Effective Data Science Projects – Part 2: The Question
In a previous post I explained where to start in a Data Science Project. I’ve also given you a Project Plan that shows all the steps you need to help your organization with a Data Science objective.
The first step in that Project Plan is “Business Understanding”, and the first step in that process is defining the problem to solve. I’ve got some basic advice on sussing out the problem you want to solve in this post, but let’s take that information and use an example problem we want to solve.
The important thing to remember is that the business or organization you are working with won’t always know what they want to know. They are probably familiar with their base data, and they have probably spent years querying it. They may have even created a Business Intelligence solution to explore their data, so when you bring up a Data Science project, they will most often conflate this with that.
So the first thing you need to do is educate your company on what Data Science, Machine Learning, and AI can actually do. I highly recommend this reference to help you understand how to explain that to them.
Now that you’re familiar with what is possible, we can turn our attention to the meeting with the business or organization leaders. In my classes I teach that you should guide the meeting into creating a 3-5 paragraph statement that contains what you need to know to do the project. Those paragraphs should contain:
- A description of the environment
- A description of the data you have access to, and the data you don’t have access to
- A description of what the company would like to be able to predict or categorize
- What the company will do when they have the answer from the last step
You’ll find within these statements what you need to break down further requirements.
Here’s an example snippet from an exercise like this:
“All of our IT systems have been modernized, and we’re taking in asignificant amountofsemi-structured datafromcomputing devices– most of itreal-time. After talking with our IT leadership, weneed a way to determine unusual events within the data streams we get, and have a way to observe the events in a dashboard so that we can respond to outages, threats, and changes quickly.”
The words in bold can lead to those Data Science questions we want to answer. We can see we are dealing with Big Data from the Internet of Things (IoT), and we want to find anomalies in that data. We’re looking at anomaly detection, something we can use Machine Learning for. We can use R, Azure ML, Python, Stream Analytics (a strong contender here) or any number of mechanisms to model the anomaly detection, but the bigger point is that we have turned a business question “we want a way to find unusual activities in our data” to a Data Science question “detect anomalies in this data set”.
Repeat this process and drill in as much as you can. One note from the field in projects I’ve worked on: the business will often have multiple questions. Each one of those gets a project plan, and each one has it’s own considerations. Trust me on this one – you want a separate project for each. Only pain awaits you if you try to do them all in the same project.
In the next post, I’ll cover the next key step – Finding the data. You can find more Data Science Project Management posts here.