Data Mining in SQL Server 2005, 2008
Data Mining like every other term in business intelligence seams to mean different things to different people. For me the key things that data mining does are:
- Assist in discovering relationships in data sets that were previously unknown. An example would be to get an insight on how age, and gender relate to which mobile phone is bought.
- Allow the creation of models from sample data which can then be applied to real data to predict what is going to happen. So having worked out that i-phones are bought by males in their late twenties, does the live data over the next six months support that and, by targeting this demographic with appropriate advertising did sales meet our expectations?
- Has some in built in processes (algorithms) to make this happen with minimal guidance from the user, i.e. the user doesn't do all of the work. The litmus test here is that you didn't spend all night writing MDX or SQL to get the answers, you simply point the tool at the data, labelled up the relevant attributes, and the answer pops out the other end.
SQL Server Analysis Services (SSAS) has had some data mining capabilities since 2000, but they really took off in 2005:
- The data mining tools can be used as part of the loading of a data warehouse to add new attributes to dimensions
- Can be used by end users through an intuitive interface in Excel (the work round trips to SSAS)
- A core set of industry standard algorithms to mine the data:
- decision trees,
- regression trees,
- logistic and linear regression,
- neural networks,
- naive bayes,
- sequence clustering,
- time series.
I will post on what each of these does over the next few days, but I have noticed a coule of upcoming TechNet Webcasts on data mining:
- The most important thing in business intelligence is to have good data to work with and this is doubly true for data mining. Bizarrely though this is also an area where data mining can actually help solve the problem . To see how this can be done have a look at this TechNet Webcast on 17 January which applies equally to SQL Server 2005 and 2008 enterprise edition.
- Predicting the future is the holy grail of business intelligence and SQL Server has a time series algorithm to help with this which gets a boost in SQL Server 2008 and there is a TechNet Webcast on 24 January dedicated to this, which is given by Donald Farmer one of the top BI guys in Microsoft.