Database Technologies for Decision Support System
Database technologies can be applied into two types of scenarios:
- Transaction Processing(OLTP)
- Analytic Processing, using statistical method(OLAP) or machine/computational learning method(Data Mining)
which is based on E.F. Codd's relation model, is the traditional (maybe
most popular) application type of DBMS and most people are very
familiar with it. This post tries to summarize related technologies in
analytic processing, which is widely adopted in decision support
Part I - What Data to Analyze?
In Decision Support(or Business Intelligence)
system, data to be analyzed usually comes from operational system,
i.e., OLTP relational database. These relational databases are often
located at different departments/sites, may be using different DBMS
vendor's products, using different data schema and merely contain data
within a relatively short time span.
To make a good business
decisions, it's strongly desired to hold historical data, view them in
a uniformed way and not bothering the daily operational environment.
Thus comes out the Data Warehouse, which is a repository of an organization's electronically stored data and is designed to facilitate reporting and analysis.
Operational data (in OLTP system) is extracted, transformed(also cleaned) and loaded(by ETL subsystem) into
the data warehouse for further analyzing. OLAP and DM systems read
these data, analyze them, produce useful reports and present them to
end(business) users. (See diagram below)
Decision Support System Life Cycle, based on MS SQL Tech Doc
Part II - How to Analyze?
There are two ways to analyze data in data warehouse:
1. Data Mining
Data Mining is the extraction of hidden predictive information from large databases. Someone also defines it as knowledge discovery process(using machine learning algorithms) in database.
- Typical Data Mining enabled information systems process data in terms of Record(or Case).
- Such system also provides some Language Extension to facilitate composing data mining related queries. (for example, MS SQL Server provides DMX - Data Mining Extensions)
The challenging part of data mining is various mining algorithms. Here is a list of data mining algorithms available in MS SQL Server Analysis Service.
2. OnLine Analytical Processing(OLAP)
OLAP is the processing of large scale multidimensional data using statistical based methods. A typical OLAP system provides:
- Multidimensional Model
- Analytical Query Language (for example, MS SQL Server provides MDX - Multidimensional Expression)
- Analyzing Server(or Engine) that executes analytical queries
2.1 Multidimensional Model
model view data as as cubes that generalize spreadsheets to any number
of dimensions. It categorizes data either as numerical values(a.k.a. measures) associated with some facts or textual values(a.k.a. dimensions) that characterize the facts.
Facts represent the subject - the interesting event in a enterprise that need to be analyzed.
Dimensions represent context information for facts, perspectives to view facts.
Measures represent those numeric properties of facts that decision makers want to analyze.
example, in a shoe shop, shoe purchasing events are the facts, the
selling price is a measure attribute and the color, the size, the
manufacture and the brand are all dimension attributes.
More tutorial explanation of this model can be found in .
2.2 OLAP Server Architecture
There are three ways to implement multidimensional model:
- ROLAP (Relational OLAP)
data is stored in relation model based storage system and some special
induces technologies may be adopted. In this architecture, measures are
derived from the records in the fact table and dimensions are derived
from the dimension tables.
from star schema example
of them contains fact table and dimension table, but in star schema,
there is only one table for one dimension, while in snowflake schema,
there are usually multiple tables for one particular dimension.
- MOLAP (Multidimensional OLAP)
data is stored in an optimized multi-dimensional array storage, i.e.,
the server supports the multidimensional model directly.
It's usually regarded to be faster but less scalable than ROLAP.
- HOLAP (Hybrid OLAP)
is a combination of ROLAP and MOLAP. HOLAP allows storing part of the
data in a MOLAP store and another part of the data in a ROLAP store.
Part III - Other technologies
Other related technologies include data visualization, metadata management, analytical query parallelizting etc.
Analytical database technology is very promising and also complicated, more survey paper can be found at.