Lesson 1: Systematically Approaching Design Stages
Estimated lesson time: 20 minutes
A database and an application together provide a comprehensive picture of a real-world system. Data modeling is a structure design process. Often, you can move forward with data modeling in parallel with functional design. You explore your choices by using analysis, abstraction, experience, heuristics, and creativity. There are always many candidate solutions; to find the best one, you have to approach the design systematically. Developing a data model is an iterative process: you develop a model, check it with domain experts, and refine the model. When your model is prepared, it can serve as an effective communication tool between domain experts and developers.
The most important data model is the relational model, which represents data in the form of two-dimensional tables consisting of rows and columns. This model is a simple, yet rigorously defined, concept of how users perceive data. Each table represents a real-world entity (person, place, thing, or event) about which you collect information. In mathematics, the relational model is based on set theory; mathematically, each table represents a set. Edgar F. Codd of IBM contrived the relational model in 1969, explaining a relational database as a collection of tables. The organization of data into relational tables is the logical view of the database. The way the database software physically stores the data on a computer disk system is the internal view of the database; it differs from product to product.
In the planning phase, according to the Microsoft Solutions Framework (MSF) process model, you deal with three distinguishable data-modeling phases. Each phase yields a phase schema. The three phases, schemas, and methods you use in a specific phase are as follows:
- Conceptual phase As a database designer, in this phase, you collect all business requirements and rules in cooperation with business domain experts. You create a conceptual schema by using the object role modeling (ORM) method. An ORM diagram documents a business problem. You can also use the entity relationship (ER) method in this phase, especially the Integration Definition for Information Modeling (IDEF1X) method, although it is a less expressive method than ORM and does not allow the diagramming of all possible business rules. Alternatively, you could use Unified Modeling Language (UML), although this language suits application design better than database design.
- Logical phase During the logical phase, you represent data that is already grouped logically in entities by adding attributes and relationships between entities. ER diagrams document this relational design. The most-used ER models are the Extended ER model and the IDEF1X model, both supported by Microsoft Office Visio.
- Physical phase You implement the relational model physically, in a database on a relational database management system (RDBMS) such as Microsoft SQL Server. No matter which method you use for implementation (automatically via a tool or manually), you have to prepare DDL statements for creating database objects in a language that your RDBMS understands.
In addition to these phases, when you are modeling a database, you can start with another schema, called the external schema, that usually comes from the application design process but that can be a good starting point for data modeling. The external schema shows how your system will be used. A common way to show the external schema is by using UML Use Case diagrams. Table 2-1 shows all four stages in a condensed format.
Table 2-1 Data Modeling Stages
Key Steps and Best Practices for Data Modeling
The first step in developing a data model is identifying requirements. You must communicate with domain experts, key stakeholders, and even end users to gather all the information you need. You can help yourself by using UML Use Case diagrams, if they already exist, and existing reports. You can then create a conceptual model, which documents all business requirements and rules as facts. From the conceptual model, you create a logical model. Alternatively, you can create a logical model directly, without the conceptual model. In either case, check the conceptual or logical model with domain experts and refine it as needed. Iterate through the model at least a couple of times.
From the logical model, you create a physical model. The design process is not finished here, though. Next, you refine the physical model according to security, performance, auditing, availability, and scalability needs. Finally, you build a prototype to test the database design.
For smaller projects, you can skip some of these steps. However, for larger projects, you should implement all of these steps, including creating a conceptual model. Here are additional best practices for data modeling:
- Be especially careful with the scope of the project. Many projects fail because the scope is not well defined. Alternatively, many developers use scope expansion as an excuse for their bad work. Your customers simply cannot know everything they need in advance. Even if they did know everything they needed today, you have to consider that business changes quickly. Therefore, you must be prepared for several iterations during the design process. Three iterations in refining the model is normal; more iterations than that usually means the scope was not well defined, or the customer requested features that should be part of a different project.
- Choose the appropriate methodology. ER diagrams are an informal standard for the database part of documenting a project, so the ER method should be one of your selected methodologies. In addition, ORM, which is more expressive than the ER model, is useful for presenting complex rules.
- Use a professional database-modeling tool. Tools such as Visio for Enterprise Architects enable forward engineering—creating a logical model from a conceptual model and then creating a physical model from the logical model. Make sure you are familiar with the tool before you start your design.
- Include your physical model, the DDL scripts, in a source control system. You can use Visual Studio Team System for source control.
- Start with a strict relational model, but be prepared to make some compromises to satisfy business needs. For example, usually you have to denormalize some data to achieve satisfactory performance. (You will learn about denormalization in Lesson 3, “Optimizing the Database Design by Denormalizing.”) Document such compromises thoroughly.
- Use free models as a starting point. You can find free models in books and on the Web. Check existing models if you upgrade an application. An upgraded application must provide all the features of the old application; otherwise, users will not be satisfied.
Object Role Modeling and the Conceptual Model
In our natural, speaking language, we use statements of fact or, in logic, propositions about entities of interest asserted to be true. For example, it is a fact that Lubor Kollar was employed by Tailspin Toys on March 19, 2004. The idea behind ORM is that you simply write down all the facts. The tool then converts those facts to a conceptual ORM diagram and produces logical and physical diagrams out of that ORM diagram. ORM pictures the world in terms of objects that play roles. Data elements are not combined into tables a priori. Descriptions of data-element relationships serve as input to a table-building algorithm. ORM thereby incorporates normalization into the methodology. (You will learn about normalization in the next lesson.)
ORM verbalizes the relevant data as elementary facts. It uses no attributes; attributes are facts related to entities. In ORM, you distinguish between two types of objects. An entity object is the one you can uniquely identify; however, you cannot write down a concrete value of the object unless you use values of attributes of the object. ORM represents entity objects as named ellipses. The Value object is a scalar attribute, which enables you to write down any concrete value the object can take. In ORM, you represent a value object as a dotted or dashed ellipse (~ attribute). Relationships between objects (that is, roles the objects play) are represented by lines and subdivided boxes that establish connections. For example, in the following paragraph, you have a collection of facts you gathered through an interview with a domain expert.
The system you are developing must support sales. Customers send orders. Each order can contain multiple products, and a single product can appear on multiple orders. A customer orders a product in a quantity with or without a discount. A single customer presumably and hopefully sends multiple orders. Each product has a name, a unit of measurement, and a price. For each customer, you need to collect the customer’s name, address, and tax number. Tax numbers are unique. An order always has a date and a known customer. You must provide a way to identify customers, products, and orders.
Figure 2-1 shows the ORM model for the facts in the preceding paragraph.
Figure 2-1 ORM model that supports the sales application
When you create an ORM model, follow the Conceptual Schema Design Procedure (CSDP). If you follow the procedure exactly, ORM guarantees a fully normalized model. (You will learn about normalization in Lesson 2, “Designing a Normalized Database.”) The seven steps of the CSDP are:
- Transform familiar information examples into elementary facts, and apply quality checks.
- Draw the fact types, and apply a population check (that is, write down examples) for each fact.
- Check for entity types that should be combined, and note any arithmetic derivation.
- Add uniqueness constraints, and check arity of fact types. Arity means how many objects are involved in the proposition. Typical facts are binary; they involve two objects. You can also have unary, ternary, and quaternary facts. You must check whether you can decompose facts of larger arity to more elementary facts.
- Add mandatory role constraints, and check for logical derivations.
- Add value, set comparison, and subtype constraints.
- Add other constraints, and perform final checks.
Entity Relationship and the Logical Model
The ER model is a top-down approach to data modeling, supported by a widely used diagramming convention. An entity is a thing that you can distinctly identify and that is of business interest; an entity in an ER model has three properties:
- You can uniquely identify each representation of an entity.
- Each representation of an entity plays an important role in the system it lives in. (It has to have a reason to be there.)
- You can describe each representation of an entity by using one or more attributes (data elements such as name, age, and quantity).
You classify entities into different entity sets. An entity set is a set of entity instances (occurrences) of the same type. Entity sets, therefore, are not always disjoint. (A company employee can belong to the employees and managers sets.) A database consists of a collection of entity sets; it also includes information about relationships between the entity sets.
Relationships are associations between entities. A relation is a subset of the cross products of the entity sets involved in the relationship. Each entity set in a relationship has a role or a function that the entity plays in the relationship. For example, person has the role of employee in the relationship works for company; company has the role of employer. The degree of a relationship is the number of entities associated in the relationship. Binary associations can be classified according to cardinality (one-to-one, one-to-many, many-to-many) and optionality (mandatory or optional).
Entities are the objects of interest, so you must have some information about them that is of interest. Each entity instance is described by a set of attributes that define its qualities, characteristics, or properties and the values of the attributes. For each attribute, there are a number of legal or permitted values. These sets of legal values are value sets or domains of that attribute. Relationships can also have relevant information about them.
Because you represent entities by the values of their attribute set, the set of attribute values must be different for each entity instance. A group of attributes (or possibly one attribute) used for identifying entities in an entity set is called an entity key. Similarly, each relationship instance in a relationship set needs to be identified. The identification is always based on the primary keys of the entity sets involved in the relationship set. Sometimes you cannot uniquely identify the entities in an entity set by the values of their own attributes; such entities, which require a relationship to be identified, are called weak entities. We call normal entities strong or regular entities.
There is no single graphical representation of an ER model. However, all ER diagrams look similar. Entities are diagrammed as rectangles. Attributes can be diagrammed as ovals attached to the rectangle that they belong to, or they can be listed inside the rectangle or even listed separately. Relationships can be diagrammed as diamonds or shown as connecting lines. They can show cardinality and optionality, either explicitly with numbers or with crow’s feet symbols.
Figure 2-2 shows an IDEF1X ER model, on the left, and an Extended ER model, on the right, for the same sales database in the ORM lesson example.
Figure 2-2 IDEF1X and Extended ER models that support the sales application
When you are doing ER modeling directly (not from an ORM source model), you must study the description of the business problem. You start with entities, identifying nouns. Then you find relationships, identifying verbs. You look for typical verbs such as “is,” which leads to investigation of the hierarchical structure (you will learn about supertypes and subtypes in Lesson 5, “Supertypes and Subtypes”), or “has” and other verbs, which lead to investigation of the relational structure. You then allocate attributes to entities, identifying adjectives. You identify primary and foreign keys, and, for foreign key associations, you identify cardinality and optionality. Your product is an ER model; however, you need to analyze the ER attributes—forgetting the attributes leads to missed entities.
On the Companion Disc
You will find all the models for this chapter on the companion CD in the C:\My Documents\Microsoft Press\TK70-441\Chapter02\Practice folder.
Practice: Opening Models
In this practice, you will examine ORM and ER models.
Exercise: Open and Examine Models
In this exercise, you check whether it is possible to reconstruct statements that describe a business problem from a conceptual and a logical database model.
- On the companion CD, navigate to the C:\My Documents\Microsoft Press\TK70-441 \Chapter02\Practice folder, and open the Ch02_Sales_ORM.bmp file. Check the facts and how they are diagrammed.
- Open the Ch02_Sales_IDEF1XER.vsd and Ch02_Sales_ExtendedER.vsd files. Try to recreate the original propositions from the example.
You have to prepare a database model to support an application for managing projects. You collect the following information: each project has a single customer, each project can have many activities, and each project can have many employees assigned to it. In addition, you want to follow the time spent (in hours) on projects by specific employees and by activity for each day. Using this information, answer the following questions:
- List the entities you can find from the description of the problem.
- What is the cardinality of the relationship between projects and activities?
Quick Check Answers
- You should find the following entities: projects, customers, activities, employees as strong entities, and project details as a weak entity.
- The relationship between projects and activities is many-to-many.
© Microsoft. All Rights Reserved.