SQL Server Modeling Services Repository Lifecycle
[This content is no longer valid. For the latest information on "M", "Quadrant", SQL Server Modeling Services, and the Repository, see the Model Citizen blog.]
Summary: This white paper is an overview of the design of the SQL Server Modeling services repository database for supporting schema and data versioning.
For the SQL Server Modeling Services repository to serve the needs of its customers, it must address several versioning issues. These issues fall into the following categories:
- Application lifecycle management (ALM): Facilitating team development of software in a continuous cycle of steps (that is, definition, design, development, testing, deployment, and management). Traditionally, Source Code Control Systems (SCCS) and build systems, like Visual Studio Team Foundation Server, automate ALM.
- Schema evolution: Changing schemas over time in such a way that applications continue to run unchanged or with minimal changes.
- Product versioning: Upgrading a customer's repository, from version n to version n+1, in such a way that the data is preserved.
- Application versioning: The ability to upgrade applications, stored in the repository, to new versions. The upgrade process has the option to retain the data from older versions of the application.
- Operational versioning: The ability to understand operational/management changes to a deployed application.
This white paper describes current plans for how the repository's design addresses these issues.
Application lifecycle management (ALM)
Modern software is developed using source code control systems and automated build. The repository is built using codename "M", a textual language for data that is well suited to both of these environments. "M" allows all aspects of data definition to be captured in SCCS-friendly text files that can be version-controlled by any SCCS. Additionally, support for extracting information from running repositories as "M" source text (forthcoming) will allow traditional diff-ing tools to determine how data has evolved from its original definition.
The repository works with automated build systems. Repository models that are defined in "M" can be statically type checked and validated as part of a normal build process without requiring a running database. This includes the ability to statically compile new models against the predefined models that ship with the SQL Server Modeling CTP.
As technologies evolve, so must the corresponding repository schemas. The repository must support the easy and robust migration of SQL Server Modeling Services applications to new schema versions. Applications that use the repository must continue to work with little or no change. However, evolving arbitrary database schemas and associated applications' queries and data access layers requires careful attention from schema developers. While the repository does not eliminate the need for developer attention, it at least provides proven mechanisms for dealing with the problem of schema versioning.
The strategy used to solve the schema evolution problem is to expose data through updatable views. As schemas evolve, updatable views remain in place but map to new structures. Existing code that accesses the repository based on previous schemas can continue to work as schemas evolve.
If these legacy views remained in place forever, they would make innovation difficult since schemas could evolve only in limited ways. Thus, legacy views will be deprecated according to a defined schedule. Figure 1 illustrates how this works for a simple case.
Figure 1. A simple example of how a schema change works in practice. In version x+1, support for international addresses is added (shown in green) and support for ZIP codes is deprecated (shown in red); then, in version x+3, support for ZIP codes is removed. In versions x+1 and x+2, ZipCode now is mapped to PostCode and old applications continue to work. In particular, a program developed against version X can continue to work as before.
Developers who use an object-relational mapping framework (ORM) that includes schema mapping facilities (such as the ADO.NET Entity Framework) may also use their ORM's mapping layer to alter how their static class structure projects down to the target database schema.
This approach has the following advantages:
- It mitigates most schema evolution issues.
- It allows schemas to evolve.
The principal disadvantages are:
- The implementation is more complex.
- Some SQL Server features do not work as well with views as they do with base tables.
There are essentially two approaches to upgrading the repository database as a whole from one version to another: in-place update and data transfer (that is, Extract, Transform, and Load, or ETL). In-place update is suitable for patches and other updates where changes are minor. Because of the risk of service and data loss, a repository database should not be in-place upgraded to a new version. Rather, a new repository should be created and the data from the original repository should be migrated using ETL and applications migrated as appropriate.
Repository upgrades will come with an SQL Server Integration Services (SSIS) package to transfer the data from the previous version of the repository to a new version of the repository (see Figure 2).
This approach has several advantages:
- It matches how IT organizations usually upgrade databases and other servers.
- Applications can be migrated from one version of the repository to the new version in an incremental and disciplined way.
- The risk of data and service loss is minimized because an old version of the repository remains running until applications can be migrated to the new version.
- By customizing the SSIS package, an organization can also transfer its custom data along with the other repository data. Since most organizations that have custom repository schemas will have significant in-house expertise in developing SSIS packages, this should minimize the customer's development costs.
Figure 2. An example of transferring data from version x of the repository to version x+1 of the repository. An SSIS package is used to transfer the data in an old version of the repository to the new version. Application1 has not yet been migrated to the new version of the repository. Application2 has been migrated to the new version of the repository.
The data in the repository is often logically grouped together based on version information. For example, the repository models for storing information about .NET Assemblies needs to record information about assemblies from multiple versions of the framework. Performing fine-grained versioning of individual rows would significantly reduce performance and complicate the data model. Moreover, the need to logically group data is so common an idiom that the repository provides a common mechanism for organizing data. That mechanism is called a repository folder.
Repository folders balance between the competing demands of ease of use, performance, and functionality. SQL Server Modeling Services content is organized in terms of a hierarchy of folders, which themselves are modeled in the database. Essentially, folders segment the repository's data horizontally with each row belonging to a single leaf folder. Typically, folders will be structured analogously to how an organization thinks about its assets (for example, /Finance/Applications/Pay Roll/2.5). After deployment, the data in a folder can be updated in place. This is analogous to the way a patch can be applied to an application without conceptually updating the version number.
The relationship between different versions of items in separate folders (for example, the .NET Framework 1.0's System.Object and the .NET Framework 2.0's System.Object) is computed based on domain-specific resolution rules (for example, CLR linking rules). This is natural in that it matches whatever mechanism the domain uses to resolve references across versions and it minimizes the performance impact of supporting the relatively rare operation of comparing repository items across major versions. Schemas will usually supply SQL functions to make this name resolution easier. For example, the CLR schema could provide a function that takes a method's ID and finds the best matching method in another folder.
Since the relationship between folders is not prescribed, users are free to structure folders in arbitrary ways. However, folders will usually have a tree structure that follows the enterprise's taxonomy of IT assets.
Structuring the repository in this way has a number of advantages over previous repository approaches:
- It is easily understood and is a good match for how software artifacts are thought about in enterprises.
- It can be efficiently implemented.
- It supports efficient and secure scale-out and scale-up strategies.
- It has a clean relational mapping that does not hamper the ability to query the repository.
- It provides a natural granularity for securing repository contents (that is, users are granted or denied access to an entire folder).
- It provides a natural basis for data partitioning.
The repository supports storing multiple versions of an application in different repository folders (for example, .NET/2.0 and .NET/3.0). However, once an application is deployed, it might need to be changed for operational reasons (for example, replacing soon-to-expire certificates and installing security patches). In the process of maintaining the application, operations staff or automated systems like SMS might make incorrect changes that might not be discovered for some time (for example, installing a certificate with too short a lifetime).
When a system fails, operations staff need to quickly diagnose and repair the issue. To do this, operations staff need to understand which changes have been made and which components the changes affect. Once incorrect changes are identified, operations staff need the ability to perform corrective compensating actions. Rolling back all changes since an incorrect change would often remove many correct changes; thus, operations staff need the ability to perform individual compensating actions, not just roll back all changes after a point in time.
To help with the operational scenarios described above, the repository can maintain a historical log of changes. A historical log enables operations staff to explore and search the history of operational changes and, for " repository-based runtimes, issue appropriate compensating changes (for example, restore an old Access Control List or ACL).
There are several options for collecting repository change events. Enterprise versions of the repository use SQL Server 2008's Change Data Capture (CDC) to track changes to repository tables. The repository provides table-valued functions (named with the pattern [Schema].[ViewNameChanges]) over CDC to abstract away how views map to base tables. The approach has the advantages of aligning with the Microsoft Sync Framework and providing a simple mechanism for repository clients to query for changes. The principal disadvantage is that the approach reduces throughput since changes to the repository are recorded twice. However, because CDC is based on a very mature SQL log reader implementation, CDC is also very efficient and robust.
All versions of the SQL Server Modeling Services repository also support tracking changes with SQL Server 2008's Change Tracking support. The disadvantage of this Change Tracking over CDC is that it records minimal information about the change (for example, it would record that an assembly with ID 10 was deleted, but would not record other values like the name of the assembly that was deleted).
The repository implementation wraps some CDC and change-tracking functions so that security policies can be enforced. For example, if a user is not allowed to see the contents of a folder, then the user should also not be able to see changes related to the folder.
The term "versioning" encompasses many aspects of the evolution of a repository, ranging from tracking changes to the data itself through to deploying new versions of the repository. Each aspect imposes its own demands and needs careful treatment if the repository is to remain useful. This document details the pragmatic strategies that will allow the repository to meet these competing demands and significantly improve the software lifecycle.
About the Author
Anthony is a principal software architect working on the SQL Server Modeling Services repository. Anthony was previously an architect and development manager on Visual Studio Team System, the director of Visio's software and database group and a research fellow.