Identifier Design Patterns

Article
09/22/2010

[This content is no longer valid. For the latest information on "M", "Quadrant", SQL Server Modeling Services, and the Repository, see the Model Citizen blog.]

There are several design choices for identifiers, including the data type and the method of populating identifiers during inserts. In the SQL Server Modeling Services database tables, the identifier column is the primary key column, Id.

Tip

This topic explains how to use the Modeling Services pattern for identifiers directly in SQL Server. If you are creating models with Microsoft code name “M”, refer to the topic on using this Modeling Services pattern in “M”. For more information, see Choosing an Identity Pattern (Modeling Services).

Guidelines for Using Identifiers

The following list provides the SQL guidelines for using identifiers in Modeling Services database tables.

Select an Appropriate Data Type for the Id Column.
Apply the IDENTITY Property to the Id Column for Tables with Low Projected Insert Volumes.
Use Sequence Objects for Tables with High Projected Insert Volumes.
Use Sequence Object Aliases for Identifiers that must be Unique across Tables.

Note

Database administrative privileges are required for performing many of these manual design tasks in the Modeling Services database.

Select an Appropriate Data Type for the Id Column

Use a bigint or uniqueidentifier SQL data type for the Id column to support scalability scenarios. Limit the use of int primary key columns to tables that are relatively small and local to a single Modeling Services database instance. One scale-out scenario for the Modeling Services database involves partitioning groups of identifiers for the same table in different Modeling Services database servers. The bigint data type has a much higher range of available identifiers than the int data type, so it is better suited for this partitioning. The uniqueidentifier data type supports unique references to Modeling Services instances across database servers without the need to partition identifiers.

Note

Verify that your scenario requires the uniqueidentifier data type before choosing this over the bigint data type for your primary key. Primary keys that use the uniqueidentifier data type are much larger and have a poor locality of reference; this hurts performance. Smaller numeric primary keys require less disk I/O and result in more efficient queries. Whether this effect is significant depends on many factors, including the volume of activity on the table and total table size. In some Modeling Services scenarios, testing has demonstrated performance improvements by a factor of 10 when using bigint keys instead of uniqueidentifier keys.

Apply the IDENTITY Property to the Id Column for Tables with Low Projected Insert Volumes

SQL Server allows columns to use the IDENTITY property to automatically increment the identity value on each insert. This is very convenient, because insert statements do not need to obtain or specify a unique identity value. Instead, this value is generated automatically. This works well for relatively small tables with low insert volumes. However, for tables with frequent or large inserts, there is a performance penalty for using this feature. In those scenarios, avoid the IDENTITY property on the Id column. Instead, use the Modeling Services feature for supplying groups of sequential identifiers manually. This is discussed in the next section.

Example

The following T-SQL example creates an AddressesTable table in a Contact schema. It applies the IDENTITY property to the Id column with a seed value of 1 and an increment value of 1. All future insert statements into this table will not specify a value for Id; the identifier will automatically generate for each insert, starting at one and incrementing by one each time.

use Repository
go
create schema [Contact] authorization [RepositoryOwner]
go
create table [Contact].[AddressesTable](
   [Id] bigint not null identity (1, 1),
   [Folder] [Repository.Item].[FolderId] not null,
   [Street] nvarchar(100) not null,
   [City] nvarchar(50) not null,
   [State] nvarchar(20) not null,
   [ZipCode] int not null,
   constraint [PK_Addresses] primary key clustered ( [Id] ) 
) 
go

Use Sequence Objects for Tables with High Projected Insert Volumes

Tables with frequent or large inserts benefit from using a manual approach to generating identifier values. Modeling Services provides sequence objects for tracking identifier values and obtaining new intervals of identifiers for batch inserts.

Note

Note that tests on some Modeling Services scenarios have demonstrated a performance improvement by a factor of seven when using Modeling Services sequence objects for insertions.

The first step is to create a Modeling Services table with a primary key, Id, which does not apply the IDENTITY property. The following T-SQL example shows a simplified version of an address table, AddressesTable, in a schema named Contact.

use Repository
go
create schema [Contact] authorization [RepositoryOwner]
go
create table [Contact].[AddressesTable](
   [Id] bigint not null,
   [Folder] [Repository.Item].[FolderId] not null,
   [ZipCode] int not null,
   constraint [PK_Addresses] primary key clustered 
   (
      [Id] asc
   ) on [PRIMARY]
) on [PRIMARY]
go

To use the Modeling Services feature for supplying identifiers, create a new identifier sequence with the [Repository].[CreateIdSequence] stored procedure.

exec [Repository].[CreateIdSequence] 
   @schema=[Contact], @sequence=[Addresses], @initialValue = 1

The [Repository].[CreateIdSequence] stored procedure takes an owning schema name, a sequence name, and a start value. Use the base table name for the sequence name. In this example, the table name is AddressesTable and the base table name is Addresses, which is the name used for the identifier sequence.

Before inserting new rows, the caller must first obtain unique identifiers for each inserted row. This is done with the [Repository].[NewIdInterval] stored procedure. This stored procedure takes a bigint output parameter, @firstValue. This output parameter is set to the next available identifier. If more than one identifier is requested, then the caller passes in the size of the interval with the @intervalSize parameter. The procedure returns a set of consecutive identifiers that begin at @firstValue and contain @intervalSize values. The following T-SQL example uses the AddressesTable table and Addresses identifier sequence from the previous examples. It requests two identifiers by specifying a @intervalSize of 2.

declare @startValue as bigint
exec [Repository].[NewIdInterval] 
   @schema=N'Contact', @sequence=N'Addresses', @intervalSize=2, 
   @firstValue=@startValue OUTPUT

insert [Contact].[AddressesTable] (Id, Folder, ZipCode)
   values (@startValue, 1, 28173)
insert [Contact].[AddressesTable] (Id, Folder, ZipCode)
   values (@startValue+1, 1, 28174)

Use Sequence Object Aliases for Identifiers that Must Be Unique across Tables

There are scenarios where it is useful to have an incrementing sequence of identifiers that spans several related tables. Modeling Services supports these scenarios through sequence object aliases. A sequence object alias is a name for a sequence object that can be applied to multiple tables. The [Repository].[NewIdInterval] stored procedure searches the registered aliases. When an alias is found, it points to the sequence object to use for the identifier request. The [Repository].[CreateIdSequenceAlias] stored procedure registers a new sequence object alias.

Example

Consider a scenario with two tables named AddressesOneTable and AddressesTwoTable. These tables reside in the Contact schema. There could be a requirement that the Id columns of both tables contain unique values across both tables. A sequence object alias accomplishes this. First, create a new identifier sequence that both tables will use:

exec [Repository].[CreateIdSequence] 
   @schema=[Contact], @sequence=[AddressesSequence], @initialValue = 1

Then create Sequence Aliases for both tables:

exec [Repository].[CreateIdSequenceAlias] @schemaAlias=Contact, @sequenceAlias=AddressesOne, 
   @schema=Contact, @sequence = AddressesSequence
exec [Repository].[CreateIdSequenceAlias] @schemaAlias=Contact, @sequenceAlias=AddressesTwo, 
   @schema=Contact, @sequence = AddressesSequence

Now calls to [Repository].[NewIdInterval] for the AddressesOne alias and the AddressesTwo alias will both return the next identifiers in the AddressesSequence sequence object.

Note

Note that the [Repository].[CreateIdSequenceAlias] takes a parameter named @schemaAlias. This enables using the same identifier sequence for tables in different schemas, regardless of that identifier sequence’s registered schema.

Identifier Design Patterns

Guidelines for Using Identifiers

Select an Appropriate Data Type for the Id Column

Apply the IDENTITY Property to the Id Column for Tables with Low Projected Insert Volumes

Example

Use Sequence Objects for Tables with High Projected Insert Volumes

Use Sequence Object Aliases for Identifiers that Must Be Unique across Tables

Example

See Also

Concepts

Additional resources