Scale out databases with the shard map manager

Article
11/04/2022

To easily scale out databases on Azure SQL Database, use a shard map manager. The shard map manager is a special database that maintains global mapping information about all shards (databases) in a shard set. The metadata allows an application to connect to the correct database based upon the value of the sharding key. In addition, every shard in the set contains maps that track the local shard data (known as shardlets).

Shard map management

Understanding how these maps are constructed is essential to shard map management. This is done using the ShardMapManager class (Java, .NET), found in the Elastic Database client library to manage shard maps.

Shard maps and shard mappings

For each shard, you must select the type of shard map to create. The choice depends on the database architecture:

Single tenant per database
Multiple tenants per database (two types):
1. List mapping
2. Range mapping

For a single-tenant model, create a list-mapping shard map. The single-tenant model assigns one database per tenant. This is an effective model for SaaS developers as it simplifies shard map management.

List mapping

The multi-tenant model assigns several tenants to an individual database (and you can distribute groups of tenants across multiple databases). Use this model when you expect each tenant to have small data needs. In this model, assign a range of tenants to a database using range mapping.

Range mapping

Or you can implement a multi-tenant database model using a list mapping to assign multiple tenants to an individual database. For example, DB1 is used to store information about tenant ID 1 and 5, and DB2 stores data for tenant 7 and tenant 10.

Multiple tenants on single DB

Supported types for sharding keys

Elastic Scale support the following types as sharding keys:

.NET	Java
integer	integer
long	long
guid	uuid
byte[]	byte[]
datetime	timestamp
timespan	duration
datetimeoffset	offsetdatetime

List and range shard maps

Shard maps can be constructed using lists of individual sharding key values, or they can be constructed using ranges of sharding key values.

List shard maps

Shards contain shardlets and the mapping of shardlets to shards is maintained by a shard map. A list shard map is an association between the individual key values that identify the shardlets and the databases that serve as shards. List mappings are explicit and different key values can be mapped to the same database. For example, key value 1 maps to Database A, and key values 3 and 6 both maps to Database B.

Key	Shard Location
1	Database_A
3	Database_B
4	Database_C
6	Database_B
...	...

Range shard maps

In a range shard map, the key range is described by a pair [Low Value, High Value) where the Low Value is the minimum key in the range, and the High Value is the first value higher than the range.

For example, [0, 100) includes all integers greater than or equal 0 and less than 100. Note that multiple ranges can point to the same database, and disjoint ranges are supported (for example, [100,200) and [400,600) both point to Database C in the following example.)

Key	Shard Location
[1,50)	Database_A
[50,100)	Database_B
[100,200)	Database_C
[400,600)	Database_C
...	...

Each of the tables shown above is a conceptual example of a ShardMap object. Each row is a simplified example of an individual PointMapping (for the list shard map) or RangeMapping (for the range shard map) object.

Shard map manager

In the client library, the shard map manager is a collection of shard maps. The data managed by a ShardMapManager instance is kept in three places:

Global Shard Map (GSM): You specify a database to serve as the repository for all of its shard maps and mappings. Special tables and stored procedures are automatically created to manage the information. This is typically a small database and lightly accessed, and it should not be used for other needs of the application. The tables are in a special schema named __ShardManagement.
Local Shard Map (LSM): Every database that you specify to be a shard is modified to contain several small tables and special stored procedures that contain and manage shard map information specific to that shard. This information is redundant with the information in the GSM, and it allows the application to validate cached shard map information without placing any load on the GSM; the application uses the LSM to determine if a cached mapping is still valid. The tables corresponding to the LSM on each shard are also in the schema __ShardManagement.
Application cache: Each application instance accessing a ShardMapManager object maintains a local in-memory cache of its mappings. It stores routing information that has recently been retrieved.

Constructing a ShardMapManager

A ShardMapManager object is constructed using a factory (Java, .NET) pattern. The ShardMapManagerFactory.GetSqlShardMapManager (Java, .NET) method takes credentials (including the server name and database name holding the GSM) in the form of a ConnectionString and returns an instance of a ShardMapManager.

Please Note: The ShardMapManager should be instantiated only once per app domain, within the initialization code for an application. Creation of additional instances of ShardMapManager in the same app domain results in increased memory and CPU utilization of the application. A ShardMapManager can contain any number of shard maps. While a single shard map may be sufficient for many applications, there are times when different sets of databases are used for different schema or for unique purposes; in those cases multiple shard maps may be preferable.

In this code, an application tries to open an existing ShardMapManager with the TryGetSqlShardMapManager (Java, .NET method. If objects representing a Global ShardMapManager (GSM) do not yet exist inside the database, the client library creates them using the CreateSqlShardMapManager (Java, .NET) method.

// Try to get a reference to the Shard Map Manager in the shardMapManager database.
// If it doesn't already exist, then create it.
ShardMapManager shardMapManager = null;
boolean shardMapManagerExists = ShardMapManagerFactory.tryGetSqlShardMapManager(shardMapManagerConnectionString,ShardMapManagerLoadPolicy.Lazy, refShardMapManager);
shardMapManager = refShardMapManager.argValue;

if (shardMapManagerExists) {
    ConsoleUtils.writeInfo("Shard Map %s already exists", shardMapManager);
}
else {
    // The Shard Map Manager does not exist, so create it
    shardMapManager = ShardMapManagerFactory.createSqlShardMapManager(shardMapManagerConnectionString);
    ConsoleUtils.writeInfo("Created Shard Map %s", shardMapManager);
}

// Try to get a reference to the Shard Map Manager via the Shard Map Manager database.  
// If it doesn't already exist, then create it.
ShardMapManager shardMapManager;
bool shardMapManagerExists = ShardMapManagerFactory.TryGetSqlShardMapManager(
                                        connectionString,
                                        ShardMapManagerLoadPolicy.Lazy,
                                        out shardMapManager);

if (shardMapManagerExists)
{
    Console.WriteLine("Shard Map Manager already exists");
}
else
{
    // Create the Shard Map Manager.
    ShardMapManagerFactory.CreateSqlShardMapManager(connectionString);
    Console.WriteLine("Created SqlShardMapManager");

    shardMapManager = ShardMapManagerFactory.GetSqlShardMapManager(
            connectionString,
            ShardMapManagerLoadPolicy.Lazy);

// The connectionString contains server name, database name, and admin credentials for privileges on both the GSM and the shards themselves.
}

For the .NET version, you can use PowerShell to create a new Shard Map Manager.

Get a RangeShardMap or ListShardMap

After creating a shard map manager, you can get the RangeShardMap (Java, .NET) or ListShardMap (Java, .NET) using the TryGetRangeShardMap (Java, .NET), the TryGetListShardMap (Java, .NET), or the GetShardMap (Java, .NET) method.

// Creates a new Range Shard Map with the specified name, or gets the Range Shard Map if it already exists.
static <T> RangeShardMap<T> createOrGetRangeShardMap(ShardMapManager shardMapManager,
            String shardMapName,
            ShardKeyType keyType) {
    // Try to get a reference to the Shard Map.
    ReferenceObjectHelper<RangeShardMap<T>> refRangeShardMap = new ReferenceObjectHelper<>(null);
    boolean isGetSuccess = shardMapManager.tryGetRangeShardMap(shardMapName, keyType, refRangeShardMap);
    RangeShardMap<T> shardMap = refRangeShardMap.argValue;

    if (isGetSuccess && shardMap != null) {
        ConsoleUtils.writeInfo("Shard Map %1$s already exists", shardMap.getName());
    }
    else {
        // The Shard Map does not exist, so create it
        try {
            shardMap = shardMapManager.createRangeShardMap(shardMapName, keyType);
        }
        catch (Exception e) {
            e.printStackTrace();
        }
        ConsoleUtils.writeInfo("Created Shard Map %1$s", shardMap.getName());
    }

    return shardMap;
}

// Creates a new Range Shard Map with the specified name, or gets the Range Shard Map if it already exists.
public static RangeShardMap<T> CreateOrGetRangeShardMap<T>(ShardMapManager shardMapManager, string shardMapName)
{
    // Try to get a reference to the Shard Map.
    RangeShardMap<T> shardMap;
    bool shardMapExists = shardMapManager.TryGetRangeShardMap(shardMapName, out shardMap);

    if (shardMapExists)
    {
        ConsoleUtils.WriteInfo("Shard Map {0} already exists", shardMap.Name);
    }
    else
    {
        // The Shard Map does not exist, so create it
        shardMap = shardMapManager.CreateRangeShardMap<T>(shardMapName);
        ConsoleUtils.WriteInfo("Created Shard Map {0}", shardMap.Name);
    }

    return shardMap;
}

Shard map administration credentials

Applications that administer and manipulate shard maps are different from those that use the shard maps to route connections.

To administer shard maps (add or change shards, shard maps, shard mappings, etc.) you must instantiate the ShardMapManager using credentials that have read/write privileges on both the GSM database and on each database that serves as a shard. The credentials must allow for writes against the tables in both the GSM and LSM as shard map information is entered or changed, as well as for creating LSM tables on new shards.

See Credentials used to access the Elastic Database client library.

Only metadata affected

Methods used for populating or changing the ShardMapManager data do not alter the user data stored in the shards themselves. For example, methods such as CreateShard, DeleteShard, UpdateMapping, etc. affect the shard map metadata only. They do not remove, add, or alter user data contained in the shards. Instead, these methods are designed to be used in conjunction with separate operations you perform to create or remove actual databases, or that move rows from one shard to another to rebalance a sharded environment. (The split-merge tool included with elastic database tools makes use of these APIs along with orchestrating actual data movement between shards.) See Scaling using the Elastic Database split-merge tool.

Data dependent routing

The shard map manager is used in applications that require database connections to perform the app-specific data operations. Those connections must be associated with the correct database. This is known as Data Dependent Routing. For these applications, instantiate a shard map manager object from the factory using credentials that have read-only access on the GSM database. Individual requests for later connections supply credentials necessary for connecting to the appropriate shard database.

Note that these applications (using ShardMapManager opened with read-only credentials) cannot make changes to the maps or mappings. For those needs, create administrative-specific applications or PowerShell scripts that supply higher-privileged credentials as discussed earlier. See Credentials used to access the Elastic Database client library.

For more information, see Data dependent routing.

Modifying a shard map

A shard map can be changed in different ways. All of the following methods modify the metadata describing the shards and their mappings, but they do not physically modify data within the shards, nor do they create or delete the actual databases. Some of the operations on the shard map described below may need to be coordinated with administrative actions that physically move data or that add and remove databases serving as shards.

These methods work together as the building blocks available for modifying the overall distribution of data in your sharded database environment.

To add or remove shards: use CreateShard (Java, .NET) and DeleteShard (Java, .NET) of the shardmap (Java, .NET) class.

The server and database representing the target shard must already exist for these operations to execute. These methods do not have any impact on the databases themselves, only on metadata in the shard map.
To create or remove points or ranges that are mapped to the shards: use CreateRangeMapping (Java, .NET), DeleteMapping (Java, .NET) of the RangeShardMapping (Java, .NET) class, and CreatePointMapping (Java, .NET) of the ListShardMap (Java, .NET) class.

Many different points or ranges can be mapped to the same shard. These methods only affect metadata - they do not affect any data that may already be present in shards. If data needs to be removed from the database in order to be consistent with DeleteMapping operations, you perform those operations separately but in conjunction with using these methods.
To split existing ranges into two, or merge adjacent ranges into one: use SplitMapping (Java, .NET) and MergeMappings (Java, .NET).

Note that split and merge operations do not change the shard to which key values are mapped. A split breaks an existing range into two parts, but leaves both as mapped to the same shard. A merge operates on two adjacent ranges that are already mapped to the same shard, coalescing them into a single range. The movement of points or ranges themselves between shards needs to be coordinated by using UpdateMapping in conjunction with actual data movement. You can use the Split/Merge service that is part of elastic database tools to coordinate shard map changes with data movement, when movement is needed.
To re-map (or move) individual points or ranges to different shards: use UpdateMapping (Java, .NET).

Since data may need to be moved from one shard to another in order to be consistent with UpdateMapping operations, you need to perform that movement separately but in conjunction with using these methods.
To take mappings online and offline: use MarkMappingOffline (Java, .NET) and MarkMappingOnline (Java, .NET) to control the online state of a mapping.

Certain operations on shard mappings are only allowed when a mapping is in an offline state, including UpdateMapping and DeleteMapping. When a mapping is offline, a data-dependent request based on a key included in that mapping returns an error. In addition, when a range is first taken offline, all connections to the affected shard are automatically killed in order to prevent inconsistent or incomplete results for queries directed against ranges being changed.

Mappings are immutable objects in .NET. All of the methods above that change mappings also invalidate any references to them in your code. To make it easier to perform sequences of operations that change a mapping's state, all of the methods that change a mapping return a new mapping reference, so operations can be chained. For example, to delete an existing mapping in shardmap sm that contains the key 25, you can execute the following:

    sm.DeleteMapping(sm.MarkMappingOffline(sm.GetMappingForKey(25)));

Adding a shard

Applications often need to add new shards to handle data that is expected from new keys or key ranges, for a shard map that already exists. For example, an application sharded by Tenant ID may need to provision a new shard for a new tenant, or data sharded monthly may need a new shard provisioned before the start of each new month.

If the new range of key values is not already part of an existing mapping and no data movement is necessary, it is simple to add the new shard and associate the new key or range to that shard. For details on adding new shards, see Adding a new shard.

For scenarios that require data movement, however, the split-merge tool is needed to orchestrate the data movement between shards in combination with the necessary shard map updates. For details on using the split-merge tool, see Overview of split-merge

Additional resources

Not using elastic database tools yet? Check out our Getting Started Guide. For questions, contact us on the Microsoft Q&A question page for SQL Database and for feature requests, add new ideas or vote for existing ideas in the SQL Database feedback forum.

Scale out databases with the shard map manager

Shard maps and shard mappings

Supported types for sharding keys

List and range shard maps

List shard maps

Range shard maps

Shard map manager

Constructing a ShardMapManager

Get a RangeShardMap or ListShardMap

Shard map administration credentials

Only metadata affected

Data dependent routing

Modifying a shard map

Adding a shard

Additional resources

Feedback

Feedback

Additional resources