Partitioning policy

The partitioning policy defines if and how extents (data shards) should be partitioned for a specific table or a materialized view.

By default, extents are partitioned by their ingestion time. In most cases, there's no need to apply another partitioning policy.

The main purpose of the partitioning policy is to improve performance of queries in specific scenarios.

Caution

  • There are no hard-coded limits set on the number of tables with the partitioning policy defined.
  • However, every additional table adds overhead to the background data partitioning process that runs on the cluster's nodes. Adding tables may result in more cluster resources being used.
  • For more information, see monitoring and capacity.
  • Before applying a partitioning policy on a materialized view, review the recommendations for materialized views partitioning policy.

Common scenarios

The following are common scenarios that can be addressed by setting a data partitioning policy:

  • Low cardinality partition key: For example, multi-tenant solutions, or a metrics table where most or all queries filter on the partition key column of type string such as the TenantId or the MetricId.
    • Low cardinality is defined as less than 10M distinct values. In the examples above, the cardinality is likely to be much lower than that.
    • Set the hash partition key to be the ID column, and set the PartitionAssigmentMode property to uniform.
  • High cardinality partition key: For example, IoT information from many different sensors, or academic records of many different students.
    • High cardinality is defined as more than 10M distinct values where the distribution of values in the column is approximately even.
    • In this case, set the hash partition key to be the column grouped-by or joined-on, and set the PartitionAssigmentMode property to default.
  • Unordered Data ingestion: Data ingested into a table might not be ordered and partitioned into extents (shards) according to a specific datetime column that represents the data creation time and is commonly used to filter data. This could be due to a backfill from heterogeneous source files that include datetime values over a large time span.
    • In this case, set the Uniform range datetime partition key to be the datetime column.
    • If you need retention and caching policies to align with the datetime values in the column, instead of aligning with the time of ingestion, set the OverrideCreationTime property to true.

Partition keys

The following kinds of partition keys are supported.

Kind Column Type Partition properties Partition value
Hash string Function, MaxPartitionCount, Seed, PartitionAssignmentMode Function(ColumnName, MaxPartitionCount, Seed)
Uniform range datetime RangeSize, Reference, OverrideCreationTime bin_at(ColumnName, RangeSize, Reference)

Hash partition key

Note

The data partitioning operation adds significant processing load. We recommend applying a hash partition key on a string-type column in a table only under the following conditions:

  • If the majority of queries use equality filters (==, in()).
  • The majority of queries aggregate/join on a specific string-typed column of large-dimension (cardinality of 10M or higher) such as an device_ID, or user_ID.
  • The usage pattern of the partitioned tables is in high concurrency query load, such as in monitoring or dashboarding applications.
  • A hash-modulo function is used to partition the data.
  • Data in homogeneous (partitioned) extents is ordered by the hash partition key.
    • You don't need to include the hash partition key in the row order policy, if one is defined on the table.
  • Queries that use the shuffle strategy, and in which the shuffle key used in join, summarize or make-series is the table's hash partition key, are expected to perform better because the amount of data required to move across cluster nodes is reduced.

Partition properties

Property Description Supported value(s) Recommended value
Function The name of a hash-modulo function to use. XxHash64
MaxPartitionCount The maximum number of partitions to create (the modulo argument to the hash-modulo function) per time period. In the range (1,2048].
Larger than five times the number of nodes in the cluster, and smaller than the cardinality of the column.
Higher values lead to greater overhead of the data partitioning process on the cluster's nodes, and a higher number of extents for each time period. For clusters with fewer than 20 nodes, start with 128. For clusters with fewer than 50 nodes, start with 256. Adjust the value based on these considerations, or based on the benefit in query performance vs. the overhead of partitioning the data post-ingestion.
Seed Use for randomizing the hash value. A positive integer. 1, which is also the default value.
PartitionAssignmentMode The mode used for assigning partitions to nodes in the cluster. Default: All homogeneous (partitioned) extents that belong to the same partition are assigned to the same node.
Uniform: An extents' partition values are disregarded. Extents are assigned uniformly to the cluster's nodes.
If queries don't join or aggregate on the hash partition key, use Uniform. Otherwise, use Default.

Hash partition key example

A hash partition key over a string-typed column named tenant_id. It uses the XxHash64 hash function, with a MaxPartitionCount of 128, and the default Seed of 1.

{
  "ColumnName": "tenant_id",
  "Kind": "Hash",
  "Properties": {
    "Function": "XxHash64",
    "MaxPartitionCount": 128,
    "Seed": 1,
    "PartitionAssignmentMode": "Default"
  }
}

Uniform range datetime partition key

Note

Only apply a uniform range datetime partition key on a datetime-typed column in a table when data ingested into the table is unlikely to be ordered according to this column.

In these cases, you can reshuffle the data between extents so that each extent includes records from a limited time range. This process results in filters on the datetime column being more effective at query time.

The partition function used is bin_at() and isn't customizable.

Partition properties

Property Description Recommended value
RangeSize A timespan scalar constant that indicates the size of each datetime partition. Start with the value 1.00:00:00 (one day). Don't set a shorter value, because it may result in the table having a large number of small extents that can't be merged.
Reference A datetime scalar constant that indicates a fixed point in time, according to which datetime partitions are aligned. Start with 1970-01-01 00:00:00. If there are records in which the datetime partition key has null values, their partition value is set to the value of Reference.
OverrideCreationTime A bool indicating whether or not the result extent's minimum and maximum creation times should be overridden by the range of the values in the partition key. Defaults to false. Set to true if data isn't ingested in-order of time of arrival (for example, a single source file may include datetime values that are distant), and/or you want to force retention/caching based on the datetime values, and not the time of ingestion.

Caution

When OverrideCreationTime is set to true, make sure the Lookback property in the table's effective Extents merge policy is aligned with the datetime values in your data.

Uniform range datetime partition example

The snippet shows a uniform datetime range partition key over a datetime typed column named timestamp. It uses datetime(1970-01-01) as its reference point, with a size of 1d for each partition, and does not override the extents' creation times.

{
  "ColumnName": "timestamp",
  "Kind": "UniformRange",
  "Properties": {
    "Reference": "1970-01-01T00:00:00",
    "RangeSize": "1.00:00:00",
    "OverrideCreationTime": false
  }
}

The policy object

By default, a table's data partitioning policy is null, in which case data in the table won't be partitioned.

The data partitioning policy has the following main properties:

  • PartitionKeys:

  • EffectiveDateTime:

    • The UTC datetime from which the policy is effective.
    • This property is optional. If it isn't specified, the policy will take effect for data ingested after the policy was applied.
    • Any non-homogeneous (non-partitioned) extents that may be dropped because of retention are ignored by the partitioning process. The extents are ignored because their creation time precedes 90% of the table's effective soft-delete period.

Caution

  • You can set a datetime value in the past and partition already-ingested data. However, this practice may significantly increase resources used in the partitioning process.
  • In most cases, it is recommended to only have newly ingested data partitioned, and to avoid partitioning large amounts of historical data.
  • If you choose to partition historical data, consider doing so gradually, by setting the EffectiveDateTime to a previous datetime in steps of up to a few days each time you alter the policy.

Data partitioning example

Data partitioning policy object with two partition keys.

  1. A hash partition key over a string-typed column named tenant_id.
    • It uses the XxHash64 hash function, with a MaxPartitionCount of 128, and the default Seed of 1.
  2. A uniform datetime range partition key over a datetime type column named timestamp.
    • It uses datetime(1970-01-01) as its reference point, with a size of 1d for each partition.
{
  "PartitionKeys": [
    {
      "ColumnName": "tenant_id",
      "Kind": "Hash",
      "Properties": {
        "Function": "XxHash64",
        "MaxPartitionCount": 128,
        "Seed": 1,
        "PartitionAssignmentMode": "Default"
      }
    },
    {
      "ColumnName": "timestamp",
      "Kind": "UniformRange",
      "Properties": {
        "Reference": "1970-01-01T00:00:00",
        "RangeSize": "1.00:00:00",
        "OverrideCreationTime": false
      }
    }
  ]
}

Additional properties

The following properties can be defined as part of the policy. These properties are optional and we recommend not changing them.

Property Description Recommended value Default value
MinRowCountPerOperation Minimum target for the sum of row count of the source extents of a single data partitioning operation. 0
MaxRowCountPerOperation Maximum target for the sum of the row count of the source extents of a single data partitioning operation. Set a value lower than 5M if you see that the partitioning operations consume a large amount of memory or CPU per operation. For more information, see monitoring. 0, with a default target of 5,000,000 records.
MaxOriginalSizePerOperation Maximum target for the sum of the original size (in bytes) of the source extents of a single data partitioning operation. If the partitioning operations consume a large amount of memory or CPU per operation, set a value lower than 5 GB. For more information, see monitoring. 0, with a default target of 5,368,709,120 bytes (5 GB).

The data partitioning process

  • Data partitioning runs as a post-ingestion background process in the cluster.
    • A table that is continuously ingested into is expected to always have a "tail" of data that is yet to be partitioned (non-homogeneous extents).
  • Data partitioning runs only on hot extents, regardless of the value of the EffectiveDateTime property in the policy.
    • If partitioning cold extents is required, you need to temporarily adjust the caching policy.

Monitor partitioning

Use the .show diagnostics command to monitor the progress or state of partitioning in a cluster.

.show diagnostics
| project MinPartitioningPercentageInSingleTable, TableWithMinPartitioningPercentage

The output includes:

  • MinPartitioningPercentageInSingleTable: The minimal percentage of partitioned data across all tables that have a data partitioning policy in the cluster.
    • If this percentage remains constantly under 90%, then evaluate the cluster's partitioning capacity.
  • TableWithMinPartitioningPercentage: The fully qualified name of the table whose partitioning percentage is shown above.

Use .show commands to monitor the partitioning commands and their resource use. For example:

.show commands 
| where StartedOn > ago(1d)
| where CommandType == "ExtentsPartition"
| parse Text with ".partition async table " TableName " extents" *
| summarize count(), sum(TotalCpu), avg(tolong(ResourcesUtilization.MemoryPeak)) by TableName, bin(StartedOn, 15m)
| render timechart with(ysplit = panels)

Partitioning capacity

  • The data partitioning process results in the creation of more extents. The cluster may gradually increase its extents merge capacity, so that the process of merging extents can keep up.
  • If there's a high ingestion throughput, or a large enough number of tables that have a partitioning policy defined, then the cluster may gradually increase its Extents partition capacity, so that the process of partitioning extents can keep up.
  • To avoid consuming too many resources, these dynamic increases are capped. You may be required to gradually and linearly increase them beyond the cap, if they're used up entirely.
    • If increasing the capacities causes a significant increase in the use of the cluster's resources, you can scale the cluster up/out, either manually, or by enabling autoscale.

Limitations

  • Attempts to partition data in a database that already has more than 5,000,000 extents will be throttled.
    • In such cases, we recommend that you temporarily disable partitioning and re-evaluate your configuration and policies. For example, you can set the EffectiveDateTime to a future date until the extent count stabilizes on a lower value.

Outliers in partitioned columns

  • The following situations can contribute to imbalanced distribution of data across the cluster's nodes, and degrade query performance:
    • If a hash partition key includes values that are much more prevalent than others, for example, an empty string, or a generic value (such as null or N/A).
    • The values represent an entity (such as tenant_id) that is more prevalent in the data set.
  • If a uniform range datetime partition key has a large enough percentage of values that are "far" from the majority of the values in the column, the overhead of the data partitioning process is increased and may lead to many small extents that the cluster will need to keep track of. An example of such a situation is datetime values from the distant past or future.

In both of these cases, either "fix" the data, or filter out any irrelevant records in the data before or at ingestion time, to reduce the overhead of the data partitioning on the cluster. For example, use an update policy.

Next steps

Use the partitioning policy control commands to manage data partitioning policies for tables.