Data partitioning policy

The partitioning policy defines if and how Extents (data shards) should be partitioned, for a specific table.

The main purpose of the policy is to improve performance of queries that are known to narrow the data set of values in the partitioned columns, or aggregate/join on a high cardinality string column. The policy may also result in better compression of the data.

Caution

There are no hard-coded limits set on the number of tables that can have the policy defined on them. However, every additional table adds overhead to the background data partitioning process that runs on the cluster's nodes. It may result in more cluster resources being used. For more information, see Monitoring and Capacity.

Partition keys

The following kinds of partition keys are supported.

Kind Column Type Partition properties Partition value
Hash string Function, MaxPartitionCount, Seed Function(ColumnName, MaxPartitionCount, Seed)
Uniform range datetime RangeSize, Reference bin_at(ColumnName, RangeSize, Reference)

Hash partition key

Note

Apply a hash partition key on a string-type column in a table only in the following instances:

  • If the majority of queries use equality filters (==, in()).
  • The majority of queries aggregate/join on a specific string-typed column of large-dimension (cardinality of 10M or higher) such as an application_ID, a tenant_ID, or a user_ID.
  • A hash-modulo function is used to partition the data.
  • All homogeneous (partitioned) extents that belong to the same partition are assigned to the same data node.
  • Data in homogeneous (partitioned) extents is ordered by the hash partition key.
    • You don't need to include the hash partition key in the row order policy, if one is defined on the table.
  • Queries that use the shuffle strategy, and in which the shuffle key used in join, summarize or make-series is the table's hash partition key, are expected to perform better, because the amount of data required to move across cluster nodes is significantly reduced.

Partition properties

  • Function is the name of a hash-modulo function to use.
    • Supported value: XxHash64.
  • MaxPartitionCount is the maximum number of partitions to create (the modulo argument to the hash-modulo function) per time period.
    • Supported values are in the range (1,1024].
      • The value is expected to be:
        • Larger than the number of nodes in the cluster
        • Smaller than the cardinality of the column.
      • The higher the value is, the greater the overhead of the data partitioning process on the cluster's nodes, and the higher the number of extents for each time period.
      • We recommend that you start with a value of 256.
        • Adjust the value as needed, based on the above considerations, or based on the benefit in query performance vs. the overhead of partitioning the data post-ingestion.
  • Seed is the value to use for randomizing the hash value.
    • The value should be a positive integer.
    • The recommended value is 1, which is the default, if unspecified.

Example

A hash partition key over a string-typed column named tenant_id. It uses the XxHash64 hash function, with a MaxPartitionCount of 256, and the default Seed of 1.

{
  "ColumnName": "tenant_id",
  "Kind": "Hash",
  "Properties": {
    "Function": "XxHash64",
    "MaxPartitionCount": 256,
    "Seed": 1
  }
}

Uniform range datetime partition key

Note

Only apply a uniform range datetime partition key on a datetime-typed column in a table when data ingested into the table is unlikely to be ordered according to this column.

In such cases, it can be helpful to reshuffle the data between extents so that each extent ends up including records from a limited time range. This will result with filters on that datetime column being more effective at query time.

The partition function used is bin_at() and isn't customizable.

Partition properties

  • RangeSize is a timespan scalar constant that indicates the size of each datetime partition. We recommended that you:
    • Start with the value 1.00:00:00 (one day).
    • Don't set a value shorter than that, because it may result in the table having a large number of small extents that can't be merged.
  • Reference is a datetime scalar constant that indicates a fixed point in time, according to which datetime partitions are aligned.
    • We recommend you start with 1970-01-01 00:00:00.
    • If there are records in which the datetime partition key has null values, their partition value is set to the value of Reference.

Example

The code snippet shows a uniform datetime range partition key over a datetime typed column named timestamp. It uses datetime(1970-01-01) as its reference point, with a size of 1d for each partition.

{
  "ColumnName": "timestamp",
  "Kind": "UniformRange",
  "Properties": {
    "Reference": "1970-01-01T00:00:00",
    "RangeSize": "1.00:00:00"
  }
}

The policy object

By default, a table's data partitioning policy is null, in which case data in the table won't be partitioned.

The data partitioning policy has the following main properties:

  • PartitionKeys:

  • EffectiveDateTime:

    • The UTC datetime from which the policy is effective.
    • This property is optional. If it isn't specified, then the policy will take effect on data ingested after the policy was applied.
    • Any non-homogeneous (non-partitioned) extents that may be dropped because of retention, are ignored by the partitioning process, because their creation time precedes 90% of the table's effective soft-delete period.

Example

Data partitioning policy object with two partition keys.

  1. A hash partition key over a string-typed column named tenant_id.
    • It uses the XxHash64 hash function, with a MaxPartitionCount of 256, and the default Seed of 1.
  2. A uniform datetime range partition key over a datetime type column named timestamp.
    • It uses datetime(1970-01-01) as its reference point, with a size of 1d for each partition.
{
  "PartitionKeys": [
    {
      "ColumnName": "tenant_id",
      "Kind": "Hash",
      "Properties": {
        "Function": "XxHash64",
        "MaxPartitionCount": 256,
        "Seed": 1
      }
    },
    {
      "ColumnName": "timestamp",
      "Kind": "UniformRange",
      "Properties": {
        "Reference": "1970-01-01T00:00:00",
        "RangeSize": "1.00:00:00"
      }
    }
  ]
}

Additional properties

The following properties can be defined as part of the policy, but are optional and we recommend that you don't change them.

  • MinRowCountPerOperation:

    • Minimum target for the sum of row count of the source extents of a single data partitioning operation.
    • This property is optional. Its default value is 0.
  • MaxRowCountPerOperation:

    • Maximum target for the sum of the row count of the source extents of a single data partitioning operation.
    • This property is optional. Its default value is 0, with a default target of 5,000,000 records.
      • You can set a value lower than 5M if you see that the partitioning operations consume a very large amount of memory or CPU, per operation. For more information, see Monitoring.

Notes

The data partitioning process

  • Data partitioning runs as a post-ingestion background process in the cluster.
    • A table that is continuously ingested into is expected to always have a "tail" of data that is yet to be partitioned (non-homogeneous extents).
  • Data partitioning runs only on hot extents, regardless of the value of the EffectiveDateTime property in the policy.
    • If partitioning cold extents is required, you will need to temporarily adjust the caching policy.

Monitoring

Use the .show diagnostics command to monitor the progress or state of partitioning in a cluster.

.show diagnostics
| project MinPartitioningPercentageInSingleTable, TableWithMinPartitioningPercentage

The output includes:

  • MinPartitioningPercentageInSingleTable: The minimal percentage of partitioned data across all tables that have a data partitioning policy in the cluster.
    • If this percentage remains constantly under 90%, then evaluate the cluster's partitioning capacity.
  • TableWithMinPartitioningPercentage: The fully qualified name of the table whose partitioning percentage is shown above.

Use .show commands to monitor the partitioning commands and their resource utilization. For example:

.show commands 
| where StartedOn > ago(1d)
| where CommandType == "ExtentsPartition"
| parse Text with ".partition async table " TableName " extents" *
| summarize count(), sum(TotalCpu), avg(tolong(ResourcesUtilization.MemoryPeak)) by TableName, bin(StartedOn, 15m)
| render timechart with(ysplit = panels)

Capacity

  • The data partitioning process results in the creation of more extents. The cluster may gradually increase its Extents merge capacity, so that the process of merging extents can keep up.
  • If there's a high ingestion throughput, or a large enough number of tables that have a partitioning policy defined, then the cluster may gradually increase its Extents partition capacity, so that the process of partitioning extents can keep up.
  • To avoid consuming too many resources, these dynamic increases are capped. You may be required to gradually and linearly increase them beyond the cap, if they're used up entirely.
    • If increasing the capacities causes a significant increase in the use of the cluster's resources, you can scale the cluster up/out, either manually, or by enabling autoscale.

Outliers in partitioned columns

  • If a hash partition key includes values that are much more prevalent than others, for example, an empty string, or a generic value (such as null or N/A), or they represent an entity (such as tenant_id) that is more prevalent in the data set, that could contribute to imbalanced distribution of data across the cluster's nodes, and degrade query performance.
  • If a uniform range datetime partition key has a large enough percentage of values that are "far" from the majority of the values in the column, for example, datetime values from the distant past or future, then that could increase the overhead of the data partitioning process, and lead to many small extents that the cluster will need to keep track of.

In both of these cases, you should either "fix" the data, or filter out any irrelevant records in the data before or at ingestion time, to reduce the overhead of the data partitioning on the cluster. For example, use an update policy.

Next steps

Use the partitioning policy control commands to manage data partitioning policies for tables.