Data persistence with SQL Server big data cluster on Kubernetes

THIS TOPIC APPLIES TO:yesSQL Server (starting with 2019)noAzure SQL DatabasenoAzure SQL Data Warehouse noParallel Data Warehouse

Persistent Volumes provide a plugin model for storage in Kubernetes. How storage is provided is abstracted from how it is consumed. Therefore, you can bring your own highly available storage and plug it into the SQL Server big data cluster. This gives you full control over the type of storage, availability, and performance that you require. Kubernetes supports various kinds of storage solutions including Azure disks/files, NFS, local storage, and more.

Configure persistent volumes

The way a SQL Server big data cluster consumes these persistent volumes is by using Storage Classes. You can create different storage classes for different kind of storage and specify them at the big data cluster deployment time. You can configure which storage class and the persistent volume claim size to use for which purpose at the pool level. A SQL Server big data cluster creates persistent volume claims with the specified storage class name for each component that requires persistent volumes. It then mounts the corresponding persistent volume(s) in the pod.

Configure big data cluster storage settings

Similar to other customizations, you can specify storage settings in the cluster configuration files at deployment time for each pool and the control plane. If there are no storage configuration settings in the pool specifications, then the control plane storage settings will be used. This is a sample of the storage configuration section that you can include in the spec:

    "storage": 
    {
      "data": {
        "className": "default",
        "accessMode": "ReadWriteOnce",
        "size": "15Gi"
      },
      "logs": {
        "className": "default",
        "accessMode": "ReadWriteOnce",
        "size": "10Gi"
    }

Deployment of big data cluster will use persistent storage to store data, metadata and logs for various components. You can customize the size of the persistent volume claims created as part of the deployment. As a best practice, we recommend to use storage classes with a Retain reclaim policy.

Note

In CTP 3.1, you can't modify storage configuration setting post deployment. Also, only ReadWriteOnce access mode for the whole cluster is supported.

Warning

Running without persistent storage can work in a test environment, but it could result in a non-functional cluster. Upon pod restarts, cluster metadata and/or user data will be lost permanently. We do not recommend to run in this configuration.

Configure storage section provides more examples on how to configure storage settings for your SQL Server big data cluster deployment.

AKS storage classes

AKS comes with two built-in storage classes default and managed-premium along with dynamic provisioner for them. You can specify either of those or create your own storage class for deploying big data cluster with persistent storage enabled. By default, the built in cluster configuration file for aks aks-dev-test comes with persistent storage configurations to use default storage class.

Warning

Persistent volumes created with the built-in storage classes default and managed-premium have a reclaim policy of Delete. So at the time the you delete the SQL Server big data cluster, persistent volume claims get deleted and then persistent volumes as well. You can create custom storage classes using azure-disk privioner with a Retain reclaim policy as shown in this article.

Minikube storage class

Minikube comes with a built-in storage class called standard along with a dynamic provisioner for it. The built in configuration file for minikube minikube-dev-test has the storage configuration settings in the control plane spec. The same settings will be applied to all pools specs. You can also customize a copy of this file and use it for your big data cluster deployment on minikube. You can manually edit the custom file and change the size of the persistent volumes claims for specific pools to accommodate the workloads you want to run. Or, see Configure storage section for examples on how to do edits using mssqlctl commands.

Kubeadm storage classes

Kubeadm does not come with a built-in storage class. You must create your own storage classes and persistent volumes using local storage or your preferred provisioner, such as Rook. In that case, you would set the className to the storage class you configured.

Note

In the built in deployment configuration file for kubeadm kubeadm-dev-test there is no storage class name specified for the data and log storage. Before deployment, you must customize the configuration file and set the value for className otherwise the pre-deployment validations will fail. Deployment also has a validation step that checks for the existence of the storage class, but not for the necessary persistent volumes. You must ensure you create enough volumes depending on the scale of your cluster. In CTP 3.1, for the default cluster size you must create at least 23 volumes. Here is an example on how to create persistent volumes using local provisioner.

Customize storage configurations for each pool

For all customizations, you must first create a copy of the built in configuration file you want to use. For example, the following command creates a copy of the aks-dev-test deployment configuration file in a subdirectory named custom:

mssqlctl bdc config init --source aks-dev-test --target custom

Then, you can customize your config file either by editing it manually, or you can use mssqlctl bdc config section set command. This set command uses a combination of jsonpath and jsonpatch libraries to provide ways to edit your config file.

Configure size

By default, the size of the persistent volume claims provisioned for each of the pods provisioned in the cluster is 10 GB. You can update this value to accommodate the workloads you are running in a custom configuration file before cluster deployment.

The following example only updates the size of persistent volume claims for data stored in the storage pool to 100Gi. Note that storage section must exist in the configuration file for the storage pool before you run this command:

mssqlctl bdc config section set --config-profile custom -j "$.spec.pools[?(@.spec.type == ""Storage"")].spec.storage.data.size=100Gi"

The following example updates the size of persistent volume claims for all pools to 32Gi:

mssqlctl bdc config section set --config-profile custom -j "$.spec.controlPlane.spec.storage.data.size=32Gi"

Configure storage class

Following example shows how to modify the storage class for the control plane:

mssqlctl bdc config section set --config-profile custom -j "$.spec.controlPlane.spec.storage.data.className=<yourStorageClassName>"

Another option is to manually edit the custom configuration file or to use jsonpatch like in the following example that changes the storage class for Storage pool. Create a patch.json file with this content:

{
  "patch": [
    {
      "op": "add",
      "path": "$.spec.pools[?(@.spec.type == 'Storage')].spec.storage",
      "value": {
          "data": {
            "className": "default",
            "accessMode": "ReadWriteOnce",
            "size": "100Gi"
          },
          "logs": {
            "className": "default",
            "accessMode": "ReadWriteOnce",
            "size": "32Gi"
          }
        }
      }
  ]
}

Apply the patch file. Use mssqlctl bdc config section set command to apply the changes in the JSON patch file. The following example applies the patch.json file to a target deployment configuration file custom.json.

mssqlctl bdc config section set --config-profile custom -p ./patch.json

Next steps

For complete documentation about volumes in Kubernetes, see the Kubernetes documentation on Volumes.

For more information about deploying a SQL Server big data cluster, see How to deploy SQL Server big data cluster on Kubernetes.