Quickstart: Deploy SQL Server big data cluster on Azure Kubernetes Service (AKS)

In this quickstart, you will deploy a SQL Server 2019 big data cluster (preview) on AKS in a default configuration suitable for dev/test environments.

Note

AKS is just one location to host Kubernetes. Big data clusters can be deployed to Kubernetes regardless of the underlying infrastructure. For more information, see How to deploy SQL Server big data clusters on Kubernetes.

In addition to a SQL Master instance, the cluster includes one compute pool instance, one data pool instance, and two storage pool instances. Data is persisted using Kubernetes persistent volumes that use AKS default storage classes. To further customize your configuration, see the environment variables in the deployment guidance.

If you would prefer to run a script to create your AKS cluster and install a big data cluster at the same time, see Deploy a SQL Server big data cluster on Azure Kubernetes Service (AKS).

Note

SQL Server big data clusters is first available as a limited public preview through the SQL Server 2019 Early Adoption Program. To request access, register here, and specify your interest to try SQL Server big data clusters. Microsoft will triage all requests and respond as soon as possible.

Prerequisites

This quickstart requires that you have already configured an AKS cluster with a minimum version of v1.10. For more information, see the deploy on AKS guide.

On the computer you are using to run the commands to install the SQL Server big data cluster, install kubectl. SQL Server big data cluster requires a minimum 1.10 version for Kubernetes, for both server and client (kubectl). To install kubectl, see Install kubectl.

To install the mssqlctl CLI tool to manage the SQL Server big data cluster on your client machine, you must first install Python minimum version v3.0 and pip3. pip is already installed if you are using a Python version of at least 3.4 downloaded from python.org.

Verify AKS configuration

Once you have the AKS cluster deployed, you can execute the below kubectl command to view the cluster configuration. Ensure that kubectl is pointed to the correct cluster context.

kubectl config view

Install mssqlctl CLI management tool

Run below command to install mssqlctl tool on your client machine. The command works from both a Windows and a Linux client, but make sure you are running it from a cmd window that runs with administrative privileges on Windows or prefix it with sudo on Linux:

pip3 install --extra-index-url https://private-repo.microsoft.com/python/ctp-2.2 mssqlctl  

Important

If you installed a previous release, you must delete the cluster before upgrading mssqlctl and installing the new release. For more information, see Upgrading to a new release.

Tip

If mssqlctl does not install correctly, review the prerequisite steps in the article Install mssqlctl.

Define environment variables

Setting the environment variables required for deploying big data cluster slightly differs depending on whether you are using Windows or Linux/macOS client. Choose the steps below depending on which operating system you are using.

Before continuing, note the following important guidelines:

  • In the Command Window, quotes are included in the environment variables. If you use quotes to wrap a password, the quotes are included in the password.
  • In bash, quotes are not included in the variable. Our examples use double quotes ".
  • You can set the password environment variables to whatever you like, but make sure they are sufficiently complex and don't use the !, &, or ' characters.
  • For the CTP 2.2 release, do not change the default ports.
  • The sa account is a system administrator on the SQL Server Master instance that gets created during setup. After creating your SQL Server container, the MSSQL_SA_PASSWORD environment variable you specified is discoverable by running echo $MSSQL_SA_PASSWORD in the container. For security purposes, change your sa password as per best practices documented here.

Initialize the following environment variables. They are required for deploying a big data cluster:

Windows

Using a Command Window (not PowerShell), configure the following environment variables:

SET ACCEPT_EULA=Y
SET CLUSTER_PLATFORM=aks

SET CONTROLLER_USERNAME=<controller_admin_name - can be anything>
SET CONTROLLER_PASSWORD=<controller_admin_password - can be anything, password complexity compliant>
SET KNOX_PASSWORD=<knox_password - can be anything, password complexity compliant>
SET MSSQL_SA_PASSWORD=<sa_password_of_master_sql_instance, password complexity compliant>

SET DOCKER_REGISTRY=private-repo.microsoft.com
SET DOCKER_REPOSITORY=mssql-private-preview
SET DOCKER_USERNAME=<your username, credentials provided by Microsoft>
SET DOCKER_PASSWORD=<your password, credentials provided by Microsoft>
SET DOCKER_EMAIL=<your Docker email, use the username provided by Microsoft>
SET DOCKER_PRIVATE_REGISTRY="1"

Linux/macOS

Initialize the following environment variables:

export ACCEPT_EULA="Y"
export CLUSTER_PLATFORM="aks"

export CONTROLLER_USERNAME="<controller_admin_name - can be anything>"
export CONTROLLER_PASSWORD="<controller_admin_password - can be anything, password complexity compliant>"
export KNOX_PASSWORD="<knox_password - can be anything, password complexity compliant>"
export MSSQL_SA_PASSWORD="<sa_password_of_master_sql_instance, password complexity compliant>"

export DOCKER_REGISTRY="private-repo.microsoft.com"
export DOCKER_REPOSITORY="mssql-private-preview"
export DOCKER_USERNAME="<your username, credentials provided by Microsoft>"
export DOCKER_PASSWORD="<your password, credentials provided by Microsoft>"
export DOCKER_EMAIL="<your Docker email, use the username provided by Microsoft>"
export DOCKER_PRIVATE_REGISTRY="1"

Note

During the limited public preview, Docker credentials to download the SQL Server big data cluster images are provided to each customer by Microsoft. To request access, register here, and specify your interest to try SQL Server big data clusters.

Deploy a big data cluster

To deploy a SQL Server 2019 CTP 2.2 big data cluster on your Kubernetes cluster, run the following command:

mssqlctl create cluster <your-cluster-name>

Note

The name of your cluster needs to be only lower case alpha-numeric characters, no spaces. All Kubernetes artifacts for the big data cluster will be created in a namespace with same name as the cluster name specified.

The command window or shell returns the deployment status. You can also check the deployment status by running these commands in a different cmd window:

kubectl get all -n <your-cluster-name>
kubectl get pods -n <your-cluster-name>
kubectl get svc -n <your-cluster-name>

You can see a more granular status and configuration for each pod by running:

kubectl describe pod <pod name> -n <your-cluster-name>

Tip

For more details on how to monitor and troubleshoot a deployment, see the deployment troubleshooting section of the deployment guidance article.

Open the Cluster Administration Portal

Once the Controller pod is running, you can use the Cluster Administration Portal to monitor the deployment. You can access the portal using the external IP address and port number for the service-proxy-lb (for example: https://<ip-address>:30777/portal). Credentials for accessing the admin portal are the values of CONTROLLER_USERNAME and CONTROLLER_PASSWORD environment variables provided above.

You can get the IP address of the service-proxy-lb service by running this command in a bash or cmd window:

kubectl get svc service-proxy-lb -n <your-cluster-name>

Note

You will see a security warning when accessing the web page since we are using auto-generated SSL certificates. In future releases, we will provide the capability to provide your own signed certificates.

Connect to the big data cluster

After the deployment script has completed successfully, you can obtain the IP address of the SQL Server master instance and the Spark/HDFS end points using the steps outlined below. All cluster endpoints are displayed in the Service Endpoints section in the Cluster Administration Portal as well for easy reference.

Azure provides the Azure LoadBalancer service to AKS. Run following command in a cmd or bash window:

kubectl get svc endpoint-master-pool -n <your-cluster-name>
kubectl get svc service-security-lb -n <your-cluster-name>

Look for the External-IP value that is assigned to the services. Connect to the SQL Server master instance using the IP address for the endpoint-master-pool at port 31433 (Ex: <ip-address>,31433) and to the SQL Server big data cluster endpoint using the external-IP for the service-security-lb service. That big data cluster end point is where you can interact with HDFS and submit Spark jobs through Knox.

Next steps

Now that the SQL Server big data cluster is deployed, try out some of the new capabilities: