Quickstart: Deploy SQL Server big data cluster on Azure Kubernetes Service (AKS)
In this quickstart, you will deploy a SQL Server 2019 big data cluster (preview) on AKS in a default configuration suitable for dev/test environments.
AKS is just one location to host Kubernetes. Big data clusters can be deployed to Kubernetes regardless of the underlying infrastructure. For more information, see How to deploy SQL Server big data clusters on Kubernetes.
In addition to a SQL Master instance, the cluster includes one compute pool instance, one data pool instance, and two storage pool instances. Data is persisted using Kubernetes persistent volumes that use AKS default storage classes. To further customize your configuration, see the environment variables in the deployment guidance.
If you would prefer to run a script to create your AKS cluster and install a big data cluster at the same time, see Deploy a SQL Server big data cluster on Azure Kubernetes Service (AKS).
SQL Server big data clusters is first available as a limited public preview through the SQL Server 2019 Early Adoption Program. To request access, register here, and specify your interest to try SQL Server big data clusters. Microsoft will triage all requests and respond as soon as possible.
This quickstart requires that you have already configured an AKS cluster with a minimum version of v1.10. For more information, see the deploy on AKS guide.
On the computer you are using to run the commands to install the SQL Server big data cluster, install kubectl. SQL Server big data cluster requires a minimum 1.10 version for Kubernetes, for both server and client (kubectl). To install kubectl, see Install kubectl.
To install the mssqlctl CLI tool to manage the SQL Server big data cluster on your client machine, you must first install Python minimum version v3.0 and pip3.
pip is already installed if you are using a Python version of at least 3.4 downloaded from python.org.
Verify AKS configuration
Once you have the AKS cluster deployed, you can execute the below kubectl command to view the cluster configuration. Ensure that kubectl is pointed to the correct cluster context.
kubectl config view
Install mssqlctl CLI management tool
Run below command to install mssqlctl tool on your client machine. The command works from both a Windows and a Linux client, but make sure you are running it from a cmd window that runs with administrative privileges on Windows or prefix it with
sudo on Linux:
pip3 install --extra-index-url https://private-repo.microsoft.com/python/ctp-2.2 mssqlctl
If you installed a previous release, you must delete the cluster before upgrading mssqlctl and installing the new release. For more information, see Upgrading to a new release.
If mssqlctl does not install correctly, review the prerequisite steps in the article Install mssqlctl.
Define environment variables
Setting the environment variables required for deploying big data cluster slightly differs depending on whether you are using Windows or Linux/macOS client. Choose the steps below depending on which operating system you are using.
Before continuing, note the following important guidelines:
- In the Command Window, quotes are included in the environment variables. If you use quotes to wrap a password, the quotes are included in the password.
- In bash, quotes are not included in the variable. Our examples use double quotes
- You can set the password environment variables to whatever you like, but make sure they are sufficiently complex and don't use the
- For the CTP 2.2 release, do not change the default ports.
saaccount is a system administrator on the SQL Server Master instance that gets created during setup. After creating your SQL Server container, the
MSSQL_SA_PASSWORDenvironment variable you specified is discoverable by running
echo $MSSQL_SA_PASSWORDin the container. For security purposes, change your
sapassword as per best practices documented here.
Initialize the following environment variables. They are required for deploying a big data cluster:
Using a Command Window (not PowerShell), configure the following environment variables:
SET ACCEPT_EULA=Y SET CLUSTER_PLATFORM=aks SET CONTROLLER_USERNAME=<controller_admin_name - can be anything> SET CONTROLLER_PASSWORD=<controller_admin_password - can be anything, password complexity compliant> SET KNOX_PASSWORD=<knox_password - can be anything, password complexity compliant> SET MSSQL_SA_PASSWORD=<sa_password_of_master_sql_instance, password complexity compliant> SET DOCKER_REGISTRY=private-repo.microsoft.com SET DOCKER_REPOSITORY=mssql-private-preview SET DOCKER_USERNAME=<your username, credentials provided by Microsoft> SET DOCKER_PASSWORD=<your password, credentials provided by Microsoft> SET DOCKER_EMAIL=<your Docker email, use the username provided by Microsoft> SET DOCKER_PRIVATE_REGISTRY="1"
Initialize the following environment variables:
export ACCEPT_EULA="Y" export CLUSTER_PLATFORM="aks" export CONTROLLER_USERNAME="<controller_admin_name - can be anything>" export CONTROLLER_PASSWORD="<controller_admin_password - can be anything, password complexity compliant>" export KNOX_PASSWORD="<knox_password - can be anything, password complexity compliant>" export MSSQL_SA_PASSWORD="<sa_password_of_master_sql_instance, password complexity compliant>" export DOCKER_REGISTRY="private-repo.microsoft.com" export DOCKER_REPOSITORY="mssql-private-preview" export DOCKER_USERNAME="<your username, credentials provided by Microsoft>" export DOCKER_PASSWORD="<your password, credentials provided by Microsoft>" export DOCKER_EMAIL="<your Docker email, use the username provided by Microsoft>" export DOCKER_PRIVATE_REGISTRY="1"
During the limited public preview, Docker credentials to download the SQL Server big data cluster images are provided to each customer by Microsoft. To request access, register here, and specify your interest to try SQL Server big data clusters.
Deploy a big data cluster
To deploy a SQL Server 2019 CTP 2.2 big data cluster on your Kubernetes cluster, run the following command:
mssqlctl create cluster <your-cluster-name>
The name of your cluster needs to be only lower case alpha-numeric characters, no spaces. All Kubernetes artifacts for the big data cluster will be created in a namespace with same name as the cluster name specified.
The command window or shell returns the deployment status. You can also check the deployment status by running these commands in a different cmd window:
kubectl get all -n <your-cluster-name> kubectl get pods -n <your-cluster-name> kubectl get svc -n <your-cluster-name>
You can see a more granular status and configuration for each pod by running:
kubectl describe pod <pod name> -n <your-cluster-name>
For more details on how to monitor and troubleshoot a deployment, see the deployment troubleshooting section of the deployment guidance article.
Open the Cluster Administration Portal
Once the Controller pod is running, you can use the Cluster Administration Portal to monitor the deployment. You can access the portal using the external IP address and port number for the
service-proxy-lb (for example: https://<ip-address>:30777/portal). Credentials for accessing the admin portal are the values of
CONTROLLER_PASSWORD environment variables provided above.
You can get the IP address of the service-proxy-lb service by running this command in a bash or cmd window:
kubectl get svc service-proxy-lb -n <your-cluster-name>
You will see a security warning when accessing the web page since we are using auto-generated SSL certificates. In future releases, we will provide the capability to provide your own signed certificates.
Connect to the big data cluster
After the deployment script has completed successfully, you can obtain the IP address of the SQL Server master instance and the Spark/HDFS end points using the steps outlined below. All cluster endpoints are displayed in the Service Endpoints section in the Cluster Administration Portal as well for easy reference.
Azure provides the Azure LoadBalancer service to AKS. Run following command in a cmd or bash window:
kubectl get svc endpoint-master-pool -n <your-cluster-name> kubectl get svc service-security-lb -n <your-cluster-name>
Look for the External-IP value that is assigned to the services. Connect to the SQL Server master instance using the IP address for the
endpoint-master-pool at port 31433 (Ex: <ip-address>,31433) and to the SQL Server big data cluster endpoint using the external-IP for the
service-security-lb service. That big data cluster end point is where you can interact with HDFS and submit Spark jobs through Knox.
Now that the SQL Server big data cluster is deployed, try out some of the new capabilities: