Quickstart: Configure a hybrid cluster with Azure Managed Instance for Apache Cassandra
Azure Managed Instance for Apache Cassandra provides automated deployment and scaling operations for managed open-source Apache Cassandra datacenters. This service helps you accelerate hybrid scenarios and reduce ongoing maintenance.
This quickstart demonstrates how to use the Azure CLI commands to configure a hybrid cluster. If you have existing datacenters in an on-premise or self-hosted environment, you can use Azure Managed Instance for Apache Cassandra to add other datacenters to that cluster and maintain them.
Prerequisites
Use the Bash environment in Azure Cloud Shell.
If you prefer, install the Azure CLI to run CLI reference commands.
If you're using a local installation, sign in to the Azure CLI by using the az login command. To finish the authentication process, follow the steps displayed in your terminal. For additional sign-in options, see Sign in with the Azure CLI.
When you're prompted, install Azure CLI extensions on first use. For more information about extensions, see Use extensions with the Azure CLI.
Run az version to find the version and dependent libraries that are installed. To upgrade to the latest version, run az upgrade.
This article requires the Azure CLI version 2.30.0 or higher. If you are using Azure Cloud Shell, the latest version is already installed.
Azure Virtual Network with connectivity to your self-hosted or on-premise environment. For more information on connecting on premises environments to Azure, see the Connect an on-premises network to Azure article.
Configure a hybrid cluster
Sign in to the Azure portal and navigate to your Virtual Network resource.
Open the Subnets tab and create a new subnet. To learn more about the fields in the Add subnet form, see the Virtual Network article:
Note
The Deployment of a Azure Managed Instance for Apache Cassandra requires internet access. Deployment fails in environments where internet access is restricted. Make sure you aren't blocking access within your VNet to the following vital Azure services that are necessary for Managed Cassandra to work properly. You can also find an extensive list of IP address and port dependencies here.
- Azure Storage
- Azure KeyVault
- Azure Virtual Machine Scale Sets
- Azure Monitoring
- Azure Active Directory
- Azure Security
Now we will apply some special permissions to the VNet and subnet which Cassandra Managed Instance requires, using Azure CLI. Use the
az role assignment createcommand, replacing<subscriptionID>,<resourceGroupName>, and<vnetName>with the appropriate values:az role assignment create \ --assignee a232010e-820c-4083-83bb-3ace5fc29d0b \ --role 4d97b98b-1d4f-4787-a291-c67834d212e7 \ --scope /subscriptions/<subscriptionID>/resourceGroups/<resourceGroupName>/providers/Microsoft.Network/virtualNetworks/<vnetName>Note
The
assigneeandrolevalues in the previous command are fixed service principle and role identifiers respectively.Next, we will configure resources for our hybrid cluster. Since you already have a cluster, the cluster name here will only be a logical resource to identify the name of your existing cluster. Make sure to use the name of your existing cluster when defining
clusterNameandclusterNameOverridevariables in the following script.You also need, at minimum, the seed nodes from your existing datacenter, and the gossip certificates required for node-to-node encryption. Azure Managed Instance for Apache Cassandra requires node-to-node encryption for communication between datacenters. If you do not have node-to-node encryption implemented in your existing cluster, you would need to implement it - see documentation here. You should supply the path to the location of the certificates. Each certificate should be in PEM format, e.g.
-----BEGIN CERTIFICATE-----\n...PEM format 1...\n-----END CERTIFICATE-----. In general, there are two ways of implementing certificates:Self signed certs. This means a private and public (no CA) certificate for each node - in this case we need all public certificates.
Certs signed by a CA. This can be a self-signed CA or even a public one. In this case we need the root CA certificate (refer to instructions on preparing SSL certificates for production), and all intermediaries (if applicable).
Optionally, if you have also implemented client-to-node certificates (see here), you also need to provide them in the same format when creating the hybrid cluster. See sample below.
Note
The value of the
delegatedManagementSubnetIdvariable you will supply below is exactly the same as the value of--scopethat you supplied in the command above:resourceGroupName='MyResourceGroup' clusterName='cassandra-hybrid-cluster-legal-name' clusterNameOverride='cassandra-hybrid-cluster-illegal-name' location='eastus2' delegatedManagementSubnetId='/subscriptions/<subscriptionID>/resourceGroups/<resourceGroupName>/providers/Microsoft.Network/virtualNetworks/<vnetName>/subnets/<subnetName>' # You can override the cluster name if the original name is not legal for an Azure resource: # overrideClusterName='ClusterNameIllegalForAzureResource' # the default cassandra version will be v3.11 az managed-cassandra cluster create \ --cluster-name $clusterName \ --resource-group $resourceGroupName \ --location $location \ --delegated-management-subnet-id $delegatedManagementSubnetId \ --external-seed-nodes 10.52.221.2 10.52.221.3 10.52.221.4 \ --external-gossip-certificates /usr/csuser/clouddrive/rootCa.pem /usr/csuser/clouddrive/gossipKeyStore.crt_signed # optional - add your existing datacenter's client-to-node certificates (if implemented): # --client-certificates /usr/csuser/clouddrive/rootCa.pem /usr/csuser/clouddrive/nodeKeyStore.crt_signedNote
If your cluster already has node-to-node and client-to-node encryption, you should know where your existing client and/or gossip SSL certificates are kept. If you are uncertain, you should be able to run
keytool -list -keystore <keystore-path> -rfc -storepass <password>to print the certs.After the cluster resource is created, run the following command to get the cluster setup details:
resourceGroupName='MyResourceGroup' clusterName='cassandra-hybrid-cluster' az managed-cassandra cluster show \ --cluster-name $clusterName \ --resource-group $resourceGroupName \The previous command returns information about the managed instance environment. You'll need the gossip certificates so that you can install them on the trust store for nodes in your existing datacenter. The following screenshot shows the output of the previous command and the format of certificates:
Note
The certificates returned from the above command contain line breaks represented as text, for example
\r\n. You should copy each certificate to a file and format it before attempting to import it into your existing datacenter's trust store.Tip
Copy the
gossipCertificatesarray value shown in the above screen shot into a file, and use the following bash script (you would need to download and install jq for your platform) to format the certs and create separate pem files for each.readarray -t cert_array < <(jq -c '.[]' gossipCertificates.txt) # iterate through the certs array, format each cert, write to a numbered file. num=0 filename="" for item in "${cert_array[@]}"; do let num=num+1 filename="cert$num.pem" cert=$(jq '.pem' <<< $item) echo -e $cert >> $filename sed -e '1d' -e '$d' -i $filename doneNext, create a new datacenter in the hybrid cluster. Make sure to replace the variable values with your cluster details:
resourceGroupName='MyResourceGroup' clusterName='cassandra-hybrid-cluster' dataCenterName='dc1' dataCenterLocation='eastus2' virtualMachineSKU='Standard_D8s_v4' noOfDisksPerNode=4 az managed-cassandra datacenter create \ --resource-group $resourceGroupName \ --cluster-name $clusterName \ --data-center-name $dataCenterName \ --data-center-location $dataCenterLocation \ --delegated-subnet-id $delegatedManagementSubnetId \ --node-count 9 --sku $virtualMachineSKU \ --disk-capacity $noOfDisksPerNode \ --availability-zone falseNote
The value for
--skucan be chosen from the following available SKUs:- Standard_E8s_v4
- Standard_E16s_v4
- Standard_E20s_v4
- Standard_E32s_v4
- Standard_DS13_v2
- Standard_DS14_v2
- Standard_D8s_v4
- Standard_D16s_v4
- Standard_D32s_v4
Note also that
--availability-zoneis set tofalse. To enable availability zones, set this totrue. Availability zones increase the availability SLA of the service. For more details, review the full SLA details here.Warning
Availability zones are not supported in all regions. Deployments will fail if you select a region where Availability zones are not supported. See here for supported regions. The successful deployment of availability zones is also subject to the availability of compute resources in all of the zones in the given region. Deployments may fail if the SKU you have selected, or capacity, is not available across all zones.
Now that the new datacenter is created, run the show datacenter command to view its details:
resourceGroupName='MyResourceGroup' clusterName='cassandra-hybrid-cluster' dataCenterName='dc1' az managed-cassandra datacenter show \ --resource-group $resourceGroupName \ --cluster-name $clusterName \ --data-center-name $dataCenterNameThe previous command outputs the new datacenter's seed nodes:
Now add the new datacenter's seed nodes to your existing datacenter's seed node configuration within the cassandra.yaml file. And install the managed instance gossip certificates that you collected earlier to the trust store for each node in your existing cluster, using
keytoolcommand for each cert:keytool -importcert -keystore generic-server-truststore.jks -alias CassandraMI -file cert1.pem -noprompt -keypass myPass -storepass truststorePassNote
If you want to add more datacenters, you can repeat the above steps, but you only need the seed nodes.
Finally, use the following CQL query to update the replication strategy in each keyspace to include all datacenters across the cluster:
ALTER KEYSPACE "ks" WITH REPLICATION = {'class': 'NetworkTopologyStrategy', 'on-premise-dc': 3, 'managed-instance-dc': 3};You also need to update the password tables:
ALTER KEYSPACE "system_auth" WITH REPLICATION = {'class': 'NetworkTopologyStrategy', 'on-premise-dc': 3, 'managed-instance-dc': 3}
Troubleshooting
If you encounter an error when applying permissions to your Virtual Network using Azure CLI, such as Cannot find user or service principal in graph database for 'e5007d2c-4b13-4a74-9b6a-605d99f03501', you can apply the same permission manually from the Azure portal. Learn how to do this here.
Note
The Azure Cosmos DB role assignment is used for deployment purposes only. Azure Managed Instanced for Apache Cassandra has no backend dependencies on Azure Cosmos DB.
Clean up resources
If you're not going to continue to use this managed instance cluster, delete it with the following steps:
- From the left-hand menu of Azure portal, select Resource groups.
- From the list, select the resource group you created for this quickstart.
- On the resource group Overview pane, select Delete resource group.
- In the next window, enter the name of the resource group to delete, and then select Delete.
Next steps
In this quickstart, you learned how to create a hybrid cluster using Azure CLI and Azure Managed Instance for Apache Cassandra. You can now start working with the cluster.
