SQL Server Big Data Clusters release notes
Gain near real-time insights from all your data with SQL Server 2019 Big Data Clusters, which provide a complete environment for working with large sets of data, including machine learning and AI capabilities.
This article lists the updates and known issues for the most recent releases of SQL Server Big Data Clusters (BDC).
SQL Server 2019
SQL Server 2019 (15.x) introduces SQL Server Big Data Clusters.
Use SQL Server Big Data Clusters to:
- Deploy scalable clusters of SQL Server, Spark, and HDFS containers running on Kubernetes.
- Read, write, and process big data from Transact-SQL or Spark.
- Easily combine and analyze high-value relational data with high-volume big data.
- Query external data sources.
- Store big data in HDFS managed by SQL Server.
- Query data from multiple external data sources through the cluster.
- Use the data for AI, machine learning, and other analysis tasks.
- Deploy and run applications in Big Data Clusters.
- Virtualize data with PolyBase. Query data from external SQL Server, Oracle, Teradata, MongoDB, and ODBC data sources with external tables.
- Provide high availability for the SQL Server master instance and all databases by using Always On availability group technology.
SQL Server version
The current version of SQL Server is
The image tag for this release is
SQL Server 2019 servicing updates
For current information about SQL Server servicing updates, see https://support.microsoft.com/help/4517790.
This section explains platforms that are supported with SQL Server Big Data Clusters (BDC).
|Kubernetes||BDC requires Kubernetes version minimum 1.13. See Kubernetes version and version skew support policy for Kubernetes version support policy.|
|Azure Kubernetes Service (AKS)||BDC requires AKS version minimum 1.13.
See Supported Kubernetes versions in AKS for version support policy.
Host OS for Kubernetes
|Red Hat Enterprise Linux||7.3, 7.4, 7.5, 7.6|
||Must be same minor version as the server (same as SQL Server master instance).
|Azure Data Studio||Get the latest build of Azure Data Studio.|
SQL Server Editions
|Big Data Cluster edition is determined by the edition of SQL Server master instance. At deployment time Developer edition is deployed by default. You can change the edition after deployment. See Configure SQL Server master instance.|
Livy job submission from Azure Data Studio (ADS) or curl fail with 500 error
Issue and customer impact: In an HA configuration, Spark shared resources (sparkhead) are configured with multiple replicas. In this case, you might experience failures with Livy job submission from Azure Data Studio (ADS) or
curl. To verify,
curl to any sparkhead pod results in refused connection. For example,
curl https://sparkhead-0:8998/ or
curl https://sparkhead-1:8998 returns 500 error.
This happens in the following scenarios:
- Zookeeper pods or process for each zookeeper instance are restarted a few times.
- When networking connectivity is unreliable between Sparkhead pod and Zookeeper pods.
Workaround: Restarting both Livy servers.
kubectl -n <clustername> exec sparkhead-0 -c hadoop-livy-sparkhistory supervisorctl restart livy
kubectl -n <clustername> exec sparkhead-1 -c hadoop-livy-sparkhistory supervisorctl restart livy
Create memory optimized table when master instance in an availability group
Issue and customer impact: You cannot use the primary endpoint exposed for connecting to availability group databases (listener) to create memory optimized tables.
Workaround: To create memory optimized tables when SQL Server master instance is an availability group configuration, connect to the SQL Server instance, expose an endpoint, connect to the SQL Server database, and create the memory optimized tables in the session created with the new connection.
Insert to external tables Active Directory authentication mode
Issue and customer impact: When SQL Server master instance is in Active Directory authentication mode, a query that selects only from external tables, where at least one of the external tables is in a storage pool, and inserts into another external table, the query returns:
Msg 7320, Level 16, State 102, Line 1 Cannot execute the query "Remote Query" against OLE DB provider "SQLNCLI11" for linked server "SQLNCLI11". Only domain logins can be used to query Kerberized storage pool.
Workaround: Modify the query in one of the following ways. Either join the storage pool table to a local table, or insert into the local table first, then read from the local table to insert into the data pool.
For more information about SQL Server Big Data Clusters, see What are SQL Server 2019 Big Data Clusters?