您现在访问的是微软AZURE全球版技术文档网站,若需要访问由世纪互联运营的MICROSOFT AZURE中国区技术文档网站,请访问 https://docs.azure.cn.

Azure Synapse Analytics 中的无服务器 SQL 池Serverless SQL pool in Azure Synapse Analytics

每个 Azure Synapse Analytics 工作区都随附了可用于在 Lake 中查询数据的无服务器 SQL 池终结点。Every Azure Synapse Analytics workspace comes with serverless SQL pool endpoints that you can use to query data in the lake.

无服务器 SQL 池是针对数据湖中的数据运行的查询服务。Serverless SQL pool is a query service over the data in your data lake. 它可让你通过以下功能访问数据:It enables you to access your data through the following functionalities:

  • 熟悉的 T-SQL 语法,可以就地查询数据,而无需将数据复制或加载到专用存储。A familiar T-SQL syntax to query data in place without the need to copy or load data into a specialized store.
  • 通过 T-SQL 接口建立的集成连接,提供各种商业智能和即席查询工具(包括最流行的驱动程序)。Integrated connectivity via the T-SQL interface that offers a wide range of business intelligence and ad-hoc querying tools, including the most popular drivers.

无服务器 SQL 池是专为大规模数据和计算功能构建的分布式数据处理系统。Serverless SQL pool is a distributed data processing system, built for large-scale data and computational functions. 使用无服务器 SQL 池可在几秒到几分钟内分析完大数据,具体所需时间取决于工作负荷。Serverless SQL pool enables you to analyze your Big Data in seconds to minutes, depending on the workload. 得益于内置的查询执行容错机制,即使是对于涉及大型数据集的长时间运行的查询,该系统也能够提供很高的可靠性和成功率。Thanks to built-in query execution fault-tolerance, the system provides high reliability and success rates even for long-running queries involving large data sets.

无服务器 SQL 池是无服务器服务,因此不需要设置基础结构,也不需要维护群集。Serverless SQL pool is serverless, hence there's no infrastructure to setup or clusters to maintain. 此服务的默认终结点将在每个 Azure Synapse 工作区内提供,因此在创建工作区后,可以立即开始查询数据。A default endpoint for this service is provided within every Azure Synapse workspace, so you can start querying data as soon as the workspace is created.

保留的资源不产生任何费用,你只需为运行的查询所处理的数据付费。因此,此模型是真正的按用量付费模型。There is no charge for resources reserved, you are only being charged for the data processed by queries you run, hence this model is a true pay-per-use model.

如果在数据管道中使用 Apache Spark for Azure Synapse 进行数据准备、清理或扩充,可以直接从无服务器 SQL 池查询已在流程中创建的 Spark 外部表If you use Apache Spark for Azure Synapse in your data pipeline, for data preparation, cleansing or enrichment, you can query external Spark tables you've created in the process, directly from serverless SQL pool. 使用专用链接可将无服务器 SQL 池终结点整合到托管工作区 VNet 中。Use Private Link to bring your serverless SQL pool endpoint into your managed workspace VNet.

无服务器 SQL 池权益Serverless SQL pool benefits

如果需要探索 Data Lake 中的数据、从这些数据获取见解,或者要优化现有的数据转换管道,则可从使用无服务器 SQL 池获益。If you need to explore data in the data lake, gain insights from it or optimize your existing data transformation pipeline, you can benefit from using serverless SQL pool. 它适用于以下方案:It is suitable for the following scenarios:

  • 基本发现和探索 - 快速推理 Data Lake 中各种格式(Parquet、CSV、JSON)的数据,以便你可以规划好如何从这些数据提取见解。Basic discovery and exploration - Quickly reason about the data in various formats (Parquet, CSV, JSON) in your data lake, so you can plan how to extract insights from it.
  • 逻辑数据仓库 – 基于原始数据或异类数据提供关系抽象,而无需重新放置和转换数据,使你看到的数据始终是最新数据。Logical data warehouse – Provide a relational abstraction on top of raw or disparate data without relocating and transforming data, allowing always up-to-date view of your data.
  • 数据转换 - 使用 T-SQL 以简单、可缩放且高效的方式转换 Lake 中的数据,以便可将数据馈送到 BI 和其他工具,或者载入到关系数据存储(Synapse SQL 数据库、Azure SQL 数据库等)。Data transformation - Simple, scalable, and performant way to transform data in the lake using T-SQL, so it can be fed to BI and other tools, or loaded into a relational data store (Synapse SQL databases, Azure SQL Database, etc.).

不同的职业角色可以从无服务器 SQL 池获益:Different professional roles can benefit from serverless SQL pool:

  • 数据工程师可以使用此服务探索 Lake、转换和准备数据,以及简化数据转换管道。Data Engineers can explore the lake, transform and prepare data using this service, and simplify their data transformation pipelines. 有关详细信息,请查看此教程For more information, check this tutorial.
  • 得益于 OPENROWSET 和自动架构推理等功能,数据科学家可以快速推理 Lake 中数据的内容和结构。Data Scientists can quickly reason about the contents and structure of the data in the lake, thanks to features such as OPENROWSET and automatic schema inference.
  • 数据分析师可以使用熟悉的 T-SQL 语言或他们偏好的、可连接到无服务器 SQL 池的工具,探索数据科学家或数据工程师创建的数据和 Spark 外部表Data Analysts can explore data and Spark external tables created by Data Scientists or Data Engineers using familiar T-SQL language or their favorite tools, which can connect to serverless SQL pool.
  • BI 专业人员可以快速基于 Lake 中的数据创建 Power BI 报表和 Spark 表。BI Professionals can quickly create Power BI reports on top of data in the lake and Spark tables.

如何开始使用无服务器 SQL 池How to start using serverless SQL pool

每个 Azure Synapse 工作区中都会提供无服务器 SQL 池终结点。Serverless SQL pool endpoint is provided within every Azure Synapse workspace. 可以创建一个工作区,然后使用熟悉的工具立即开始查询数据。You can create a workspace and start querying data instantly using tools you are familiar with.

客户端工具Client tools

使用无服务器 SQL 池,可以将现有的 SQL 即席查询和商业智能工具连接到 Data Lake 中。Serverless SQL pool enables existing SQL ad-hoc querying and business intelligence tools to tap into the data lake. 由于它提供用户熟悉的 T-SQL 语法,任何能够建立 TDS 连接 SQL 产品/服务的工具都可以连接和查询 Synapse SQL 按需版本As it provides familiar T-SQL syntax, any tool capable to establish TDS connection SQL offerings can connect to and query Synapse SQL on-demand. 可以与 Azure Data Studio 进行连接并运行即席查询,或者与 Power BI 进行连接以快速获取见解。You can connect with Azure Data Studio and run ad-hoc queries or connect with Power BI to gain insights in a matter of minutes.

T-SQL 支持T-SQL support

无服务器 SQL 池提供 T-SQL 查询外围应用,该外围应用在某些方面略有增强/扩展,以便改善半结构化和非结构化数据的查询体验。Serverless SQL pool offers T-SQL querying surface area, which is slightly enhanced/extended in some aspects to accommodate for experiences around querying semi-structured and unstructured data. 此外,由于无服务器 SQL 池的设计,T-SQL 语言的某些方面不受支持,例如,目前不支持 DML 功能。Furthermore, some aspects of the T-SQL language aren't supported due to the design of serverless SQL pool, as an example, DML functionality is currently not supported.

  • 可以使用熟悉的概念来组织工作负荷:Workload can be organized using familiar concepts:
  • 数据库 - 无服务器 SQL 池终结点可以包含多个数据库。Databases - serverless SQL pool endpoint can have multiple databases.
  • 架构 - 数据库中可以包含一个或多个称作“架构”的对象所有权组。Schemas - Within a database, there can be one or many object ownership groups called schemas.
  • 视图Views
  • 外部资源 – 数据源、文件格式和表External resources – data sources, file formats, and tables

可通过以下方式强制实施安全性:Security can be enforced using:

  • 登录名和用户Logins and users
  • 用于控制对存储帐户的访问的凭据Credentials to control access to storage accounts
  • 在每个对象级别授予、拒绝和撤销权限Grant, deny, and revoke permissions per object level
  • Azure Active Directory 集成Azure Active Directory integration

支持的 T-SQL:Supported T-SQL:

  • 支持完整的 SELECT 外围应用,包括大部分 SQL 函数Full SELECT surface area is supported, including a majority of SQL functions
  • CETAS - CREATE EXTERNAL TABLE AS SELECTCETAS - CREATE EXTERNAL TABLE AS SELECT
  • 仅与视图和安全性相关的 DDL 语句DDL statements related to views and security only

无服务器 SQL 池没有本地存储,只有元数据对象才存储在数据库中。Serverless SQL pool has no local storage, only metadata objects are stored in databases. 因此,不支持与以下概念相关的 T-SQL:Therefore, T-SQL related to the following concepts isn't supported:

  • Tables
  • 触发器Triggers
  • 具体化视图Materialized views
  • 不与视图和安全性相关的 DDL 语句DDL statements other than ones related to views and security
  • DML 语句DML statements

扩展Extensions

为了使就地查询 Data Lake 文件中驻留的数据的体验顺畅,无服务器 SQL 池添加了以下功能,以此扩展了现有的 OPENROWSET 函数:In order to enable smooth experience for in place querying of data residing in files in data lake, serverless SQL pool extends the existing OPENROWSET function by adding following capabilities:

查询多个文件或文件夹Query multiple files or folders

PARQUET 文件格式PARQUET file format

用于处理分隔文本的其他选项(字段终止符、行终止符、转义字符)Additional options for working with delimited text (field terminator, row terminator, escape char)

读取选定的列子集Read a chosen subset of columns

架构推理Schema inference

filename 函数filename function

filepath 函数filepath function

处理复杂类型以及嵌套或重复的数据结构Work with complex types and nested or repeated data structures

安全性Security

无服务器 SQL 池提供了一些机制来保护对数据的访问。Serverless SQL pool offers mechanisms to secure access to your data.

Azure Active Directory 集成和多重身份验证Azure Active Directory integration and multi-factor authentication

借助无服务器 SQL 池,可以使用 Azure Active Directory 集成集中管理数据库用户和其他 Microsoft 服务的标识。Serverless SQL pool enables you to centrally manage identities of database user and other Microsoft services with Azure Active Directory integration. 此功能简化了权限管理,增强了安全性。This capability simplifies permission management and enhances security. Azure Active Directory (Azure AD) 支持多重身份验证 (MFA),以便在支持单一登录过程的同时提高数据和应用程序安全性。Azure Active Directory (Azure AD) supports multi-factor authentication (MFA) to increase data and application security while supporting a single sign-on process.

身份验证Authentication

无服务器 SQL 池身份验证指的是用户连接到终结点时如何证明其身份。Serverless SQL pool authentication refers to how users prove their identity when connecting to the endpoint. 支持两种类型的身份验证:Two types of authentication are supported:

  • SQL 身份验证SQL Authentication

    此身份验证方法使用用户名和密码。This authentication method uses a username and password.

  • Azure Active Directory 身份验证Azure Active Directory Authentication:

    此身份验证方法使用 Azure Active Directory 管理的标识。This authentication method uses identities managed by Azure Active Directory. 对于 Azure AD 用户,可以启用多重身份验证。For Azure AD users, multi-factor authentication can be enabled. 尽可能使用 Active Directory 身份验证(集成安全性)。Use Active Directory authentication (integrated security) whenever possible.

授权Authorization

授权指的是用户可以在无服务器 SQL 池数据库中执行哪些操作,并且授权由用户帐户的数据库角色成员身份和对象级权限控制。Authorization refers to what a user can do within a serverless SQL pool database, and is controlled by your user account's database role memberships and object-level permissions.

如果使用 SQL 身份验证,则 SQL 用户仅存在于无服务器 SQL 池中,而权限范围将限定为无服务器 SQL 池中的对象。If SQL Authentication is used, the SQL user exists only in serverless SQL pool and permissions are scoped to the objects in serverless SQL pool. 不能直接向 SQL 用户授予访问其他服务(例如 Azure 存储)中安全对象的权限,因为该用户仅存在于无服务器 SQL 池的范围内。Access to securable objects in other services (such as Azure Storage) can't be granted to SQL user directly since it only exists in scope of serverless SQL pool. SQL 用户需使用支持的授权类型之一来访问文件。The SQL user needs to use one of the supported authorization types to access the files.

如果使用 Azure AD 身份验证,则用户可以登录到无服务器 SQL 池和其他服务(例如 Azure 存储),并可以向 Azure AD 用户授予权限。If Azure AD authentication is used, a user can sign in to serverless SQL pool and other services, like Azure Storage, and can grant permissions to the Azure AD user.

访问存储帐户Access to storage accounts

登录到无服务器 SQL 池服务的用户必须获得访问和查询 Azure 存储中的文件的授权。A user that is logged into the serverless SQL pool service must be authorized to access and query the files in Azure Storage. 无服务器 SQL 池支持以下授权类型:serverless SQL pool supports the following authorization types:

  • 共享访问签名 (SAS) 提供对存储帐户中的资源的委托访问权限。Shared access signature (SAS) provides delegated access to resources in storage account. 通过 SAS,可以授予客户端对存储帐户中资源的访问权限,而无需共享帐户密钥。With a SAS, you can grant clients access to resources in storage account, without sharing account keys. 通过 SAS 可以精细控制向具有 SAS 的客户端授予的访问权限类型:有效期间隔、授予的权限、可接受的 IP 地址范围、可接受的协议 (https/http)。A SAS gives you granular control over the type of access you grant to clients who have the SAS: validity interval, granted permissions, acceptable IP address range, acceptable protocol (https/http).

  • 用户标识(也称为“直通”)是一种授权类型。在使用这种授权类型的情况下,登录到无服务器 SQL 池的 Azure AD 用户的标识会用来授予对数据的访问权限。User Identity (also known as "pass-through") is an authorization type where the identity of the Azure AD user that logged into serverless SQL pool is used to authorize access to the data. 在访问数据之前,Azure 存储管理员必须向访问数据的 Azure AD 用户授予权限。Before accessing the data, Azure Storage administrator must grant permissions to Azure AD user for accessing the data. 此授权类型使用登录到无服务器 SQL 池的 Azure AD 用户,因此不受 SQL 用户类型的支持。This authorization type uses the Azure AD user that logged into serverless SQL pool, therefore it's not supported for SQL user types.

后续步骤Next steps

可在以下文章中找到有关终结点连接和查询文件的更多信息:Additional information on endpoint connection and querying files can be found in the following articles: