您现在访问的是微软AZURE全球版技术文档网站,若需要访问由世纪互联运营的MICROSOFT AZURE中国区技术文档网站,请访问 https://docs.azure.cn.

智能见解:使用 AI 监视数据库性能并对其进行故障排除Intelligent Insights using AI to monitor and troubleshoot database performance

使用 Azure SQL 数据库智能见解可以了解 SQL 数据库和托管实例数据库的性能情况。Azure SQL Database Intelligent Insights lets you know what is happening with your SQL Database and Managed Instance database performance.

Intelligent Insights 使用内置智能,通过人工智能持续监视数据库使用情况,并检测导致性能不佳的干扰性事件。Intelligent Insights uses built-in intelligence to continuously monitor database usage through artificial intelligence and detect disruptive events that cause poor performance. 检测后,将执行详细分析,并生成包含对问题的智能评估的诊断日志。Once detected, a detailed analysis is performed that generates a diagnostics log with an intelligent assessment of the issue. 此评估包含对数据库性能问题的根本原因分析,以及为性能改进而提供的可行性建议。This assessment consists of a root cause analysis of the database performance issue and, where possible, recommendations for performance improvements.

智能见解能为你做什么?What can Intelligent Insights do for you

Intelligent Insights 是 Azure 内置智能的一项独特功能,提供以下功能价值:Intelligent Insights is a unique capability of Azure built-in intelligence that provides the following value:

  • 主动监视Proactive monitoring
  • 定制的性能见解Tailored performance insights
  • 数据库性能下降的早期检测Early detection of database performance degradation
  • 已检测出问题的根本原因分析Root cause analysis of issues detected
  • 性能改进建议Performance improvement recommendations
  • 数以十万计的数据库上的横向扩展功能Scale out capability on hundreds of thousands of databases
  • 对 DevOps 资源和总拥有成本的积极影响Positive impact to DevOps resources and the total cost of ownership

智能见解的工作原理How does Intelligent Insights work

Intelligent Insights 可分析数据库性能,方法是比较前一个小时的数据库工作负荷和前七天的基线工作负荷。Intelligent Insights analyzes database performance by comparing the database workload from the last hour with the past seven-day baseline workload. 数据库工作负荷由确定为对数据库性能最为重要的查询(例如重复最多和最大的查询)组成。Database workload is composed of queries determined to be the most significant to the database performance, such as the most repeated and largest queries. 由于每个数据库基于其结构、数据、使用情况和应用程序都是唯一的,因此每个生成的工作负荷基线对于单个实例而言都具有特定性和唯一性。Because each database is unique based on its structure, data, usage, and application, each workload baseline that is generated is specific and unique to an individual instance. Intelligent Insights 因为独立于工作负荷基线,还可监视绝对操作阈值,并检测过长的等待时间问题、关键异常和查询参数化问题,这些问题可能会对性能造成影响。Intelligent Insights, independent of the workload baseline, also monitors absolute operational thresholds and detects issues with excessive wait times, critical exceptions, and issues with query parameterizations that might affect performance.

在使用人工智能根据多个观察的指标检测到性能降低问题后,将执行分析。After a performance degradation issue is detected from multiple observed metrics by using artificial intelligence, analysis is performed. 此外还会生成包含数据库状况的智能见解的诊断日志。A diagnostics log is generated with an intelligent insight on what is happening with your database. Intelligent Insights 可对数据库性能问题从其首次出现到解决全程轻松展开跟踪。Intelligent Insights makes it easy to track the database performance issue from its first appearance until resolution. 从初始问题检测和性能改进验证到完成,跟踪每个检测到的问题的整个生命周期。Each detected issue is tracked through its lifecycle from initial issue detection and verification of performance improvement to its completion.

数据库性能分析工作流

用于衡量和检测数据库性能问题的指标基于查询持续时间、超时请求、过长的等待时间和出错的请求制定。The metrics used to measure and detect database performance issues are based on query duration, timeout requests, excessive wait times, and errored requests. 有关指标的详细信息,请参阅本文档的检测指标部分。For more information on metrics, see the Detection metrics section of this document.

识别的 SQL 数据库性能降低问题记录在诊断日志中,包含由以下属性组成的智能项:Identified SQL Database performance degradations are recorded in the diagnostics log with intelligent entries that consist of the following properties:

属性Property 详细信息Details
数据库信息Database information 关于在其上检测到见解的数据库的元数据,例如资源 URI。Metadata about a database on which an insight was detected, such as a resource URI.
观察的时间范围Observed time range 检测到的见解时段的开始和结束时间。Start and end time for the period of the detected insight.
受影响的指标Impacted metrics 导致生成某个见解的指标:Metrics that caused an insight to be generated:
  • 查询持续时间增加 [秒]。Query duration increase [seconds].
  • 过长等待 [秒]。Excessive waiting [seconds].
  • 超时的请求 [百分比]。Timed-out requests [percentage].
  • 出错的请求 [百分比]。Errored-out requests [percentage].
影响值Impact value 某个指标测量出的值。Value of a metric measured.
受影响的查询和错误代码Impacted queries and error codes 查询哈希或错误代码。Query hash or error code. 这些属性可用于轻松关联到受影响的查询。These can be used to easily correlate to affected queries. 提供包括查询持续时间增加、等待时间、超时计数或错误代码的指标。Metrics that consist of either query duration increase, waiting time, timeout counts, or error codes are provided.
检测Detections 在数据库中发生事件时确定的检测。Detection identified at the database during the time of an event. 有 15 种检测模式。There are 15 detection patterns. 有关详细信息,请参阅使用 Intelligent Insights 排查数据库性能问题For more information, see Troubleshoot database performance issues with Intelligent Insights.
根本原因分析Root cause analysis 对已识别问题的根本原因分析采用人类可读的格式。Root cause analysis of the issue identified in a human-readable format. 一些见解可能包含可行的性能改进建议。Some insights might contain a performance improvement recommendation where possible.

有关将 Intelligent Insights 与 Azure SQL Analytics 配合使用的实践概述和典型使用方案,请观看嵌入视频:For a hands-on overview on using Intelligent Insights with Azure SQL Analytics and for typical usage scenarios, see the embedded video:

Intelligent Insights 在发现和排查 SQL 数据库性能问题方面出类拔萃。Intelligent Insights shines in discovering and troubleshooting SQL Database performance issues. 若要使用智能见解排查 SQL 数据库和托管实例数据库的性能问题,请参阅使用智能见解解决 Azure SQL 数据库性能问题In order to use Intelligent Insights to troubleshoot SQL Database and Managed Instance database performance issues, see Troubleshoot Azure SQL Database performance issues with Intelligent Insights.

智能见解选项Intelligent Insights options

可在 Azure SQL 数据库中使用智能见解选项包括:Intelligent Insights options available in Azure SQL Database are:

智能见解选项Intelligent Insights option 单一数据库和共用数据库支持Single database and pooled database support 实例数据库支持Instance database support
配置智能见解-配置数据库智能见解分析。Configure Intelligent Insights - Configure Intelligent Insights analysis for your databases. Yes Yes
将见解流式处理到 Azure SQL Analytics --流式处理深入了解 Azure SQL 数据库的 Azure SQL Analytics 监视解决方案。Stream insights to Azure SQL Analytics -- Stream insights to Azure SQL Analytics monitoring solution for Azure SQL Database. Yes Yes
将见解流式传输到事件中心-将见解流式处理到事件中心,以便进行进一步的自定义集成。Stream insights to Event Hub - Stream insights to Event Hubs for further custom integrations. Yes Yes
将见解流式传输到 Azure 存储-将见解流式传输到 azure 存储,以便进行进一步分析和长期存档。Stream insights to Azure Storage - Stream insights to Azure Storage for further analysis and long term archival. Yes Yes

配置智能见解Configure Intelligent Insights

智能见解的输出是一种智能性能诊断日志。Output of the Intelligent Insights is a smart performance diagnostics log. 此日志可以通过多种方式使用 - 可以流式传输到 Azure SQL Analytics、Azure 事件中心和 Azure 存储,或者第三方产品。This log can be consumed in several ways - through streaming it to Azure SQL Analytics, Azure Event Hubs and Azure storage, or a third party product.

  • 将此产品与 Azure SQL Analytics 配合使用,即可通过 Azure 门户的用户界面来查看见解。Use the product with Azure SQL Analytics to view insights through the user interface of the Azure portal. 这是集成式 Azure 解决方案,是用于查看见解的最典型方式。This is the integrated Azure solution, and the most typical way to view insights.
  • 将此产品与 Azure 事件中心配合使用,可以开发自定义监视和警报方案Use the product with Azure Event Hubs for development of custom monitoring and alerting scenarios
  • 将此产品与 Azure 存储配合使用,可以进行自定义应用程序开发,例如自定义报告、长期数据存档,等等。Use the product with Azure storage for custom application development, such are for example custom reporting, long-term data archival and so forth.

Intelligent Insights 与其他产品(Azure SQL Analytics、Azure 事件中心、Azure 存储或第三方消费产品)的集成是这样执行的:先在数据库的“诊断设置”边栏选项卡中启用 Intelligent Insights 日志记录(“SQLInsights”日志),然后配置要流式传输到这些产品之一的 Intelligent Insights 日志数据。Integration of Intelligent Insights with other products Azure SQL Analytics, Azure Event Hub, Azure storage, or third party products for consumption is performed through first enabling Intelligent Insights logging (the "SQLInsights" log) in the Diagnostic settings blade of a database, and then configuring Intelligent Insights log data to be streamed into one of these products.

有关如何启用 Intelligent Insights 日志记录以及如何配置要流式传输到消费产品的日志数据的详细信息,请参阅 Azure SQL 数据库指标和诊断日志记录For more information on how to enable Intelligent Insights logging and to configure log data to be streamed to a consuming product, see Azure SQL Database metrics and diagnostics logging.

通过 Azure SQL Analytics 进行设置Set up with Azure SQL Analytics

Azure SQL Analytics 解决方案提供图形用户界面、有关数据库性能的报告和警报功能,以及智能见解诊断日志数据。Azure SQL Analytics solution provides graphical user interface, reporting and alerting capabilities on database performance, along with the Intelligent Insights diagnostics log data.

提示

快速入门:若要完成智能见解的入门,最容易的方式是将它与 Azure SQL Analytics 配合使用,后者提供一个图形用户界面来显示数据库性能问题。Quick getting started: The easiest way to get off the ground with using Intelligent Insights is to use it along with Azure SQL Analytics which will provide a graphical user interface to database performance issues. 从市场添加 Azure SQL Analytics 解决方案,在此解决方案中创建一个工作区,然后在数据库的“诊断设置”边栏选项卡中针对每个要在其上启用智能见解的数据库配置“SQLInsights”日志的流式传输,以便将其传输到 Azure SQL Analytics 的工作区。Add Azure SQL Analytics solution from the marketplace, create a workspace inside this solution, and then for each database you wish to enable Intelligent Insights on, configure streaming of "SQLInsights" log in the Diagnostics settings blade of a database to the workspace of Azure SQL Analytics.

先决条件是将 Azure SQL Analytics 从市场添加到 Azure 门户仪表板,并创建一个工作区,详见配置 Azure SQL AnalyticsPre-requirement is to have Azure SQL Analytics added to your Azure portal dashboard from the marketplace and to create a workspace, see configure Azure SQL Analytics

若要将智能见解与 Azure SQL Analytics 配合使用,请配置要流式传输到 Azure SQL Analytics 工作区(已在上一步创建)的智能见解日志数据,详见 Azure SQL 数据库指标和诊断日志记录To use Intelligent Insights with Azure SQL Analytics, configure Intelligent Insights log data to be streamed to Azure SQL Analytics workspace you've created in the previous step, see Azure SQL Database metrics and diagnostics logging.

以下示例显示了通过 Azure SQL Analytics 查看的智能见解:The following example shows an Intelligent Insights viewed through Azure SQL Analytics:

智能见解报告

通过事件中心进行设置Set up with Event Hubs

若要将智能见解与事件中心配合使用,配置要流式传输到事件中心的智能见解日志数据,请参阅将 Azure 诊断日志流式传输到事件中心To use Intelligent Insights with Event Hubs, configure Intelligent Insights log data to be streamed to Event Hubs, see Stream Azure diagnostics logs to Event Hubs.

若要使用事件中心设置自定义监视和警报,请参阅如何在事件中心处理指标和诊断日志To use Event Hubs to setup custom monitoring and alerting, see What to do with metrics and diagnostics logs in Event Hubs.

通过 Azure 存储进行设置Set up with Azure Storage

若要将 Intelligent Insights 与存储配合使用,配置要流式传输到存储的 Intelligent Insights 日志数据,请参阅流式传输到 Azure 存储To use Intelligent Insights with Storage, configure Intelligent Insights log data to be streamed to Storage, see Stream into Azure Storage.

Intelligent Insights 日志的自定义集成Custom integrations of Intelligent Insights log

若要将 Intelligent Insights 与第三方工具配合使用或用于自定义警报和监视开发,请参阅使用 Intelligent Insights 数据库性能诊断日志To use Intelligent Insights with third party tools, or for custom alerting and monitoring development, see Use the Intelligent Insights database performance diagnostics log.

检测指标Detection metrics

用于生成 Intelligent Insights 的检测模型的指标基于监视以下内容得出:Metrics used for detection models that generate Intelligent Insights are based on monitoring:

  • 查询持续时间Query duration
  • 超时请求Timeout requests
  • 过长的等待时间Excessive wait time
  • 出错的请求Errored out requests

在检测数据库工作负载性能问题时,查询持续时间和超时请求将用作主要模型。Query duration and timeout requests are used as primary models in detecting issues with database workload performance. 这是因为它们直接测量工作负荷发生的情况。They're used because they directly measure what is happening with the workload. 为检测工作负荷性能降低的所有可能情况,过长的等待时间和出错的请求将用作附加模型,以指出影响工作负荷性能的问题。To detect all possible cases of workload performance degradation, excessive wait time and errored-out requests are used as additional models to indicate issues that affect the workload performance.

系统会自动将工作负荷更改和对数据库所做的查询请求数更改纳入考虑,以动态决定正常和异常的数据库性能阈值。The system automatically considers changes to the workload and changes in the number of query requests made to the database to dynamically determine normal and out-of-the-ordinary database performance thresholds.

所有指标都将通过科学派生的数据模型在各种关系中纳入考虑,并且这种模型会对每个检测到的性能问题进行归类。All of the metrics are considered together in various relationships through a scientifically derived data model that categorizes each performance issue detected. 通过智能见解提供的信息包括:Information provided through an intelligent insight includes:

  • 检测到的性能问题的详细信息。Details of the performance issue detected.
  • 检测到的问题的根本原因分析。A root cause analysis of the issue detected.
  • 在可能的情况下如何改进所监视 SQL 数据库性能的建议。Recommendations on how to improve the performance of the monitored SQL database, where possible.

查询持续时间Query duration

查询持续时间下降模型分析单独的查询,并检测与性能基线相比,编译和执行查询增加的时间。The query duration degradation model analyzes individual queries and detects the increase in the time it takes to compile and execute a query compared to the performance baseline.

如果 SQL 数据库内置智能检测到查询编译或查询执行时间显著增加并影响工作负荷性能,这些查询将标记为出现查询持续时间性能降低问题。If SQL Database built-in intelligence detects a significant increase in query compile or query execution time that affects workload performance, these queries are flagged as query duration performance degradation issues.

Intelligent Insights 诊断日志会输出性能降低的查询的查询哈希。The Intelligent Insights diagnostics log outputs the query hash of the query degraded in performance. 查询哈希指出性能降低是否与查询编译或执行时间增加(使查询持续时间增加)有关。The query hash indicates whether the performance degradation was related to query compile or execution time increase, which increased query duration time.

超时请求Timeout requests

超时请求下降模型分析单独的查询,并在查询执行级别检测任何超时增加情况,以及与性能基线持续时间相比,数据库级别的总体请求超时。The timeout requests degradation model analyzes individual queries and detects any increase in timeouts at the query execution level and the overall request timeouts at the database level compared to the performance baseline period.

某些查询甚至在到达执行阶段之前就可能超时。Some of the queries might time out even before they reach the execution stage. 通过比较中止的辅助角色数和所提请求数的方式,SQL 数据库内置智能还测量和分析到达数据库的所有查询,而不管它们有没有到达执行阶段。Through the means of aborted workers vs. requests made, SQL Database built-in intelligence measures and analyzes all queries that reached the database whether they got to the execution stage or not.

在执行查询超时数或中止的请求辅助角色数超出系统管理阈值后,使用智能见解填充诊断日志。After the number of timeouts for executed queries or the number of aborted request workers crosses the system-managed threshold, a diagnostics log is populated with intelligent insights.

生成的见解包含超时请求数和超时查询数。The insights generated contain the number of timed-out requests and the number of timed-out queries. 指示性能降低是否与执行阶段的超时增加有关,或提供总体数据库级别。Indication of the performance degradation is related to timeout increase at the execution stage, or the overall database level is provided. 当系统认为超时增加对数据库性能至关重要时,这些查询将标记为出现超时性能降低问题。When the increase in timeouts is deemed significant to database performance, these queries are flagged as timeout performance degradation issues.

过长的等待时间Excessive wait times

过长的等待时间模型监视单个数据库查询。The excessive wait time model monitors individual database queries. 它会检测高出系统管理的绝对阈值的异常查询等待统计信息。It detects unusually high query wait stats that crossed the system-managed absolute thresholds. 以下查询过长等待时间指标使用新的 SQL Server 功能(“查询存储等待统计信息”(sys.query_store_wait_stats))进行观测:The following query excessive wait-time metrics are observed by using the new SQL Server feature, Query Store Wait Stats (sys.query_store_wait_stats):

  • 达到资源限制Reaching resource limits
  • 达到弹性池资源上限Reaching elastic pool resource limits
  • 过多的辅助角色或会话线程数Excessive number of worker or session threads
  • 过多的数据库锁定Excessive database locking
  • 内存压力Memory pressure
  • 其他等待统计信息Other wait stats

达到资源限制或弹性池资源限制表示某一订阅或弹性池中的可用资源消耗已超出绝对阈值。Reaching resource limits or elastic pool resource limits denote that consumption of available resources on a subscription or in the elastic pool crossed absolute thresholds. 这些统计信息指示工作负荷性能降低。These stats indicate workload performance degradation. 过多的辅助角色或会话线程数表示如下情况:启动的工作线程数或会话数超出绝对阈值。An excessive number of worker or session threads denotes a condition in which the number of worker threads or sessions initiated crossed absolute thresholds. 这些统计信息指示工作负荷性能降低。These stats indicate workload performance degradation.

过多的数据库锁定表示如下情况:数据库锁定计数超出绝对阈值。Excessive database locking denotes a condition in which the count of locks on a database has crossed absolute thresholds. 此状态指示工作负荷性能降低。This stat indicates a workload performance degradation. 内存压力表示如下情况:请求内存授予的线程数超出绝对阈值。Memory pressure is a condition in which the number of threads requesting memory grants crossed an absolute threshold. 此状态指示工作负荷性能降低。This stat indicates a workload performance degradation.

其他等待统计信息指示如下情况:通过查询存储等待统计信息测量的其他指标超出绝对阈值。Other wait stats detection indicates a condition in which miscellaneous metrics measured through the Query Store Wait Stats crossed an absolute threshold. 这些统计信息指示工作负荷性能降低。These stats indicate workload performance degradation.

检测到过长的等待时间后,根据可用数据的情况,Intelligent Insights 诊断日志将输出性能降低正在影响和已经影响的查询的哈希,以及导致查询在执行和测量等待时间中等待的指标的详细信息。After excessive wait times are detected, depending on the data available, the Intelligent Insights diagnostics log outputs hashes of the affecting and affected queries degraded in performance, details of the metrics that cause queries to wait in execution, and measured wait time.

出错的请求Errored requests

出错的请求下降模型监视单独的查询,并检测与基线持续时间相比,增加的出错查询数。The errored requests degradation model monitors individual queries and detects an increase in the number of queries that errored out compared to the baseline period. 此模型还监视超出由 SQL 数据库内置智能管理的绝对阈值的关键异常。This model also monitors critical exceptions that crossed absolute thresholds managed by SQL Database built-in intelligence. 系统会自动考虑向数据库提出的查询请求数,并在受监视时期说明任何工作负荷更改。The system automatically considers the number of query requests made to the database and accounts for any workload changes in the monitored period.

当系统认为测量出的出错请求数增加(与提出的总请求数有关)对工作负荷性能至关重要时,受影响的查询将标记为具有出错请求的性能降低问题。When the measured increase in errored requests relative to the overall number of requests made is deemed significant to workload performance, affected queries are flagged as errored requests performance degradation issues.

Intelligent Insights 日志输出出错请求计数。The Intelligent Insights log outputs the count of errored requests. 它会指示性能降低是否与出错请求数增加或超出受监视的关键异常阈值有关,以及性能降低的测量时间。It indicates whether the performance degradation was related to an increase in errored requests or to crossing a monitored critical exception threshold and measured time of the performance degradation.

在任何受监视的关键异常超出系统管理的绝对阈值的情况下,会使用关键异常详细信息生成智能见解。If any of the monitored critical exceptions cross the absolute thresholds managed by the system, an intelligent insight is generated with critical exception details.

后续步骤Next steps