高级搜寻查询最佳做法Advanced hunting query best practices

重要

改进的 Microsoft 365 安全中心现在可用。The improved Microsoft 365 security center is now available. 此新体验将 Defender for Endpoint、Defender for Office、365 Microsoft 365 Defender 等引入了 Microsoft 365 安全中心。This new experience brings Defender for Endpoint, Defender for Office 365, Microsoft 365 Defender, and more into the Microsoft 365 security center. 了解新增功能Learn what's new.

适用于:Applies to:

  • Microsoft 365 DefenderMicrosoft 365 Defender

应用这些建议可更快地获取结果,并避免在运行复杂查询时出现超时。Apply these recommendations to get results faster and avoid timeouts while running complex queries. 有关提高查询性能的更多指导,请参阅 Kusto 查询最佳做法For more guidance on improving query performance, read Kusto query best practices.

了解 CPU 资源配额Understand CPU resource quotas

根据大小,每个租户都有权访问分配给运行高级搜寻查询的一组 CPU 资源。Depending on its size, each tenant has access to a set amount of CPU resources allocated for running advanced hunting queries. 有关各种服务限制的详细信息, 请阅读有关高级搜寻配额和使用参数的信息For detailed information about various service limits, read about advanced hunting quotas and usage parameters.

定期运行多个查询的客户应跟踪使用情况,并应用本文中的优化指南,以最大限度地减少因超出配额或使用参数而导致的中断。Customers who run multiple queries regularly should track consumption and apply the optimization guidance in this article to minimize disruption resulting from exceeding quotas or usage parameters.

常规优化提示General optimization tips

  • 调整新查询的大小— 如果您怀疑查询将返回大型查询结果集,请首先使用 count 运算符 来评估它Size new queries—If you suspect that a query will return a large result set, assess it first using the count operator. 使用 limit 或它的同义词 take 以避免大型结果集。Use limit or its synonym take to avoid large result sets.

  • 提前 应用筛选器 — 应用时间筛选器和其他筛选器以减少数据集,尤其是在使用转换和分析函数(如子字符串 () 、替换 () 、trim () 、toupper ()parse_json () )之前Apply filters early—Apply time filters and other filters to reduce the data set, especially before using transformation and parsing functions, such as substring(), replace(), trim(), toupper(), or parse_json(). 在下面的示例中,在筛选运算符减少记录数 () 使用分析函数 extractjson 方法。In the example below, the parsing function extractjson() is used after filtering operators have reduced the number of records.

    DeviceEvents
    | where Timestamp > ago(1d)
    | where ActionType == "UsbDriveMount" 
    | where DeviceName == "user-desktop.domain.com"
    | extend DriveLetter = extractjson("$.DriveLetter", AdditionalFields)
    
  • Has beats contains-To avoid searching substrings within words unnecessarily, use the has operator instead of contains .Has beats contains—To avoid searching substrings within words unnecessarily, use the has operator instead of contains. 了解字符串运算符Learn about string operators

  • 查看特定列— 查看特定列,而不是在所有列中运行全文搜索。Look in specific columns—Look in a specific column rather than running full text searches across all columns. 请勿使用 * 检查所有列。Don't use * to check all columns.

  • 对于速度,区分 大小写 — 区分大小写的搜索更为具体,通常性能也更高。Case-sensitive for speed—Case-sensitive searches are more specific and generally more performant. 区分大小写的字符串 运算符的名称,例如 和 has_cs contains_cs ,通常以 结尾 _csNames of case-sensitive string operators, such as has_cs and contains_cs, generally end with _cs. 您还可以使用区分大小写的等于运算符 == ,而不是 =~You can also use the case-sensitive equals operator == instead of =~.

  • 分析,不要提取— 尽可能使用 分析 运算符或分析函数( 如 parse_json ())。Parse, don't extract—Whenever possible, use the parse operator or a parsing function like parse_json(). 避免 matches regex 使用字符串运算符或 extract () 函数,两者均使用正则表达式。Avoid the matches regex string operator or the extract() function, both of which use regular expression. 保留对更复杂的方案的正则表达式的使用。Reserve the use of regular expression for more complex scenarios. 阅读有关分析函数的更多信息Read more about parsing functions

  • 筛选表不是表达式-如果可以筛选表列,则不要筛选计算列。Filter tables not expressions—Don't filter on a calculated column if you can filter on a table column.

  • 无三字符术语- 避免使用带三个字符或更少字符的术语进行比较或筛选。No three-character terms—Avoid comparing or filtering using terms with three characters or fewer. 这些术语未编制索引,匹配它们将需要更多资源。These terms are not indexed and matching them will require more resources.

  • 选择性地 项目 — 仅预测所需的列,使结果更易于理解。Project selectively—Make your results easier to understand by projecting only the columns you need. 在运行联接或类似 操作之前规划 特定列还有助于提高性能。Projecting specific columns prior to running join or similar operations also helps improve performance.

优化 join 运算符Optimize the join operator

联接 运算符 通过匹配指定列中的值来合并两个表中的行。The join operator merges rows from two tables by matching values in specified columns. 应用这些提示以优化使用此运算符的查询。Apply these tips to optimize queries that use this operator.

  • 左侧较小的表- 运算符将联接语句左侧的表中的记录与右侧 join 记录相匹配。Smaller table to your left—The join operator matches records in the table on the left side of your join statement to records on the right. 通过使较小的表位于左侧,需要匹配的记录就更少,从而加快了查询速度。By having the smaller table on the left, fewer records will need to be matched, thus speeding up the query.

    在下表中,在通过帐户 SID 加入之前,我们将左表缩小为仅覆盖三台特定 DeviceLogonEvents IdentityLogonEvents 设备。In the table below, we reduce the left table DeviceLogonEvents to cover only three specific devices before joining it with IdentityLogonEvents by account SIDs.

    DeviceLogonEvents 
    | where DeviceName in ("device-1.domain.com", "device-2.domain.com", "device-3.domain.com")
    | where ActionType == "LogonFailed"
    | join
        (IdentityLogonEvents
        | where ActionType == "LogonFailed"
        | where Protocol == "Kerberos")
    on AccountSid
    
  • 使用 inner-join 风格-默认联接风格或 innerunique-join删除左表中的行,然后通过联接键将每一个匹配项的行返回到右表。Use the inner-join flavor—The default join flavor or the innerunique-join deduplicates rows in the left table by the join key before returning a row for each match to the right table. 如果左表有多个行,这些行的键值相同,则这些行将进行重复数据删除,以保留每个唯一值的一 join 个随机行。If the left table has multiple rows with the same value for the join key, those rows will be deduplicated to leave a single random row for each unique value.

    此默认行为可能会从左侧表中排除可提供有用见解的重要信息。This default behavior can leave out important information from the left table that can provide useful insight. 例如,下面的查询将只显示一封包含特定附件的电子邮件,即使该同一附件是使用多个电子邮件发送的:For example, the query below will only show one email containing a particular attachment, even if that same attachment was sent using multiple emails messages:

    EmailAttachmentInfo
    | where Timestamp > ago(1h)
    | where Subject == "Document Attachment" and FileName == "Document.pdf"
    | join (DeviceFileEvents | where Timestamp > ago(1h)) on SHA256 
    

    为了应对此限制,我们应用 了内部 联接风格,具体操作是指定在左侧表中显示所有行,其右侧具有 kind=inner 匹配值:To address this limitation, we apply the inner-join flavor by specifying kind=inner to show all rows in the left table with matching values in the right:

    EmailAttachmentInfo
    | where Timestamp > ago(1h)
    | where Subject == "Document Attachment" and FileName == "Document.pdf"
    | join kind=inner (DeviceFileEvents | where Timestamp > ago(1h)) on SHA256 
    
  • 从时间窗口 加入记录 — 调查安全事件时,分析员会查找大约在同一时间段发生的相关事件。Join records from a time window—When investigating security events, analysts look for related events that occur around the same time period. 在使用时应用相同方法还通过减少要检查的记录数 join 来提高性能。Applying the same approach when using join also benefits performance by reducing the number of records to check.

    下面的查询在接收恶意文件后 30 分钟内检查登录事件:The query below checks for logon events within 30 minutes of receiving a malicious file:

    EmailEvents
    | where Timestamp > ago(7d)
    | where ThreatTypes has "Malware"
    | project EmailReceivedTime = Timestamp, Subject, SenderFromAddress, AccountName = tostring(split(RecipientEmailAddress, "@")[0])
    | join (
    DeviceLogonEvents 
    | where Timestamp > ago(7d)
    | project LogonTime = Timestamp, AccountName, DeviceName
    ) on AccountName 
    | where (LogonTime - EmailReceivedTime) between (0min .. 30min)
    
  • 对双方应用 时间筛选器 - 即使您未调查特定时间窗口,对左侧和右侧表应用时间筛选器可以减少要检查的记录数并 join 提高性能。Apply time filters on both sides—Even if you're not investigating a specific time window, applying time filters on both the left and right tables can reduce the number of records to check and improve join performance. 下面的查询适用于 Timestamp > ago(1h) 这两个表,以便它仅联接过去一小时内的记录:The query below applies Timestamp > ago(1h) to both tables so that it joins only records from the past hour:

    EmailAttachmentInfo
    | where Timestamp > ago(1h)
    | where Subject == "Document Attachment" and FileName == "Document.pdf"
    | join kind=inner (DeviceFileEvents | where Timestamp > ago(1h)) on SHA256 
    
  • 使用性能提示— 将提示与 运算符一同使用,以指示后端在运行资源 join 密集型操作时分配负载。Use hints for performance—Use hints with the join operator to instruct the backend to distribute load when running resource-intensive operations. 详细了解加入提示Learn more about join hints

    例如,收缩提示 有助于在使用基数较高的键(具有许多唯一值的键)加入表时提高查询性能,如下面的 AccountObjectId 查询:For example, the shuffle hint helps improve query performance when joining tables using a key with high cardinality—a key with many unique values—such as the AccountObjectId in the query below:

    IdentityInfo
    | where JobTitle == "CONSULTANT"
    | join hint.shufflekey = AccountObjectId 
    (IdentityDirectoryEvents
        | where Application == "Active Directory"
        | where ActionType == "Private data retrieval")
    on AccountObjectId 
    

    左表 较小且最多包含 100,000 (且右表非常大时,广播) 会有所帮助。The broadcast hint helps when the left table is small (up to 100,000 records) and the right table is extremely large. 例如,下面的查询尝试将具有特定主题的一些电子邮件与表中包含链接的所有邮件 EmailUrlInfo 加入:For example, the query below is trying to join a few emails that have specific subjects with all messages containing links in the EmailUrlInfo table:

    EmailEvents 
    | where Subject in ("Warning: Update your credentials now", "Action required: Update your credentials now")
    | join hint.strategy = broadcast EmailUrlInfo on NetworkMessageId 
    

优化 summarize 运算符Optimize the summarize operator

summarize 运算符 聚合表的内容。The summarize operator aggregates the contents of a table. 应用这些提示以优化使用此运算符的查询。Apply these tips to optimize queries that use this operator.

  • 查找不同的 值 - 通常,使用 summarize 查找可能重复的不同的值。Find distinct values—In general, use summarize to find distinct values that can be repetitive. 使用它来聚合没有重复值的列是不必要的。It can be unnecessary to use it to aggregate columns that don't have repetitive values.

    虽然单个电子邮件可以是多个事件的一部分,但以下示例并非有效使用,因为单个电子邮件的网络邮件 summarize ID 始终带有唯一的发件人地址。While a single email can be part of multiple events, the example below is not an efficient use of summarize because a network message ID for an individual email always comes with a unique sender address.

    EmailEvents  
    | where Timestamp > ago(1h)
    | summarize by NetworkMessageId, SenderFromAddress   
    

    运算符可以轻松替换为 ,从而在消耗更少 summarize project 资源时产生可能相同的结果:The summarize operator can be easily replaced with project, yielding potentially the same results while consuming fewer resources:

    EmailEvents  
    | where Timestamp > ago(1h)
    | project NetworkMessageId, SenderFromAddress   
    

    以下示例使用效率更高,因为发件人地址向同一收件人地址发送电子邮件可能有多个 summarize 不同实例。The following example is a more efficient use of summarize because there can be multiple distinct instances of a sender address sending email to the same recipient address. 此类组合不同,并且可能具有重复项。Such combinations are less distinct and are likely to have duplicates.

    EmailEvents  
    | where Timestamp > ago(1h)
    | summarize by SenderFromAddress, RecipientEmailAddress   
    
  • 收缩查询— 虽然最好在具有重复值的列中使用,但相同的列也可以具有高基数或大量 summarize 唯一值。 Shuffle the query—While summarize is best used in columns with repetitive values, the same columns can also have high cardinality or large numbers of unique values. 与 运算符一样,还可以将收缩提示与 一起应用,以分配处理负载,并可能改进在高基数列 join summarize 上操作时的性能。Like the join operator, you can also apply the shuffle hint with summarize to distribute processing load and potentially improve performance when operating on columns with high cardinality.

    下面的查询用于计算不同的收件人电子邮件地址,这些地址可以在数十万大型 summarize 组织中运行。The query below uses summarize to count distinct recipient email address, which can run in the hundreds of thousands in large organizations. 为了提高性能,它包含了 hint.shufflekeyTo improve performance, it incorporates hint.shufflekey:

    EmailEvents  
    | where Timestamp > ago(1h)
    | summarize hint.shufflekey = RecipientEmailAddress count() by Subject, RecipientEmailAddress
    

查询方案Query scenarios

使用进程标识唯一进程标识Identify unique processes with process IDs

进程 ID (PID) 在 Windows 中回收,并重新用于新进程。Process IDs (PIDs) are recycled in Windows and reused for new processes. 它们本身不能用作特定进程的唯一标识符。On their own, they can't serve as unique identifiers for specific processes.

若要获得特定计算机上进程的唯一标识符,请使用进程 ID 以及进程创建时间。To get a unique identifier for a process on a specific machine, use the process ID together with the process creation time. 在进程之间联接或汇总数据时,请加入以下列:计算机标识符(DeviceIdDeviceName)、进程 ID(ProcessIdInitiatingProcessId)以及进程创建时间(ProcessCreationTimeInitiatingProcessCreationTimeWhen you join or summarize data around processes, include columns for the machine identifier (either DeviceId or DeviceName), the process ID (ProcessId or InitiatingProcessId), and the process creation time (ProcessCreationTime or InitiatingProcessCreationTime)

以下示例查询将查找通过端口 445 (SMB) 访问 10 个以上 IP 地址(可能扫描文件共享)的进程。The following example query finds processes that access more than 10 IP addresses over port 445 (SMB), possibly scanning for file shares.

示例查询:Example query:

DeviceNetworkEvents
| where RemotePort == 445 and Timestamp > ago(12h) and InitiatingProcessId !in (0, 4)
| summarize RemoteIPCount=dcount(RemoteIP) by DeviceName, InitiatingProcessId
InitiatingProcessCreationTime, InitiatingProcessFileName
| where RemoteIPCount > 10

该查询按 InitiatingProcessIdInitiatingProcessCreationTime 进行汇总,以便查看单个进程,而不会混用具有同一进程 ID 的多个进程。The query summarizes by both InitiatingProcessId and InitiatingProcessCreationTime so that it looks at a single process, without mixing multiple processes with the same process ID.

查询命令行Query command lines

有很多方法可以构造命令行来完成任务。There are numerous ways to construct a command line to accomplish a task. 例如,攻击者可能会引用没有路径、没有文件扩展名、使用环境变量或带引号的图像文件。For example, an attacker could reference an image file without a path, without a file extension, using environment variables, or with quotes. 攻击者还可以更改参数的顺序或添加多个引号和空格。The attacker could also change the order of parameters or add multiple quotes and spaces.

若要围绕命令行创建更持久的查询,请应用以下做法:To create more durable queries around command lines, apply the following practices:

  • 通过匹配 (字段(而不是筛选命令行本身)psexec.exe) 标识已知进程,例如net.exe或) 。 Identify the known processes (such as net.exe or psexec.exe) by matching on the file name fields, instead of filtering on the command-line itself.
  • 使用 parse_command_line () 函数分析命令行部分Parse command-line sections using the parse_command_line() function
  • 查询命令行参数时,请勿按特定顺序查找多个不相关参数的完全匹配。When querying for command-line arguments, don't look for an exact match on multiple unrelated arguments in a certain order. 而是使用正则表达式或使用多个单独的 Contains 运算符。Instead, use regular expressions or use multiple separate contains operators.
  • 使用不区分大小写的匹配项。Use case insensitive matches. 例如,使用 =~in~contains ,而不是 、 和 == in contains_csFor example, use =~, in~, and contains instead of ==, in, and contains_cs.
  • 若要缓解命令行混淆技术,请考虑删除引号、将逗号替换为空格,以及将多个连续空格替换为单个空格。To mitigate command-line obfuscation techniques, consider removing quotes, replacing commas with spaces, and replacing multiple consecutive spaces with a single space. 有一些更复杂的模糊处理技术需要其他方法,但这些调整可帮助解决常见问题。There are more complex obfuscation techniques that require other approaches, but these tweaks can help address common ones.

以下示例显示了构建查询的各种方法,该查询查找用于停止net.exe"MpsSvc"的文件: The following examples show various ways to construct a query that looks for the file net.exe to stop the firewall service "MpsSvc":

// Non-durable query - do not use
DeviceProcessEvents
| where ProcessCommandLine == "net stop MpsSvc"
| limit 10

// Better query - filters on file name, does case-insensitive matches
DeviceProcessEvents
| where Timestamp > ago(7d) and FileName in~ ("net.exe", "net1.exe") and ProcessCommandLine contains "stop" and ProcessCommandLine contains "MpsSvc" 

// Best query also ignores quotes
DeviceProcessEvents
| where Timestamp > ago(7d) and FileName in~ ("net.exe", "net1.exe")
| extend CanonicalCommandLine=replace("\"", "", ProcessCommandLine)
| where CanonicalCommandLine contains "stop" and CanonicalCommandLine contains "MpsSvc" 

从外部源中输入数据Ingest data from external sources

若要将长列表或大型表合并到查询中,请使用 externaldata 运算符从指定的 URI 引入数据。To incorporate long lists or large tables into your query, use the externaldata operator to ingest data from a specified URI. 可以从 TXT、CSV、JSON 或其他格式 的文件获取数据You can get data from files in TXT, CSV, JSON, or other formats. 以下示例演示如何利用 MalwareBazaar 文件提供的恶意软件 SHA-256 哈希的广泛列表 (abuse.ch) 检查电子邮件上的附件:The example below shows how you can utilize the extensive list of malware SHA-256 hashes provided by MalwareBazaar (abuse.ch) to check attachments on emails:

let abuse_sha256 = (externaldata(sha256_hash: string )
[@"https://bazaar.abuse.ch/export/txt/sha256/recent/"]
with (format="txt"))
| where sha256_hash !startswith "#"
| project sha256_hash;
abuse_sha256
| join (EmailAttachmentInfo 
| where Timestamp > ago(1d) 
) on $left.sha256_hash == $right.SHA256
| project Timestamp,SenderFromAddress,RecipientEmailAddress,FileName,FileType,
SHA256,ThreatTypes,DetectionMethods

分析字符串Parse strings

可以使用多种函数高效处理需要分析或转换的字符串。There are various functions you can use to efficiently handle strings that need parsing or conversion.

StringString 函数Function 用法示例Usage example
命令行Command-lines parse_command_line () parse_command_line() 提取命令和所有参数。Extract the command and all arguments.
PathsPaths parse_path () parse_path() 提取文件或文件夹路径的各个部分。Extract the sections of a file or folder path.
版本号Version numbers parse_version () parse_version() 解构一个版本号,每个节最多包含四个部分,最多八个字符。Deconstruct a version number with up to four sections and up to eight characters per section. 使用分析的数据比较版本使用期。Use the parsed data to compare version age.
IPv4 地址IPv4 addresses parse_ipv4 () parse_ipv4() 将 IPv4 地址转换为长整型。Convert an IPv4 address to a long integer. 若要比较 IPv4 地址而不转换它们,请使用 ipv4_compare () To compare IPv4 addresses without converting them, use ipv4_compare().
IPv6 地址IPv6 addresses parse_ipv6 () parse_ipv6() 将 IPv4 或 IPv6 地址转换为规范 IPv6 表示法。Convert an IPv4 or IPv6 address to the canonical IPv6 notation. 若要比较 IPv6 地址,请使用 ipv6_compare () To compare IPv6 addresses, use ipv6_compare().

若要了解所有受支持的分析函数,请阅读 Kusto 字符串函数To learn about all supported parsing functions, read about Kusto string functions.

备注

本文中的某些表在 Microsoft Defender for Endpoint 中可能不可用。Some tables in this article might not be available in Microsoft Defender for Endpoint. 打开 Microsoft 365 Defender 以使用更多数据源搜寻威胁。Turn on Microsoft 365 Defender to hunt for threats using more data sources. 你可以按照从 Microsoft Defender for Endpoint 迁移高级搜寻查询中的步骤将高级搜寻工作流从 Microsoft Defender for Endpoint移动到 Microsoft 365 Defender。You can move your advanced hunting workflows from Microsoft Defender for Endpoint to Microsoft 365 Defender by following the steps in Migrate advanced hunting queries from Microsoft Defender for Endpoint.