APPROX_COUNT_DISTINCT (Transact-SQL)APPROX_COUNT_DISTINCT (Transact-SQL)

适用对象: noSQL ServeryesAzure SQL 数据库yesAzure SQL 数据仓库yes并行数据仓库APPLIES TO: noSQL Server yesAzure SQL Database yesAzure SQL Data Warehouse yesParallel Data Warehouse

此函数返回组中唯一非空值的近似数。This function returns the approximate number of unique non-null values in a group.

主题链接图标 TRANSACT-SQL 语法约定Topic link icon Transact-SQL Syntax Conventions

备注

APPROX_COUNT_DISTINCT 是公共预览版功能。APPROX_COUNT_DISTINCT is a public preview feature.

语法Syntax

-- Syntax for Azure SQL Database, Azure SQL Data Warehouse and Parallel Data Warehouse  

APPROX_COUNT_DISTINCT ( expression )   

参数Arguments

expressionexpression
任意类型(“image” 、“sql_variant” 、“ntext” 或“text” 除外)的 表达式An expression of any type, except image, sql_variant, ntext, or text.

返回类型Return types

bigintbigint

RemarksRemarks

APPROX_COUNT_DISTINCT( expression ) 计算组中每行的表达式,并返回组中唯一非空值的近似数。APPROX_COUNT_DISTINCT( expression ) evaluates an expression for each row in a group, and returns the approximate number of unique non-null values in a group. 此函数旨在跨响应速度比绝对精度更为关键的大型数据集进行聚合。This function is designed to provide aggregations across large data sets where responsiveness is more critical than absolute precision.

APPROX_COUNT_DISTINCT 专用于大数据方案,更适合以下情形:APPROX_COUNT_DISTINCT is designed for use in big data scenarios and is optimized for the following conditions:

  • 访问包含数百万行或更多行的数据集,且 Access of data sets that are millions of rows or higher and
  • 聚合包含多个非重复值的一个或多个列Aggregation of a column or columns that have many distinct values

此函数实现可保证最多 2% 的错误率,概率在 97% 内。The function implementation guarantees up to a 2% error rate within a 97% probability.

与详尽 COUNT DISTINCT 操作相比,APPROX_COUNT_DISTINCT 需要的内存更少。APPROX_COUNT_DISTINCT requires less memory than an exhaustive COUNT DISTINCT operation. 与精确 COUNT DISTINCT 操作相比,鉴于 APPROX_COUNT_DISTINCT 占用的内存更少,因此它不太可能会将内存溢出到磁盘。Given the smaller memory footprint, APPROX_COUNT_DISTINCT is less likely to spill memory to disk compared to a precise COUNT DISTINCT operation. 若要详细了解用于实现此目的的算法,请参阅 HyperLogLogTo learn more about the algorithm used to achieve this, see HyperLogLog.

备注

使用排序规则敏感字符串,APPROX_COUNT_DISTINCT 的公共预览版使用二进制匹配,生成的结果与在有 BIN 排序规则(而不是 BIN2)的情况下一致。With collation sensitive strings, the Public Preview version of APPROX_COUNT_DISTINCT uses a binary match and provides results that would have been generated in the presence of BIN collations and not BIN2.

示例Examples

A.A. 使用 APPROX_COUNT_DISTINCTUsing APPROX_COUNT_DISTINCT

此示例返回订单表中不同订单键的近似数。This example returns the approximate number of different order keys from the orders table.

SELECT APPROX_COUNT_DISTINCT(O_OrderKey) AS Approx_Distinct_OrderKey
FROM dbo.Orders;

下面是结果集:Here is the result set.

Approx_Distinct_OrderKey
------------------------
15164704

B.B. 结合使用 APPROX_COUNT_DISTINCT 和 GROUP BYUsing APPROX_COUNT_DISTINCT with GROUP BY

此示例按订单状态返回订单表中不同订单键的近似数。This example returns the approximate number of different order keys by order status from the orders table.

SELECT O_OrderStatus, APPROX_COUNT_DISTINCT(O_OrderKey) AS Approx_Distinct_OrderKey
FROM dbo.Orders
GROUP BY O_OrderStatus
ORDER BY O_OrderStatus; 

下面是结果集:Here is the result set.

O_OrderStatus                                                    Approx_Distinct_OrderKey
---------------------------------------------------------------- ------------------------
F                                                                7397838
O                                                                7387803
P                                                                388036

另请参阅See also

聚合函数 (Transact-SQL)Aggregate Functions (Transact-SQL)
COUNT (Transact-SQL)COUNT (Transact-SQL)