Determine table and relation size in Azure Cosmos DB for PostgreSQL

APPLIES TO: Azure Cosmos DB for PostgreSQL (powered by the Citus database extension to PostgreSQL)

The usual way to find table sizes in PostgreSQL, pg_total_relation_size, drastically under-reports the size of distributed tables on Azure Cosmos DB for PostgreSQL. All this function does on a cluster is to reveal the size of tables on the coordinator node. In reality, the data in distributed tables lives on the worker nodes (in shards), not on the coordinator. A true measure of distributed table size is obtained as a sum of shard sizes. Azure Cosmos DB for PostgreSQL provides helper functions to query this information.

Function Returns
citus_relation_size(relation_name)
  • Size of actual data in table (the "main fork").
  • A relation can be the name of a table or an index.
citus_table_size(relation_name)
citus_total_relation_size(relation_name)
  • citus_table_size plus:

    • size of indices

These functions are analogous to three of the standard PostgreSQL object size functions, except if they can't connect to a node, they error out.

Example

Here's how to list the sizes of all distributed tables:

SELECT logicalrelid AS name,
       pg_size_pretty(citus_table_size(logicalrelid)) AS size
  FROM pg_dist_partition;

Output:

┌───────────────┬───────┐
│     name      │ size  │
├───────────────┼───────┤
│ github_users  │ 39 MB │
│ github_events │ 37 MB │
└───────────────┴───────┘

Next steps