教學課程:使用適用於 PostgreSQL 的 Azure 資料庫 – 超大規模 (Citus) (預覽) 設計多租用戶資料庫Tutorial: design a multi-tenant database by using Azure Database for PostgreSQL – Hyperscale (Citus) (preview)

在本教學課程中,您將使用適用於 PostgreSQL 的 Azure 資料庫 – 超大規模 (Citus) (預覽) 了解如何:In this tutorial, you use Azure Database for PostgreSQL - Hyperscale (Citus) (preview) to learn how to:

  • 建立 Hyperscale (Citus) 伺服器群組Create a Hyperscale (Citus) server group
  • 使用 psql 公用程式建立結構描述Use psql utility to create a schema
  • 進行跨節點的資料表分區Shard tables across nodes
  • 內嵌範例資料Ingest sample data
  • 查詢租用戶資料Query tenant data
  • 在租用戶之間共用資料Share data between tenants
  • 自訂每一租用戶的結構描述Customize the schema per-tenant

必要條件Prerequisites

如果您沒有 Azure 訂用帳戶,請在開始前建立免費帳戶If you don't have an Azure subscription, create a free account before you begin.

登入 Azure 入口網站Sign in to the Azure portal

登入 Azure 入口網站Sign in to the Azure portal.

建立適用於 PostgreSQL 的 Azure 資料庫 – Hyperscale (Citus)Create an Azure Database for PostgreSQL - Hyperscale (Citus)

請依照下列步驟來建立「適用於 PostgreSQL 的 Azure 資料庫」伺服器:Follow these steps to create an Azure Database for PostgreSQL server:

  1. 按一下 Azure 入口網站左上角的 [建立資源] 。Click Create a resource in the upper left-hand corner of the Azure portal.

  2. 從 [新增] 頁面中選取 [資料庫] ,然後從 [資料庫] 頁面中選取 [適用於 PostgreSQL 的 Azure 資料庫] 。Select Databases from the New page, and select Azure Database for PostgreSQL from the Databases page.

  3. 針對部署選項,按一下 [Hyperscale (Citus) 伺服器群組 - 預覽] 下方的 [建立] 按鈕。For the deployment option, click the Create button under Hyperscale (Citus) server group - PREVIEW.

  4. 在新伺服器詳細資料表單中填寫下列資訊︰Fill out the new server details form with the following information:

    • 資源群組:按一下此欄位的文字方塊下方的 [新建] 連結。Resource group: click the Create new link below the text box for this field. 輸入名稱,例如 myresourcegroupEnter a name such as myresourcegroup.
    • 伺服器群組名稱:輸入新伺服器群組的唯一名稱,該名稱也會用於伺服器子網域。Server group name: enter a unique name for the new server group, which will also be used for a server subdomain.
    • 系統管理員使用者名稱:其值目前必須是 citus,且無法變更。Admin username: currently required to be the value citus, and can't be changed.
    • 密碼:長度必須至少有八個字元,且包含來自下列三種類別的字元:英文大寫字母、英文小寫字母、數字 (0-9) 和非英數字元 (!、$、#、% 等等)。Password: must be at least eight characters long and contain characters from three of the following categories – English uppercase letters, English lowercase letters, numbers (0-9), and non-alphanumeric characters (!, $, #, %, and so on.)
    • 位置:使用最接近使用者的位置,讓他們能以最快速度存取資料。Location: use the location that is closest to your users to give them the fastest access to the data.

    重要

    必須有您在此處指定的伺服器系統管理員密碼,才能登入伺服器和其資料庫。The server admin password that you specify here is required to log in to the server and its databases. 請記住或記錄此資訊,以供稍後使用。Remember or record this information for later use.

  5. 按一下 [設定伺服器群組] 。Click Configure server group. 讓該區段中的設定維持不變,然後按一下 [儲存] 。Leave the settings in that section unchanged and click Save.

  6. 按一下 [檢閱 + 建立] ,然後按一下 [建立] 以佈建伺服器。Click Review + create and then Create to provision the server. 佈建需要幾分鐘的時間。Provisioning takes a few minutes.

  7. 此頁面會重新導向以監視部署。The page will redirect to monitor deployment. 當即時狀態從 [您的部署正在進行中] 變更為 [您的部署已完成] 時,按一下頁面左側的 [輸出] 功能表項目。When the live status changes from Your deployment is underway to Your deployment is complete, click the Outputs menu item on the left of the page.

  8. [輸出] 頁面將包含協調器主機名稱,且旁邊會有按鈕可將此值複製到剪貼簿。The outputs page will contain a coordinator hostname with a button next to it to copy the value to the clipboard. 請記錄此資訊,以供後續使用。Record this information for later use.

設定伺服器層級防火牆規則Configure a server-level firewall rule

「適用於 PostgreSQL 的 Azure 資料庫 – Hyperscale (Citus) (預覽)」服務會使用伺服器層級的防火牆。The Azure Database for PostgreSQL – Hyperscale (Citus) (preview) service uses a firewall at the server-level. 根據預設,防火牆會防止所有外部應用程式和工具連線到協調器節點及其中的任何資料庫。By default, the firewall prevents all external applications and tools from connecting to the coordinator node and any databases inside. 我們必須新增規則,以針對特定 IP 位址範圍開啟防火牆。We must add a rule to open the firewall for a specific IP address range.

  1. 在您先前複製協調器節點主機名稱的 [輸出] 區段中,按一下 [上一頁] 返回 [概觀] 功能表項目。From the Outputs section where you previously copied the coordinator node hostname, click back into the Overview menu item.

  2. 尋找部署的伺服器群組名稱,然後對其按一下。Find the name of your deployment's server group and click it. (伺服器群組名稱將「不會」 有尾碼。(The server group name will not have a suffix. 舉例來說,名稱結尾為 "-c"、"-w0" 或 "-w1" 的項目不是伺服器群組)。Items with names ending in, for example, "-c", "-w0", or "-w1" are not the server group.)

  3. 按一下左側功能表中 [安全性] 下方的 [防火牆] 。Click Firewall under Security in the left-hand menu.

  4. 按一下[+ 為目前的用戶端 IP 位址新增防火牆規則] 連結。Click the link + Add firewall rule for current client IP address.

  5. 最後,按一下 [儲存] 按鈕。Finally, click the Save button.

    注意

    Azure PostgreSQL 伺服器會透過連接埠 5432 進行通訊。Azure PostgreSQL server communicates over port 5432. 如果您嘗試從公司網路內進行連線,您網路的防火牆可能不允許透過連接埠 5432 的輸出流量。If you are trying to connect from within a corporate network, outbound traffic over port 5432 may not be allowed by your network's firewall. 若情況如此,除非 IT 部門開啟連接埠 5432,否則您無法連線至 Azure SQL Database 伺服器。If so, you cannot connect to your Azure SQL Database server unless your IT department opens port 5432.

使用 psql 連線到資料庫Connect to the database using psql

當您建立 Azure Database for PostgreSQL 伺服器時,系統會建立名為 citus 的預設資料庫。When you create your Azure Database for PostgreSQL server, a default database named citus is created. 若要連線到資料庫伺服器,您需要連接字串和系統管理員密碼。To connect to your database server, you need a connection string and the admin password.

  1. 取得連接字串。Obtain the connection string. 在 [伺服器群組] 頁面中,按一下 [連接字串] 功能表項目。In the server group page click the Connection strings menu item. (其位於 [設定] 底下)。尋找標示為 C++ (libpq) 的字串。(It's under Settings.) Find the string marked C++ (libpq). 其格式將會是:It will be of the form:

    host=hostname.postgres.database.azure.com port=5432 dbname=citus user=citus password={your_password} sslmode=require
    

    複製字串。Copy the string. 您必須將 "{your_password}" 取代為您先前選擇的系統管理密碼。You will need to replace "{your_password}" with the administrative password you chose earlier. 系統不會儲存您的純文字密碼,因此無法在連接字串中對您顯示。The system doesn't store your plaintext password and so can't display it for you in the connection string.

  2. 在本機電腦上開啟終端機視窗。Open a terminal window on your local computer.

  3. 在提示字元處,使用 psql 公用程式來連線到「適用於 PostgreSQL 的 Azure 資料庫」伺服器。At the prompt, connect to your Azure Database for PostgreSQL server with the psql utility. 以引號括住來傳遞連接字串,並確定其包含密碼:Pass your connection string in quotes, being sure it contains your password:

    psql "{connection_string}"
    

    例如,下列命令會連線至伺服器群組 mydemoserver 的協調器節點:For example, the following command connects to the coordinator node of the server group mydemoserver:

    psql "host=mydemoserver-c.postgres.database.azure.com port=5432 dbname=citus user=citus password={your_password} sslmode=require"
    

使用 psql 公用程式建立結構描述Use psql utility to create a schema

使用 psql 連線到適用於 PostgreSQL 的 Azure 資料庫 - 超大規模 (Citus) (預覽) 後,您可以完成一些基本工作。Once connected to the Azure Database for PostgreSQL - Hyperscale (Citus) (preview) using psql, you can complete some basic tasks. 本教學課程會引導您建立可讓廣告商追蹤其行銷活動的 Web 應用程式。This tutorial walks you through creating a web app that allows advertisers to track their campaigns.

此應用程式可供多家公司使用,因此我們將建立一個資料表來保存公司,並且為其行銷活動建立另一個資料表。Multiple companies can use the app, so let's create a table to hold companies and another for their campaigns. 在 psql 主控台中,執行下列命令:In the psql console, run these commands:

CREATE TABLE companies (
  id bigserial PRIMARY KEY,
  name text NOT NULL,
  image_url text,
  created_at timestamp without time zone NOT NULL,
  updated_at timestamp without time zone NOT NULL
);

CREATE TABLE campaigns (
  id bigserial,
  company_id bigint REFERENCES companies (id),
  name text NOT NULL,
  cost_model text NOT NULL,
  state text NOT NULL,
  monthly_budget bigint,
  blacklisted_site_urls text[],
  created_at timestamp without time zone NOT NULL,
  updated_at timestamp without time zone NOT NULL,

  PRIMARY KEY (company_id, id)
);

每個行銷活動都會付費以執行廣告。Each campaign will pay to run ads. 完成上述程式碼後也請在 psql 中執行下列程式碼,以新增廣告的資料表:Add a table for ads too, by running the following code in psql after the code above:

CREATE TABLE ads (
  id bigserial,
  company_id bigint,
  campaign_id bigint,
  name text NOT NULL,
  image_url text,
  target_url text,
  impressions_count bigint DEFAULT 0,
  clicks_count bigint DEFAULT 0,
  created_at timestamp without time zone NOT NULL,
  updated_at timestamp without time zone NOT NULL,

  PRIMARY KEY (company_id, id),
  FOREIGN KEY (company_id, campaign_id)
    REFERENCES campaigns (company_id, id)
);

最後,我們會追蹤每個廣告的點擊率和廣告曝光數的相關統計資料:Finally, we'll track statistics about clicks and impressions for each ad:

CREATE TABLE clicks (
  id bigserial,
  company_id bigint,
  ad_id bigint,
  clicked_at timestamp without time zone NOT NULL,
  site_url text NOT NULL,
  cost_per_click_usd numeric(20,10),
  user_ip inet NOT NULL,
  user_data jsonb NOT NULL,

  PRIMARY KEY (company_id, id),
  FOREIGN KEY (company_id, ad_id)
    REFERENCES ads (company_id, id)
);

CREATE TABLE impressions (
  id bigserial,
  company_id bigint,
  ad_id bigint,
  seen_at timestamp without time zone NOT NULL,
  site_url text NOT NULL,
  cost_per_impression_usd numeric(20,10),
  user_ip inet NOT NULL,
  user_data jsonb NOT NULL,

  PRIMARY KEY (company_id, id),
  FOREIGN KEY (company_id, ad_id)
    REFERENCES ads (company_id, id)
);

現在,您可以在 psql 中執行下列命令,以查看在資料表清單中新建的資料表:You can see the newly created tables in the list of tables now in psql by running:

\dt

多租用戶應用程式只能就個別的租用戶強制執行唯一性,這正是所有主要和外部索引鍵都包含公司識別碼的原因。Multi-tenant applications can enforce uniqueness only per tenant, which is why all primary and foreign keys include the company ID.

進行跨節點的資料表分區Shard tables across nodes

超大規模部署會根據使用者指定的資料行值,在不同的節點上儲存資料表資料列。A hyperscale deployment stores table rows on different nodes based on the value of a user-designated column. 這個「散發資料行」標記會標示哪個租用戶擁有哪些資料列。This "distribution column" marks which tenant owns which rows.

我們要將散發資料行設定為 company_id,即租用戶識別碼。Let's set the distribution column to be company_id, the tenant identifier. 在 psql 中,執行下列函式:In psql, run these functions:

SELECT create_distributed_table('companies',   'id');
SELECT create_distributed_table('campaigns',   'company_id');
SELECT create_distributed_table('ads',         'company_id');
SELECT create_distributed_table('clicks',      'company_id');
SELECT create_distributed_table('impressions', 'company_id');

內嵌範例資料Ingest sample data

現在,在 psql 外部的一般命令列中,下載範例資料集:Outside of psql now, in the normal command line, download sample data sets:

for dataset in companies campaigns ads clicks impressions geo_ips; do
  curl -O https://examples.citusdata.com/mt_ref_arch/${dataset}.csv
done

回到 psql 內,進行資料的大量載入。Back inside psql, bulk load the data. 請務必在您下載資料檔案的相同目錄中執行 psql。Be sure to run psql in the same directory where you downloaded the data files.

\copy companies from 'companies.csv' with csv
\copy campaigns from 'campaigns.csv' with csv
\copy ads from 'ads.csv' with csv
\copy clicks from 'clicks.csv' with csv
\copy impressions from 'impressions.csv' with csv

這項資料現在會分散至背景工作節點。This data will now be spread across worker nodes.

查詢租用戶資料Query tenant data

當應用程式要求單一租用戶的資料時,資料庫可在單一背景工作節點上執行查詢。When the application requests data for a single tenant, the database can execute the query on a single worker node. 單一租用戶查詢會依單一租用戶識別碼進行篩選。Single-tenant queries filter by a single tenant ID. 例如,下列查詢會篩選 company_id = 5 的廣告和曝光數。For example, the following query filters company_id = 5 for ads and impressions. 請嘗試在 psql 中執行該查詢,並查看結果。Try running it in psql to see the results.

SELECT a.campaign_id,
       RANK() OVER (
         PARTITION BY a.campaign_id
         ORDER BY a.campaign_id, count(*) desc
       ), count(*) as n_impressions, a.id
  FROM ads as a
  JOIN impressions as i
    ON i.company_id = a.company_id
   AND i.ad_id      = a.id
 WHERE a.company_id = 5
GROUP BY a.campaign_id, a.id
ORDER BY a.campaign_id, n_impressions desc;

在租用戶之間共用資料Share data between tenants

到目前為止,所有資料表都已藉由 company_id 散發,但某些資料並非原本就特別「屬於」任何租用戶,而可以共用。Until now all tables have been distributed by company_id, but some data doesn't naturally "belong" to any tenant in particular, and can be shared. 例如,範例廣告平台中的所有公司可能都會想要根據 IP 位址取得其對象的地理資訊。For instance, all companies in the example ad platform might want to get geographical information for their audience based on IP addresses.

請建立資料表來保存共用的地理資訊。Create a table to hold shared geographic information. 在 psql 中執行下列命令:Run the following commands in psql:

CREATE TABLE geo_ips (
  addrs cidr NOT NULL PRIMARY KEY,
  latlon point NOT NULL
    CHECK (-90  <= latlon[0] AND latlon[0] <= 90 AND
           -180 <= latlon[1] AND latlon[1] <= 180)
);
CREATE INDEX ON geo_ips USING gist (addrs inet_ops);

接著,使 geo_ips 成為「參考資料表」,用以儲存每個背景工作節點上的資料表複本。Next make geo_ips a "reference table" to store a copy of the table on every worker node.

SELECT create_reference_table('geo_ips');

使用範例資料加以載入。Load it with example data. 請務必從您下載資料集的目錄內執行 psql 的此命令。Remember to run this command in psql from inside the directory where you downloaded the dataset.

\copy geo_ips from 'geo_ips.csv' with csv

在所有節點上,聯結點擊率資料表與 geo_ips 都有其效用。Joining the clicks table with geo_ips is efficient on all nodes. 以下聯結可用來尋找每個點擊廣告的使用者所在的位置Here is a join to find the locations of everyone who clicked on ad 290. 請嘗試在 psql 中執行查詢。Try running the query in psql.

SELECT c.id, clicked_at, latlon
  FROM geo_ips, clicks c
 WHERE addrs >> c.user_ip
   AND c.company_id = 5
   AND c.ad_id = 290;

自訂每一租用戶的結構描述Customize the schema per-tenant

每個租用戶都可能需要儲存其他人不需要的特殊資訊。Each tenant may need to store special information not needed by others. 不過,所有租用戶會共用具有相同資料庫結構描述的通用基礎結構。However, all tenants share a common infrastructure with an identical database schema. 其他資料的去處為何?Where can the extra data go?

方法之一是使用開放式資料行類型,例如 PostgreSQL 的 JSONB。One trick is to use an open-ended column type like PostgreSQL's JSONB. 我們的結構描述在 clicks 中有名為 user_data 的 JSONB 欄位。Our schema has a JSONB field in clicks called user_data. 一家公司 (例如公司五) 可使用資料行來追蹤使用者是否位於行動裝置上。A company (say company five), can use the column to track whether the user is on a mobile device.

以下查詢可找出哪些人點擊了更多項目:行動裝置或傳統訪客。Here's a query to find who clicks more: mobile, or traditional visitors.

SELECT
  user_data->>'is_mobile' AS is_mobile,
  count(*) AS count
FROM clicks
WHERE company_id = 5
GROUP BY user_data->>'is_mobile'
ORDER BY count DESC;

我們也可以建立部分索引,為單一公司最佳化此查詢。We can optimize this query for a single company by creating a partial index.

CREATE INDEX click_user_data_is_mobile
ON clicks ((user_data->>'is_mobile'))
WHERE company_id = 5;

在更廣泛的層面,我們可以建立資料行中每個索引鍵和值的 GIN 索引More generally, we can create a GIN indices on every key and value within the column.

CREATE INDEX click_user_data
ON clicks USING gin (user_data);

-- this speeds up queries like, "which clicks have
-- the is_mobile key present in user_data?"

SELECT id
  FROM clicks
 WHERE user_data ? 'is_mobile'
   AND company_id = 5;

清除資源Clean up resources

在前述步驟中,您在伺服器群組中建立了 Azure 資源。In the preceding steps, you created Azure resources in a server group. 如果您認為未來不需要這些資源,請刪除伺服器群組。If you don't expect to need these resources in the future, delete the server group. 在您伺服器群組的 [概觀] 頁面中,按一下 [刪除] 按鈕。Press the Delete button in the Overview page for your server group. 當快顯頁面上出現提示時,請確認伺服器群組的名稱,然後按一下最後一個 [刪除] 按鈕。When prompted on a pop-up page, confirm the name of the server group and click the final Delete button.

後續步驟Next steps

在本教學課程中,您已了解如何佈建 Hyperscale (Citus) 伺服器群組。In this tutorial, you learned how to provision a Hyperscale (Citus) server group. 您已使用 psql 連線到該群組、建立結構描述,並散發資料。You connected to it with psql, created a schema, and distributed data. 您已了解如何在租用戶內部和之間查詢資料,以及如何自訂每個租用戶的結構描述。You learned to query data both within and between tenants, and to customize the schema per tenant.

接下來,請了解超大規模的概念。Next, learn about the concepts of hyperscale.