快速入門:在 Azure 入口網站中建立適用於 PostgreSQL 的 Azure 資料庫 – Hyperscale (Citus) (預覽)Quickstart: Create an Azure Database for PostgreSQL - Hyperscale (Citus) (preview) in the Azure portal

Azure Database for PostgreSQL 是一種受控服務,您用來在雲端執行、管理及調整高可用性的 PostgreSQL 資料庫。Azure Database for PostgreSQL is a managed service that you use to run, manage, and scale highly available PostgreSQL databases in the cloud. 本快速入門說明如何使用 Azure 入口網站,建立適用於 PostgreSQL 的 Azure 資料庫 – Hyperscale (Citus) (預覽) 伺服器群組。This Quickstart shows you how to create an Azure Database for PostgreSQL - Hyperscale (Citus) (preview) server group using the Azure portal. 您將探索分散式資料:進行跨節點的資料表分區化、擷取範例資料,以及在多個節點上執行查詢。You'll explore distributed data: sharding tables across nodes, ingesting sample data, and running queries that execute on multiple nodes.

如果您沒有 Azure 訂用帳戶,請在開始前建立免費帳戶If you don't have an Azure subscription, create a free account before you begin.

登入 Azure 入口網站Sign in to the Azure portal

登入 Azure 入口網站Sign in to the Azure portal.

建立適用於 PostgreSQL 的 Azure 資料庫 – Hyperscale (Citus)Create an Azure Database for PostgreSQL - Hyperscale (Citus)

請依照下列步驟來建立「適用於 PostgreSQL 的 Azure 資料庫」伺服器:Follow these steps to create an Azure Database for PostgreSQL server:

  1. 按一下 Azure 入口網站左上角的 [建立資源] 。Click Create a resource in the upper left-hand corner of the Azure portal.

  2. 從 [新增] 頁面中選取 [資料庫] ,然後從 [資料庫] 頁面中選取 [適用於 PostgreSQL 的 Azure 資料庫] 。Select Databases from the New page, and select Azure Database for PostgreSQL from the Databases page.

  3. 針對部署選項,按一下 [Hyperscale (Citus) 伺服器群組 - 預覽] 下方的 [建立] 按鈕。For the deployment option, click the Create button under Hyperscale (Citus) server group - PREVIEW.

  4. 在新伺服器詳細資料表單中填寫下列資訊︰Fill out the new server details form with the following information:

    • 資源群組:按一下此欄位的文字方塊下方的 [新建] 連結。Resource group: click the Create new link below the text box for this field. 輸入名稱,例如 myresourcegroupEnter a name such as myresourcegroup.
    • 伺服器群組名稱:輸入新伺服器群組的唯一名稱,該名稱也會用於伺服器子網域。Server group name: enter a unique name for the new server group, which will also be used for a server subdomain.
    • 系統管理員使用者名稱:其值目前必須是 citus,且無法變更。Admin username: currently required to be the value citus, and can't be changed.
    • 密碼:長度必須至少有八個字元,且包含來自下列三種類別的字元:英文大寫字母、英文小寫字母、數字 (0-9) 和非英數字元 (!、$、#、% 等等)。Password: must be at least eight characters long and contain characters from three of the following categories – English uppercase letters, English lowercase letters, numbers (0-9), and non-alphanumeric characters (!, $, #, %, and so on.)
    • 位置:使用最接近使用者的位置,讓他們能以最快速度存取資料。Location: use the location that is closest to your users to give them the fastest access to the data.

    重要

    必須有您在此處指定的伺服器系統管理員密碼,才能登入伺服器和其資料庫。The server admin password that you specify here is required to log in to the server and its databases. 請記住或記錄此資訊,以供稍後使用。Remember or record this information for later use.

  5. 按一下 [設定伺服器群組] 。Click Configure server group. 讓該區段中的設定維持不變,然後按一下 [儲存] 。Leave the settings in that section unchanged and click Save.

  6. 按一下 [檢閱 + 建立] ,然後按一下 [建立] 以佈建伺服器。Click Review + create and then Create to provision the server. 佈建需要幾分鐘的時間。Provisioning takes a few minutes.

  7. 此頁面會重新導向以監視部署。The page will redirect to monitor deployment. 當即時狀態從 [您的部署正在進行中] 變更為 [您的部署已完成] 時,按一下頁面左側的 [輸出] 功能表項目。When the live status changes from Your deployment is underway to Your deployment is complete, click the Outputs menu item on the left of the page.

  8. [輸出] 頁面將包含協調器主機名稱,且旁邊會有按鈕可將此值複製到剪貼簿。The outputs page will contain a coordinator hostname with a button next to it to copy the value to the clipboard. 請記錄此資訊,以供後續使用。Record this information for later use.

設定伺服器層級防火牆規則Configure a server-level firewall rule

「適用於 PostgreSQL 的 Azure 資料庫 – Hyperscale (Citus) (預覽)」服務會使用伺服器層級的防火牆。The Azure Database for PostgreSQL – Hyperscale (Citus) (preview) service uses a firewall at the server-level. 根據預設,防火牆會防止所有外部應用程式和工具連線到協調器節點及其中的任何資料庫。By default, the firewall prevents all external applications and tools from connecting to the coordinator node and any databases inside. 我們必須新增規則,以針對特定 IP 位址範圍開啟防火牆。We must add a rule to open the firewall for a specific IP address range.

  1. 在您先前複製協調器節點主機名稱的 [輸出] 區段中,按一下 [上一頁] 返回 [概觀] 功能表項目。From the Outputs section where you previously copied the coordinator node hostname, click back into the Overview menu item.

  2. 尋找部署的伺服器群組名稱,然後對其按一下。Find the name of your deployment's server group and click it. (伺服器群組名稱將「不會」 有尾碼。(The server group name will not have a suffix. 舉例來說,名稱結尾為 "-c"、"-w0" 或 "-w1" 的項目不是伺服器群組)。Items with names ending in, for example, "-c", "-w0", or "-w1" are not the server group.)

  3. 按一下左側功能表中 [安全性] 下方的 [防火牆] 。Click Firewall under Security in the left-hand menu.

  4. 按一下[+ 為目前的用戶端 IP 位址新增防火牆規則] 連結。Click the link + Add firewall rule for current client IP address.

  5. 最後,按一下 [儲存] 按鈕。Finally, click the Save button.

    注意

    Azure PostgreSQL 伺服器會透過連接埠 5432 進行通訊。Azure PostgreSQL server communicates over port 5432. 如果您嘗試從公司網路內進行連線,您網路的防火牆可能不允許透過連接埠 5432 的輸出流量。If you are trying to connect from within a corporate network, outbound traffic over port 5432 may not be allowed by your network's firewall. 若情況如此,除非 IT 部門開啟連接埠 5432,否則您無法連線至 Azure SQL Database 伺服器。If so, you cannot connect to your Azure SQL Database server unless your IT department opens port 5432.

使用 psql 連線到資料庫Connect to the database using psql

當您建立 Azure Database for PostgreSQL 伺服器時,系統會建立名為 citus 的預設資料庫。When you create your Azure Database for PostgreSQL server, a default database named citus is created. 若要連線到資料庫伺服器,您需要連接字串和系統管理員密碼。To connect to your database server, you need a connection string and the admin password.

  1. 取得連接字串。Obtain the connection string. 在 [伺服器群組] 頁面中,按一下 [連接字串] 功能表項目。In the server group page click the Connection strings menu item. (其位於 [設定] 底下)。尋找標示為 C++ (libpq) 的字串。(It's under Settings.) Find the string marked C++ (libpq). 其格式將會是:It will be of the form:

    host=hostname.postgres.database.azure.com port=5432 dbname=citus user=citus password={your_password} sslmode=require
    

    複製字串。Copy the string. 您必須將 "{your_password}" 取代為您先前選擇的系統管理密碼。You will need to replace "{your_password}" with the administrative password you chose earlier. 系統不會儲存您的純文字密碼,因此無法在連接字串中對您顯示。The system doesn't store your plaintext password and so can't display it for you in the connection string.

  2. 在本機電腦上開啟終端機視窗。Open a terminal window on your local computer.

  3. 在提示字元處,使用 psql 公用程式來連線到「適用於 PostgreSQL 的 Azure 資料庫」伺服器。At the prompt, connect to your Azure Database for PostgreSQL server with the psql utility. 以引號括住來傳遞連接字串,並確定其包含密碼:Pass your connection string in quotes, being sure it contains your password:

    psql "{connection_string}"
    

    例如,下列命令會連線至伺服器群組 mydemoserver 的協調器節點:For example, the following command connects to the coordinator node of the server group mydemoserver:

    psql "host=mydemoserver-c.postgres.database.azure.com port=5432 dbname=citus user=citus password={your_password} sslmode=require"
    

建立和散發資料表Create and distribute tables

使用 psql 連線到 Hyperscale 協調器節點後,您可以完成一些基本工作。Once connected to the hyperscale coordinator node using psql, you can complete some basic tasks.

Hyperscale 伺服器內有三種類型的資料表:Within Hyperscale servers there are three types of tables:

  • 分散式或分區化資料表 (已分散有助於調整效能與平行處理)Distributed or sharded tables (spread out to help scaling for performance and parallelization)
  • 參考資料表 (維護多份)Reference tables (multiple copies maintained)
  • 本機資料表 (通常用於內部管理資料表)Local tables (often used for internal admin tables)

在本快速入門中,我們主要著重於分散式資料表並且熟悉它們。In this quickstart, we'll primarily focus on distributed tables and getting familiar with them.

我們即將使用的資料模型十分簡單:來自 GitHub 的使用者和事件資料。The data model we're going to work with is simple: user and event data from GitHub. 事件包括分支建立、組織相關 git 認可等等。Events include fork creation, git commits related to an organization, and more.

在您透過 psql 連線後,我們會建立我們的資料表。Once you've connected via psql, let's create our tables. 在 psql 主控台中執行:In the psql console run:

CREATE TABLE github_events
(
    event_id bigint,
    event_type text,
    event_public boolean,
    repo_id bigint,
    payload jsonb,
    repo jsonb,
    user_id bigint,
    org jsonb,
    created_at timestamp
);

CREATE TABLE github_users
(
    user_id bigint,
    url text,
    login text,
    avatar_url text,
    gravatar_id text,
    display_login text
);

github_eventspayload 欄位具有 JSONB 資料類型。The payload field of github_events has a JSONB datatype. JSONB 是 Postgres 中二進位格式的 JSON 資料類型。JSONB is the JSON datatype in binary form in Postgres. 資料類型可讓您在單一資料行中輕鬆地儲存更有彈性的結構描述。The datatype makes it easy to store a flexible schema in a single column.

Postgres 可以建立此類型的 GIN 索引,該索引將會檢索每個索引鍵和其值。Postgres can create a GIN index on this type, which will index every key and value within it. 使用索引,即可利用各種條件快速又輕鬆地查詢承載。With an index, it becomes fast and easy to query the payload with various conditions. 我們會繼續建立幾個索引,再載入我們的資料。Let's go ahead and create a couple of indexes before we load our data. 在 psql 中:In psql:

CREATE INDEX event_type_index ON github_events (event_type);
CREATE INDEX payload_index ON github_events USING GIN (payload jsonb_path_ops);

接著,我們會採用協調器節點上的 Postgres 資料表,並告知 Hyperscale 跨 背景工作角色進行這些資料表的分區化。Next we’ll take those Postgres tables on the coordinator node and tell Hyperscale to shard them across the workers. 為了這麼做,我們會針對每個資料表執行查詢,並指定要在其中進行分區化的索引鍵。To do so, we’ll run a query for each table specifying the key to shard it on. 在目前的範例中,我們會在 user_id 上將 events 和 users 資料表分區化:In the current example we’ll shard both the events and users table on user_id:

SELECT create_distributed_table('github_events', 'user_id');
SELECT create_distributed_table('github_users', 'user_id');

我們準備好載入資料。We're ready to load data. 同樣在 psql 中,使用 curl 和 -O 選項來下載檔案:In psql still, shell out to download the files:

\! curl -O https://examples.citusdata.com/users.csv
\! curl -O https://examples.citusdata.com/events.csv

接下來,將資料從檔案載入至分散式資料表:Next, load the data from the files into the distributed tables:

\copy github_events from 'events.csv' WITH CSV
\copy github_users from 'users.csv' WITH CSV

執行查詢Run queries

現在輪到有趣的部分,實際執行一些查詢。Now it's time for the fun part, actually running some queries. 我們會從簡單的 count (*) 著手,以查看我們所載入的資料量:Let's start with a simple count (*) to see how much data we loaded:

SELECT count(*) from github_events;

運作良好。That worked nicely. 我們很快會回到這種彙總,但現在讓我們看看其他幾個查詢。We'll come back to that sort of aggregation in a bit, but for now let’s look at a few other queries. JSONB payload 資料行內有不少資料,但資料會因事件類型而有所不同。Within the JSONB payload column there's a good bit of data, but it varies based on event type. PushEvent 事件包含的大小包括推送的相異認可數目。PushEvent events contain a size that includes the number of distinct commits for the push. 我們可使用它來尋找每小時的認可總數:We can use it to find the total number of commits per hour:

SELECT date_trunc('hour', created_at) AS hour,
       sum((payload->>'distinct_size')::int) AS num_commits
FROM github_events
WHERE event_type = 'PushEvent'
GROUP BY hour
ORDER BY hour;

到目前為止,查詢已獨家參與 github_events,但我們可以結合此資訊與 github_users。So far the queries have involved the github_events exclusively, but we can combine this information with github_users. 因為我們已將相同識別碼 (user_id) 上的 users 和 events 分區化,所以使用者識別碼相符的這兩個資料表上的資料列將共置於相同的資料庫節點上,並可輕鬆地聯結。Since we sharded both users and events on the same identifier (user_id), the rows of both tables with matching user IDs will be colocated on the same database nodes and can easily be joined.

如果我們在 user_id 上聯結,Hyperscale 可以將聯結執行向下推送到分區,進而以平行方式在背景工作角色節點上執行。If we join on user_id, Hyperscale can push the join execution down into shards for execution in parallel on worker nodes. 例如,讓我們找出建立最多存放庫的使用者:For example, let's find the users who created the greatest number of repositories:

SELECT login, count(*)
FROM github_events ge
JOIN github_users gu
ON ge.user_id = gu.user_id
WHERE event_type = 'CreateEvent' AND
      payload @> '{"ref_type": "repository"}'
GROUP BY login
ORDER BY count(*) DESC;

清除資源Clean up resources

在前述步驟中,您在伺服器群組中建立了 Azure 資源。In the preceding steps, you created Azure resources in a server group. 如果您認為未來不需要這些資源,請刪除伺服器群組。If you don't expect to need these resources in the future, delete the server group. 在您伺服器群組的 [概觀] 頁面中,按一下 [刪除] 按鈕。Press the Delete button in the Overview page for your server group. 當快顯頁面上出現提示時,請確認伺服器群組的名稱,然後按一下最後一個 [刪除] 按鈕。When prompted on a pop-up page, confirm the name of the server group and click the final Delete button.

後續步驟Next steps

在本快速入門中,您已了解如何佈建 Hyperscale (Citus) 伺服器群組。In this quickstart, you learned how to provision a Hyperscale (Citus) server group. 您已使用 psql 連線到該群組、建立結構描述,以及分散資料。You connected to it with psql, created a schema, and distributed data.

接下來,遵循教學課程以建置可擴充的多租用戶應用程式。Next, follow a tutorial to build scalable multi-tenant applications.