您现在访问的是微软AZURE全球版技术文档网站,若需要访问由世纪互联运营的MICROSOFT AZURE中国区技术文档网站,请访问 https://docs.azure.cn.

快速入门:在 Azure 门户中创建 Azure Database for PostgreSQL - 超大规模 (Citus)Quickstart: Create an Azure Database for PostgreSQL - Hyperscale (Citus) in the Azure portal

用于 PostgreSQL 的 Azure 数据库是一种托管服务,可用于在云中运行、管理和缩放具有高可用性的 PostgreSQL 数据库。Azure Database for PostgreSQL is a managed service that you use to run, manage, and scale highly available PostgreSQL databases in the cloud. 本快速入门介绍如何使用 Azure 门户创建 Azure Database for PostgreSQL - 超大规模 (Citus) 服务器组。This Quickstart shows you how to create an Azure Database for PostgreSQL - Hyperscale (Citus) server group using the Azure portal. 你将探究分布式数据:在节点之间将表分片、引入示例数据,以及运行在多个节点上执行的查询。You'll explore distributed data: sharding tables across nodes, ingesting sample data, and running queries that execute on multiple nodes.

如果没有 Azure 订阅,请在开始之前创建一个免费帐户。If you don't have an Azure subscription, create a free account before you begin.

登录到 Azure 门户Sign in to the Azure portal

登录 Azure 门户Sign in to the Azure portal.

创建 Azure Database for PostgreSQL - 超大规模 (Citus)Create an Azure Database for PostgreSQL - Hyperscale (Citus)

可以按照以下步骤创建用于 PostgreSQL 的 Azure 数据库:Follow these steps to create an Azure Database for PostgreSQL server:

  1. 在 Azure 门户的左上角单击“创建资源”。 Click Create a resource in the upper left-hand corner of the Azure portal.

  2. 从“新建”页中选择“数据库”,并从“数据库”页中选择“用于 PostgreSQL 的 Azure 数据库”。 Select Databases from the New page, and select Azure Database for PostgreSQL from the Databases page.

  3. 对于部署选项,请单击“Hyperscale (Citus) 服务器组”下的“创建”按钮 。For the deployment option, click the Create button under Hyperscale (Citus) server group.

  4. 使用以下信息填写“新服务器详细信息”窗体:Fill out the new server details form with the following information:

    • 资源组:单击此字段的文本框下的“新建”链接 。Resource group: click the Create new link below the text box for this field. 输入一个名称,例如 myresourcegroupEnter a name such as myresourcegroup.
    • 服务器组名称:输入新服务器组的唯一名称,该名称也将用于服务器子域。Server group name: enter a unique name for the new server group, which will also be used for a server subdomain.
    • 管理员用户名:当前必须是值 citus,并且不能更改。Admin username: currently required to be the value citus, and can't be changed.
    • 密码:长度必须至少为八个字符,且必须包含以下类别中三种类别的字符 - 英文大写字母、英文小写字母、数字 (0-9) 和非字母数字字符(!、$、#、%,等等。)Password: must be at least eight characters long and contain characters from three of the following categories – English uppercase letters, English lowercase letters, numbers (0-9), and non-alphanumeric characters (!, $, #, %, and so on.)
    • 位置:使用距离你的用户最近的位置,使用户可以最快速度访问数据。Location: use the location that is closest to your users to give them the fastest access to the data.

    重要

    登录到服务器及其数据库时需要使用在此处指定的服务器管理员密码。The server admin password that you specify here is required to log in to the server and its databases. 请牢记或记录此信息,以后会使用到它。Remember or record this information for later use.

  5. 单击“配置服务器组” 。Click Configure server group. 保留该部分的设置不变,单击“保存” 。Leave the settings in that section unchanged and click Save.

  6. 单击页面底部的“下一步: 网络 >”。Click Next : Networking > at the bottom of the screen.

  7. 在“网络” 选项卡中,单击“公共终结点” 单选按钮。In the Networking tab, click the Public endpoint radio button. 已选择公共终结点Public endpoint selected

  8. 单击链接“+ 添加当前客户端 IP 地址” 。Click the link + Add current client IP address. 已添加客户端 IPAdded client IP

    备注

    Azure PostgreSQL 服务器通过端口 5432 进行通信。Azure PostgreSQL server communicates over port 5432. 如果尝试从企业网络内部进行连接,则该网络的防火墙可能不允许经端口 5432 的出站流量。If you are trying to connect from within a corporate network, outbound traffic over port 5432 may not be allowed by your network's firewall. 如果是这样,则除非 IT 部门打开端口 5432,否则无法连接到超大规模 (Citus) 群集。If so, you cannot connect to your Hyperscale (Citus) cluster unless your IT department opens port 5432.

  9. 单击“查看 + 创建”,然后单击“创建”,预配服务器 。Click Review + create and then Create to provision the server. 预配需要数分钟。Provisioning takes a few minutes.

  10. 页面会重定向,以监视部署。The page will redirect to monitor deployment. 当实时状态从“部署正在进行”变为“部署已完成”时,单击页面左侧的“输出”菜单项 。When the live status changes from Your deployment is underway to Your deployment is complete, click the Outputs menu item on the left of the page.

  11. 输出页将包含协调器主机名,主机名旁边有一个按钮,用于将值复制到剪贴板。The outputs page will contain a coordinator hostname with a button next to it to copy the value to the clipboard. 记录此信息以供将来使用。Record this information for later use.

使用 psql 连接到数据库Connect to the database using psql

创建 Azure Database for PostgreSQL 服务器时,会创建名为 citus 的默认数据库。When you create your Azure Database for PostgreSQL server, a default database named citus is created. 若要连接到你的数据库服务器,需要具有连接字符串和管理员密码。To connect to your database server, you need a connection string and the admin password.

  1. 获取连接字符串值。Obtain the connection string. 在服务器组页面中,单击“连接字符串” 菜单项。In the server group page click the Connection strings menu item. (它位于“设置” 下。)查找标记为“psql” 的字符串。(It's under Settings.) Find the string marked psql. 它的形式如下:It will be of the form:

    psql "host=hostname.postgres.database.azure.com port=5432 dbname=citus user=citus password={your_password} sslmode=require"
    

    复制此字符串。Copy the string. 你需要将“{your_password}”替换为你之前选择的管理员密码。You will need to replace "{your_password}" with the administrative password you chose earlier. 系统不存储你的纯文本密码,因此无法在连接字符串中为你显示密码。The system doesn't store your plaintext password and so can't display it for you in the connection string.

  2. 在本地计算机上打开一个终端窗口。Open a terminal window on your local computer.

  3. 在提示符下,使用 psql 实用工具连接到你的 Azure Database for PostgreSQL 服务器。At the prompt, connect to your Azure Database for PostgreSQL server with the psql utility. 将连接字符串放在引号中传递,请确保其中包含密码:Pass your connection string in quotes, being sure it contains your password:

    psql "host=..."
    

    例如,以下命令连接到服务器组 mydemoserver 的协调器节点:For example, the following command connects to the coordinator node of the server group mydemoserver:

    psql "host=mydemoserver-c.postgres.database.azure.com port=5432 dbname=citus user=citus password={your_password} sslmode=require"
    

创建和分布表Create and distribute tables

使用 psql 连接到 Hyperscale 协调器节点后,可以完成一些基本任务。Once connected to the hyperscale coordinator node using psql, you can complete some basic tasks.

Hyperscale 服务器中有三种类型的表:Within Hyperscale servers there are three types of tables:

  • 分布式表或分片表(分散在不同的位置以帮助进行缩放,从而提高性能并便于并行化)Distributed or sharded tables (spread out to help scaling for performance and parallelization)
  • 引用表(保留多个副本)Reference tables (multiple copies maintained)
  • 本地表(通常用于内部管理表)Local tables (often used for internal admin tables)

本快速入门重点介绍分布式表,以帮助读者熟悉此类表。In this quickstart, we'll primarily focus on distributed tables and getting familiar with them.

我们要使用的数据模型非常简单:来自 GitHub 的用户和事件数据。The data model we're going to work with is simple: user and event data from GitHub. 事件包括分叉创建、组织相关的 git 提交,等等。Events include fork creation, git commits related to an organization, and more.

通过 psql 建立连接后,让我们来创建表。Once you've connected via psql, let's create our tables. 在 psql 控制台中运行:In the psql console run:

CREATE TABLE github_events
(
    event_id bigint,
    event_type text,
    event_public boolean,
    repo_id bigint,
    payload jsonb,
    repo jsonb,
    user_id bigint,
    org jsonb,
    created_at timestamp
);

CREATE TABLE github_users
(
    user_id bigint,
    url text,
    login text,
    avatar_url text,
    gravatar_id text,
    display_login text
);

github_eventspayload 字段包含 JSONB 数据类型。The payload field of github_events has a JSONB datatype. JSONB 是 Postgres 中采用二进制格式的 JSON 数据类型。JSONB is the JSON datatype in binary form in Postgres. 使用此数据类型,可以方便在单个列中存储灵活的架构。The datatype makes it easy to store a flexible schema in a single column.

Postgres 可以基于此类型创建 GIN 索引,以便为表中的每个键和值编制索引。Postgres can create a GIN index on this type, which will index every key and value within it. 通过索引可以使用各种条件快速轻松地查询有效负载。With an index, it becomes fast and easy to query the payload with various conditions. 在加载数据之前,让我们继续创建几个索引。Let's go ahead and create a couple of indexes before we load our data. 在 psql 中:In psql:

CREATE INDEX event_type_index ON github_events (event_type);
CREATE INDEX payload_index ON github_events USING GIN (payload jsonb_path_ops);

接下来,我们将这些 Postgres 表放到协调器节点上,并告知 Hyperscale 在工作器节点之间将这些表分片。Next we’ll take those Postgres tables on the coordinator node and tell Hyperscale to shard them across the workers. 为此,我们将针对每个表运行一个查询,指定将该表分片所依据的键。To do so, we’ll run a query for each table specifying the key to shard it on. 在当前示例中,我们要将 user_id 上的事件和用户表分片:In the current example we’ll shard both the events and users table on user_id:

SELECT create_distributed_table('github_events', 'user_id');
SELECT create_distributed_table('github_users', 'user_id');

现已准备好加载数据。We're ready to load data. 仍然在 psql 中,执行 shell 命令来下载文件:In psql still, shell out to download the files:

\! curl -O https://examples.citusdata.com/users.csv
\! curl -O https://examples.citusdata.com/events.csv

接下来,将数据从文件加载到分布式表中:Next, load the data from the files into the distributed tables:

SET CLIENT_ENCODING TO 'utf8';

\copy github_events from 'events.csv' WITH CSV
\copy github_users from 'users.csv' WITH CSV

运行查询Run queries

接下来的操作比较有趣,但实际上只是运行几个查询。Now it's time for the fun part, actually running some queries. 让我们从一个简单的 count (*) 着手,以查看加载的数据量:Let's start with a simple count (*) to see how much data we loaded:

SELECT count(*) from github_events;

查询非常顺利。That worked nicely. 稍后我们会花点时间来了解聚合,但我们先看看其他几个查询。We'll come back to that sort of aggregation in a bit, but for now let’s look at a few other queries. JSONB payload 列中包含许多数据,但具体的数据根据事件类型而异。Within the JSONB payload column there's a good bit of data, but it varies based on event type. PushEvent 事件包含一个大小,它反映了推送操作的非重复提交次数。PushEvent events contain a size that includes the number of distinct commits for the push. 我们可以使用该大小找到每小时的提交总次数:We can use it to find the total number of commits per hour:

SELECT date_trunc('hour', created_at) AS hour,
       sum((payload->>'distinct_size')::int) AS num_commits
FROM github_events
WHERE event_type = 'PushEvent'
GROUP BY hour
ORDER BY hour;

到目前为止,查询只是涉及到 github_events,但我们可将这些信息与 github_users 合并。So far the queries have involved the github_events exclusively, but we can combine this information with github_users. 由于我们基于同一个标识符 (user_id) 分片了用户和事件,因此,这两个表中具有匹配用户 ID 的行将共置到同一个数据库节点,并可以轻松联接。Since we sharded both users and events on the same identifier (user_id), the rows of both tables with matching user IDs will be colocated on the same database nodes and can easily be joined.

如果在 user_id 上联接,则 Hyperscale 可将联接执行下推到分片中,以便在工作器节点上并行执行。If we join on user_id, Hyperscale can push the join execution down into shards for execution in parallel on worker nodes. 例如,让我们查找创建了最多存储库的用户:For example, let's find the users who created the greatest number of repositories:

SELECT gu.login, count(*)
  FROM github_events ge
  JOIN github_users gu
    ON ge.user_id = gu.user_id
 WHERE ge.event_type = 'CreateEvent'
   AND ge.payload @> '{"ref_type": "repository"}'
 GROUP BY gu.login
 ORDER BY count(*) DESC;

清理资源Clean up resources

在前面的步骤中,你已在服务器组中创建了 Azure 资源。In the preceding steps, you created Azure resources in a server group. 如果你认为以后不需要这些资源,请删除该服务器组。If you don't expect to need these resources in the future, delete the server group. 在服务器组的“概述”页中,按“删除”按钮 。Press the Delete button in the Overview page for your server group. 弹出页面上出现提示时,请确认服务器组的名称,然后单击最后一个“删除”按钮 。When prompted on a pop-up page, confirm the name of the server group and click the final Delete button.

后续步骤Next steps

在本快速入门中,你已了解如何预配 Hyperscale (Citus) 服务器组。In this quickstart, you learned how to provision a Hyperscale (Citus) server group. 你已使用 psql 连接到该组,创建了架构并分布了数据。You connected to it with psql, created a schema, and distributed data.

接下来,请遵循以下教程生成可缩放的多租户应用程序。Next, follow a tutorial to build scalable multi-tenant applications.