您现在访问的是微软AZURE全球版技术文档网站,若需要访问由世纪互联运营的MICROSOFT AZURE中国区技术文档网站,请访问 https://docs.azure.cn.

从 CouchBase 迁移到 Azure Cosmos DB SQL APIMigrate from CouchBase to Azure Cosmos DB SQL API

适用于: SQL API

Azure Cosmos DB 是一种可扩展、全球分布式、完全托管的数据库。Azure Cosmos DB is a scalable, globally distributed, fully managed database. 它提供对数据的访问且保证访问延迟很低。It provides guaranteed low latency access to your data. 若要详细了解 Azure Cosmos DB,请参阅概述一文。To learn more about Azure Cosmos DB, see the overview article. 本文说明如何将连接到 Couchbase 的 Java 应用程序迁移到 Azure Cosmos DB 中的 SQL API 帐户。This article provides instructions to migrate Java applications that are connected to Couchbase to a SQL API account in Azure Cosmos DB.

术语差别Differences in nomenclature

下面是相比于 Couchbase,在 Azure Cosmos DB 中以不同方式工作的重要功能:The following are the key features that work differently in Azure Cosmos DB when compared to Couchbase:

CouchbaseCouchbase Azure Cosmos DBAzure Cosmos DB
Couchbase 服务器Couchbase server 帐户Account
Bucket 数据库Database
Bucket 容器/集合Container/Collection
JSON 文档JSON Document 项/文档Item / Document

主要区别Key differences

  • Azure Cosmos DB 在文档中有一个“ID”字段,而 Couchbase 将 ID 用作桶的一部分。Azure Cosmos DB has an "ID" field within the document whereas Couchbase has the ID as a part of bucket. “ID”字段在整个分区中是唯一的。The "ID" field is unique across the partition.

  • Azure Cosmos DB 使用分区或分片技术进行缩放。Azure Cosmos DB scales by using the partitioning or sharding technique. 这意味着它会将数据拆分为多个分片/分区。Which means it splits the data into multiple shards/partitions. 这些分区/分片是基于提供的分区键属性创建的。These partitions/shards are created based on the partition key property that you provide. 可以选择分区键来优化读取以及写入操作,或者优化读/写操作。You can select the partition key to optimize read as well write operations or read/write optimized too. 若要了解详细信息,请参阅分区一文。To learn more, see the partitioning article.

  • 在 Azure Cosmos DB 中,顶级层次结构无需表示集合,因为集合名称已存在。In Azure Cosmos DB, it is not required for the top-level hierarchy to denote the collection because the collection name already exists. 此功能大幅简化了 JSON 结构。This feature makes the JSON structure much simpler. 以下示例展示了 Couchbase 与 Azure Cosmos DB 之间的数据模型差别:The following is an example that shows differences in the data model between Couchbase and Azure Cosmos DB:

    Couchbase:文档 ID = "99FF4444"Couchbase: Document ID = "99FF4444"

    {
      "TravelDocument":
      {
       "Country":"India",
      "Validity" : "2022-09-01",
        "Person":
        {
          "Name": "Manish",
          "Address": "AB Road, City-z"
        },
        "Visas":
        [
          {
          "Country":"India",
          "Type":"Multi-Entry",
          "Validity":"2022-09-01"
          },
          {
          "Country":"US",
          "Type":"Single-Entry",
          "Validity":"2022-08-01"
          }
        ]
      }
    }
    

    Azure Cosmos DB:引用文档中的“ID”,如下所示Azure Cosmos DB: Refer "ID" within the document as shown below

    {
      "id" : "99FF4444",
    
      "Country":"India",
       "Validity" : "2022-09-01",
        "Person":
        {
          "Name": "Manish",
          "Address": "AB Road, City-z"
        },
        "Visas":
        [
          {
          "Country":"India",
          "Type":"Multi-Entry",
          "Validity":"2022-09-01"
          },
          {
          "Country":"US",
          "Type":"Single-Entry",
          "Validity":"2022-08-01"
          }
        ]
      }
    
    

Java SDK 支持Java SDK support

Azure Cosmos DB 提供以下 SDK 来支持不同的 Java 框架:Azure Cosmos DB has following SDKs to support different Java frameworks:

  • 异步 SDKAsync SDK
  • Spring Boot SDKSpring Boot SDK

以下部分将介绍何时使用其中的每种 SDK。The following sections describe when to use each of these SDKs. 假设我们有三种类型的工作负荷:Consider an example where we have three types of workloads:

用作文档存储库的 Couchbase,以及基于 Spring Data 的自定义查询Couchbase as document repository & spring data-based custom queries

如果要迁移的工作负荷基于 Spring Boot SDK,则可以使用以下步骤:If the workload that you are migrating is based on Spring Boot Based SDK, then you can use the following steps:

  1. 将父级添加到 POM.xml 文件:Add parent to the POM.xml file:

    <parent>
       <groupId>org.springframework.boot</groupId>
       <artifactId>spring-boot-starter-parent</artifactId>
       <version>2.1.5.RELEASE</version>
       <relativePath/>
    </parent>
    
  2. 将属性添加到 POM.xml 文件:Add properties to the POM.xml file:

    <azure.version>2.1.6</azure.version>
    
  3. 将依赖项添加到 POM.xml 文件:Add dependencies to the POM.xml file:

    <dependency>
        <groupId>com.microsoft.azure</groupId>
        <artifactId>azure-cosmosdb-spring-boot-starter</artifactId>
        <version>2.1.6</version>
    </dependency>
    
  4. 在资源下面添加应用程序属性,并指定以下语句。Add application properties under resources and specify the following. 请务必替换 URL、密钥和数据库名称参数:Make sure to replace the URL, key, and database name parameters:

       azure.cosmosdb.uri=<your-cosmosDB-URL>
       azure.cosmosdb.key=<your-cosmosDB-key>
       azure.cosmosdb.database=<your-cosmosDB-dbName>
    
  5. 在模型中定义集合的名称。Define the name of the collection in the model. 还可以指定其他注释。You can also specify further annotations. 例如,指定 ID、分区键以显式表示资源:For example, ID, partition key to denote them explicitly:

    @Document(collection = "mycollection")
        public class User {
            @id
            private String id;
            private String firstName;
            @PartitionKey
            private String lastName;
        }
    

下面是 CRUD 操作的代码片段:The following are the code snippets for CRUD operations:

插入和更新操作Insert and update operations

其中 _repo 是存储库的对象,而 doc 是 POJO 类的对象。Where _repo is the object of repository and doc is the POJO class’s object. 你可以使用 .save 插入或更新插入(如果找到具有指定 ID 的文档)。You can use .save to insert or upsert (if document with specified ID found). 以下代码片段演示如何插入或更新 doc 对象:The following code snippet shows how to insert or update a doc object:

_repo.save(doc);

删除操作Delete Operation

考虑以下代码片段,其中,doc 对象包含查找和删除对象所必需的 ID 和分区键:Consider the following code snippet, where doc object will have ID and partition key mandatory to locate and delete the object:

_repo.delete(doc);

读取操作Read Operation

可以通过指定或不指定分区键来读取文档。You can read the document with or without specifying the partition key. 如果未指定分区键,则将其视为跨分区查询。If you don’t specify the partition key, then it is treated as a cross-partition query. 请考虑以下代码示例,第一个代码示例将使用 ID 和分区键字段执行操作。Consider the following code samples, first one will perform operation using ID and partition key field. 第二个示例使用常规字段且未指定分区键字段。Second example uses a regular field & without specifying the partition key field.

  • _repo.findByIdAndName(objDoc.getId(),objDoc.getName());
  • _repo.findAllByStatus(objDoc.getStatus());

就这么简单,你现在可以将应用程序用于 Azure Cosmos DB。That’s it, you can now use your application with Azure Cosmos DB. CouchbaseToCosmosDB-SpringCosmos GitHub 存储库中提供了本文档中所述示例的完整代码示例。Complete code sample for the example described in this doc is available in the CouchbaseToCosmosDB-SpringCosmos GitHub repo.

用作文档存储库的 Couchbase,使用 N1QL 查询Couchbase as a document repository & using N1QL queries

可以通过 N1QL 查询在 Couchbase 中定义查询。N1QL queries is the way to define queries in the Couchbase.

N1QL 查询N1QL Query Azure CosmosDB 查询Azure CosmosDB Query
SELECT META(TravelDocument).id AS id, TravelDocument.* FROM TravelDocument WHERE _type = "com.xx.xx.xx.xxx.xxx.xxxx " and country = 'India’ and ANY m in Visas SATISFIES m.type == 'Multi-Entry' and m.Country IN ['India', Bhutan’] ORDER BY Validity DESC LIMIT 25 OFFSET 0SELECT META(TravelDocument).id AS id, TravelDocument.* FROM TravelDocument WHERE _type = "com.xx.xx.xx.xxx.xxx.xxxx " and country = 'India’ and ANY m in Visas SATISFIES m.type == 'Multi-Entry' and m.Country IN ['India', Bhutan’] ORDER BY Validity DESC LIMIT 25 OFFSET 0 SELECT c.id,c FROM c JOIN m in c.country=’India’ WHERE c._type = " com.xx.xx.xx.xxx.xxx.xxxx" and c.country = 'India' and m.type = 'Multi-Entry' and m.Country IN ('India', 'Bhutan') ORDER BY c.Validity DESC OFFSET 0 LIMIT 25SELECT c.id,c FROM c JOIN m in c.country=’India’ WHERE c._type = " com.xx.xx.xx.xxx.xxx.xxxx" and c.country = 'India' and m.type = 'Multi-Entry' and m.Country IN ('India', 'Bhutan') ORDER BY c.Validity DESC OFFSET 0 LIMIT 25

你可能会注意到 N1QL 查询中的以下更改:You can notice the following changes in your N1QL queries:

  • 不需要使用 META 关键字或引用第一级文档。You don’t need to use the META keyword or refer to the first-level document. 你可以转为创建自己的对容器的引用。Instead you can create your own reference to the container. 在此示例中,我们已将其视为“c”(可以是任何内容)。In this example, we have considered it as "c" (it can be anything). 此引用用作所有第一级字段的前缀。This reference is used as a prefix for all the first-level fields. 例如,c.id、c.country,等等。Fr example, c.id, c.country etc.

  • 在不指定“ANY”的情况下,现在可以针对子文档执行联接,并使用专用别名(例如“m”)来引用它。Instead of "ANY" now you can do a join on subdocument and refer it with a dedicated alias such as "m". 为子文档创建别名后,需要使用别名。Once you have created alias for a subdocument you need to use alias. 例如 m.Country。For example, m.Country.

  • OFFSET 的顺序在 Azure Cosmos DB 查询中是不同的,需要先指定 OFFSET,再指定 LIMIT。The sequence of OFFSET is different in Azure Cosmos DB query, first you need to specify OFFSET then LIMIT. 如果使用了最大数目的自定义查询,我们建议不要使用 Spring Data SDK,否则在将查询传递给 Azure Cosmos DB 时,可能会在客户端上产生不必要的开销。It is recommended not to use Spring Data SDK if you are using maximum custom defined queries as it can have unnecessary overhead at the client side while passing the query to Azure Cosmos DB. 我们提供了一个直接异步 Java SDK,在这种情况下,可以更有效地利用该 SDK。Instead we have a direct Async Java SDK, which can be utilized much efficiently in this case.

读取操作Read operation

通过以下步骤使用异步 Java SDK:Use the Async Java SDK with the following steps:

  1. 将以下依赖项配置到 POM.xml 文件中:Configure the following dependency onto the POM.xml file:

    <!-- https://mvnrepository.com/artifact/com.microsoft.azure/azure-cosmosdb -->
    <dependency>
        <groupId>com.microsoft.azure</groupId>
        <artifactId>azure-cosmos</artifactId>
        <version>3.0.0</version>
    </dependency>
    
  2. 如以下示例中所示,使用 ConnectionBuilder 方法为 Azure Cosmos DB 创建一个连接对象。Create a connection object for Azure Cosmos DB by using the ConnectionBuilder method as shown in the following example. 请确保将此声明放入 bean 中,以便仅执行以下代码一次:Make sure you put this declaration into the bean such that the following code should get executed only once:

    ConnectionPolicy cp=new ConnectionPolicy();
    cp.connectionMode(ConnectionMode.DIRECT);
    
    if(client==null)
       client= CosmosClient.builder()
          .endpoint(Host)//(Host, PrimaryKey, dbName, collName).Builder()
           .connectionPolicy(cp)
           .key(PrimaryKey)
           .consistencyLevel(ConsistencyLevel.EVENTUAL)
           .build();
    
    container = client.getDatabase(_dbName).getContainer(_collName);
    
  3. 若要执行查询,需运行以下代码片段:To execute the query, you need to run the following code snippet:

    Flux<FeedResponse<CosmosItemProperties>> objFlux= container.queryItems(query, fo);
    

现在,借助上述方法,可以非常顺利地传递并执行多个查询。Now, with the help of above method you can pass multiple queries and execute without any hassle. 如果必须执行一个可拆分为多个查询的较大查询,请尝试运行以下代码片段,而不要运行前面所述的代码:In case you have the requirement to execute one large query, which can be split into multiple queries then try the following code snippet instead of the previous one:

for(SqlQuerySpec query:queries)
{
   objFlux= container.queryItems(query, fo);
   objFlux .publishOn(Schedulers.elastic())
         .subscribe(feedResponse->
            {
               if(feedResponse.results().size()>0)
               {
                  _docs.addAll(feedResponse.results());
               }
            
            },
            Throwable::printStackTrace,latch::countDown);
   lstFlux.add(objFlux);
}
                  
      Flux.merge(lstFlux);
      latch.await();
}

使用上述代码可以并行运行查询,并增加分布式执行以进行优化。With the previous code, you can run queries in parallel and increase the distributed executions to optimize. 此外,还可以运行插入和更新操作:Further you can run the insert and update operations too:

插入操作Insert operation

若要插入文档,请运行以下代码:To insert the document, run the following code:

Mono<CosmosItemResponse> objMono= container.createItem(doc,ro);

然后按如下所示订阅 Mono:Then subscribe to Mono as:

CountDownLatch latch=new CountDownLatch(1);
objMono .subscribeOn(Schedulers.elastic())
        .subscribe(resourceResponse->
        {
           if(resourceResponse.statusCode()!=successStatus)
              {
                 throw new RuntimeException(resourceResponse.toString());
              }
           },
        Throwable::printStackTrace,latch::countDown);
latch.await();

更新插入操作Upsert operation

更新插入操作要求指定所需更新的文档。Upsert operation requires you to specify the document that needs to be updated. 若要提取完整文档,可以使用“读取操作”标题下提到的代码片段,然后修改所需的字段。To fetch the complete document, you can use the snippet mentioned under heading read operation then modify the required field(s). 以下代码片段将更新插入文档:The following code snippet upserts the document:

Mono<CosmosItemResponse> obs= container.upsertItem(doc, ro);

然后订阅 Mono。Then subscribe to mono. 请参考“插入操作”中的 Mono 订阅代码片段。Refer to the mono subscription snippet in insert operation.

删除操作Delete operation

以下代码片段执行删除操作:Following snippet will do delete operation:

CosmosItem objItem= container.getItem(doc.Id, doc.Tenant);
Mono<CosmosItemResponse> objMono = objItem.delete(ro);

然后订阅 Mono。请参考“插入操作”中的 Mono 订阅代码片段。Then subscribe to mono, refer mono subscription snippet in insert operation. CouchbaseToCosmosDB-AsyncInSpring GitHub 存储库中提供了完整的代码示例。The complete code sample is available in the CouchbaseToCosmosDB-AsyncInSpring GitHub repo.

用作键/值对的 CouchbaseCouchbase as a key/value pair

这是一个简单类型的工作负荷,在其中可以执行查找而不是查询。This is a simple type of workload in which you can perform lookups instead of queries. 针对键/值对使用以下步骤:Use the following steps for key/value pairs:

  1. 考虑使用“/ID”作为主键,以确保可以直接在特定的分区中执行查找操作。Consider having "/ID" as primary key, which will makes sure you can perform lookup operation directly in the specific partition. 创建一个集合,并指定“/ID”作为分区键。Create a collection and specify "/ID" as partition key.

  2. 完全关闭索引功能。Switch off the indexing completely. 由于执行的是查找操作,因此不会带来任何索引开销。Because you will execute lookup operations, there is no point of carrying indexing overhead. 若要禁用索引功能,请登录到 Azure 门户并转到“Azure Cosmos DB 帐户”。To turn off indexing, sign into Azure portal, goto Azure Cosmos DB Account. 打开“数据资源管理器”,选择你的 数据库容器Open the Data Explorer, select your Database and the Container. 打开“规模和设置”选项卡,然后选择“索引策略”。 Open the Scale & Settings tab and select the Indexing Policy. 索引策略目前如下所示:Currently indexing policy looks like the following:

    {
     "indexingMode": "consistent",
     "automatic": true,
     "includedPaths": [
         {
             "path": "/*"
         }
     ],
     "excludedPaths": [
         {
             "path": "/\"_etag\"/?"
         }
     ]
     }
    

    将以上索引策略替换为以下策略:Replace the above indexing policy with the following policy:

    {
     "indexingMode": "none",
     "automatic": false,
     "includedPaths": [],
     "excludedPaths": []
     }
    
  3. 使用以下代码片段创建连接对象。Use the following code snippet to create the connection object. 连接对象(将放在 @Bean 中,或设为静态):Connection Object (to be placed in @Bean or make it static):

    ConnectionPolicy cp=new ConnectionPolicy();
    cp.connectionMode(ConnectionMode.DIRECT);
    
    if(client==null)
       client= CosmosClient.builder()
          .endpoint(Host)//(Host, PrimaryKey, dbName, collName).Builder()
           .connectionPolicy(cp)
           .key(PrimaryKey)
           .consistencyLevel(ConsistencyLevel.EVENTUAL)
           .build();
    
    container = client.getDatabase(_dbName).getContainer(_collName);
    

现在,可按如下所示执行 CRUD 操作:Now you can execute the CRUD operations as follows:

读取操作Read operation

若要读取项,请使用以下代码片段:To read the item, use the following snippet:

CosmosItemRequestOptions ro=new CosmosItemRequestOptions();
ro.partitionKey(new PartitionKey(documentId));
CountDownLatch latch=new CountDownLatch(1);
      
var objCosmosItem= container.getItem(documentId, documentId);
Mono<CosmosItemResponse> objMono = objCosmosItem.read(ro);
objMono .subscribeOn(Schedulers.elastic())
        .subscribe(resourceResponse->
        {
           if(resourceResponse.item()!=null)
           {
              doc= resourceResponse.properties().toObject(UserModel.class);
           }
        },
        Throwable::printStackTrace,latch::countDown);
latch.await();

插入操作Insert operation

若要插入项,可执行以下代码:To insert an item, you can perform the following code:

Mono<CosmosItemResponse> objMono= container.createItem(doc,ro);

然后按如下所示订阅 Mono:Then subscribe to mono as:

CountDownLatch latch=new CountDownLatch(1);
objMono.subscribeOn(Schedulers.elastic())
      .subscribe(resourceResponse->
      {
         if(resourceResponse.statusCode()!=successStatus)
            {
               throw new RuntimeException(resourceResponse.toString());
            }
         },
      Throwable::printStackTrace,latch::countDown);
latch.await();

更新插入操作Upsert operation

若要更新项的值,请参考以下代码片段:To update the value of an item, refer the code snippet below:

Mono<CosmosItemResponse> obs= container.upsertItem(doc, ro);

然后订阅 Mono。请参考“插入操作”中的 Mono 订阅代码片段。Then subscribe to mono, refer mono subscription snippet in insert operation.

删除操作Delete operation

使用以下代码片段执行删除操作:Use the following snippet to execute the delete operation:

CosmosItem objItem= container.getItem(id, id);
Mono<CosmosItemResponse> objMono = objItem.delete(ro);

然后订阅 Mono。请参考“插入操作”中的 Mono 订阅代码片段。Then subscribe to mono, refer mono subscription snippet in insert operation. CouchbaseToCosmosDB-AsyncKeyValue GitHub 存储库中提供了完整的代码示例。The complete code sample is available in the CouchbaseToCosmosDB-AsyncKeyValue GitHub repo.

数据迁移Data Migration

可通过两种方式迁移数据。There are two ways to migrate data.

  • 使用 Azure 数据工厂: 强烈建议使用此方法来迁移数据。Use Azure Data Factory: This is the most recommended method to migrate the data. 将源配置为 Couchbase,将接收器配置为 Azure Cosmos DB SQL API。有关详细步骤,请参阅 Azure Cosmos DB 数据工厂连接器一文。Configure the source as Couchbase and sink as Azure Cosmos DB SQL API, see the Azure Cosmos DB Data Factory connector article for detailed steps.

  • 使用 Azure Cosmos DB 数据导入工具: 使用包含少量数据的 VM 进行迁移时,建议使用此选项。Use the Azure Cosmos DB data import tool: This option is recommended to migrate using VMs with less amount of data. 有关详细步骤,请参阅数据导入工具一文。For detailed steps, see the Data importer article.

后续步骤Next Steps