Azure Cosmos DB 簡介:Gremlin APIIntroduction to Azure Cosmos DB: Gremlin API

Azure Cosmos DB  是 Microsoft 推出的全域散發多模型資料庫服務,適用於任務關鍵性應用程式。Azure Cosmos DB is the globally distributed, multi-model database service from Microsoft for mission-critical applications. 它是一種多模型資料庫,支援文件、索引鍵值、圖形和資料行系列資料模型。It is a multi-model database and supports document, key-value, graph, and column-family data models. Azure Cosmos DB Gremlin API 用來在針對任何規模而設計的完全受控資料庫服務上儲存及操作圖形資料。The Azure Cosmos DB Gremlin API is used to store and operate with graph data on a fully managed database service designed for any scale.

Azure Cosmos DB 圖表架構

本文提供 Azure Cosmos DB Gremlin API 的概觀,並說明如何使用它來儲存包含數十億個頂點和邊緣的巨大圖表。This article provides an overview of the Azure Cosmos DB Gremlin API and explains how you can use it to store massive graphs with billions of vertices and edges. 您可以在幾毫秒延遲的情況下查詢圖形,並輕鬆地發展圖形結構。You can query the graphs with millisecond latency and evolve the graph structure easily. Azure Cosmos DB 的 Gremlin API 會遵循 Apache TinkerPop  圖形資料庫標準,並使用 Gremlin 查詢語言。Azure Cosmos DB's Gremlin API is based on the Apache TinkerPop graph database standard, and uses the Gremlin query language.

Azure Cosmos DB 的 Gremlin API 結合了圖形資料庫演算法的強大功能,其具有調整性高的受控基礎結構,可針對因缺乏彈性和關聯式方法而產生的常見資料問題,提供唯一且有彈性的解決方案。Azure Cosmos DB's Gremlin API combines the power of graph database algorithms with highly scalable, managed infrastructure to provide a unique, flexible solution to most common data problems associated with lack of flexibility and relational approaches.

Azure Cosmos DB 圖表資料庫的功能Features of Azure Cosmos DB graph database

Azure Cosmos DB 是一種完全受控的圖表資料庫,提供全域散發、彈性調整的儲存體和輸送量、自動編製索引和查詢、可調整的一致性等級,而且支援 TinkerPop 標準。Azure Cosmos DB is a fully managed graph database that offers global distribution, elastic scaling of storage and throughput, automatic indexing and query, tunable consistency levels, and support for the TinkerPop standard.

以下是 Azure Cosmos DB Gremlin API 所提供的差異化功能:The following are the differentiated features that Azure Cosmos DB Gremlin API offers:

  • 可彈性調整的輸送量和儲存體Elastically scalable throughput and storage

    實務上的圖表需要調整超過單一伺服器的產能。Graphs in the real world need to scale beyond the capacity of a single server. Azure Cosmos DB 支援水平調整圖形資料庫,在儲存體和已佈建的輸送量方面幾乎沒有大小限制。Azure Cosmos DB supports horizontally scalable graph databases that can have a virtually unlimited size in terms of storage and provisioned throughput. 當圖形資料庫的規模擴增時,資料就會自動透過圖形分割來散發。As the graph database scale grows, the data will be automatically distributed using graph partitioning.

  • 多重區域複寫Multi-region replication

    Azure Cosmos DB 可以自動將圖形資料複寫至全球各地的任何 Azure 區域。Azure Cosmos DB can automatically replicate your graph data to any Azure region worldwide. 全域複寫可簡化需要全域資料存取權的應用程式開發。Global replication simplifies the development of applications that require global access to data. 除了將全球各地的讀取和寫入延遲降至最低以外,Azure Cosmos DB 還提供了自動區域性容錯移轉機制,如果應用程式罕見地在區域中發生服務中斷時,這將可確保應用程式的持續性。In addition to minimizing read and write latency anywhere around the world, Azure Cosmos DB provides automatic regional failover mechanism that can ensure the continuity of your application in the rare case of a service interruption in a region.

  • 使用最廣為採用的圖形查詢標準進行快速查詢和周遊Fast queries and traversals with the most widely adopted graph query standard

    儲存異質頂點和邊緣,並透過熟悉的 Gremlin 語法加以查詢。Store heterogeneous vertices and edges and query them through a familiar Gremlin syntax. Gremlin 是命令式的功能性查詢語言,其提供豐富的介面來實作常用圖形演算法。Gremlin is an imperative, functional query language that provides a rich interface to implement common graph algorithms.

    這樣一來,不需要指定結構描述提示、次要索引或檢視,Azure Cosmos DB 就能進行大量且即時的查詢和周遊。Azure Cosmos DB enables rich real-time queries and traversals without the need to specify schema hints, secondary indexes, or views. 深入了解使用 Gremlin 查詢圖形Learn more in Query graphs by using Gremlin.

  • 完全受控的圖形資料庫Fully managed graph database

    Azure Cosmos DB 能消除資料庫和電腦資源的管理需求。Azure Cosmos DB eliminates the need to manage database and machine resources. 大部分的現有圖形資料庫平台都受制於其基礎結構限制,而且往往需要較高程度的維護,以確保其作業正常運作。Most existing graph database platforms are bound to the limitations of their infrastructure and often require a high degree of maintenance to ensure its operation.

    作為完全受控的服務,Cosmos DB 可讓您不需要管理虛擬機器、更新執行階段軟體、管理分區化或複寫,或處理複雜的資料層升級。As a fully managed service, Cosmos DB removes the need to manage virtual machines, update runtime software, manage sharding or replication, or deal with complex data-tier upgrades. 每個圖表都會自動備份,以防區域性失敗。Every graph is automatically backed up and protected against regional failures. 這些保證可讓開發人員專注在開發應用程式值,而不是操作和管理圖形資料庫。These guarantees allow developers to focus on delivering application value instead of operating and managing their graph databases.

  • 自動編製索引Automatic indexing

    根據預設,Azure Cosmos DB 會自動為圖表中節點和邊緣內的屬性編製索引,而不要求或需要任何結構描述或建立次要索引。By default, Azure Cosmos DB automatically indexes all the properties within nodes and edges in the graph and doesn't expect or require any schema or creation of secondary indices. 深入了解 Azure Cosmos DB 的索引編製Learn more about indexing in Azure Cosmos DB.

  • Apache TinkerPop 相容性Compatibility with Apache TinkerPop

    Azure Cosmos DB 支援開放原始碼的 Apache TinkerPop 標準Azure Cosmos DB supports the open-source Apache TinkerPop standard. Tinkerpop 標準有豐富的應用程式和程式庫生態系統,可以與 Azure Cosmos DB 的 Gremlin API 輕鬆整合。The Tinkerpop standard has an ample ecosystem of applications and libraries that can be easily integrated with Azure Cosmos DB's Gremlin API.

  • 可調式一致性層級Tunable consistency levels

    Azure Cosmos DB 提供了五個定義完善的一致性層級,讓您可在應用程式的一致性與效能之間做出適當的取捨。Azure Cosmos DB provides five well-defined consistency levels to achieve the right tradeoff between consistency and performance for your application. 針對查詢和讀取作業,Azure Cosmos DB 提供五個不同的一致性等級:強式、限定過期、工作階段、一致的前置和最終。For queries and read operations, Azure Cosmos DB offers five distinct consistency levels: strong, bounded-staleness, session, consistent prefix, and eventual. 這些細微且定義完善的一致性等級,可讓您在一致性、可用性與延遲三者間做出合理取捨。These granular, well-defined consistency levels allow you to make sound tradeoffs among consistency, availability, and latency. 深入了解 Azure Cosmos DB 中的 Tunable 資料一致性層級Learn more in Tunable data consistency levels in Azure Cosmos DB.

可以使用 Gremlin API 的案例Scenarios that can use Gremlin API

以下是某些可以使用 Azure Cosmos DB 圖形支援的案例︰Here are some scenarios where graph support of Azure Cosmos DB can be useful:

  • 社交網路/Customer 365Social networks/Customer 365

    藉由結合客戶相關資料和他們與其他人的互動,您可以開發個人化體驗、預測客戶行為,或將興趣雷同的人們聯繫在一起。By combining data about your customers and their interactions with other people, you can develop personalized experiences, predict customer behavior, or connect people with others with similar interests. Azure Cosmos DB 可用來管理社交網路並追蹤客戶的喜好設定和資料。Azure Cosmos DB can be used to manage social networks and track customer preferences and data.

  • 建議引擎Recommendation engines

    這是零售業常用的案例。This scenario is commonly used in the retail industry. 藉由結合產品、使用者和使用者互動的相關資訊,例如購物、瀏覽或商品評價,您可以建立自訂的推薦。By combining information about products, users, and user interactions, like purchasing, browsing, or rating an item, you can build customized recommendations. Azure Cosmos DB 憑藉其低延遲、彈性調整及原生的圖形支援,很適合用於這些案例。The low latency, elastic scale, and native graph support of Azure Cosmos DB is ideal for these scenarios.

  • 地理空間Geospatial

    電信、物流和旅遊規劃方面的許多應用程式需要在區域內尋找有興趣的地點,或在兩個地點之間找出最短/最佳路線。Many applications in telecommunications, logistics, and travel planning need to find a location of interest within an area or locate the shortest/optimal route between two locations. Azure Cosmos DB 很自然地可以解決這些問題。Azure Cosmos DB is a natural fit for these problems.

  • 物聯網Internet of Things

    以圖形模擬 IoT 裝置之間的網路和連線,可讓您更了解裝置和資產的狀態。With the network and connections between IoT devices modeled as a graph, you can build a better understanding of the state of your devices and assets. 您也可以了解網路的某個部分變動時可能對其他部分造成什麼影響。You also can learn how changes in one part of the network can potentially affect another part.

圖形資料庫的簡介Introduction to graph databases

實務上的資料會自然連線。Data as it appears in the real world is naturally connected. 傳統的資料模型化著重於個別定義實體,以及計算這些實體在執行階段上的關聯性。Traditional data modeling focuses on defining entities separately and computing their relationships at runtime. 雖然此模型有其優點,但要在其限制下管理高度連結的資料並不容易。While this model has its advantages, highly connected data can be challenging to manage under its constraints.

圖形資料庫方法則是依賴儲存層中持續的關聯性,以達到高效率的圖形擷取作業。A graph database approach relies on persisting relationships in the storage layer instead, which leads to highly efficient graph retrieval operations. Azure Cosmos DB 的 Gremlin API 支援屬性圖形模型Azure Cosmos DB's Gremlin API supports the property graph model.

屬性圖形物件Property graph objects

圖形屬性是由頂點邊緣組成的結構。A property graph is a structure that's composed of vertices and edges. 這兩個物件能夠以任意數目的索引鍵值組作為屬性。Both objects can have an arbitrary number of key-value pairs as properties.

  • 頂點 - 頂點代表特定的實體,例如人員、地點或事件。Vertices - Vertices denote discrete entities, such as a person, a place, or an event.

  • 邊緣 - 邊緣代表頂點之間的關聯性。Edges - Edges denote relationships between vertices. 比方說,某個人可能會知道其他人、參與某個事件,以及在最近前往某個位置。For example, a person might know another person, be involved in an event, and recently been at a location.

  • 屬性 - 屬性表達頂點和邊緣的相關資訊。Properties - Properties express information about the vertices and edges. 頂點或邊緣中可以有任意數目的屬性,並且可用來說明及篩選查詢中的物件。There can be any number of properties in either vertices or edges, and they can be used to describe and filter the objects in a query. 屬性範例包括具有名稱和年齡的頂點,或是具有時間戳記和/或加權的邊緣。Example properties include a vertex that has name and age, or an edge, which can have a time stamp and/or a weight.

圖形資料庫通常內含在 NoSQL 或非關聯式的資料庫類別中,因為沒有結構描述或受限資料模型的相依性。Graph databases are often included within the NoSQL or non-relational database category, since there is no dependency on a schema or constrained data model. 這種缺乏結構描述的特性,可讓連接的結構自然又有效率地模型化和儲存。This lack of schema allows for modeling and storing connected structures naturally and efficiently.

Gremlin 範例Gremlin by example

讓我們利用一個範例圖表了解如何以 Gremlin 表達查詢。Let's use a sample graph to understand how queries can be expressed in Gremlin. 下圖顯示的商務應用程式以圖表形式管理使用者、興趣和裝置的相關資料。The following figure shows a business application that manages data about users, interests, and devices in the form of a graph.

顯示人員、裝置和興趣的範例資料庫

此圖形有下列頂點類型 (在 Gremlin 中稱為「標籤」)︰This graph has the following vertex types (called "label" in Gremlin):

  • 人員:圖表中有三個人:Robin、Thomas 和 BenPeople: The graph has three people, Robin, Thomas, and Ben
  • 興趣:在此範例中他們的興趣是足球比賽Interests: Their interests, in this example, the game of Football
  • 裝置:人員使用的裝置Devices: The devices that people use
  • 作業系統:執行裝置的作業系統Operating Systems: The operating systems that the devices run on

我們透過下列邊緣類型/標籤,表示這些實體之間的關聯性︰We represent the relationships between these entities via the following edge types/labels:

  • 認識:例如,「Thomas 認識 Robin」Knows: For example, "Thomas knows Robin"
  • 有興趣:在圖表中表示人員的興趣,例如「Ben 對足球有興趣」Interested: To represent the interests of the people in our graph, for example, "Ben is interested in Football"
  • 執行 OS︰膝上型電腦執行 Windows OSRunsOS: Laptop runs the Windows OS
  • 使用:代表某個人使用的裝置。Uses: To represent which device a person uses. 例如,Robin 使用序號 77 的 Motorola 手機For example, Robin uses a Motorola phone with serial number 77

讓我們使用 Gremlin 主控台 (英文) 對此圖表執行一些作業。Let's run some operations against this graph using the Gremlin Console. 也可以在您選擇的平台 (Java、Node.js、Python 或 .NET) 使用 Gremlin 驅動程式執行這些作業。You can also perform these operations using Gremlin drivers in the platform of your choice (Java, Node.js, Python, or .NET). 在了解 Azure Cosmos DB 中支援什麼功能之前,讓我們先看看幾個範例,以熟悉語法。Before we look at what's supported in Azure Cosmos DB, let's look at a few examples to get familiar with the syntax.

首先,讓我們看看 CRUD。First let's look at CRUD. 下列 Gremlin 陳述式會將 "Thomas" 頂點插入圖表中︰The following Gremlin statement inserts the "Thomas" vertex into the graph:

:> g.addV('person').property('id', 'thomas.1').property('firstName', 'Thomas').property('lastName', 'Andersen').property('age', 44)

接著,下列 Gremlin 陳述式會在 Thomas 和 Robin 之間插入 "knows" 邊緣。Next, the following Gremlin statement inserts a "knows" edge between Thomas and Robin.

:> g.V('thomas.1').addE('knows').to(g.V('robin.1'))

下列查詢會依名字的遞減順序傳回 "person" 頂點:The following query returns the "person" vertices in descending order of their first names:

:> g.V().hasLabel('person').order().by('firstName', decr)

圖表的威力在於當您需要回答「Thomas 的朋友使用什麼作業系統?」這種問題時。Where graphs shine is when you need to answer questions like "What operating systems do friends of Thomas use?". 您可以執行這個 Gremlin 周遊,從圖表中取得這項資訊︰You can run this Gremlin traversal to get that information from the graph:

:> g.V('thomas.1').out('knows').out('uses').out('runsos').group().by('name').by(count())

後續步驟Next steps

若要深入了解 Azure Cosmos DB 中的圖表支援,請參閱︰To learn more about graph support in Azure Cosmos DB, see: