非關聯式資料和 NoSQLNon-relational data and NoSQL

「非關聯式資料庫」是一種資料庫,這種資料庫不會使用可在大部分傳統資料庫系統中看到的表格式資料列和資料行結構描述。A non-relational database is a database that does not use the tabular schema of rows and columns found in most traditional database systems. 相反地,非關聯式資料庫會使用針對所要儲存之資料類型的特定需求所最佳化的儲存體模型。Instead, non-relational databases use a storage model that is optimized for the specific requirements of the type of data being stored. 比方說,資料可能會儲存為簡單的索引鍵/值組、JSON 文件,或包含邊緣和頂點的圖表。For example, data may be stored as simple key/value pairs, as JSON documents, or as a graph consisting of edges and vertices.

這些資料存放區的共通點是它們都未使用關聯式模式What all of these data stores have in common is that they don't use a relational model. 此外,他們傾向於對其支援的資料類型及資料的查詢方式提供更具體的方法。Also, they tend to be more specific in the type of data they support and how data can be queried. 比方說,時間序列資料存放區是針對以時間為基礎的資料序列查詢予以最佳化,而圖表資料存放區則是針對瀏覽實體之間的加權關聯性進行最佳化。For example, time series data stores are optimized for queries over time-based sequences of data, while graph data stores are optimized for exploring weighted relationships between entities. 這兩種格式都無法完全適用於管理交易資料的工作。Neither format would generalize well to the task of managing transactional data.

NoSQL 一詞所指的資料存放區不會使用 SQL 進行查詢,而是會改用其他程式設計語言和建構來查詢資料。The term NoSQL refers to data stores that do not use SQL for queries, and instead use other programming languages and constructs to query the data. 在實務上,「NoSQL」表示「非關聯式資料庫」,即使這些資料庫之中有許多都支援 SQL 相容查詢亦然。In practice, "NoSQL" means "non-relational database," even though many of these databases do support SQL-compatible queries. 不過,在執行相同的 SQL 查詢時,基礎查詢執行策略與傳統的 RDBMS 通常會使用極為不同的方式。However, the underlying query execution strategy is usually very different from the way a traditional RDBMS would execute the same SQL query.

下列各節會說明非關聯式或 NoSQL 資料庫的主要類別。The following sections describe the major categories of non-relational or NoSQL database.

文件資料存放區Document data stores

文件資料存放區會在稱為「文件」的實體中管理一組具名字串欄位和物件資料值。A document data store manages a set of named string fields and object data values in an entity referred to as a document. 這些資料存放區一般會以 JSON 文件的形式儲存資料。These data stores typically store data in the form of JSON documents. 每個欄位值可以是純量項目 (例如數字),也可以是複合元素 (例如清單或父子集合)。Each field value could be a scalar item, such as a number, or a compound element, such as a list or a parent-child collection. 文件之欄位中的資料可以各種不同的方式編碼,包括 XML、YAML、JSON、BSON,或甚至以純文字形式儲存。The data in the fields of a document can be encoded in a variety of ways, including XML, YAML, JSON, BSON, or even stored as plain text. 文件中的欄位會公開至儲存體管理系統,讓應用程式可以藉由使用這些欄位中的值來查詢和篩選資料。The fields within documents are exposed to the storage management system, enabling an application to query and filter data by using the values in these fields.

一般而言,文件包含實體的整個資料。Typically, a document contains the entire data for an entity. 項目構成實體是特定應用程式。What items constitute an entity are application-specific. 例如,實體可能包含客戶、訂單或兩者組合的詳細資料。For example, an entity could contain the details of a customer, an order, or a combination of both. 單一文件可能包含會分散於關聯式資料庫管理系統 (RDBMS) 中數個關聯式資料表的資訊。A single document might contain information that would be spread across several relational tables in a relational database management system (RDBMS). 文件存放區不需要所有文件都具有相同的結構。A document store does not require that all documents have the same structure. 此自由格式的方法提供極大的彈性。This free-form approach provides a great deal of flexibility. 比方說,應用程式可以在文件中儲存不同的資料,以回應商務需求的變更。For example, applications can store different data in documents in response to a change in business requirements.

文件資料存放區範例

應用程式可以使用文件索引鍵來擷取文件。The application can retrieve documents by using the document key. 這是文件的唯一識別碼,通常是雜湊,以協助平均散發資料。This is a unique identifier for the document, which is often hashed, to help distribute data evenly. 某些文件資料庫會自動建立文件索引鍵。Some document databases create the document key automatically. 其他資料庫則可讓您指定要作為索引鍵的文件屬性。Others enable you to specify an attribute of the document to use as the key. 應用程式也可以根據一或多個欄位的值來查詢文件。The application can also query documents based on the value of one or more fields. 某些文件資料庫支援編製索引,以便根據一或多個索引的欄位來快速查閱文件。Some document databases support indexing to facilitate fast lookup of documents based on one or more indexed fields.

許多文件資料庫支援就地更新,讓應用程式不需要重寫整份文件,就可以修改文件中特定欄位的值。Many document databases support in-place updates, enabling an application to modify the values of specific fields in a document without rewriting the entire document. 單一文件中多個欄位的讀取和寫入作業是通常是不可部分完成的。Read and write operations over multiple fields in a single document are typically atomic.

相關 Azure 服務:Relevant Azure service:

單欄式資料存放區Columnar data stores

單欄式或資料行系列資料存放區會將資料組織成資料行和資料列。A columnar or column-family data store organizes data into columns and rows. 在其最簡單的形式中,資料行系列資料存放區的外觀會非常類似於關聯式資料庫,至少在概念上是如此。In its simplest form, a column-family data store can appear very similar to a relational database, at least conceptually. 資料行系列資料庫的實際能力在於其建構疏鬆資料的反正規化方法,此方法源自於用來儲存資料的資料行導向方法。The real power of a column-family database lies in its denormalized approach to structuring sparse data, which stems from the column-oriented approach to storing data.

您可以將資料行系列資料存放區視為保存具有資料列和資料行的表格式資料,但是資料行分成稱為「資料行系列」的群組。You can think of a column-family data store as holding tabular data with rows and columns, but the columns are divided into groups known as column families. 每個資料行系列會保存一組資料行,邏輯上相互關聯,通常當作一個單位來擷取或管理。Each column family holds a set of columns that are logically related and are typically retrieved or manipulated as a unit. 個別存取的其他資料可以儲存在個別的資料行系列中。Other data that is accessed separately can be stored in separate column families. 在資料行系列內,新資料行可以動態新增,而資料列可以是疏鬆的 (亦即,一個資料列不一定要具有每個資料行的值)。Within a column family, new columns can be added dynamically, and rows can be sparse (that is, a row doesn't need to have a value for every column).

下圖顯示具有兩個資料行系列 IdentityContact Info 的範例。The following diagram shows an example with two column families, Identity and Contact Info. 單一實體的資料在每個資料行系列中會有相同的資料列索引鍵。The data for a single entity has the same row key in each column family. 此結構 (其中資料行系列中任何指定物件的資料列會動態變化) 是資料行系列方法的重要優點,讓這種形式的資料存放區高度適用於儲存具有各種結構描述的資料。This structure, where the rows for any given object in a column family can vary dynamically, is an important benefit of the column-family approach, making this form of data store highly suited for storing data with varying schemas.

資料行系列資料範例

不同於索引鍵/值存放區或文件資料庫,大部分資料行系列資料庫會以索引鍵順序來實際儲存資料,而不是藉由計算雜湊。Unlike a key/value store or a document database, most column-family databases physically store data in key order, rather than by computing a hash. 這些資料庫會將資料列索引鍵視為主要索引,而且資料列索引鍵可讓您透過特定的索引鍵或索引鍵範圍來啟用以索引鍵為基礎的存取。The row key is considered the primary index and enables key-based access via a specific key or a range of keys. 部分實作可讓您在資料行系列的特定資料行上建立次要索引。Some implementations allow you to create secondary indexes over specific columns in a column family. 次要索引可讓您依據資料行值擷取資料,而不是依據資料列索引鍵。Secondary indexes let you retrieve data by columns value, rather than row key.

在磁碟上,資料行系列中的所有資料行會一起儲存在相同的檔案中,每個檔案各有一定數目的資料列。On disk, all of the columns within a column family are stored together in the same file, with a certain number of rows in each file. 使用大型資料集時,這個方法可在一次只一起查詢幾個資料行時,藉由減少必須從磁碟讀取的資料量來產生效能優勢。With large data sets, this approach creates a performance benefit by reducing the amount of data that needs to be read from disk when only a few columns are queried together at a time.

雖然某些實作在跨越多個資料行系列的整個資料列提供不可部分完成性,是通常在單一資料行系列不可部分完成的資料列的讀取和寫入作業。Read and write operations for a row are typically atomic within a single column family, although some implementations provide atomicity across the entire row, spanning multiple column families.

相關 Azure 服務:Relevant Azure service:

索引鍵/值資料存放區Key/value data stores

索引鍵/值存放區基本上是大型雜湊資料表。A key/value store is essentially a large hash table. 讓每個資料值與唯一的索引鍵產生關聯,索引鍵/值存放區會使用此索引鍵來儲存資料,方法是使用適當的雜湊函式。You associate each data value with a unique key, and the key/value store uses this key to store the data by using an appropriate hashing function. 會選取雜湊函式以提供跨資料儲存體雜湊索引鍵的平均分配。The hashing function is selected to provide an even distribution of hashed keys across the data storage.

大部分索引鍵/值存放區僅支援簡單的查詢、插入和刪除作業。Most key/value stores only support simple query, insert, and delete operations. 若要修改值 (部分或完全),應用程式必須覆寫整個值的現有資料。To modify a value (either partially or completely), an application must overwrite the existing data for the entire value. 在大部分實作中,讀取或寫入單一值是不可部分完成的作業。In most implementations, reading or writing a single value is an atomic operation. 如果值很大,寫入可能需要一些時間。If the value is large, writing may take some time.

應用程式可以將任意資料儲存為一組值,雖然某些索引鍵/值存放區會對值的大小上限施加限制。An application can store arbitrary data as a set of values, although some key/value stores impose limits on the maximum size of values. 儲存的值對於儲存體系統軟體是不透明的。The stored values are opaque to the storage system software. 任何結構描述資訊都必須由應用程式提供並解譯。Any schema information must be provided and interpreted by the application. 基本上,值為 blob 且索引鍵/值存放區只依據索引鍵擷取或儲存值。Essentially, values are blobs and the key/value store simply retrieves or stores the value by key.

索引鍵/值存放區中的資料範例

索引鍵/值存放區非常適合使用索引鍵值或索引鍵範圍來執行簡單查閱的應用程式,但是較不適合需要跨不同索引鍵/值資料表來查詢資料的系統,例如聯結跨多個資料表的資料。Key/value stores are highly optimized for applications performing simple lookups using the value of the key, or by a range of keys, but are less suitable for systems that need to query data across different tables of keys/values, such as joining data across multiple tables.

對於必須依據非索引鍵值來進行查詢或篩選,而不是只根據索引鍵來執行查閱的情況,就不適合使用索引鍵/值存放區。Key/value stores are also not optimized for scenarios where querying or filtering by non-key values is important, rather than performing lookups based only on keys. 比方說,使用關聯式資料庫時,您可以使用 WHERE 子句篩選非索引鍵資料行中,找到一筆記錄,但索引鍵/值存放區通常不會有這種類型的值,查閱功能的或如果沒有的話,它需要緩慢地掃描所有值。For example, with a relational database, you can find a record by using a WHERE clause to filter the non-key columns, but key/values stores usually do not have this type of lookup capability for values, or if they do, it requires a slow scan of all values.

單一索引鍵/值存放區可以極度擴充,因為資料存放區可以輕易地在不同機器上的多個節點之間分散資料。A single key/value store can be extremely scalable, as the data store can easily distribute data across multiple nodes on separate machines.

相關 Azure 服務:Relevant Azure services:

圖表資料存放區Graph data stores

圖表資料存放區會管理兩種資訊,分別是節點和邊緣。A graph data store manages two types of information, nodes and edges. 節點代表實體,邊緣則會指定這些實體之間的關聯性。Nodes represent entities, and edges specify the relationships between these entities. 節點和邊緣都有屬性,提供該節點或邊緣的相關資訊,類似於資料表中的資料行。Both nodes and edges can have properties that provide information about that node or edge, similar to columns in a table. 邊緣也可以有方向,指出關聯性的本質。Edges can also have a direction indicating the nature of the relationship.

圖表資料存放區的目的是允許應用程式有效率地執行查詢,周遊節點和邊緣的網路,並分析實體之間的關聯性。The purpose of a graph data store is to allow an application to efficiently perform queries that traverse the network of nodes and edges, and to analyze the relationships between entities. 下圖顯示以圖表為結構之組織的人員資料。The following diagram shows an organization's personnel data structured as a graph. 實體是員工和部門,邊緣表示報告關聯性以及員工工作所在的部門。The entities are employees and departments, and the edges indicate reporting relationships and the department in which employees work. 在此圖表中,邊緣上的箭號會顯示關聯性的方向。In this graph, the arrows on the edges show the direction of the relationships.

圖表資料存放區中的資料範例

此結構會讓執行例如「尋找直接或間接向 Sarah 報告的所有員工」或「誰在與 John 一樣的部門工作?」的查詢更加直覺。This structure makes it straightforward to perform queries such as "Find all employees who report directly or indirectly to Sarah" or "Who works in the same department as John?" 如有大量實體和關聯性的大型圖表,您可以快速執行複雜的分析。For large graphs with lots of entities and relationships, you can perform complex analyses quickly. 許多圖表資料庫提供查詢語言,可讓您有效率地周遊關聯性的網路。Many graph databases provide a query language that you can use to traverse a network of relationships efficiently.

相關 Azure 服務:Relevant Azure service:

時間序列資料存放區Time series data stores

時間序列資料是一組依時間加以組織的值,而時間序列資料存放區非常適合用於這種類型的資料。Time series data is a set of values organized by time, and a time series data store is optimized for this type of data. 時間序列資料存放區必須支援非常大量的寫入,因為它們通常會即時從大量來源收集大量資料。Time series data stores must support a very high number of writes, as they typically collect large amounts of data in real time from a large number of sources. 時間序列資料存放區非常適合用於儲存遙測資料。Time series data stores are optimized for storing telemetry data. 案例包括 IoT 感應器或應用程式/系統計數器。Scenarios include IoT sensors or application/system counters. 更新很少,刪除通常會以大量作業來完成。Updates are rare, and deletes are often done as bulk operations.

時間序列資料範例

雖然寫入時間序列資料庫的記錄通常很小,但是經常會有大量記錄,資料大小總計會快速成長。Although the records written to a time series database are generally small, there are often a large number of records, and total data size can grow rapidly. 時間序列資料存放區也會處理順序錯亂和未準時抵達的資料、資料點的自動索引,以及以時間範圍描述之查詢的最佳化。Time series data stores also handle out-of-order and late-arriving data, automatic indexing of data points, and optimizations for queries described in terms of windows of time. 這最後一項功能可讓查詢跨數百萬個資料點與多個資料流快速地執行,以便支援時間序列視覺效果,這是常見的時間序列資料使用方式。This last feature enables queries to run across millions of data points and multiple data streams quickly, in order to support time series visualizations, which is a common way that time series data is consumed.

如需詳細資訊,請參閱時間序列解決方案For more information, see Time series solutions

相關 Azure 服務:Relevant Azure services:

物件資料存放區Object data stores

物件資料存放區已針對儲存和擷取大型二進位物件或 Blob (如映像、文字檔、影片和音訊串流、大型應用程式資料物件和文件,以及虛擬機器磁碟映像) 最佳化。Object data stores are optimized for storing and retrieving large binary objects or blobs such as images, text files, video and audio streams, large application data objects and documents, and virtual machine disk images. 物件是由預存資料、某些中繼資料和用來存取物件的唯一識別碼所組成。An object consists of the stored data, some metadata, and a unique ID for accessing the object. 物件存放區是設計來支援非常大型的個別檔案,以及提供大量的總儲存空間來管理所有檔案。Object stores are designed to support files that are individually very large, as well provide large amounts of total storage to manage all files.

物件資料範例

某些物件資料存放區會跨多個伺服器節點複寫指定的 Blob,以便實現快速平行讀取。Some object data stores replicate a given blob across multiple server nodes, which enables fast parallel reads. 這又可實現大型檔案所含資料的相應放大查詢,因為一般會在不同伺服器上執行的多個處理序,各可同時查詢大型資料檔案。This in turn enables the scale-out querying of data contained in large files, because multiple processes, typically running on different servers, can each query the large data file simultaneously.

物件資料存放區的其中一個特殊案例是網路檔案共用。One special case of object data stores is the network file share. 使用檔案共用可使用標準網路通訊協定 (例如伺服器訊息區 (SMB)) 透過網路存取檔案。Using file shares enables files to be accessed across a network using standard networking protocols like server message block (SMB). 指定適當的安全性和並行存取控制機制,共用資料,以這種方式可以讓分散式的服務提供可高度擴充的資料存取的基本、 低階作業,例如簡單的讀取和寫入要求。Given appropriate security and concurrent access control mechanisms, sharing data in this way can enable distributed services to provide highly scalable data access for basic, low-level operations such as simple read and write requests.

相關 Azure 服務:Relevant Azure services:

外部索引資料存放區External index data stores

外部索引資料存放區提供了可搜尋其他資料存放區和服務中保存之資訊的能力。External index data stores provide the ability to search for information held in other data stores and services. 外部索引會作為任何資料存放區的次要索引,並可用於對大量資料編制索引,以及提供這些索引的近乎即時存取。An external index acts as a secondary index for any data store, and can be used to index massive volumes of data and provide near real-time access to these indexes.

例如,您可能會在檔案系統中儲存文字檔。For example, you might have text files stored in a file system. 依檔案路徑來尋找檔案的速度會很快,但根據檔案內容所進行的搜尋就必須掃描所有檔案,因此會很慢。Finding a file by its file path is quick, but searching based on the contents of the file would require a scan of all of the files, which is slow. 外部索引可讓您建立第二個搜尋索引,然後快速尋找符合您準則的檔案路徑。An external index lets you create secondary search indexes and then quickly find the path to the files that match your criteria. 外部索引的另一個應用範例是在僅依索引鍵編制索引的索引鍵/值存放區。Another example application of an external index is with key/value stores that only index by the key. 您可以根據資料中的值來建置次要索引,並快速查閱可唯一識別每個相符項目的索引鍵。You can build a secondary index based on the values in the data, and quickly look up the key that uniquely identifies each matched item.

搜尋資料範例

執行索引程序即可建立索引。The indexes are created by running an indexing process. 這可以藉由使用提取模型來執行、由資料存放區觸發,或使用由應用程式程式碼所起始的推送模型來執行。This can be performed using a pull model, triggered by the data store, or using a push model, initiated by application code. 索引可以是多維度,且可支援跨大量文字資料的任意文字搜尋。Indexes can be multidimensional and may support free-text searches across large volumes of text data.

外部索引資料存放區通常用於支援全文檢索和 web 為基礎的搜尋。External index data stores are often used to support full text and web-based search. 在這些案例中,您可以進行精確或模糊搜尋。In these cases, searching can be exact or fuzzy. 模糊搜尋會尋找符合一組條件的文件,並且計算符合程度。A fuzzy search finds documents that match a set of terms and calculates how closely they match. 某些外部索引也支援可根據同義字、內容類型擴充 (例如,比對「dogs」和「pets」) 和詞幹分析 (例如,搜尋「run」也會比對「ran」和「running」) 傳回相符項目的語言分析。Some external indexes also support linguistic analysis that can return matches based on synonyms, genre expansions (for example, matching "dogs" to "pets"), and stemming (for example, searching for "run" also matches "ran" and "running").

相關 Azure 服務:Relevant Azure service:

一般需求Typical requirements

非關聯式資料存放區所使用的儲存體架構通常會與關聯式資料庫所使用的架構不同。Non-relational data stores often use a different storage architecture from that used by relational databases. 具體來說,它們傾向具有固定的結構描述。Specifically, they tend toward having no fixed schema. 此外,它們往往不支援交易,要不然就是會限制交易的範圍,且基於延展性考量,通常不會包含次要索引。Also, they tend not to support transactions, or else restrict the scope of transactions, and they generally don't include secondary indexes for scalability reasons.

下表會比較每個非關聯式資料存放區的需求:The following compares the requirements for each of the non-relational data stores:

需求Requirement 文件資料Document data 資料行系列資料Column-family data 索引鍵/值資料Key/value data 圖表資料Graph data
正規化Normalization 反正規化Denormalized 反正規化Denormalized 反正規化Denormalized 正規化Normalized
結構描述Schema 讀取時結構描述Schema on read 資料行系列在寫入時定義,資料行讀取時結構描述Column families defined on write, column schema on read 讀取時結構描述Schema on read 讀取時結構描述Schema on read
一致性 (跨並行交易)Consistency (across concurrent transactions) 可微調的一致性,文件層級保證Tunable consistency, document-level guarantees 資料行系列層級保證Column-family–level guarantees 索引鍵層級保證Key-level guarantees 圖表層級保證Graph-level guarantees
不可部分完成的作業 (交易範圍)Atomicity (transaction scope) 集合Collection 資料表Table 資料表Table 圖形Graph
鎖定策略Locking Strategy 開放式 (無鎖定)Optimistic (lock free) 封閉式 (資料列鎖定)Pessimistic (row locks) 開放式 (ETag)Optimistic (ETag)
存取模式Access pattern 隨機存取Random access 高/寬資料的彙總Aggregates on tall/wide data 隨機存取Random access 隨機存取Random access
編製索引Indexing 主要和次要索引Primary and secondary indexes 主要和次要索引Primary and secondary indexes 僅主要索引Primary index only 主要和次要索引Primary and secondary indexes
資料圖形Data shape 文件Document 表格式,使用包含資料行的資料行系列Tabular with column families containing columns 索引鍵和值Key and value 包含邊緣和頂點的圖表Graph containing edges and vertices
疏鬆Sparse Yes Yes Yes No
寬 (眾多資料行/屬性)Wide (lots of columns/attributes) Yes Yes No No
資料大小Datum size 小 (KB) 至中 (低 MB)Small (KBs) to medium (low MBs) 中 (MB) 至大 (低 GB)Medium (MBs) to Large (low GBs) 小 (KB)Small (KBs) 小 (KB)Small (KBs)
整體的最大級別Overall Maximum Scale 極大 (PB)Very Large (PBs) 極大 (PB)Very Large (PBs) 極大 (PB)Very Large (PBs) 大 (TB)Large (TBs)
需求Requirement 時間序列資料Time series data 物件資料Object data 外部索引資料External index data
正規化Normalization 正規化Normalized 反正規化Denormalized 反正規化Denormalized
結構描述Schema 讀取時結構描述Schema on read 讀取時結構描述Schema on read 寫入時結構描述Schema on write
一致性 (跨並行交易)Consistency (across concurrent transactions) N/AN/A N/AN/A N/AN/A
不可部分完成的作業 (交易範圍)Atomicity (transaction scope) N/AN/A ObjectObject N/AN/A
鎖定策略Locking Strategy N/AN/A 封閉式 (Blob 鎖定)Pessimistic (blob locks) N/AN/A
存取模式Access pattern 隨機存取和彙總Random access and aggregation 循序存取Sequential access 隨機存取Random access
編製索引Indexing 主要和次要索引Primary and secondary indexes 僅主要索引Primary index only N/AN/A
資料圖形Data shape 表格式Tabular Blob 和中繼資料Blob and metadata 文件Document
疏鬆Sparse No N/AN/A No
寬 (眾多資料行/屬性)Wide (lots of columns/attributes) No yesYes Yes
資料大小Datum size 小 (KB)Small (KBs) 大 (GB) 至極大 (TB)Large (GBs) to Very Large (TBs) 小 (KB)Small (KBs)
整體的最大級別Overall Maximum Scale 大 (低 TB)Large (low TBs) 極大 (PB)Very Large (PBs) 大 (低 TB)Large (low TBs)