您现在访问的是微软AZURE全球版技术文档网站,若需要访问由世纪互联运营的MICROSOFT AZURE中国区技术文档网站,请访问 https://docs.azure.cn.

具体化视图模式Materialized View pattern

当未针对所需的查询操作完美设置数据的格式时,在一个或多个数据存储中基于数据生成预填充的视图。Generate prepopulated views over the data in one or more data stores when the data isn't ideally formatted for required query operations. 这有助于支持高效查询和数据提取,并可提高应用程序性能。This can help support efficient querying and data extraction, and improve application performance.

上下文和问题Context and problem

存储数据时,开发人员和数据管理员的首要任务通常是专注于数据的存储方式,而不是其读取方式。When storing data, the priority for developers and data administrators is often focused on how the data is stored, as opposed to how it's read. 所选的存储格式通常与以下内容紧密相关:数据格式、管理数据大小和数据完整性的需求以及正在使用的存储类型。The chosen storage format is usually closely related to the format of the data, requirements for managing data size and data integrity, and the kind of store in use. 例如,使用 NoSQL 文档存储时,数据通常表示为一系列聚合,每个聚合包含该实体的所有信息。For example, when using NoSQL document store, the data is often represented as a series of aggregates, each containing all of the information for that entity.

但是,这可能对查询产生负面影响。However, this can have a negative effect on queries. 如果查询仅需要某些实体的数据子集(如多个客户的订单摘要,而不需要订单的所有详细信息),它必须从相关实体中提取所有数据,以便获取所需信息。When a query only needs a subset of the data from some entities, such as a summary of orders for several customers without all of the order details, it must extract all of the data for the relevant entities in order to obtain the required information.

解决方案Solution

若要支持高效查询,常见的解决方案是提前生成一个视图,该视图以适合所需结果集的格式将数据具体化。To support efficient querying, a common solution is to generate, in advance, a view that materializes the data in a format suited to the required results set. 具体化视图模式介绍如何在以下环境中生成预填充的数据视图:源数据的格式不适合进行查询、难以生成适合的查询或数据或数据存储的性质导致查询性能较差。The Materialized View pattern describes generating prepopulated views of data in environments where the source data isn't in a suitable format for querying, where generating a suitable query is difficult, or where query performance is poor due to the nature of the data or the data store.

这些具体化视图(仅包含查询所需的数据)允许应用程序快速获取所需信息。These materialized views, which only contain data required by a query, allow applications to quickly obtain the information they need. 除了联接表格或合并数据实体外,具体化视图还可以包含计算列或数据项当前的值、对数据项合并值或执行转换的结果以及指定为查询的一部分的值。In addition to joining tables or combining data entities, materialized views can include the current values of calculated columns or data items, the results of combining values or executing transformations on the data items, and values specified as part of the query. 甚至可以仅针对单个查询优化具体化视图。A materialized view can even be optimized for just a single query.

关键点在于,具体化视图可从源数据存储完全重新生成,所以它及其包含的信息是完全可释放的。A key point is that a materialized view and the data it contains is completely disposable because it can be entirely rebuilt from the source data stores. 具体化视图从不通过应用程序直接更新,因此它是专用缓存。A materialized view is never updated directly by an application, and so it's a specialized cache.

视图的源数据更改时,必须更新视图以使其包含新的信息。When the source data for the view changes, the view must be updated to include the new information. 可计划自动执行此操作,或在系统检测到原始数据发生更改时执行操作。You can schedule this to happen automatically, or when the system detects a change to the original data. 在某些情况下,可能需要手动重新生成视图。In some cases it might be necessary to regenerate the view manually. 下图中的示例演示了可以如何使用具体化视图模式。The figure shows an example of how the Materialized View pattern might be used.

图 1 中的示例演示了可以如何使用具体化视图模式

问题和注意事项Issues and considerations

在决定如何实现此模式时,请考虑以下几点:Consider the following points when deciding how to implement this pattern:

视图更新的方式和时间。How and when the view will be updated. 理想情况下,当有事件指示源数据发生更改时,会重新生成视图(尽管源数据更改速度过快时会导致过度开销)。Ideally it'll regenerate in response to an event indicating a change to the source data, although this can lead to excessive overhead if the source data changes rapidly. 或者,可以考虑使用计划的任务、外部触发器或通过手动操作重新生成视图。Alternatively, consider using a scheduled task, an external trigger, or a manual action to regenerate the view.

在某些系统(例如,使用事件溯源模式维护仅含修改数据的事件的存储时)中,必须使用具体化视图。In some systems, such as when using the Event Sourcing pattern to maintain a store of only the events that modified the data, materialized views are necessary. 通过检查所有事件确定当前状态来预填充视图,这可能是从事件存储获取信息的唯一方法。Prepopulating views by examining all events to determine the current state might be the only way to obtain information from the event store. 如果没有使用事件溯源,则需考虑具体化视图对你而言是否有用。If you're not using Event Sourcing, you need to consider whether a materialized view is helpful or not. 具体化视图通常专门用于一个或少量查询的情况。Materialized views tend to be specifically tailored to one, or a small number of queries. 如果使用了大量查询,具体化视图会产生不可接受的存储容量需求和存储成本。If many queries are used, materialized views can result in unacceptable storage capacity requirements and storage cost.

如果此情况定期出现,请在生成视图以及更新视图时考虑这对数据一致性的影响。Consider the impact on data consistency when generating the view, and when updating the view if this occurs on a schedule. 如果在生成视图时源数据发生更改,则此视图中的数据副本不会与原始数据完全一致。If the source data is changing at the point when the view is generated, the copy of the data in the view won't be fully consistent with the original data.

请考虑视图的存储位置。Consider where you'll store the view. 视图所在的存储或分区无需与原始数据的相同。The view doesn't have to be located in the same store or partition as the original data. 它可以是几个不同分区合并形成的子集。It can be a subset from a few different partitions combined.

如果视图丢失,可重新生成视图。A view can be rebuilt if lost. 因此,如果视图是暂时性视图,且仅用于通过反映数据的当前状态提升查询性能或提高可伸缩性,则可将其存储在缓存或可靠性较低的位置。Because of that, if the view is transient and is only used to improve query performance by reflecting the current state of the data, or to improve scalability, it can be stored in a cache or in a less reliable location.

定义具体化视图时,基于以下内容将数据项或列添加到具体化视图使值最大化:现有数据项的计算或转换、在查询中传递的值或这些值在适当情况下的组合。When defining a materialized view, maximize its value by adding data items or columns to it based on computation or transformation of existing data items, on values passed in the query, or on combinations of these values when appropriate.

如果存储机制支持,请考虑将具体化视图编制索引,进一步提高性能。Where the storage mechanism supports it, consider indexing the materialized view to further increase performance. 大多数关系数据库和基于 Apache Hadoop 的大数据解决方案都支持为视图编制索引。Most relational databases support indexing for views, as do big data solutions based on Apache Hadoop.

何时使用此模式When to use this pattern

此模式适合用于:This pattern is useful when:

  • 对难以直接查询的数据创建具体化视图,或查询必须十分复杂才能提取以规范化、半结构化或非结构化方式存储的数据。Creating materialized views over data that's difficult to query directly, or where queries must be very complex to extract data that's stored in a normalized, semi-structured, or unstructured way.
  • 创建的临时视图有以下作用时:可极大地提高查询性能,或可直接充当 UI、报告或显示的源视图或数据传输对象。Creating temporary views that can dramatically improve query performance, or can act directly as source views or data transfer objects for the UI, for reporting, or for display.
  • 支持偶尔连接或断开的方案,其中不始终提供与数据存储的连接。Supporting occasionally connected or disconnected scenarios where connection to the data store isn't always available. 在此情况下可本地缓存视图。The view can be cached locally in this case.
  • 简化查询,并以无需了解源数据格式的方式公开数据以进行试验。Simplifying queries and exposing data for experimentation in a way that doesn't require knowledge of the source data format. 例如,通过在一个或多个数据库或 NoSQL 存储中的一个或多个域中联接不同的表格,然后设置数据格式以满足其最终用途。For example, by joining different tables in one or more databases, or one or more domains in NoSQL stores, and then formatting the data to fit its eventual use.
  • 提供对源数据的特定子集的访问权限,出于安全性或隐私原因,这些子集不可公开访问、不可供修改或者不对用户完全公开。Providing access to specific subsets of the source data that, for security or privacy reasons, shouldn't be generally accessible, open to modification, or fully exposed to users.
  • 桥接不同的数据存储,充分利用其各项功能。Bridging different data stores, to take advantage of their individual capabilities. 例如,将写入效率高的云存储作为参考数据存储使用,并使用提供良好查询和读取性能的关系数据库来保存具体化视图。For example, using a cloud store that's efficient for writing as the reference data store, and a relational database that offers good query and read performance to hold the materialized views.

此模式在以下情况中不起作用:This pattern isn't useful in the following situations:

  • 源数据十分简单且易于查询。The source data is simple and easy to query.
  • 源数据更改速度很快,或可在不使用视图的情况下访问源数据。The source data changes very quickly, or can be accessed without using a view. 在这些情况下,应避免创建视图时的处理开销。In these cases, you should avoid the processing overhead of creating views.
  • 一致性是重中之重。Consistency is a high priority. 视图可能无法始终与原始数据完全一致。The views might not always be fully consistent with the original data.

示例Example

下表中的示例演示了如何使用具体化视图模式生成销售摘要。The following figure shows an example of using the Materialized View pattern to generate a summary of sales. 将 Azure 存储帐户中不同分区的“订单”、“订单项”和“客户”表格中的数据合并,生成一个视图,该视图在“电子”类别中包含了每个产品的总销售价值,还包含购买每一项的客户数量。Data in the Order, OrderItem, and Customer tables in separate partitions in an Azure storage account are combined to generate a view containing the total sales value for each product in the Electronics category, along with a count of the number of customers who made purchases of each item.

图 2:使用具体化视图模式生成销售摘要

创建此具体化视图需要复杂的查询。Creating this materialized view requires complex queries. 但是,通过具体化视图的方式公开查询结果,用户可以轻松获取结果并直接使用,或将结果纳入另一个查询。However, by exposing the query result as a materialized view, users can easily obtain the results and use them directly or incorporate them in another query. 此视图可能会用于报告系统或仪表板,也可以定期(如每周)更新。The view is likely to be used in a reporting system or dashboard, and can be updated on a scheduled basis such as weekly.

虽然此示例使用 Azure 表存储,但许多关系数据库管理系统还为具体化视图提供本机支持。Although this example uses Azure table storage, many relational database management systems also provide native support for materialized views.

实现此模式时可能,可能也会与以下模式和指南相关:The following patterns and guidance might also be relevant when implementing this pattern:

  • Data Consistency Primer(数据一致性入门)。Data Consistency Primer. 需要维护具体化视图中的摘要信息,以便其反映基础数据值。The summary information in a materialized view has to be maintained so that it reflects the underlying data values. 数据值更改时,可能无法实时更新摘要数据,此时需要采用最终一致的方法。As the data values change, it might not be practical to update the summary data in real time, and instead you'll have to adopt an eventually consistent approach. 总结了有关维护分布式数据一致性的问题,介绍了不同一致性模型的优点和权衡方案。Summarizes the issues surrounding maintaining consistency over distributed data, and describes the benefits and tradeoffs of different consistency models.
  • 命令和查询责任分离 (CQRS) 模式Command and Query Responsibility Segregation (CQRS) pattern. 用于在基础数据值更改时对发生的事件作出响应,从而更新具体化视图中的信息。Use to update the information in a materialized view by responding to events that occur when the underlying data values change.
  • 事件溯源模式Event Sourcing pattern. 与 CQRS 模式配合使用来维护具体化视图中的信息。Use in conjunction with the CQRS pattern to maintain the information in a materialized view. 具体化视图所基于的数据值更改时,系统可以引发描述这些更改的事件并将其保存到事件存储中。When the data values a materialized view is based on are changed, the system can raise events that describe these changes and save them in an event store.
  • 索引表模式Index Table pattern. 通常由主键整理具体化视图中的数据,但查询可能需要通过检查其他字段中的数据来从此视图检索信息。The data in a materialized view is typically organized by a primary key, but queries might need to retrieve information from this view by examining data in other fields. 用于对不支持本机辅助索引的数据存储的数据集创建辅助索引。Use to create secondary indexes over data sets for data stores that don't support native secondary indexes.