您现在访问的是微软AZURE全球版技术文档网站,若需要访问由世纪互联运营的MICROSOFT AZURE中国区技术文档网站,请访问 https://docs.azure.cn.

最大程度减少协调Minimize coordination

最大程度减少应用程序服务之间的协调以获得可伸缩性Minimize coordination between application services to achieve scalability

大多数云应用程序包含多个应用程序服务 — web 前端、数据库、业务流程、报告和分析等。Most cloud applications consist of multiple application services — web front ends, databases, business processes, reporting and analysis, and so on. 若要实现可伸缩性和可靠性,其中每一个服务都应在多个实例上运行。To achieve scalability and reliability, each of those services should run on multiple instances.

当两个实例尝试执行影响某种共享状态的并发操作时会发生什么?What happens when two instances try to perform concurrent operations that affect some shared state? 在某些情况下,须跨节点进行协调,例如保留 ACID 保证。In some cases, there must be coordination across nodes, for example to preserve ACID guarantees. 此图中,Node2 正在等待 Node1 释放数据库锁定:In this diagram, Node2 is waiting for Node1 to release a database lock:

数据库锁定图

协调限制了水平缩放的优点,且会形成瓶颈。Coordination limits the benefits of horizontal scale and creates bottlenecks. 在此示例中,当横向扩展应用程序并添加更多实例时,锁定争用会增加。In this example, as you scale out the application and add more instances, you'll see increased lock contention. 而最糟的情况是前端实例将花费大部分时间等待锁定。In the worst case, the front-end instances will spend most of their time waiting on locks.

“仅一次”语义是发生协调的另一个常见原因。"Exactly once" semantics are another frequent source of coordination. 例如,一个顺序必须仅处理一次。For example, an order must be processed exactly once. 两个辅助角色正在侦听新顺序。Two workers are listening for new orders. Worker1 选取一个顺序进行处理。Worker1 picks up an order for processing. 应用程序须确保 Worker2 不会重复工作,并且如果 Worker1 崩溃,不会删除顺序。The application must ensure that Worker2 doesn't duplicate the work, but also if Worker1 crashes, the order isn't dropped.

协调关系图

可使用计划程序代理监督程序等模式在辅助角色之间进行协调,但在这种情况下,采用对工作进行分区的方法可能更好。You can use a pattern such as Scheduler Agent Supervisor to coordinate between the workers, but in this case a better approach might be to partition the work. 每个辅助角色都分配有某范围的顺序(比如按照计费区域)。Each worker is assigned a certain range of orders (say, by billing region). 如果某辅助角色故障,新的实例会在前一个实例停止的位置启用,但不会出现多个实例争用的情况。If a worker crashes, a new instance picks up where the previous instance left off, but multiple instances aren't contending.

建议Recommendations

实现最终一致性Embrace eventual consistency. 分布数据时,需要协调来强制执行可靠的一致性保证。When data is distributed, it takes coordination to enforce strong consistency guarantees. 例如,假设通过一项操作更新两个数据库。For example, suppose an operation updates two databases. 最好系统能调节最终一致性(方法是在发生故障后,使用补偿事务模式进行逻辑回滚),而不是将其置于单个事务范围。Instead of putting it into a single transaction scope, it's better if the system can accommodate eventual consistency, perhaps by using the Compensating Transaction pattern to logically roll back after a failure.

使用域事件同步状态Use domain events to synchronize state. 域事件是一种事件,可记录域中发生的重要事情。A domain event is an event that records when something happens that has significance within the domain. 关注的服务会侦听事件,而不是使用全局事务来协调多个服务。Interested services can listen for the event, rather than using a global transaction to coordinate across multiple services. 如果使用此方法,系统必须允许最终一致性(请参阅上一项)。If this approach is used, the system must tolerate eventual consistency (see previous item).

请考虑使用 CQRS 和事件源等模式。Consider patterns such as CQRS and event sourcing. 这两种模式有助于减少读取工作负载和写入工作负载之间的争用。These two patterns can help to reduce contention between read workloads and write workloads.

  • CQRS 模式将读取操作从写入操作中分离。The CQRS pattern separates read operations from write operations. 在某些实现中,读取数据通过物理方式从写入数据中分离。In some implementations, the read data is physically separated from the write data.

  • 事件源模式中,状态更改作为一系列事件被记录到仅追加数据存储中。In the Event Sourcing pattern, state changes are recorded as a series of events to an append-only data store. 将事件追加到流是一种原子操作,需要最小锁定。Appending an event to the stream is an atomic operation, requiring minimal locking.

这两种模式互为补充。These two patterns complement each other. 如果 CQRS 中的只写存储使用事件源,则只读存储可以侦听相同的事件,以创建当前状态的可读快照(已针对查询进行优化)。If the write-only store in CQRS uses event sourcing, the read-only store can listen for the same events to create a readable snapshot of the current state, optimized for queries. 但是,在采用 CQRS 或事件源之前,请先了解此方法的难题。Before adopting CQRS or event sourcing, however, be aware of the challenges of this approach.

将数据分区Partition data. 避免将所有数据放入一个由多个应用程序服务共享的数据架构中。Avoid putting all of your data into one data schema that is shared across many application services. 微服务体系结构通过使每个服务对自己的数据存储负责来强制执行这一原则。A microservices architecture enforces this principle by making each service responsible for its own data store. 在单个数据库中,将数据分区到不同分片可以提高并发性,因为写入到一个分片的服务不会影响写入其他分片的服务。Within a single database, partitioning the data into shards can improve concurrency, because a service writing to one shard does not affect a service writing to a different shard.

设计幂等操作。Design idempotent operations. 如果可能,请将操作设计为幂等操作。When possible, design operations to be idempotent. 这样一来,可使用“至少一次”语义处理这些操作。That way, they can be handled using at-least-once semantics. 例如,可将工作项放入队列。For example, you can put work items on a queue. 如果某辅助角色在操作期间故障,另一个辅助角色会选取此工作项。If a worker crashes in the middle of an operation, another worker simply picks up the work item.

使用异步并行处理。Use asynchronous parallel processing. 如果某项操作需要多个异步执行的步骤(例如远程服务调用),可并行调用,然后聚合结果。If an operation requires multiple steps that are performed asynchronously (such as remote service calls), you might be able to call them in parallel, and then aggregate the results. 此方法假定每个步骤不依赖上一步的结果。This approach assumes that each step does not depend on the results of the previous step.

如果可能,请使用乐观并发。Use optimistic concurrency when possible. 悲观并发控件使用数据库锁定来防止冲突。Pessimistic concurrency control uses database locks to prevent conflicts. 这可能会导致性能不佳,可用性降低。This can cause poor performance and reduce availability. 对于乐观并发控件,每个事务修改数据的副本或快照。With optimistic concurrency control, each transaction modifies a copy or snapshot of the data. 提交事务时,数据库引擎会验证事务并拒绝会影响数据库一致性的任何事务。When the transaction is committed, the database engine validates the transaction and rejects any transactions that would affect database consistency.

通过快照隔离,Azure SQL 数据库和 SQL Server 支持乐观并发。Azure SQL Database and SQL Server support optimistic concurrency through snapshot isolation. 通过使用 Azure Cosmos DBAzure 存储等 Etag,一些 Azure 存储服务支持乐观并发。Some Azure storage services support optimistic concurrency through the use of Etags, including Azure Cosmos DB and Azure Storage.

请考虑使用 MapReduce、其他并行或分布式算法。Consider MapReduce or other parallel, distributed algorithms. 根据要执行的工作的数据和类型,可将工作拆分为独立的任务,这些任务可以由并行工作的多个节点执行。Depending on the data and type of work to be performed, you may be able to split the work into independent tasks that can be performed by multiple nodes working in parallel. 请参阅大计算的体系结构样式See Big compute architecture style.

使用领导选择进行协调。Use leader election for coordination. 如果需要协调操作,请确保协调器不会成为应用程序中的单一故障点。In cases where you need to coordinate operations, make sure the coordinator does not become a single point of failure in the application. 使用领导选择模式,一个实例始终是领导并充当协调器。Using the Leader Election pattern, one instance is the leader at any time, and acts as the coordinator. 如果该领导失败,会选择新的实例作为领导。If the leader fails, a new instance is elected to be the leader.