整合的持續性反模式Monolithic Persistence antipattern

將所有應用程式的資料放入單一資料存放區會降低效能,因為會造成資源爭用,或資料存放區不適合某些資料。Putting all of an application's data into a single data store can hurt performance, either because it leads to resource contention, or because the data store is not a good fit for some of the data.

問題說明Problem description

以前應用程式經常使用單一資料存放區,不論需要儲存之應用程式資料的類型是否不同。Historically, applications have often used a single data store, regardless of the different types of data that the application might need to store. 通常這是為了簡化應用程式設計,或是符合開發小組的現有技能集。Usually this was done to simplify the application design, or else to match the existing skill set of the development team.

現代化雲端式系統通常會有其他的功能性和非功能性需求,而且需要儲存許多異質性類型的資料,例如文件、映像、快取資料、已佇列的訊息、應用程式記錄和遙測。Modern cloud-based systems often have additional functional and nonfunctional requirements, and need to store many heterogeneous types of data, such as documents, images, cached data, queued messages, application logs, and telemetry. 遵循將這項資訊全部放入相同資料存放區的傳統方法會降低效能,有兩個主要原因:Following the traditional approach and putting all of this information into the same data store can hurt performance, for two main reasons:

  • 在相同資料存放區中儲存和擷取大量不相關的資料,可能會造成爭用,進而導致緩慢回應時間和連線失敗。Storing and retrieving large amounts of unrelated data in the same data store can cause contention, which in turn leads to slow response times and connection failures.
  • 無論選擇哪個資料存放區,無法適合所有不同類型的資料,或無法針對應用程式執行的作業最佳化。Whichever data store is chosen, it might not be the best fit for all of the different types of data, or it might not be optimized for the operations that the application performs.

下列範例會顯示 ASP.NET Web API 控制器,它會將新記錄新增至資料庫,也會將結果記錄到記錄檔。The following example shows an ASP.NET Web API controller that adds a new record to a database and also records the result to a log. 記錄會保留在與商務資料相同的資料庫。The log is held in the same database as the business data. 您可以在這裡找到完整的範例。You can find the complete sample here.

public class MonoController : ApiController
{
    private static readonly string ProductionDb = ...;

    public async Task<IHttpActionResult> PostAsync([FromBody]string value)
    {
        await DataAccess.InsertPurchaseOrderHeaderAsync(ProductionDb);
        await DataAccess.LogAsync(ProductionDb, LogTableName);
        return Ok();
    }
}

記錄檔記錄產生的速率可能會影響商務作業的效能。The rate at which log records are generated will probably affect the performance of the business operations. 如果另一個元件 (例如應用程式處理序監視器) 定期讀取並處理記錄資料,也會影響商務作業。And if another component, such as an application process monitor, regularly reads and processes the log data, that can also affect the business operations.

如何修正問題How to fix the problem

根據其使用分隔資料。Separate data according to its use. 針對每個資料集,選取最符合該資料集使用方式的資料存放區。For each data set, select a data store that best matches how that data set will be used. 在上一個範例中,應用程式應該會記錄到與保存商務資料之資料庫不同的個別存放區:In the previous example, the application should be logging to a separate store from the database that holds business data:

public class PolyController : ApiController
{
    private static readonly string ProductionDb = ...;
    private static readonly string LogDb = ...;

    public async Task<IHttpActionResult> PostAsync([FromBody]string value)
    {
        await DataAccess.InsertPurchaseOrderHeaderAsync(ProductionDb);
        // Log to a different data store.
        await DataAccess.LogAsync(LogDb, LogTableName);
        return Ok();
    }
}

考量Considerations

  • 根據資料的使用方式及存取方式來分隔資料。Separate data by the way it is used and how it is accessed. 例如,不要將記錄資訊和商務資料儲存在相同的資料存放區。For example, don't store log information and business data in the same data store. 這些類型的資料有明顯不同的需求和存取模式。These types of data have significantly different requirements and patterns of access. 記錄檔記錄本來就是循序的,而商務資料比較像是需要隨機存取,且通常是關聯式。Log records are inherently sequential, while business data is more likely to require random access, and is often relational.

  • 請考慮每種資料類型的資料存取模式。Consider the data access pattern for each type of data. 例如,在像是 Cosmos DB 的文件資料庫儲存格式化報告和文件,但是使用 Azure Cache for Redis 以快取暫存資料。For example, store formatted reports and documents in a document database such as Cosmos DB, but use Azure Cache for Redis to cache temporary data.

  • 如果您遵循本指南,但仍然達到資料庫的限制,您可能需要擴大資料庫。If you follow this guidance but still reach the limits of the database, you may need to scale up the database. 另外請考慮水平調整以及在資料庫伺服器之間分割負載。Also consider scaling horizontally and partitioning the load across database servers. 不過,資料分割可能需要重新設計應用程式。However, partitioning may require redesigning the application. 如需詳細資訊,請參閱資料分割For more information, see Data partitioning.

如何偵測問題How to detect the problem

系統可能會大幅減慢,最終失敗,因為系統用盡例如資料庫連線的資源。The system will likely slow down dramatically and eventually fail, as the system runs out of resources such as database connections.

您可以執行下列步驟來協助識別原因。You can perform the following steps to help identify the cause.

  1. 檢測系統以記錄關鍵效能統計資料。Instrument the system to record the key performance statistics. 擷取每個作業的計時資訊,以及應用程式讀取及寫入資料的位置點。Capture timing information for each operation, as well as the points where the application reads and writes data.
  2. 盡可能監視在生產環境中執行數天的系統,以取得系統使用方式的實際檢視。If possible, monitor the system running for a few days in a production environment to get a real-world view of how the system is used. 如果不可行,以執行典型系列作業的虛擬使用者實際數量執行指令碼式負載測試。If this is not possible, run scripted load tests with a realistic volume of virtual users performing a typical series of operations.
  3. 使用遙測資料來識別效能不佳的期間。Use the telemetry data to identify periods of poor performance.
  4. 識別在這些期間存取了哪些資料存放區。Identify which data stores were accessed during those periods.
  5. 識別可能經歷爭用的資料儲存體資源。Identify data storage resources that might be experiencing contention.

範例診斷Example diagnosis

下列各節會將這些步驟套用到稍早所述的範例應用程式。The following sections apply these steps to the sample application described earlier.

檢測和監視系統Instrument and monitor the system

下圖顯示針對稍早所述之範例應用程式進行負載測試的結果。The following graph shows the results of load testing the sample application described earlier. 測試使用最多 1000 個並行使用者的步驟負載。The test used a step load of up to 1000 concurrent users.

以 SQL 為基礎之控制站的負載測試效能結果

隨著負載增加至 700 個使用者,輸送量也會跟著增加。As the load increases to 700 users, so does the throughput. 屆時,輸送量會穩定,系統會以其最大容量執行。But at that point, throughput levels off, and the system appears to be running at its maximum capacity. 平均回應會隨著使用者負載而逐漸增加,顯示系統無法跟上需求。The average response gradually increases with user load, showing that the system can't keep up with demand.

識別效能不佳的期間Identify periods of poor performance

如果您正在監視生產系統,您可能會發現模式。If you are monitoring the production system, you might notice patterns. 例如,回應時間可能會在每天的相同時間大幅下滑。For example, response times might drop off significantly at the same time each day. 這可能是由一般工作負載或排程批次作業所造成,或者只是因為系統在特定時間有更多使用者。This could be caused by a regular workload or scheduled batch job, or just because the system has more users at certain times. 您應該專注於這些事件的遙測資料。You should focus on the telemetry data for these events.

尋找增加的回應時間和增加的資料庫活動或 I/O 之間的相互關聯,以共用資源。Look for correlations between increased response times and increased database activity or I/O to shared resources. 如果有相互關聯,則表示資料庫可能是瓶頸。If there are correlations, it means the database might be a bottleneck.

識別在這些期間存取了哪些資料存放區Identify which data stores are accessed during those periods

下圖會顯示在負載測試期間資料庫輸送量單位 (DTU) 的使用率。The next graph shows the utilization of database throughput units (DTU) during the load test. (DTU 是可用容量的量值,也是 CPU 使用率、記憶體配置、I/O 速率的組合。)DTU 使用率快速達到 100%。(A DTU is a measure of available capacity, and is a combination of CPU utilization, memory allocation, I/O rate.) Utilization of DTUs quickly reached 100%. 這大約是上圖中的輸送量尖峰點。This is roughly the point where throughput peaked in the previous graph. 測試完成之前,資料庫使用率仍然相當高。Database utilization remained very high until the test finished. 到結束之前會有些微下滑,可能是因為節流、資料庫連線的競爭或其他因素所造成。There is a slight drop toward the end, which could be caused by throttling, competition for database connections, or other factors.

Azure 傳統入口網站中的資料庫監視會顯示資料庫的資源使用率

檢查資料存放區的遙測Examine the telemetry for the data stores

檢測資料存放區以擷取活動的低層級詳細資料。Instrument the data stores to capture the low-level details of the activity. 在範例應用程式中,資料存取統計資料會顯示針對 PurchaseOrderHeader 資料表和 MonoLog 資料表執行的大量插入作業。In the sample application, the data access statistics showed a high volume of insert operations performed against both the PurchaseOrderHeader table and the MonoLog table.

範例應用程式的資料存取統計資料

識別資源爭用Identify resource contention

此時,您可以檢閱原始程式碼,將焦點放在應用程式存取之爭用資源的位置點。At this point, you can review the source code, focusing on the points where contended resources are accessed by the application. 尋找如下的情況:Look for situations such as:

  • 以邏輯方式分隔的資料被寫入相同的存放區。Data that is logically separate being written to the same store. 例如記錄、報告和已佇列的訊息之資料不應該保存在與商務資訊相同的資料庫。Data such as logs, reports, and queued messages should not be held in the same database as business information.
  • 資料存放區和資料類型選擇之間的不相符,例如大型 blob 或關聯式資料庫中的 XML 文件。A mismatch between the choice of data store and the type of data, such as large blobs or XML documents in a relational database.
  • 具有差異極大之使用模式但是共用相同存放區的資料,例如以與低寫入/高讀取資料儲存的高寫入/低讀取資料。Data with significantly different usage patterns that share the same store, such as high-write/low-read data being stored with low-write/high-read data.

實作解決方案並確認結果Implement the solution and verify the result

應用程式已變更為將記錄寫入個別資料存放區。The application was changed to write logs to a separate data store. 以下是負載測試結果:Here are the load test results:

使用 Polyglot 控制器的負載測試效能結果

輸送量的模式類似於先前的圖表,但是效能尖峰點大約是每秒要求數 500 個以上。The pattern of throughput is similar to the earlier graph, but the point at which performance peaks is approximately 500 requests per second higher. 平均回應時間稍低。The average response time is marginally lower. 不過,這些統計資料不會訴說完整的資訊。However, these statistics don't tell the full story. 商務資料庫的遙測顯示 DTU 使用率尖峰於大約在 75%,而不是 100%。Telemetry for the business database shows that DTU utilization peaks at around 75%, rather than 100%.

Azure 傳統入口網站中的資料庫監視會顯示 polyglot 案例中資料庫的資源使用率

同樣地,記錄資料庫的最大 DTU 使用率只會達到大約 70%。Similarly, the maximum DTU utilization of the log database only reaches about 70%. 資料庫不再是系統效能的限制因素。The databases are no longer the limiting factor in the performance of the system.

Azure 傳統入口網站中的資料庫監視會顯示 polyglot 案例中記錄資料庫的資源使用率