Build a scalable system for massive data

Your data storage system is fundamental to the success of your applications, and therefore to the success of your enterprise. When the storage system is well architected, response is quick, data storage capacity is easily adjusted as necessary, the system is resilient to failures, and it's affordable.

A crucial consideration is whether the design scales well as data grows. As an example of data growth, consider an application that generates 6 terabytes (TB) of data its first month, with the amount increasing every month at a 10 percent yearly rate. Here's a graph that shows how data accumulates over time:

A line graph of terabytes created over time, from 6 after one month to 249 after 3 years. The 10 percent growth rate steepens the slope over time.

After three years, there's 249 TB of data. If the system is well architected, it handles such data growth gracefully, remaining responsive, resilient, and affordable.

This example isn't extreme. If your customers are businesses, data grows both as you add customers and as your customers add data. It can also grow because of application enhancements.

Handling data growth may require a mix of storage products. For example, you may need to keep rarely accessed data in low-cost services, and frequently accessed data in higher-cost services with better access times.

To design such a system on Azure, you need to be familiar with the many Azure services and with how to use them for various types of applications and various objectives. The articles in this section provide seven system architectures for web applications that use massive amounts of data and that are resilient to system failures. They serve as examples that can help you design a storage system that properly accommodates your applications.

The architectures demonstrate the use of these Azure products: Azure Table Storage, Azure Cosmos DB, Azure Data Factory, and Azure Data Lake.

This capability matrix provides links to the articles and summarizes the benefits and risks of each architecture:

Architecture Benefits Risks
Two-region web application with Table Storage failover Straightforward, low-cost implementation Limited resiliency—only two Azure regions
Multi-region web application with custom Storage Table replication Resiliency Implementation time and difficulty
Multi-region web application with Cosmos DB replication Resiliency, performance, scalability Storage costs
Optimized storage with logical data classification Resiliency, performance, scalability, storage costs Implementation time, need to design logical data classification
Optimized Storage – time based – multi writes Storage costs Limited resiliency, performance, limited scalability, implementation time, need to design time-based data retention
Optimized Storage – time based with Data Lake Resiliency, performance, scalability Implementation time, need to design time-based data retention
Minimal storage – change feed to replicate data Resiliency, performance, time-based data retention Limited scalability, implementation time

Next steps

Here are resources to help you design your storage solution and investigate its business aspects, including costs and service-level agreements.

Design storage solutions

Azure service limits, cost, service level agreements (SLA), and regional availability