Data warehousing and analytics for sales and marketing

This example scenario demonstrates a data pipeline that integrates large amounts of data from multiple sources into a unified analytics platform in Azure. This specific scenario is based on a sales and marketing solution, but the design patterns are relevant for many industries requiring advanced analytics of large datasets such as e-commerce, retail, and healthcare.

This example demonstrates a sales and marketing company that creates incentive programs. These programs reward customers, suppliers, salespeople, and employees. Data is fundamental to these programs, and the company wants to improve the insights gained through data analytics using Azure.

The company needs a modern approach to analysis data, so that decisions are made using the right data at the right time. The company's goals include:

  • Combining different kinds of data sources into a cloud-scale platform.
  • Transforming source data into a common taxonomy and structure, to make the data consistent and easily compared.
  • Loading data using a highly parallelized approach that can support thousands of incentive programs, without the high costs of deploying and maintaining on-premises infrastructure.
  • Greatly reducing the time needed to gather and transform data, so you can focus on analyzing the data.

Relevant use cases

This approach can also be used to:

  • Establish a data warehouse to be a single source of truth for your data.
  • Integrate relational data sources with other unstructured datasets.
  • Use semantic modeling and powerful visualization tools for simpler data analysis.

Architecture

Architecture for a data warehousing and analysis scenario in Azure

The data flows through the solution as follows:

  1. For each data source, any updates are exported periodically into a staging area in Azure Blob storage.
  2. Data Factory incrementally loads the data from Blob storage into staging tables in Azure Synapse Analytics. The data is cleansed and transformed during this process. Polybase can parallelize the process for large datasets.
  3. After loading a new batch of data into the warehouse, a previously created Analysis Services tabular model is refreshed. This semantic model simplifies the analysis of business data and relationships.
  4. Business analysts use Microsoft Power BI to analyze warehoused data via the Analysis Services semantic model.

Components

The company has data sources on many different platforms:

  • SQL Server on-premises
  • Oracle on-premises
  • Azure SQL Database
  • Azure table storage
  • Cosmos DB

Data is loaded from these different data sources using several Azure components:

  • Blob storage is used to stage source data before it's loaded into Azure Synapse.
  • Data Factory orchestrates the transformation of staged data into a common structure in Azure Synapse. Data Factory uses Polybase when loading data into Azure Synapse to maximize throughput.
  • Azure Synapse is a distributed system for storing and analyzing large datasets. Its use of massive parallel processing (MPP) makes it suitable for running high-performance analytics. Azure Synapse can use PolyBase to rapidly load data from Blob storage.
  • Analysis Services provides a semantic model for your data. It can also increase system performance when analyzing your data.
  • Power BI is a suite of business analytics tools to analyze data and share insights. Power BI can query a semantic model stored in Analysis Services, or it can query Azure Synapse directly.
  • Azure Active Directory (Azure AD) authenticates users who connect to the Analysis Services server through Power BI. Data Factory can also use Azure AD to authenticate to Azure Synapse via a service principal or Managed identity for Azure resources.

Alternatives

Considerations

The technologies in this architecture were chosen because they met the company's requirements for scalability and availability, while helping them control costs.

Pricing

Review a pricing sample for a data warehousing scenario via the Azure pricing calculator. Adjust the values to see how your requirements affect your costs.

  • Azure Synapse allows you to scale your compute and storage levels independently. Compute resources are charged per hour, and you can scale or pause these resources on demand. Storage resources are billed per terabyte, so your costs will increase as you ingest more data.
  • Data Factory costs are based on the number of read/write operations, monitoring operations, and orchestration activities performed in a workload. Your Data Factory costs will increase with each additional data stream and the amount of data processed by each one.
  • Analysis Services is available in developer, basic, and standard tiers. Instances are priced based on query processing units (QPUs) and available memory. To keep your costs lower, minimize the number of queries you run, how much data they process, and how often they run.
  • Power BI has different product options for different requirements. Power BI Embedded provides an Azure-based option for embedding Power BI functionality inside your applications. A Power BI Embedded instance is included in the pricing sample above.

Next Steps