Advanced Analytics Architecture

Analysis Services
Blob Storage
Cosmos DB
Databricks
Data Factory
Synapse Analytics

Solution Idea

If you'd like to see us expand this article with more information, implementation details, pricing guidance, or code examples, let us know with GitHub Feedback!

Transform your data into actionable insights using the best-in-class machine learning tools. This solution allows you to combine any data at any scale, and to build and deploy custom machine learning models at scale.

Architecture

Diagram of a an advanced analytics architecture using Azure Synapse Analytics with Azure Data Lake Storage Gen2, Azure Analysis Services, Azure Cosmos DB, and Power BI.

Download an SVG of this architecture.

Data flow

  1. Bring together all your structured, unstructured, and semi-structured data (logs, files, and media) using Synapse Pipelines to Azure Data Lake Storage.
  2. Use Apache Spark pools to clean and transform the structureless datasets and combine them with structured data from operational databases or data warehouses.
  3. Use scalable machine learning/deep learning techniques, to derive deeper insights from this data using Python, Scala, or .NET, with notebook experiences in Apache Spark pool.
  4. Apply Apache Spark pool and Synapse Pipelines in Azure Synapse Analytics to access and move data at scale.
  5. Query and report on data in Power BI.
  6. Take the insights from Apache Spark pools to Cosmos DB to make them accessible through web and mobile apps.

Components

  • Azure Synapse Analytics is the fast, flexible, and trusted cloud data warehouse that lets you scale, compute, and store elastically and independently, with a massively parallel processing architecture.
  • Synapse Pipelines Documentation allows you to create, schedule, and orchestrate your ETL/ELT workflows.
  • Azure Blob storage is a Massively scalable object storage for any type of unstructured data-images, videos, audio, documents, and more-easily and cost-effectively.
  • Azure Synapse Analytics Spark pools is a fast, easy, and collaborative Apache Spark-based analytics platform.
  • Azure Cosmos DB is a globally distributed, multi-model database service. Learn how to replicate your data across any number of Azure regions and scale your throughput independent from your storage.
  • Azure Synapse Link for Azure Cosmos DB enables you to run near real-time analytics over operational data in Azure Cosmos DB, without any performance or cost impact on your transactional workload, by using the two analytics engines available from your Azure Synapse workspace: SQL Serverless and Spark Pools.
  • Azure Analysis Services is an enterprise grade analytics as a service that lets you govern, deploy, test, and deliver your BI solution with confidence.
  • Power BI is a suite of business analytics tools that deliver insights throughout your organization. Connect to hundreds of data sources, simplify data prep, and drive unplanned analysis. Produce beautiful reports, then publish them for your organization to consume on the web and across mobile devices.

Alternatives

  • Synapse Link is the Microsoft preferred solution for analytics on top of Cosmos DB data.

Pricing

Next steps