Real-time analytics on big data architecture

Analysis Services
Event Hubs
Synapse Analytics

Solution Idea

If you'd like to see us expand this article with more information, such as potential use cases, alternative services, implementation considerations, or pricing guidance, let us know with GitHub Feedback!

Get insights from live streaming data with ease. Capture data continuously from any IoT device, or logs from website clickstreams, and process it in near-real time.

Architecture

Diagram of a real time analytics on big data architecture using Azure Synapse Analytics with Azure Data Lake Storage Gen2, Event Hub, Azure Analysis Services, Azure Cosmos DB, and Power BI.

Download an SVG of this architecture.

Data flow

  1. Easily ingest live streaming data for an application using Azure Event Hubs.
  2. Bring together all your structured data using Synapse Pipelines to Azure Blob Storage.
  3. Take advantage of Apache Spark pools to clean, transform, and analyze the streaming data, and combine it with structured data from operational databases or data warehouses.
  4. Use scalable machine learning/deep learning techniques, to derive deeper insights from this data, using Python, Scala, or .NET, with notebook experiences in Apache Spark pools.
  5. Apply Apache Spark pool and Synapse Pipelines in Azure Synapse Analytics to access and move data at scale.
  6. Build analytics dashboards and embedded reports in dedicated SQL pool to share insights within your organization and use Azure Analysis Services to serve this data to thousands of users.
  7. Take the insights from Apache Spark pools to Cosmos DB to make them accessible through real time apps.

Components

  • Azure Synapse Analytics is the fast, flexible, and trusted cloud data warehouse that lets you scale, compute, and store elastically and independently, with a massively parallel processing architecture.
  • Synapse Pipelines Documentation allows you to create, schedule, and orchestrate your ETL/ELT workflows.
  • Azure Data Lake Storage: Massively scalable, secure data lake functionality built on Azure Blob Storage
  • Azure Synapse Analytics Spark pools is a fast, easy, and collaborative Apache Spark-based analytics platform.
  • Azure Azure Event Hubs Documentation is a big data streaming platform and event ingestion service.
  • Azure Cosmos DB is a globally distributed, multi-model database service. Then learn how to replicate your data across any number of Azure regions and scale your throughput independent from your storage.
  • Azure Synapse Link for Azure Cosmos DB enables you to run near real-time analytics over operational data in Azure Cosmos DB, without any performance or cost impact on your transactional workload, by using the two analytics engines available from your Azure Synapse workspace: SQL Serverless and Spark Pools.
  • Azure Analysis Services is an enterprise grade analytics as a service that lets you govern, deploy, test, and deliver your BI solution with confidence.
  • Power BI is a suite of business analytics tools that deliver insights throughout your organization. Connect to hundreds of data sources, simplify data prep, and drive unplanned analysis. Produce beautiful reports, then publish them for your organization to consume on the web and across mobile devices.

Alternatives

  • Synapse Link is the Microsoft preferred solution for analytics on top of Cosmos DB data.

Pricing

Next steps