Real Time Analytics on Big Data Architecture

Solution Idea

If you'd like to see us expand this article with more information (implementation details, pricing guidance, code examples, etc), let us know with GitHub Feedback!

Get insights from live streaming data with ease. Capture data continuously from any IoT device, or logs from website clickstreams, and process it in near-real time.

Architecture

Architecture Diagram Download an SVG of this architecture.

Data Flow

  1. Easily ingest live streaming data for an application using Apache Kafka cluster in Azure HDInsight.
  2. Bring together all your structured data using Azure Data Factory to Azure Blob Storage.
  3. Take advantage of Azure Databricks to clean, transform, and analyze the streaming data, and combine it with structured data from operational databases or data warehouses.
  4. Use scalable machine learning/deep learning techniques, to derive deeper insights from this data using Python, R or Scala, with inbuilt notebook experiences in Azure Databricks.
  5. Leverage native connectors between Azure Databricks and Azure Synapse Analytics to access and move data at scale.
  6. Build analytical dashboards and embedded reports on top of Azure Data Warehouse to share insights within your organization and use Azure Analysis Services to serve this data to thousands of users.
  7. Power users take advantage of the inbuilt capabilities of Azure Databricks and Azure HDInsight to perform root cause determination and raw data analysis.
  8. Take the insights from Azure Databricks to Cosmos DB to make them accessible through real time apps.

Components

  • Azure Synapse Analytics is the fast, flexible and trusted cloud data warehouse that lets you scale, compute and store elastically and independently, with a massively parallel processing architecture.
  • Azure Data Factory is a hybrid data integration service that allows you to create, schedule and orchestrate your ETL/ELT workflows.
  • Azure Data Lake Storage: Massively scalable, secure data lake functionality built on Azure Blob Storage
  • Azure Databricks is a fast, easy, and collaborative Apache Spark-based analytics platform.
  • Azure HDInsight is a fully managed, full spectrum open-source analytics service for popular open-source frameworks such as Hadoop, Spark, Hive, LLAP, Kafka, Storm, R & more.
  • Azure Cosmos DB is a globally distributed, multi-model database service. Then learn how to replicate your data across any number of Azure regions and scale your throughput independent from your storage.
  • Azure Analysis Services is an enterprise grade analytics as a service that lets you govern, deploy, test, and deliver your BI solution with confidence.
  • Power BI is a suite of business analytics tools that deliver insights throughout your organization. Connect to hundreds of data sources, simplify data prep, and drive ad hoc analysis. Produce beautiful reports, then publish them for your organization to consume on the web and across mobile devices.

Next steps

Pricing Calculator