Data streaming with AKS

Azure App Service
Azure API Management
Azure Container Registry
Azure Cache for Redis
Azure Cosmos DB

Solution ideas

This article is a solution idea. If you'd like us to expand the content with more information, such as potential use cases, alternative services, implementation considerations, or pricing guidance, let us know by providing GitHub feedback.

This article presents a solution for using Azure Kubernetes Service (AKS) to quickly process and analyze a large volume of streaming data from devices.

ApacheĀ®, Apache Kafka, and Apache Spark are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by The Apache Software Foundation is implied by the use of these marks.

Architecture

Architecture diagram that shows how streaming data from devices is ingested, processed, and analyzed.

Download a Visio file of this architecture.

Dataflow

  1. Sensors generate data and stream it to Azure API Management.
  2. An AKS cluster runs microservices that are deployed as containers behind a service mesh. The containers are built by using a DevOps process and are stored in Azure Container Registry.
  3. An ingest service stores data in Azure Cosmos DB.
  4. Asynchronously, an analysis service receives the data and streams it to Apache Kafka and Azure HDInsight.
  5. Data scientists use machine learning models and the Splunk platform to analyze the data.
  6. A processing service processes the data and stores the result in Azure Database for PostgreSQL. The service also caches the data in Azure Cache for Redis.
  7. A web app that runs in Azure App Service creates visualizations of the results.

Components

The solution uses the following key technologies:

Scenario details

This solution is a good fit for a scenario that involves millions of data points, where data sources include Internet of Things (IoT) devices, sensors, and vehicles. In such a situation, processing the large volume of data is one challenge. Quickly analyzing the data is another demanding task, as organizations seek to gain insight into complex scenarios.

Containerized microservices in AKS form a key part of the solution. These self-contained services ingest and process the real-time data stream. They also scale as needed. The containers' portability makes it possible for the services to run in different environments and process data from multiple sources. To develop and deploy the microservices, DevOps and continuous integration/continuous delivery (CI/CD) are used. These approaches shorten the development cycle.

To store the ingested data, the solution uses Azure Cosmos DB. This database elastically scales throughput and storage, which makes it a good choice for large volumes of data.

The solution also uses Kafka. This low-latency streaming platform handles real-time data feeds at extremely high speeds.

Another key solution component is HDInsight, which is a managed, open-source cloud analytics service. HDInsight simplifies running big data frameworks in large volume and velocity while using Apache Spark in Azure. Splunk helps in the data analysis process. This platform creates visualizations from real-time data and provides business intelligence.

Potential use cases

This solution benefits the following areas:

  • Vehicle safety, especially in the automotive industry
  • Customer service in retail and other industries
  • Healthcare cloud solutions
  • Financial technology solutions in the finance industry

Next steps

Product documentation:

Microsoft training modules: