What is Stream Analytics?

Azure Stream Analytics is a fully managed event-processing engine that lets you set up real-time analytic computations on streaming data. The data can come from devices, sensors, web sites, social media feeds, applications, infrastructure systems, and more.

What can I use Stream Analytics for?

Using Stream Analytics, you can examine high volumes of data flowing from devices or processes, extract information from the data stream, and look for patterns, trends, and relationships. Based on what's in the data, you can then perform application tasks. For example, you might raise alerts, kick off automation workflows, feed information to a reporting tool such as Power BI, or store data for later investigation.

Examples of Stream Analytics scenarios include:

  • Personalized, real-time stock-trading analysis and alerts offered by financial services companies.
  • Real-time fraud detection based on examining transaction data.
  • Data and identity protection services.
  • Analysis of data generated by sensors and actuators embedded in physical objects (Internet of Things, or IoT).
  • Web clickstream analytics.
  • Customer relationship management (CRM) applications, such as issuing alerts when customer experience within a time frame is degraded.

How does Stream Analytics work?

The following diagram illustrates the Stream Analytics pipeline, showing how data is ingested, analyzed, and then sent for presentation or action.

Stream Analytics pipeline

Stream Analytics starts with a source of streaming data. The data can be ingested into Azure from a device using an Azure event hub or IoT hub. The data can also be pulled from a data store like Azure Blob Storage.

To examine the stream, you create a Stream Analytics job that specifies where the data is coming from. The job also specifies a transformation—how to look for data, patterns, or relationships. For this task, Stream Analytics supports a SQL-like query language that lets you filter, sort, aggregate, and join streaming data over a time period.

Finally, the job specifies an output to send the transformed data to. This lets you control what to do in response to the information you've analyzed. For example, in response to analysis, you might:

  • Send a command to change a device's settings.
  • Send data to a queue that's monitored by a process that takes action based on what it finds.
  • Send data to a Power BI dashboard for reporting.
  • Send data to storage like Data Lake Store, SQL Server database, or Azure Blob or Table storage.

While a job is running, you can monitor it and adjust how many events it processes per second. You can also have jobs produce diagnostic logs for troubleshooting.

Key capabilities and benefits

Stream Analytics is designed to be easy to use, flexible, scalable to any job size, and economical.

Connectivity to many inputs and outputs

Stream Analytics connects directly to Azure Event Hubs and Azure IoT Hub for stream ingestion, and the Azure Blob storage service to ingest historical data. If you get data from event hubs, you can combine Stream Analytics with other data sources and processing engines.

Job input can also include reference data (static or slow-changing data). You can join streaming data to this reference data to perform lookup operations the same way you would with database queries.

The output of a Stream Analytics job can be routed in many directions. It can be written to storage, such as Azure Storage blobs or tables, Azure SQL DB, Azure Data Lake Stores, or Azure Cosmos DB. From there, the data might go for batch analytics via Azure HDInsight. You might send the output to another service for consumption by another process, such as event hubs, Azure Service Bus topics, or queues. You might send the output to Power BI for visualization.

Ease of use

To define transformations, you use a simple, declarative Stream Analytics query language that lets you create sophisticated analyses with no programming. The query language takes streaming data as its input. You can then filter and sort the data, aggregate values, perform calculations, join data (within a stream or to reference data), and use geospatial functions. You can edit queries in the portal, using IntelliSense and syntax checking, and you can test queries using sample data that you can extract from the live stream.

Extensible query language

You can extend the capabilities of the query language by defining and invoking additional functions. You can define function calls in the Azure Machine Learning service to take advantage of Azure Machine Learning solutions. You can also integrate JavaScript user-defined functions (UDFs) in order to perform complex calculations as part a Stream Analytics query.

Scalability

Stream Analytics can handle up to 1 GB of incoming data per second. Integration with Azure Event Hubs and Azure IoT Hub allows jobs to ingest millions of events per second coming from connected devices, clickstreams, and log files, to name a few. Using the partition feature of event hubs, you can partition computations into logical steps, each with the ability to be further partitioned to increase scalability.

Low cost

As a cloud service, Stream Analytics is optimized to let you get going at low cost. You pay as you go based on streaming-unit usage and the amount of data processed by the system. Usage is derived based on the volume of events processed and the amount of compute power provisioned within the cluster to handle Stream Analytics jobs.

Reliability, quick recovery, and repeatability

As a managed service in the cloud, Stream Analytics helps prevent data loss and provides business continuity. If failures occur, the service provides built-in recovery capabilities. With the ability to internally maintain state, the service provides repeatable results ensuring it is possible to archive events and reapply processing in the future, always getting the same results. This enables you to go back in time and investigate computations when doing root-cause analysis, what-if analysis, and so on.

Next steps