Understand stream processing


Stream processing refers to the continuous ingestion, transformation, and analysis of data streams generated by applications, IoT devices and sensors, and other sources to derive actionable insights in near-real-time. Data stream analysis frequently involves using temporal operations, such as windowed aggregates, temporal joins, and temporal analytic functions to measure changes or differences over time. The intent being to:

  • Continuously monitor data using time-boxes windows to understand better how specific areas of interest change or fluctuate over time
  • Identify and react to anomalies or irregularities within data in real-time
  • Perpetually analyze new data to identify and respond to issues in real-time
  • Trigger specific actions when certain thresholds are identified

The exponential propagation of connected applications, devices, and sensors has fueled the necessity for organizations to analyze streaming data as it arrives and use the latent knowledge contained within the data to make business decisions in near-real-time. Some example use cases of streaming data analysis include:

  • Anomaly detection to identify potentially fraudulent transactions in finance industries
  • Making product recommendations to online customers in real-time
  • Monitoring pipelines and distribution systems by oil companies
  • Generating predictive maintenance schedules for industrial and manufacturing equipment
  • Sentiment analysis of social media posts

Approaches to data stream processing

The primary approach to stream processing is to analyze new data continuously, transforming incoming data as it arrives to facilitate near-real-time insights. Computations and aggregations can be executed against the data using temporal analysis and sent to a Power BI dashboard for real-time visualization and analysis. This approach typically involves persisting the streaming data into a data store, such as Azure Data Lake Storage (ADLS) Gen2, for further examination or more advanced analytics workloads.

An alternative approach for processing streaming data is to persist incoming data in a data store, such as Azure Data Lake Storage (ADLS) Gen2. You can then process the static data in batches at a later time. This approach is frequently used to take advantage of lower compute costs when processing large sets of existing data.