Query examples for common Stream Analytics usage patterns

Introduction

Queries in Azure Stream Analytics are expressed in a SQL-like query language, which is documented in the Stream Analytics Query Language Reference guide. This article outlines solutions to several common query patterns based on real world scenarios. It is a work in progress and will continue to be updated with new patterns on an ongoing basis.

Query example: Data type conversions

Description: Define the types of the properties on the input stream. For example, car weight is coming on the input stream as strings and needs to be converted to INT to perform SUM it up.

Input:

Make Time Weight
Honda 2015-01-01T00:00:01.0000000Z "1000"
Honda 2015-01-01T00:00:02.0000000Z "2000"

Output:

Make Weight
Honda 3000

Solution:

SELECT
    Make,
    SUM(CAST(Weight AS BIGINT)) AS Weight
FROM
    Input TIMESTAMP BY Time
GROUP BY
    Make,
    TumblingWindow(second, 10)

Explanation: Use a CAST statement on the Weight field to specify its type (see the list of supported Data Types here).

Query example: Using Like/Not like to do pattern matching

Description: Check that a field value on the event matches a certain pattern For example, return license plates that start with A and end with 9

Input:

Make LicensePlate Time
Honda ABC-123 2015-01-01T00:00:01.0000000Z
Toyota AAA-999 2015-01-01T00:00:02.0000000Z
Nissan ABC-369 2015-01-01T00:00:03.0000000Z

Output:

Make LicensePlate Time
Toyota AAA-999 2015-01-01T00:00:02.0000000Z
Nissan ABC-369 2015-01-01T00:00:03.0000000Z

Solution:

SELECT
    *
FROM
    Input TIMESTAMP BY Time
WHERE
    LicensePlate LIKE 'A%9'

Explanation: Use the LIKE statement to check that the LicensePlate field value starts with A then has any string of zero or more characters and it ends with 9.

Query example: Specify logic for different cases/values (CASE statements)

Description: Provide different computation for a field based on some criteria. For example, provide a string description for how many cars passed of the same make with a special case for 1.

Input:

Make Time
Honda 2015-01-01T00:00:01.0000000Z
Toyota 2015-01-01T00:00:02.0000000Z
Toyota 2015-01-01T00:00:03.0000000Z

Output:

CarsPassed Time
1 Honda 2015-01-01T00:00:10.0000000Z
2 Toyotas 2015-01-01T00:00:10.0000000Z

Solution:

SELECT
    CASE
        WHEN COUNT(*) = 1 THEN CONCAT('1 ', Make)
        ELSE CONCAT(CAST(COUNT(*) AS NVARCHAR(MAX)), ' ', Make, 's')
    END AS CarsPassed,
    System.TimeStamp AS Time
FROM
    Input TIMESTAMP BY Time
GROUP BY
    Make,
    TumblingWindow(second, 10)

Explanation: The CASE clause allows us to provide a different computation based on some criteria (in our case the count of cars in the aggregate window).

Query example: Send data to multiple outputs

Description: Send data to multiple output targets from a single job. For example, analyze data for a threshold-based alert and archive all events to blob storage

Input:

Make Time
Honda 2015-01-01T00:00:01.0000000Z
Honda 2015-01-01T00:00:02.0000000Z
Toyota 2015-01-01T00:00:01.0000000Z
Toyota 2015-01-01T00:00:02.0000000Z
Toyota 2015-01-01T00:00:03.0000000Z

Output1:

Make Time
Honda 2015-01-01T00:00:01.0000000Z
Honda 2015-01-01T00:00:02.0000000Z
Toyota 2015-01-01T00:00:01.0000000Z
Toyota 2015-01-01T00:00:02.0000000Z
Toyota 2015-01-01T00:00:03.0000000Z

Output2:

Make Time Count
Toyota 2015-01-01T00:00:10.0000000Z 3

Solution:

SELECT
    *
INTO
    ArchiveOutput
FROM
    Input TIMESTAMP BY Time

SELECT
    Make,
    System.TimeStamp AS Time,
    COUNT(*) AS [Count]
INTO
    AlertOutput
FROM
    Input TIMESTAMP BY Time
GROUP BY
    Make,
    TumblingWindow(second, 10)
HAVING
    [Count] >= 3

Explanation: The INTO clause tells Stream Analytics which of the outputs to write the data from this statement. The first query is a pass-through of the data we received to an output that we named ArchiveOutput. The second query does some simple aggregation and filtering and sends the results to a downstream alerting system. Note: You can also reuse results of CTEs (i.e. WITH statements) in multiple output statements – this has the added benefit of opening fewer readers to the input source. For example,

WITH AllRedCars AS (
    SELECT
        *
    FROM
        Input TIMESTAMP BY Time
    WHERE
        Color = 'red'
)
SELECT * INTO HondaOutput FROM AllRedCars WHERE Make = 'Honda'
SELECT * INTO ToyotaOutput FROM AllRedCars WHERE Make = 'Toyota'

Query example: Counting unique values

Description: count the number of unique field values that appear in the stream within a time window. For example, how many unique make of cars passed through the toll booth in a 2 second window?

Input:

Make Time
Honda 2015-01-01T00:00:01.0000000Z
Honda 2015-01-01T00:00:02.0000000Z
Toyota 2015-01-01T00:00:01.0000000Z
Toyota 2015-01-01T00:00:02.0000000Z
Toyota 2015-01-01T00:00:03.0000000Z

Output:

Count Time
2 2015-01-01T00:00:02.000Z
1 2015-01-01T00:00:04.000Z

Solution:

SELECT
     COUNT(DISTINCT Make) AS CountMake,
     System.TIMESTAMP AS TIME
FROM Input TIMESTAMP BY TIME
GROUP BY 
     TumblingWindow(second, 2)

Explanation: COUNT(DISTINCT Make) returns the number of distinct values of the “Make” column within a time window.

Query example: Determine if a value has changed

Description: Look at a previous value to determine if it is different than the current value For example, is the previous car on the Toll Road the same make as the current car?

Input:

Make Time
Honda 2015-01-01T00:00:01.0000000Z
Toyota 2015-01-01T00:00:02.0000000Z

Output:

Make Time
Toyota 2015-01-01T00:00:02.0000000Z

Solution:

SELECT
    Make,
    Time
FROM
    Input TIMESTAMP BY Time
WHERE
    LAG(Make, 1) OVER (LIMIT DURATION(minute, 1)) <> Make

Explanation: Use LAG to peek into the input stream one event back and get the Make value. Then compare it to the Make on the current event and output the event if they are different.

Query example: Find first event in a window

Description: Find first car in every 10 minute interval?

Input:

LicensePlate Make Time
DXE 5291 Honda 2015-07-27T00:00:05.0000000Z
YZK 5704 Ford 2015-07-27T00:02:17.0000000Z
RMV 8282 Honda 2015-07-27T00:05:01.0000000Z
YHN 6970 Toyota 2015-07-27T00:06:00.0000000Z
VFE 1616 Toyota 2015-07-27T00:09:31.0000000Z
QYF 9358 Honda 2015-07-27T00:12:02.0000000Z
MDR 6128 BMW 2015-07-27T00:13:45.0000000Z

Output:

LicensePlate Make Time
DXE 5291 Honda 2015-07-27T00:00:05.0000000Z
QYF 9358 Honda 2015-07-27T00:12:02.0000000Z

Solution:

SELECT 
    LicensePlate,
    Make,
    Time
FROM 
    Input TIMESTAMP BY Time
WHERE 
    IsFirst(minute, 10) = 1

Now let’s change the problem and find first car of particular Make in every 10 minute interval.

LicensePlate Make Time
DXE 5291 Honda 2015-07-27T00:00:05.0000000Z
YZK 5704 Ford 2015-07-27T00:02:17.0000000Z
YHN 6970 Toyota 2015-07-27T00:06:00.0000000Z
QYF 9358 Honda 2015-07-27T00:12:02.0000000Z
MDR 6128 BMW 2015-07-27T00:13:45.0000000Z

Solution:

SELECT 
    LicensePlate,
    Make,
    Time
FROM 
    Input TIMESTAMP BY Time
WHERE 
    IsFirst(minute, 10) OVER (PARTITION BY Make) = 1

Query example: Find last event in a window

Description: Find last car in every 10 minute interval.

Input:

LicensePlate Make Time
DXE 5291 Honda 2015-07-27T00:00:05.0000000Z
YZK 5704 Ford 2015-07-27T00:02:17.0000000Z
RMV 8282 Honda 2015-07-27T00:05:01.0000000Z
YHN 6970 Toyota 2015-07-27T00:06:00.0000000Z
VFE 1616 Toyota 2015-07-27T00:09:31.0000000Z
QYF 9358 Honda 2015-07-27T00:12:02.0000000Z
MDR 6128 BMW 2015-07-27T00:13:45.0000000Z

Output:

LicensePlate Make Time
VFE 1616 Toyota 2015-07-27T00:09:31.0000000Z
MDR 6128 BMW 2015-07-27T00:13:45.0000000Z

Solution:

WITH LastInWindow AS
(
    SELECT 
        MAX(Time) AS LastEventTime
    FROM 
        Input TIMESTAMP BY Time
    GROUP BY 
        TumblingWindow(minute, 10)
)
SELECT 
    Input.LicensePlate,
    Input.Make,
    Input.Time
FROM
    Input TIMESTAMP BY Time 
    INNER JOIN LastInWindow
    ON DATEDIFF(minute, Input, LastInWindow) BETWEEN 0 AND 10
    AND Input.Time = LastInWindow.LastEventTime

Explanation: There are two steps in the query – the first one finds latest timestamp in 10 minute windows. The second step joins results of the first query with original stream to find events matching last timestamps in each window.

Query example: Detect the absence of events

Description: Check that a stream has no value that matches a certain criteria. For example, have 2 consecutive cars from the same make entered the toll road within 90 seconds?

Input:

Make LicensePlate Time
Honda ABC-123 2015-01-01T00:00:01.0000000Z
Honda AAA-999 2015-01-01T00:00:02.0000000Z
Toyota DEF-987 2015-01-01T00:00:03.0000000Z
Honda GHI-345 2015-01-01T00:00:04.0000000Z

Output:

Make Time CurrentCarLicensePlate FirstCarLicensePlate FirstCarTime
Honda 2015-01-01T00:00:02.0000000Z AAA-999 ABC-123 2015-01-01T00:00:01.0000000Z

Solution:

SELECT
    Make,
    Time,
    LicensePlate AS CurrentCarLicensePlate,
    LAG(LicensePlate, 1) OVER (LIMIT DURATION(second, 90)) AS FirstCarLicensePlate,
    LAG(Time, 1) OVER (LIMIT DURATION(second, 90)) AS FirstCarTime
FROM
    Input TIMESTAMP BY Time
WHERE
    LAG(Make, 1) OVER (LIMIT DURATION(second, 90)) = Make

Explanation: Use LAG to peek into the input stream one event back and get the Make value. Then compare it to the Make on the current event and output the event if they are the same and use LAG to get data about the previous car.

Query example: Detect duration between events

Description: Find the duration of a given event. For example, given a web clickstream determine time spent on a feature.

Input:

User Feature Event Time
user@location.com RightMenu Start 2015-01-01T00:00:01.0000000Z
user@location.com RightMenu End 2015-01-01T00:00:08.0000000Z

Output:

User Feature Duration
user@location.com RightMenu 7

Solution

    SELECT
        [user], feature, DATEDIFF(second, LAST(Time) OVER (PARTITION BY [user], feature LIMIT DURATION(hour, 1) WHEN Event = 'start'), Time) as duration
    FROM input TIMESTAMP BY Time
    WHERE
        Event = 'end'

Explanation: Use LAST function to retrieve last Time value when event type was ‘Start’. Note that LAST function uses PARTITION BY [user] to indicate that result shall be computed per unique user. The query has a 1 hour maximum threshold for time difference between ‘Start’ and ‘Stop’ events but is configurable as needed (LIMIT DURATION(hour, 1).

Query example: Detect duration of a condition

Description: Find out how long a condition occurred for. For example, suppose that a bug that resulted in all cars having an incorrect weight (above 20,000 pounds) – we want to compute the duration of the bug.

Input:

Make Time Weight
Honda 2015-01-01T00:00:01.0000000Z 2000
Toyota 2015-01-01T00:00:02.0000000Z 25000
Honda 2015-01-01T00:00:03.0000000Z 26000
Toyota 2015-01-01T00:00:04.0000000Z 25000
Honda 2015-01-01T00:00:05.0000000Z 26000
Toyota 2015-01-01T00:00:06.0000000Z 25000
Honda 2015-01-01T00:00:07.0000000Z 26000
Toyota 2015-01-01T00:00:08.0000000Z 2000

Output:

StartFault EndFault
2015-01-01T00:00:02.000Z 2015-01-01T00:00:07.000Z

Solution:

    WITH SelectPreviousEvent AS
    (
    SELECT
    *,
        LAG([time]) OVER (LIMIT DURATION(hour, 24)) as previousTime,
        LAG([weight]) OVER (LIMIT DURATION(hour, 24)) as previousWeight
    FROM input TIMESTAMP BY [time]
    )

    SELECT 
        LAG(time) OVER (LIMIT DURATION(hour, 24) WHEN previousWeight < 20000 ) [StartFault],
        previousTime [EndFault]
    FROM SelectPreviousEvent
    WHERE
        [weight] < 20000
        AND previousWeight > 20000

Explanation: Use LAG to view the input stream for 24 hours and look for instances where StartFault and StopFault are spanned by weight < 20000.

Query example: Fill missing values

Description: For the stream of events that have missing values, produce a stream of events with regular intervals. For example, generate event every 5 seconds that will report the most recently seen data point.

Input:

t value
"2014-01-01T06:01:00" 1
"2014-01-01T06:01:05" 2
"2014-01-01T06:01:10" 3
"2014-01-01T06:01:15" 4
"2014-01-01T06:01:30" 5
"2014-01-01T06:01:35" 6

Output (first 10 rows):

windowend lastevent.t lastevent.value
2014-01-01T14:01:00.000Z 2014-01-01T14:01:00.000Z 1
2014-01-01T14:01:05.000Z 2014-01-01T14:01:05.000Z 2
2014-01-01T14:01:10.000Z 2014-01-01T14:01:10.000Z 3
2014-01-01T14:01:15.000Z 2014-01-01T14:01:15.000Z 4
2014-01-01T14:01:20.000Z 2014-01-01T14:01:15.000Z 4
2014-01-01T14:01:25.000Z 2014-01-01T14:01:15.000Z 4
2014-01-01T14:01:30.000Z 2014-01-01T14:01:30.000Z 5
2014-01-01T14:01:35.000Z 2014-01-01T14:01:35.000Z 6
2014-01-01T14:01:40.000Z 2014-01-01T14:01:35.000Z 6
2014-01-01T14:01:45.000Z 2014-01-01T14:01:35.000Z 6

Solution:

SELECT
    System.Timestamp AS windowEnd,
    TopOne() OVER (ORDER BY t DESC) AS lastEvent
FROM
    input TIMESTAMP BY t
GROUP BY HOPPINGWINDOW(second, 300, 5)

Explanation: This query will generate events every 5 second and will output the last event that was received before. Hopping Window duration determines how far back the query will look to find the latest event (300 seconds in this example).

Get help

For further assistance, try our Azure Stream Analytics forum

Next steps