Stream Analytics: batching events and trigger on "last"
For a side project, I had to write a few Azure Stream Analytics queries. Not being a SQL -- let alone Stream Analytics -- person, it required some effort and substantial assistance. Hope it helps.
The goal of this first query is to collect all incoming events from an eventhub over a duration of 10 seconds and then forward the collection to another eventhub as output.
SELECT i1.Id, Collect() as messages INTO output FROM input group by i1.id, SessionWindow(second, 10, 15)
The second one is a bit more interesting. The purpose is to wait for a “last” event in a stream. If no event with a specific ID arrives within 15 seconds, the query needs to trigger an output. Here it is:
SELECT i1.Id, i1.EventEnqueuedUtcTime as t into output FROM Input as i1 TIMESTAMP BY EventEnqueuedUtcTime LEFT outer JOIN Input as i2 TIMESTAMP BY EventEnqueuedUtcTime ON i1.Id = i2.Id AND DATEDIFF(second,i1,i2) BETWEEN 0 AND 15 and i2.EventEnqueuedUtcTime > i1.EventEnqueuedUtcTime WHERE i2 IS NULL
There is something counter intuitive about the last line, but it works beautifully. It is important to put the time comparison in the ON clause and not in the WHERE clause. In the last case, you won't get any output.