Examine the Azure Stream Analytics query syntax
Queries in Azure Stream Analytics are expressed in a SQL-like query language.
Queries can be designed for simple pass-through logic that moves event data from one input stream into an output data store. An advanced query might use rich pattern matching and temporal analysis to calculate aggregates over various time windows. You can also use queries to join data from multiple inputs. For example, you could combine streaming events, or you could implement lookups against static reference data to enrich the event values. Queries are also used to write data to multiple outputs.
Queries are written using a combination of query syntax elements and functions. Queries process data from an input and use the data to produce a desired output.
Query language elements
Azure Stream Analytics provides a variety of elements for building queries. They are summarized below.
Element | Summary |
---|---|
APPLY | The APPLY operator allows you to invoke a table-valued function for each row returned by an outer table expression of a query. There are two forms of APPLY: CROSS APPLY returns only rows from the outer table that produce a result set from the table-valued function. OUTER APPLY returns both rows that produce a result set, and rows that do not, with NULL values in the columns produced by the table-valued function. |
CASE | CASE evaluates a list of conditions and returns one of multiple possible result expressions |
COALESCE | COALESCE evaluates the arguments in order and returns the value of the first expression that initially does not evaluate to NULL. |
CREATE TABLE | CREATE TABLE is used to define the schema of the payload of the events coming into Azure Stream Analytics. |
FROM | FROM specifies the input stream or a step name associated in a WITH clause. The FROM clause is always required for any SELECT statement. |
GROUP BY | GROUP BY groups a selected set of rows into a set of summary rows grouped by the values of one or more columns or expressions. |
HAVING | HAVING specifies a search condition for a group or an aggregate. HAVING can be used only with the SELECT expression. |
INTO | INTO explicitly specifies an output stream, and is always associated with an SELECT expression. If not specified, the default output stream is "output". |
JOIN and Reference Data JOIN |
JOIN is used to combine records from two or more input sources. JOIN is temporal in nature, meaning that each JOIN must define how far the matching rows can be separated in time. JOIN is also used to correlate persisted historical data or a slow changing dataset (also known as reference data) with the real-time event stream to make smarter decisions about the system. For example, join an event stream to a static dataset which maps IP Addresses to locations. This is the only JOIN supported in Stream Analytics where a temporal bound is not necessary. |
MATCH_RECOGNIZE | MATCH_RECOGNIZE is used to search for a set of events over a data stream. |
NULLIF | NULLIF evaluates two arguments and returns null if they are equal. |
OVER | OVER defines the grouping of rows before an associated aggregate or analytic function is applied. |
SELECT | SELECT is used to retrieve rows from input streams and enables the selection of one or many columns from one or many input streams in Azure Stream Analytics. |
UNION | UNION combines two or more queries into a single result set that includes all the rows that belong to all queries in the union. |
WHERE | WHERE specifies the search condition for the rows returned by the query. |
WITH | WITH specifies a temporary named result set which can be referenced by a FROM clause in the query. This is defined within the execution scope of a single SELECT statement. |
Built-in functions
Azure Stream Analytics provides some built-in functions. The categories of built-in functions are:
Types of Functions
Function Category | Description |
---|---|
Aggregate Functions | Operate on a collection of values but return a single, summarizing value. |
Analytic Functions | Return a value based on defined constraints. |
Array Functions | Returns information from an array. |
GeoSpatial Functions | Perform specialized GeoSpatial functions. |
Input Metadata Functions | Query the metadata of property in the data input. |
Record Functions | Returns record properties or values. |
Windowing Functions | Perform operations on events within a time window. |
Scalar Functions | Operate on a single value and then return a single value. Scalar functions can be used wherever an expression is valid. |
Scalar Functions
A scalar function operates on a single value and then return a single value. Scalar functions can be used wherever an expression is valid.
Function Category | Description |
---|---|
Conversion Functions | These functions allow you to cast data into different formats. |
Date and Time Functions | These functions allow you to perform operations on DateTime formats. |
Mathematical Functions | Represent the scalar functions that perform a calculation, usually based on input values that are provided as arguments, and return a numeric value. |
String Functions | These functions allow you to convert strings to upper or lower case. |
Query syntax examples
The following queries are examples taken from the Stream Analytics query language reference. They were selected because they help to illustrate common query patterns and show basic syntax.
All query pattern examples included in the reference guide are based on the following toll booth scenario:
A vehicle tolling station is a common phenomenon – we encounter them in many expressways, bridges, and tunnels across the world. Each toll station has multiple toll booth lanes that a car can enter, which may be manual – meaning that you stop to pay the toll to an attendant, or automated – where a sensor placed on top of the booth scans an RFID card affixed to the windshield of your vehicle as you pass the toll booth. The passage of vehicles through the toll stations provides us with an event stream that we can visualize. We can use this event stream to perform some interesting operations.
Pattern 1 - Convert data types
For this example, the input stream includes the make of the car, the time when the car passes through the booth, and the weight of the car. The goal is to calculate the sum of vehicle weight that passes through an individual booth during a defined time period. In this example, the car weight reported on the input stream is represented as a string data type. The weight values must be converted to INT before the SUM can be calculated.
Input
Make | Time | Weight |
---|---|---|
Honda | 2015-01-01T00:00:01.0000000Z | "1000" |
Honda | 2015-01-01T00:00:02.0000000Z | "2000" |
Query solution
SELECT
Make,
SUM(CAST(Weight AS BIGINT)) AS Weight
FROM
Input TIMESTAMP BY Time
GROUP BY
Make,
TumblingWindow(second, 10)
Output
Make | Weight |
---|---|
Honda | 3000 |
Explanation
We use a CAST statement in the Weight
field to specify its data type.
Pattern 2 - Send data to multiple outputs
Description
In this example, the input stream includes the make of a car and the time when that car passes through the booth. We have two goals. The first goal it to archive the input data to cold storage. The second goal is to support a downstream alerting system that sends alerts whenever the traffic rate for a particular make of vehicle exceeds a defined threshold value. To achieve both goals we need to process the data for multiple output targets. The first target output will receive all of the input data. For the second target output will receive a record indicating the make of the vehicle, the time for the event, and the associated number of vehicles passing through the booth.
Input
Make | Time |
---|---|
Honda | 2015-01-01T00:00:01.0000000Z |
Honda | 2015-01-01T00:00:02.0000000Z |
Toyota | 2015-01-01T00:00:01.0000000Z |
Toyota | 2015-01-01T00:00:02.0000000Z |
Toyota | 2015-01-01T00:00:03.0000000Z |
Query solution
SELECT
*
INTO
ArchiveOutput
FROM
Input TIMESTAMP BY Time
SELECT
Make,
System.TimeStamp() AS AsaTime,
COUNT(*) AS [Count]
INTO
AlertOutput
FROM
Input TIMESTAMP BY Time
GROUP BY
Make,
TumblingWindow(second, 10)
HAVING
[Count] >= 3
Output 1
Make | Time |
---|---|
Honda | 2015-01-01T00:00:01.0000000Z |
Honda | 2015-01-01T00:00:02.0000000Z |
Toyota | 2015-01-01T00:00:01.0000000Z |
Toyota | 2015-01-01T00:00:02.0000000Z |
Toyota | 2015-01-01T00:00:03.0000000Z |
Output 2
Make | Time | Count |
---|---|---|
Toyota | 2015-01-01T00:00:10.0000000Z | 3 |
Explanation
The INTO clause tells Stream Analytics the outputs to write the data to from this statement. The first query is a pass-through of the data received to an output named ArchiveOutput
. The second query does some simple aggregation and filtering, and it sends the results to a downstream alerting system, AlertOutput
.
The full Stream Analytics query language reference guide can be found here: Stream Analytics Query Language Reference.