Troubleshoot Azure Stream Analytics queries

This article describes common issues with developing Stream Analytics queries and how to troubleshoot them.

Query is not producing expected output

  1. Examine errors by testing locally:

  2. If you use Timestamp By, verify that the events have timestamps greater than the job start time.

  3. Eliminate common pitfalls, such as:

    • A WHERE clause in the query filtered out all events, preventing any output from being generated.
    • A CAST function fails, causing the job to fail. To avoid type cast failures, use TRY_CAST instead.
    • When you use window functions, wait for the entire window duration to see an output from the query.
    • The timestamp for events precedes the job start time and, therefore, events are being dropped.
  4. Ensure event ordering policies are configured as expected. Go to the Settings and select Event Ordering. The policy is not applied when you use the Test button to test the query. This result is one difference between testing in-browser versus running the job in production.

  5. Debug by using audit and diagnostic logs:

Job is consuming too many Streaming Units

Ensure you take advantage of parallelization in Azure Stream Analytics. You can learn to scale with query parallelization of Stream Analytics jobs by configuring input partitions and tuning the analytics query definition.

Debug queries progressively

In real-time data processing, knowing what the data looks like in the middle of the query can be helpful. Because inputs or steps of an Azure Stream Analytics job can be read multiple times, you can write extra SELECT INTO statements. Doing so outputs intermediate data into storage and lets you inspect the correctness of the data, just as watch variables do when you debug a program.

The following example query in an Azure Stream Analytics job has one stream input, two reference data inputs, and an output to Azure Table Storage. The query joins data from the event hub and two reference blobs to get the name and category information:

Example Stream Analytics SELECT INTO query

Note that the job is running, but no events are being produced in the output. On the Monitoring tile, shown here, you can see that the input is producing data, but you don’t know which step of the JOIN caused all the events to be dropped.

The Stream Analytics Monitoring tile

In this situation, you can add a few extra SELECT INTO statements to "log" the intermediate JOIN results and the data that's read from the input.

In this example, we've added two new "temporary outputs." They can be any sink you like. Here we use Azure Storage as an example:

Adding extra SELECT INTO statements to Stream Analytics query

You can then rewrite the query like this:

Rewritten SELECT INTO Stream Analytics query

Now start the job again, and let it run for a few minutes. Then query temp1 and temp2 with Visual Studio Cloud Explorer to produce the following tables:

temp1 table SELECT INTO temp1 table Stream Analytics query

temp2 table SELECT INTO temp2 table Stream Analytics query

As you can see, temp1 and temp2 both have data, and the name column is populated correctly in temp2. However, because there is still no data in output, something is wrong:

SELECT INTO output1 table with no data Stream Analytics query

By sampling the data, you can be almost certain that the issue is with the second JOIN. You can download the reference data from the blob and take a look:

SELECT INTO ref table Stream Analytics query

As you can see, the format of the GUID in this reference data is different from the format of the [from] column in temp2. That’s why the data didn’t arrive in output1 as expected.

You can fix the data format, upload it to reference blob, and try again:

SELECT INTO temp table Stream Analytics query

This time, the data in the output is formatted and populated as expected.

SELECT INTO final table Stream Analytics query

Get help

For further assistance, try our Azure Stream Analytics forum.

Next steps