Outputs from Azure Stream Analytics
An Azure Stream Analytics job consists of an input, query, and an output. There are several output types to which you can send transformed data. This article lists the supported Stream Analytics outputs. When you design your Stream Analytics query, refer to the name of the output by using the INTO clause. You can use a single output per job, or multiple outputs per streaming job (if you need them) by adding multiple INTO clauses to the query.
To create, edit, and test Stream Analytics job outputs, you can use the Azure portal, Azure PowerShell, .NET API, REST API, Visual Studio, and Visual Studio Code.
Note
We strongly recommend using Stream Analytics tools for Visual Studio Code for best local development experience. There are known feature gaps in Stream Analytics tools for Visual Studio 2019 (version 2.6.3000.0) and it won't be improved going forward.
Some outputs types support partitioning, and output batch sizes vary to optimize throughput. The following table shows features that are supported for each output type:
Output type | Partitioning | Security |
---|---|---|
Azure Data Lake Storage Gen 1 | Yes | Azure Active Directory user , Managed Identity |
Azure Data Explorer | Yes | Managed Identity |
Azure Database for PostgreSQL | Yes | Username and password auth |
Azure SQL Database | Yes, optional. | SQL user auth, Managed Identity |
Azure Synapse Analytics | Yes | SQL user auth, Managed Identity |
Blob storage and Azure Data Lake Gen 2 | Yes | Access key, Managed Identity |
Azure Event Hubs | Yes, need to set the partition key column in output configuration. | Access key, Managed Identity |
Power BI | No | Azure Active Directory user, Managed Identity |
Azure Table storage | Yes | Account key |
Azure Service Bus queues | Yes | Access key |
Azure Service Bus topics | Yes | Access key |
Azure Cosmos DB | Yes | Access key |
Azure Functions | Yes | Access key |
Partitioning
Stream Analytics supports partitions for all outputs except for Power BI. For more information on partition keys and the number of output writers, see the article for the specific output type you're interested in. All output articles are linked in the previous section.
Additionally, for more advanced tuning of the partitions, the number of output writers can be controlled using an INTO <partition count>
(see INTO) clause in your query, which can be helpful in achieving a desired job topology. If your output adapter is not partitioned, lack of data in one input partition causes a delay up to the late arrival amount of time. In such cases, the output is merged to a single writer, which might cause bottlenecks in your pipeline. To learn more about late arrival policy, see Azure Stream Analytics event order considerations.
Output batch size
All outputs support batching, but only some support batch size explicitly. Azure Stream Analytics uses variable-size batches to process events and write to outputs. Typically the Stream Analytics engine doesn't write one message at a time, and uses batches for efficiency. When the rate of both the incoming and outgoing events is high, Stream Analytics uses larger batches. When the egress rate is low, it uses smaller batches to keep latency low.
Parquet output batching window properties
When using Azure Resource Manager template deployment or the REST API, the two batching window properties are:
timeWindow
The maximum wait time per batch. The value should be a string of Timespan. For example, "00:02:00" for two minutes. After this time, the batch is written to the output even if the minimum rows requirement is not met. The default value is 1 minute and the allowed maximum is 2 hours. If your blob output has path pattern frequency, the wait time cannot be higher than the partition time range.
sizeWindow
The number of minimum rows per batch. For Parquet, every batch creates a new file. The current default value is 2,000 rows and the allowed maximum is 10,000 rows.
These batching window properties are only supported by API version 2017-04-01-preview. Below is an example of the JSON payload for a REST API call:
"type": "stream",
"serialization": {
"type": "Parquet",
"properties": {}
},
"timeWindow": "00:02:00",
"sizeWindow": "2000",
"datasource": {
"type": "Microsoft.Storage/Blob",
"properties": {
"storageAccounts" : [
{
"accountName": "{accountName}",
"accountKey": "{accountKey}",
}
],
Next steps
Feedback
Submit and view feedback for