Configure a SQL Server source stage in Azure IoT Data Processor Preview

Article
03/07/2024

Important

Azure IoT Operations Preview – enabled by Azure Arc is currently in PREVIEW. You shouldn't use this preview software in production environments.

See the Supplemental Terms of Use for Microsoft Azure Previews for legal terms that apply to Azure features that are in beta, preview, or otherwise not yet released into general availability.

The source stage is the first and required stage in an Azure IoT Data Processor Preview pipeline. The source stage gets data into the data processing pipeline and prepares it for further processing. The SQL Server source stage lets you read data from a Microsoft SQL Server database at a user-defined interval.

In the source stage, you define:

Connection details for SQL Server.
The interval at which to query the SQL Server database. The stage waits for a result before it resets the interval timer.
A partitioning configuration based on your specific data processing requirements.

Prerequisites

A functioning instance of Data Processor is deployed.
A SQL Server database with all necessary raw data available is operational and reachable.

Prepare the SQL Server database

Before you can read from your SQL Server database, you need to grant access to the database from the pipeline. You can use Username/Password, Managed identity, or Service principal to authenticate the pipeline to the database. The advantage of using a managed identity is that you don't need to manage the lifecycle of the service principal or the password details. The managed identity is automatically managed by Azure and is tied to the lifecycle of the resource it's assigned to.

Service principal
Managed identity

To create a service principal with a client secret:

Use the following Azure CLI command to create a service principal.
```
az ad sp create-for-rbac --name <YOUR_SP_NAME> 
```
The output of this command includes an appId, displayName, password, and tenant. Make a note of these values to use when you configure access to your cloud resource such as Microsoft Fabric, create a secret, and configure a pipeline destination:
```
{
    "appId": "<app-id>",
    "displayName": "<name>",
    "password": "<client-secret>",
    "tenant": "<tenant-id>"
}
```

To grant access to the Principal ID in your SQL Server database, run the following command in the database:

CREATE USER processor FROM EXTERNAL PROVIDER WITH OBJECT_ID=<principalId>;

Note

To run the previous command, you must use the Entra ID listed as the admin in SQL Server.

For the source stage to connect to your SQL Server database, it needs access to a secret that contains the authentication details. To create a secret:

Use the following command to add a secret to your Azure Key Vault that contains the client secret you made a note of when you created the service principal:
```
az keyvault secret set --vault-name <your-key-vault-name> --name AccessADXSecret --value <client-secret>
```
Add the secret reference to your Kubernetes cluster by following the steps in Manage secrets for your Azure IoT Operations deployment.

To find the principal ID of the managed identity, run the following commands. Replace the placeholders with your cluster name and resource group:

CLUSTER_NAME=<Your connected cluster name>
RESOURCE_GROUP=<The resource group where your connected cluster is installed>
SUBSCRIPTION=<The subscription where your connected cluster is installed>
EXTENSION_NAME=processor

OBJECT_ID=$(az k8s-extension show --name $EXTENSION_NAME --cluster-name $CLUSTER_NAME --resource-group $RESOURCE_GROUP --cluster-type connectedClusters --query identity.principalId -o tsv --subscription $SUBSCRIPTION)
echo "Principal ID: " $OBJECT_ID

To grant access to the Principal ID in your SQL Server database, run the following command in the database:

CREATE USER processor FROM EXTERNAL PROVIDER WITH OBJECT_ID=<principalId>;

Note

To run the previous command, you must use the Entra ID listed as the admin in SQL Server.

Configure the SQL Server source

To configure the SQL Server source:

Provide details of the SQL Server database. This configuration includes the server name and a query to retrieve the data.
Specify the authentication method. Currently limited to username/password-based or service principal-based authentication.

The following table describes the SQL Server source configuration parameters:

Field	Type	Description	Required	Default	Example
Name	String	A customer-visible name for the source stage.	Required	NA	`erp-database`
Description	String	A customer-visible description of the source stage.	Optional	NA	`Enterprise database`
Server host	String	The URL to use to connect to the server.	Required	NA	`https://contoso.com/some/url/path`
Server port	Integer	The port number to connect to on the server.	Required	`1433`	`1433`
Authentication	Authentication type	The authentication method for connecting to the server. One of: `None`, `Username/Password`, `Managed identity`, or `Service principal`.	Optional	`NA`	`Username/Password`
Username/Password > Username	String	The username for the username/password authentication	Yes	NA	`myuser`
Username/Password > Secret	String	Reference to the password stored in Azure Key Vault.	Yes	Yes	`AKV_USERNAME_PASSWORD`
Service principal > Tenant ID	String	The Tenant ID of the service principal.	Yes	NA	`<Tenant ID>`
Service principal > Client ID	String	The Client ID of the service principal.	Yes	NA	`<Client ID>`
Service principal > Secret	String	Reference to the service principal client secret stored in Azure Key Vault.	Yes	NA	`AKV_SERVICE_PRINCIPAL`
Database	String	The name of the SQL Server database to query.	Required	NA	`erp_db`
Data query	String	The query to run against the database.	Required	NA	`SELECT * FROM your_table WHERE column_name = foo`
Query interval	Duration	String representation of the time to wait before the next API call.	Required	`10s`	`24h`
Data format	Format	Data format of the incoming data	Required	NA	`{"type": "json"}`
Partitioning	Partitioning	Partitioning configuration for the source stage.	Required	NA	See partitioning

To learn more about secrets, see Manage secrets for your Azure IoT Operations Preview deployment.

Note

Requests timeout in 30 seconds if there's no response from the SQL server.

Select data format

In a Data Processor pipeline, the format field in the source stage specifies how to deserialize the incoming data. By default, the Data Processor pipeline uses the raw format that means it doesn't convert the incoming data. To use many Data Processor features such as Filter or Enrich stages in a pipeline, you must deserialize your data in the input stage. You can choose to deserialize your incoming data from JSON, jsonStream, MessagePack, CBOR, CSV, or Protobuf formats into a Data Processor readable message in order to use the full Data Processor functionality.

The following tables describe the different deserialization configuration options:

Field	Description	Required	Default	Value
Data Format	The type of the data format.	Yes	`Raw`	`Raw` `JSON` `jsonStream` `MessagePack` `CBOR` `CSV` `Protobuf`

The Data Format field is mandatory and its value determines the other required fields.

To deserialize CSV messages, you also need to specify the following fields:

Field	Description	Required	Value	Example
Header	Whether the CSV data includes a header line.	Yes	`Yes` `No`	`No`
Name	Name of the column in CSV	Yes	-	`temp`, `asset`
Path	The jq path in the message where the column information is added.	No	-	The default jq path is the column name
Data Type	The data type of the data in the column and how it's represented inside the Data Processor pipeline.	No	`String`, `Float`, `Integer`, `Boolean`, `Bytes`	Default: `String`

To deserialize Protobuf messages, you also need to specify the following fields:

Field	Description	Required	Value	Example
Descriptor	The base64-encoded descriptor for the protobuf definition.	Yes	-	`Zhf...`
Message	The name of the message type that's used to format the data.	Yes	-	`pipeline`
Package	The name of the package in the descriptor where the type is defined.	Yes	-	`schedulerv1`

Note

Data Processor supports only one message type in each .proto file.

Configure partitioning

Partitioning in a pipeline divides the incoming data into separate partitions. Partitioning enables data parallelism in the pipeline, which can improve throughput and reduce latency. Partitioning strategies affect how the data is processed in the other stages of the pipeline. For example, the last known value stage and aggregate stage operate on each logical partition.

To partition your data, specify a partitioning strategy and the number of partitions to use:

Field	Description	Required	Default	Example
Partition type	The type of partitioning to be used: Partition `ID` or Partition `Key`	Required	`ID`	`ID`
Partition expression	The jq expression to use on the incoming message to compute the partition `ID` or partition `Key`	Required	`0`	`.payload.header`
Number of partitions	The number of partitions in a Data Processor pipeline.	Required	`1`	`1`

Data Processor adds metadata to the incoming message. See Data Processor message structure overview to understand how to correctly specify the partitioning expression that runs on the incoming message. By default, the partitioning expression is set to 0 with the Partition type as ID to send all the incoming data to a single partition.

For recommendations and to learn more, see What is partitioning?.

Share via

Configure a SQL Server source stage in Azure IoT Data Processor Preview

Prerequisites

Prepare the SQL Server database

Configure the SQL Server source

Select data format

Configure partitioning

Feedback

Additional resources

Share via

Configure a SQL Server source stage in Azure IoT Data Processor Preview

Prerequisites

Prepare the SQL Server database

Configure the SQL Server source

Select data format

Configure partitioning

Related content

Feedback

Additional resources