Azure Stream Analytics needs a better choice for the key column in an auto-created table
I have the following tech stack:
SQL Server on-prem running 2019
Change Data Capture (CDC) is enabled on one table for all columns
Debezium is connecting that to Kafka for transport to AEH
Azure Event Hubs (AEH) is my message broker for all changes
Using Azure Stream Analytics (ASA) to stream those changes to my sink
Azure SQL Server is my sink
Issue layer from the aforementioned list:
Azure Streaming Analytics
Experience
CDC is working beautifully, as is the Debezium+Kafka piece. For the data structure in the sink, I opted to allow for ASA to create the table structure with me providing just the name. It turns out that the key it chose to use is a field called "Schema" with a data type of NVARCHAR(8000); that can be an issue, and would trigger a warning when this type of field is used as the key to a table, as it leaves room for error when the key being inserted is too wide for a clustered or non-clustered index, which allow 900 and 1,700 bytes respectively. My very first record had a key of 4,780 bytes, and the table structure displayed in the schema column is tiny, at just 4 columns. My solution was to drop the index, add a surrogate key as the column which was used was the only unique field, then to recreate the index using my new field.
Proposed idea for a solution
When the table gets auto-created based on a data set, and the key field provided is too wide for a clustered key, to use a surrogate key. I'm simply suggesting what I used for my solution, though you guys may have a better idea on how to mitigate this potential issue.
