Azure Stream Analytics needs a better choice for the key column in an auto-created table

Goldner, Eli 1 Reputation point
2021-06-09T14:57:31.117+00:00

Azure Stream Analytics needs a better choice for the key column in an auto-created table
I have the following tech stack:

  • SQL Server on-prem running 2019
  • Change Data Capture (CDC) is enabled on one table for all columns
  • Debezium is connecting that to Kafka for transport to AEH
  • Azure Event Hubs (AEH) is my message broker for all changes
  • Using Azure Stream Analytics (ASA) to stream those changes to my sink
  • Azure SQL Server is my sink

Issue layer from the aforementioned list:
Azure Streaming Analytics

Experience
CDC is working beautifully, as is the Debezium+Kafka piece. For the data structure in the sink, I opted to allow for ASA to create the table structure with me providing just the name. It turns out that the key it chose to use is a field called "Schema" with a data type of NVARCHAR(8000); that can be an issue, and would trigger a warning when this type of field is used as the key to a table, as it leaves room for error when the key being inserted is too wide for a clustered or non-clustered index, which allow 900 and 1,700 bytes respectively. My very first record had a key of 4,780 bytes, and the table structure displayed in the schema column is tiny, at just 4 columns. My solution was to drop the index, add a surrogate key as the column which was used was the only unique field, then to recreate the index using my new field.

Proposed idea for a solution
When the table gets auto-created based on a data set, and the key field provided is too wide for a clustered key, to use a surrogate key. I'm simply suggesting what I used for my solution, though you guys may have a better idea on how to mitigate this potential issue.

Azure Stream Analytics
Azure Stream Analytics
An Azure real-time analytics service designed for mission-critical workloads.
331 questions
{count} votes

1 answer

Sort by: Most helpful
  1. PRADEEPCHEEKATLA-MSFT 77,751 Reputation points Microsoft Employee
    2021-06-16T11:02:57.56+00:00

    Hello @Goldner, Eli ,

    Apologize for the delay in response.

    Your scenario is interesting and we do realize that the key column is not optimized in some cases and will look how to improve this in future releases.

    In the meantime, for these scenarios, it may be best to create the table manually and choose an existing table.

    ASA portal does not create any key for the new table, but only an index. And when ASA creates a table, the maximum size of nvarchar we may set is 4000 instead of 8000 as described by you, so I'm wondering if the behavior you have seen was caused by some other product.

    As for the index column, you can pick what column that you want to use before creating the table:

    106155-image.png

    Hope this helps. Do let us know if you any further queries.

    ------------

    • Please accept an answer if correct. Original posters help the community find answers faster by identifying the correct answer. Here is how.
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification.