question

GoldnerEli-3839 avatar image
0 Votes"
GoldnerEli-3839 asked PRADEEPCHEEKATLA-MSFT commented

Azure Stream Analytics needs a better choice for the key column in an auto-created table

Azure Stream Analytics needs a better choice for the key column in an auto-created table
I have the following tech stack:

  • SQL Server on-prem running 2019

  • Change Data Capture (CDC) is enabled on one table for all columns

  • Debezium is connecting that to Kafka for transport to AEH

  • Azure Event Hubs (AEH) is my message broker for all changes

  • Using Azure Stream Analytics (ASA) to stream those changes to my sink

  • Azure SQL Server is my sink

Issue layer from the aforementioned list:
Azure Streaming Analytics

Experience
CDC is working beautifully, as is the Debezium+Kafka piece. For the data structure in the sink, I opted to allow for ASA to create the table structure with me providing just the name. It turns out that the key it chose to use is a field called "Schema" with a data type of NVARCHAR(8000); that can be an issue, and would trigger a warning when this type of field is used as the key to a table, as it leaves room for error when the key being inserted is too wide for a clustered or non-clustered index, which allow 900 and 1,700 bytes respectively. My very first record had a key of 4,780 bytes, and the table structure displayed in the schema column is tiny, at just 4 columns. My solution was to drop the index, add a surrogate key as the column which was used was the only unique field, then to recreate the index using my new field.

Proposed idea for a solution
When the table gets auto-created based on a data set, and the key field provided is too wide for a clustered key, to use a surrogate key. I'm simply suggesting what I used for my solution, though you guys may have a better idea on how to mitigate this potential issue.

azure-stream-analytics
· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hello @GoldnerEli-3839,

Welcome to the Microsoft Q&A platform.

We are reaching out to internal team to get help on this issue and will update you as soon as we have a response.

Thank you for your patience.

0 Votes 0 ·

Hello @GoldnerEli-3839,

We are still awaiting for the response from the internal team. I will update you as soon as we have a response.

Stay tuned!

0 Votes 0 ·

1 Answer

PRADEEPCHEEKATLA-MSFT avatar image
0 Votes"
PRADEEPCHEEKATLA-MSFT answered PRADEEPCHEEKATLA-MSFT commented

Hello @GoldnerEli-3839,

Apologize for the delay in response.

Your scenario is interesting and we do realize that the key column is not optimized in some cases and will look how to improve this in future releases.

In the meantime, for these scenarios, it may be best to create the table manually and choose an existing table.

ASA portal does not create any key for the new table, but only an index. And when ASA creates a table, the maximum size of nvarchar we may set is 4000 instead of 8000 as described by you, so I'm wondering if the behavior you have seen was caused by some other product.

As for the index column, you can pick what column that you want to use before creating the table:

106155-image.png

Hope this helps. Do let us know if you any further queries.


  • Please accept an answer if correct. Original posters help the community find answers faster by identifying the correct answer. Here is how.

  • Want a reminder to come back and check responses? Here is how to subscribe to a notification.


image.png (75.6 KiB)
· 6
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hello @GoldnerEli-3839,

Just checking in to see if the above answer helped. If this answers your query, do click Accept Answer and Up-Vote for the same. And, if you have any further query do let us know.

0 Votes 0 ·

Hi,
My experience is not the way you describe it. The wide columns were NVARCHAR(8000), not 4000 as you mention. Creating an index instead of a key makes it "less bad", though still leaves plenty of room for error, as it is now limited to a width of 1700 bytes instead of 900. The field in question is still well over 2 times the width. Your process to create a table samples the data, as it correctly had the data types, though every single value in the field which was used in the index was over 4000 bytes, so there was no way to have that make sense to be used as a key. It is not a candidate key for an index, clustered or non-clustered, hence the issue still stands.
I did not use any tool other than ASA, and I'm wondering if this "feature" is one of the reasons why this product is still listed as being in preview.

0 Votes 0 ·

Hello @GoldnerEli-3839,

Thanks for the details.

We are reaching out to internal team to get help on this issue and will update you as soon as we have a response.

Thank you for your patience.

0 Votes 0 ·
Show more comments