I have a general question on if my setup is any good for my scenario below:
1 streaming IoT source to Azure Eventhub
Raw format is in byte encoding and requires that it be decoded.
Need to build a scalable and reusable data structure
Output to 2 Azure cosmos DB collections, 1 streaming and 1 batch different set of data.
I have the following in mind
1 streaming job 24/7 - Data from Azure eventhub --> append to Delta table table Bronze
1 streaming job 24/7 - Select from Bronze Delta table, decode data and merge into --> Delta table Silver
1 streaming job 24/7 - Select specifics from Silver Delta table, transform, aggregate and upsert into —> Azure Cosmos DB collection 1
1 batch job once per day - Select specifics from Silver Delta table, transform, aggregate, upsert into —> Azure Cosmos DB collection 2
Is this the correct way of doing it?
Should I save the raw unencoded data in Bronze or should I instead decode and save it as raw?
What about schema and decode logic changes from IoT source?
Can/should you cohost the first 2 streaming jobs into 1 to save cost? Is the merge between bronze and silver to heavy to have in 1 job?