question

sashachernin-6111 avatar image
0 Votes"
sashachernin-6111 asked MughundhanRaveendran-MSFT commented

Sending huge amount of custom logs to Azure Monitor via Data Collector API causes duplicates

I have a function app which collects custom logs from one source and sends them concurrently to Azure Monitor using the following code example. The function app runs each hour and sends about 100k+ rows of log.

For example I'm trying to send 100,000 rows, but for some reason it shows in Azure that the number of rows received is slightly more than the original, for an example 100,050.

What could be the possible reason?


azure-functionsazure-monitor
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

1 Answer

MughundhanRaveendran-MSFT avatar image
1 Vote"
MughundhanRaveendran-MSFT answered MughundhanRaveendran-MSFT commented

Hi @sashachernin-6111 ,

Thanks for reaching out to Q&A forum.

The additional logs are most likely duplicates. Log Analytics, at its core, is a Big Data system running at cloud scale. As typical in Big Data world, we need to make tradeoff between multiple factors, including data ingestion reliability (no data loss), ingestion speed, duplicate record and many more. Log Analytics is optimized for reliability and ingestion speed, and typically does not introduce duplicate records. To make sure data is not lost, we implement multiple protection mechanisms (storage-backed “staging areas”) when data is flowing into the system. In rare cases when some local instability occurs (typically unnoticeable and by design), we opt to re-ingest the data from “staging” to ensure no data is lost – this may introduce duplicate records. These occurrences are quite rare, and in most cases can be ignored. Therefore, In some cases, you might encounter duplicated records on your Log Analytics workspace. As depicted above, This can happen throughout the ingestion pipeline due to various reasons, which rarely happen and are considered reasonable and expected.


Additional publicly available information can be found here:
https://docs.microsoft.com/en-us/azure/azure-monitor/faq#why-am-i-seeing-duplicate-records-in-azure-monitor-logs-


The way to handle duplicated data is to process it and filter out the existing duplicated rows, which can be done in different manners. One example would be to use this sort of query:
https://docs.microsoft.com/en-us/azure/data-explorer/dealing-with-duplicates#solution-2-handle-duplicate-rows-during-query
It should be noted that duplicated records cannot be deleted.

To properly work with the solution mentioned above, It is important to run this summarize operator over the lowest set of records (i.e. lowest input size) you can get to.
Also, reducing the number of columns used within the after the ‘by’ operator may help as well.
You may also try project away unnecessary columns as you use the wildcard parameter for the ExprToReturn within the arg_max() function.

Additionally you can look into this article as well : https://docs.microsoft.com/en-us/azure/data-explorer/dealing-with-duplicates

I hope this helps!

Please 'Accept as answer' and ‘Upvote’ if it helped so that it can help others in the community looking for help on similar topics.

· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

@sashachernin-6111 ,

Following up to see if the above answer helps. Do let me know if you have any queries.

Please 'Accept as answer' and ‘Upvote’ if it helped so that it can help others in the community looking for help on similar topics.

0 Votes 0 ·