question

RONALDONASCIMENTO-1394 avatar image
0 Votes"
RONALDONASCIMENTO-1394 asked RONALDONASCIMENTO-1394 commented

Use Hybrid Runbook Worker or make shorter queries to Log Analytics?

I am running a functional powershell script that queries Log Analytics for a specified time range (usually within 2 monhts), retrieves the data from Log Analytics and send it to Event Hub.

Problem:
Query results are very large. The following are the counts:
24 hours of security logs query = count is 15,446,817 rows
1 hour of security logs query = count is 363,303 rows
1 minute of security logs query = count is 6,166 rows

If I run the Powershell code in Azure Automation, it is runs fine if query period is less than 3 minutes. If I attempt queries larger than 3 minutes, it times out (it exceeds 400MB memory limit). If I run the same code from my local machine, I can do queries for a period of 10 hours without timeouts.

Question:
1) Should the best solution in this case break long query period and do increments of up to 3 minutes? For example, if the user requests a query for the entire month of March 2021, I should start a counter to increment time by 3 minutes until I reach the entire month?

2)Another suggestion I received is to implement a Hybrid Runbook Worker and use more computing resources from a VM running agent-job. But if I can avoid a VM additional cost by break long queries into small segments, that would be preferred.

Please let me know best approach in this large data set scenario.

Blockquote
function QueryLogAnalyticsSend {

Param(
$TableName,
$startperiod,
$endperiod,
$LogAnalyticsWorkspaceId
)

$makehub = $TableName.ToLower()
$EventHub = "am-"+$makehub


$elapsedvar1 = Get-Date

Write-Host "Executing query on Log Analytics, table $TableName..."

$query = "$($TableName)| where TimeGenerated >= datetime($startperiod) and TimeGenerated <= datetime($endperiod)"
Write-Host "Query is $query"

Try {

$output = (Invoke-AzOperationalInsightsQuery -WorkspaceId $LogAnalyticsWorkspaceId -Query $query).Results
}

Catch {
Write-Host "Error during Invoke-AzOperationalInsightsQuery. Verify 1) PowerShell Module Az is installed 2) LA Work ID is correct 3) DateTime syntax is correct. "
}


$arrayrows = @()
$total_size_sent = 0
$current_size = 0
$allowed_size = 1024 1024 - 3000

foreach ($row in $output)
{

$irow++
$converted_row = $row | ConvertTo-Json
$row_size = [System.Text.Encoding]::UTF8.GetByteCount($converted_row)
$current_size = $current_size+$row_size


IF ($current_size -gt $allowed_size)
{
Write-Host "
Current size to be sent: $current_size Bytes Table is: $TableName "

$SAStoken = get-SAStoken -EventHubNamespace $EventHubNamespace -EventHub $EventHub -Access_Policy_Name $AccessPolicyName -Access_Policy_Key $Access_Policy_Key

SendDataToEventHub -token $SAStoken -row $arrayrows -Namespace $EventHubNamespace -EventHub $EventHub

$total_size_sent = $total_size_sent + $current_size
$current_size = 0
$arrayrows = @()
}

$arrayrows = $arrayrows + $converted_row

}

Write-Host "
Current size to be sent: $current_size Bytes Table is: $TableName * "

$SAStoken = get-SAStoken -EventHubNamespace $EventHubNamespace -EventHub $EventHub -Access_Policy_Name $AccessPolicyName -Access_Policy_Key $Access_Policy_Key

SendDataToEventHub -token $SAStoken -row $arrayrows -Namespace $EventHubNamespace -EventHub $EventHub

$elapsedvar2 = Get-Date
$elapsedtime = $elapsedvar2 - $elapsedvar1

$total_size_sent = $total_size_sent + $current_size
Write-Host "Elapsed time:$elapsedtime Total size sent:$total_size_sent Bytes Start:$elapsedvar1 End:$elapsedvar2 Rows:$irow"
Write-Host "Original query: $query"

}

Blockquote



azure-automation
· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi @RONALDONASCIMENTO-1394 / @MrSky,

Did you get chance to review my earlier response? Let me know if you have any further queries regarding it.

0 Votes 0 ·

Hi there, thanks. Yes, it makes sense. I still may break queries into smaller interactions so that I can better control rows to be resent in case of errors. But I have no doubt the Hybrid Runbook Worker is necessary. Deploying it today.

0 Votes 0 ·

1 Answer

tbgangav-MSFT avatar image
0 Votes"
tbgangav-MSFT answered

Hi @RONALDONASCIMENTO-1394 / @MrSky,

Yes, as per this, maximum amount of memory given to a sandbox is 400MB hence the situation. Even though the intention of HRW (Hybrid Runbook Worker) is different as mentioned here but in this case I would recommend to go with option 2 i.e., HRW rather than option 1 i.e., incremental sandbox runbook execution.

On the other hand, you may try to leverage this new feature that's currently in preview.

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.