Container Insights에서 로그 쿼리

아티클
04/10/2024

컨테이너 인사이트는 컨테이너 호스트 및 컨테이너에서 성능 메트릭, 인벤토리 데이터 및 상태 정보를 수집합니다. 데이터는 3분 마다 수집되고 Azure Monitor의 Log Analytics를 사용하여 로그 쿼리에 사용할 수 있는 Azure Monitor의 Log Analytics 작업 영역으로 전달됩니다.

마이그레이션 계획, 용량 분석, 검색 및 주문형 성능 문제 해결을 포함하는 시나리오에 이 데이터를 적용할 수 있습니다. Azure Monitor Logs에서는 현재 클러스터 구성이 최적 상태로 실행되고 있는지를 파악하는 데 도움이 되도록 추세를 찾아보거나, 병목 상태를 진단하거나, 예측하거나 데이터 간 상관 관계를 파악할 수 있습니다.

해당 쿼리 사용에 대한 자세한 정보는 Azure Monitor Log Analytics에서 쿼리 사용을 참조하세요. Log Analytics를 사용한 쿼리 실행 및 결과 작업에 대한 전체 자습서는 Log Analytics 자습서를 참조하세요.

Important

이 문서의 쿼리는 Container Insights에서 수집되고 Log Analytics 작업 영역에 저장된 데이터에 따라 달라집니다. 기본 데이터 수집 설정을 수정한 경우 쿼리가 예상 결과를 반환하지 않을 수 있습니다. 특히 클러스터에 대해 Prometheus 메트릭을 사용하도록 설정한 이후 성능 데이터 수집을 사용하지 않도록 설정한 경우 Perf 테이블을 사용하는 모든 쿼리는 결과를 반환하지 않습니다.

성능 데이터 수집을 사용하지 않도록 설정하는 등 미리 설정된 구성은 데이터 수집 규칙을 사용하여 Container Insights에서 데이터 수집 구성을 참조하세요. 추가 데이터 수집 옵션은 ConfigMap을 사용하여 Container Insights에서 데이터 수집 구성을 참조하세요.

Log Analytics 열기

몇 가지 옵션으로 Log Analytics를 시작할 수 있습니다. 각 옵션은 다른 범위로 시작합니다. 작업 영역에서 모든 데이터에 액세스하려면 모니터링 메뉴에서 로그를 선택합니다. 단일 Kubernetes 클러스터로 데이터를 제한하려면, 해당 클러스터의 메뉴에서 로그를 선택합니다.

기존 로그 쿼리

Log Analytics를 사용하기 위해 로그 쿼리 작성 방법을 반드시 이해할 필요는 없습니다. 미리 빌드된 여러 쿼리 중에서 선택할 수 있습니다. 이러한 쿼리를 수정 없이 실행하거나 또는 사용자 지정 쿼리의 출발점으로 사용할 수 있습니다. Log Analytics 화면의 맨 위에 있는 쿼리를 선택하고 Kubernetes Services의 리소스 유형을 사용하여 쿼리를 봅니다.

컨테이너 테이블

Container Insights에서 사용하는 테이블 목록 및 자세한 설명은 Azure Monitor 테이블 참조를 참조하세요. 이러한 테이블은 모두 로그 쿼리에 사용할 수 있습니다.

샘플 로그 쿼리

한두 가지 예제로 시작하는 쿼리를 작성한 다음, 요구 사항에 맞게 수정하는 것이 유용한 경우가 많습니다. 고급 쿼리를 작성하는 데 도움이 되도록 다음과 같은 샘플 쿼리를 테스트해 볼 수 있습니다.

컨테이너의 수명 주기 정보 모두 나열

ContainerInventory
| project Computer, Name, Image, ImageTag, ContainerState, CreatedTime, StartedTime, FinishedTime
| render table

kubernetes 이벤트

참고 항목

기본적으로 Normal 이벤트 형식은 수집되지 않으므로 collect_all_kube_events ConfigMap 설정을 사용하도록 설정하지 않으면 KubeEvents 테이블을 쿼리할 때 표시되지 않습니다. Normal 이벤트를 수집해야 하는 경우 container-azm-ms-agentconfig ConfigMap에서 collect_all_kube_events 설정을 사용하도록 설정합니다. ConfigMap을 구성하는 방법에 대한 내용은 컨테이너 인사이트에 대한 에이전트 데이터 수집 구성을 참조하세요.

KubeEvents
| where not(isempty(Namespace))
| sort by TimeGenerated desc
| render table

컨테이너 CPU

Perf
| where ObjectName == "K8SContainer" and CounterName == "cpuUsageNanoCores" 
| summarize AvgCPUUsageNanoCores = avg(CounterValue) by bin(TimeGenerated, 30m), InstanceName

컨테이너 메모리

이 쿼리는 Linux 노드에만 사용할 수 있는 memoryRssBytes를 사용합니다.

Perf
| where ObjectName == "K8SContainer" and CounterName == "memoryRssBytes"
| summarize AvgUsedRssMemoryBytes = avg(CounterValue) by bin(TimeGenerated, 30m), InstanceName

사용자 지정 메트릭의 분당 요청 수

InsightsMetrics
| where Name == "requests_count"
| summarize Val=any(Val) by TimeGenerated=bin(TimeGenerated, 1m)
| sort by TimeGenerated asc
| project RequestsPerMinute = Val - prev(Val), TimeGenerated
| render barchart

이름 및 네임스페이스별 Pod

let startTimestamp = ago(1h);
KubePodInventory
| where TimeGenerated > startTimestamp
| project ContainerID, PodName=Name, Namespace
| where PodName contains "name" and Namespace startswith "namespace"
| distinct ContainerID, PodName
| join
(
    ContainerLog
    | where TimeGenerated > startTimestamp
)
on ContainerID
// at this point before the next pipe, columns from both tables are available to be "projected". Due to both
// tables having a "Name" column, we assign an alias as PodName to one column which we actually want
| project TimeGenerated, PodName, LogEntry, LogEntrySource
| summarize by TimeGenerated, LogEntry
| order by TimeGenerated desc

Pod 스케일 아웃(HPA)

이 쿼리는 각 배포에서 스케일 아웃된 복제본의 수를 반환합니다. HPA에 구성된 최대 복제본 수를 사용하여 스케일 아웃 비율을 계산합니다.

let _minthreshold = 70; // minimum threshold goes here if you want to setup as an alert
let _maxthreshold = 90; // maximum threshold goes here if you want to setup as an alert
let startDateTime = ago(60m);
KubePodInventory
| where TimeGenerated >= startDateTime 
| where Namespace !in('default', 'kube-system') // List of non system namespace filter goes here.
| extend labels = todynamic(PodLabel)
| extend deployment_hpa = reverse(substring(reverse(ControllerName), indexof(reverse(ControllerName), "-") + 1))
| distinct tostring(deployment_hpa)
| join kind=inner (InsightsMetrics 
    | where TimeGenerated > startDateTime 
    | where Name == 'kube_hpa_status_current_replicas'
    | extend pTags = todynamic(Tags) //parse the tags for values
    | extend ns = todynamic(pTags.k8sNamespace) //parse namespace value from tags
    | extend deployment_hpa = todynamic(pTags.targetName) //parse HPA target name from tags
    | extend max_reps = todynamic(pTags.spec_max_replicas) // Parse maximum replica settings from HPA deployment
    | extend desired_reps = todynamic(pTags.status_desired_replicas) // Parse desired replica settings from HPA deployment
    | summarize arg_max(TimeGenerated, *) by tostring(ns), tostring(deployment_hpa), Cluster=toupper(tostring(split(_ResourceId, '/')[8])), toint(desired_reps), toint(max_reps), scale_out_percentage=(desired_reps * 100 / max_reps)
    //| where scale_out_percentage > _minthreshold and scale_out_percentage <= _maxthreshold
    )
    on deployment_hpa

Nodepool 스케일 아웃

이 쿼리는 각 노드 풀의 활성 노드 수를 반환합니다. 자동 크기 조정기에서 사용 가능한 활성 노드 수와 최대 노드 구성 수를 계산하여 스케일 아웃 비율을 결정합니다. 결과 수 경고 규칙에 사용하려면, 쿼리의 주석 처리된 줄을 참조하세요.

let nodepoolMaxnodeCount = 10; // the maximum number of nodes in your auto scale setting goes here.
let _minthreshold = 20;
let _maxthreshold = 90;
let startDateTime = 60m;
KubeNodeInventory
| where TimeGenerated >= ago(startDateTime)
| extend nodepoolType = todynamic(Labels) //Parse the labels to get the list of node pool types
| extend nodepoolName = todynamic(nodepoolType[0].agentpool) // parse the label to get the nodepool name or set the specific nodepool name (like nodepoolName = 'agentpool)'
| summarize nodeCount = count(Computer) by ClusterName, tostring(nodepoolName), TimeGenerated
//(Uncomment the below two lines to set this as a log search alert)
//| extend scaledpercent = iff(((nodeCount * 100 / nodepoolMaxnodeCount) >= _minthreshold and (nodeCount * 100 / nodepoolMaxnodeCount) < _maxthreshold), "warn", "normal")
//| where scaledpercent == 'warn'
| summarize arg_max(TimeGenerated, *) by nodeCount, ClusterName, tostring(nodepoolName)
| project ClusterName, 
    TotalNodeCount= strcat("Total Node Count: ", nodeCount),
    ScaledOutPercentage = (nodeCount * 100 / nodepoolMaxnodeCount),  
    TimeGenerated, 
    nodepoolName

시스템 컨테이너(replicaset) 가용성

이 쿼리는 시스템 컨테이너(replicasets)를 반환하고 사용할 수 없는 비율을 보고합니다. 결과 수 경고 규칙에 사용하려면, 쿼리의 주석 처리된 줄을 참조하세요.

let startDateTime = 5m; // the minimum time interval goes here
let _minalertThreshold = 50; //Threshold for minimum and maximum unavailable or not running containers
let _maxalertThreshold = 70;
KubePodInventory
| where TimeGenerated >= ago(startDateTime)
| distinct ClusterName, TimeGenerated
| summarize Clustersnapshot = count() by ClusterName
| join kind=inner (
    KubePodInventory
    | where TimeGenerated >= ago(startDateTime)
    | where Namespace in('default', 'kube-system') and ControllerKind == 'ReplicaSet' // the system namespace filter goes here
    | distinct ClusterName, Computer, PodUid, TimeGenerated, PodStatus, ServiceName, PodLabel, Namespace, ContainerStatus
    | summarize arg_max(TimeGenerated, *), TotalPODCount = count(), podCount = sumif(1, PodStatus == 'Running' or PodStatus != 'Running'), containerNotrunning = sumif(1, ContainerStatus != 'running')
        by ClusterName, TimeGenerated, ServiceName, PodLabel, Namespace
    )
    on ClusterName
| project ClusterName, ServiceName, podCount, containerNotrunning, containerNotrunningPercent = (containerNotrunning * 100 / podCount), TimeGenerated, PodStatus, PodLabel, Namespace, Environment = tostring(split(ClusterName, '-')[3]), Location = tostring(split(ClusterName, '-')[4]), ContainerStatus
//Uncomment the below line to set for automated alert
//| where PodStatus == "Running" and containerNotrunningPercent > _minalertThreshold and containerNotrunningPercent < _maxalertThreshold
| summarize arg_max(TimeGenerated, *), c_entry=count() by PodLabel, ServiceName, ClusterName
//Below lines are to parse the labels to identify the impacted service/component name
| extend parseLabel = replace(@'k8s-app', @'k8sapp', PodLabel)
| extend parseLabel = replace(@'app.kubernetes.io\\/component', @'appkubernetesiocomponent', parseLabel)
| extend parseLabel = replace(@'app.kubernetes.io\\/instance', @'appkubernetesioinstance', parseLabel)
| extend tags = todynamic(parseLabel)
| extend tag01 = todynamic(tags[0].app)
| extend tag02 = todynamic(tags[0].k8sapp)
| extend tag03 = todynamic(tags[0].appkubernetesiocomponent)
| extend tag04 = todynamic(tags[0].aadpodidbinding)
| extend tag05 = todynamic(tags[0].appkubernetesioinstance)
| extend tag06 = todynamic(tags[0].component)
| project ClusterName, TimeGenerated,
    ServiceName = strcat( ServiceName, tag01, tag02, tag03, tag04, tag05, tag06),
    ContainerUnavailable = strcat("Unavailable Percentage: ", containerNotrunningPercent),
    PodStatus = strcat("PodStatus: ", PodStatus), 
    ContainerStatus = strcat("Container Status: ", ContainerStatus)

시스템 컨테이너(daemonsets) 가용성

이 쿼리는 시스템 컨테이너(daemonsets)를 반환하고 사용할 수 없는 비율을 보고합니다. 결과 수 경고 규칙에 사용하려면, 쿼리의 주석 처리된 줄을 참조하세요.

let startDateTime = 5m; // the minimum time interval goes here
let _minalertThreshold = 50; //Threshold for minimum and maximum unavailable or not running containers
let _maxalertThreshold = 70;
KubePodInventory
| where TimeGenerated >= ago(startDateTime)
| distinct ClusterName, TimeGenerated
| summarize Clustersnapshot = count() by ClusterName
| join kind=inner (
    KubePodInventory
    | where TimeGenerated >= ago(startDateTime)
    | where Namespace in('default', 'kube-system') and ControllerKind == 'DaemonSet' // the system namespace filter goes here
    | distinct ClusterName, Computer, PodUid, TimeGenerated, PodStatus, ServiceName, PodLabel, Namespace, ContainerStatus
    | summarize arg_max(TimeGenerated, *), TotalPODCount = count(), podCount = sumif(1, PodStatus == 'Running' or PodStatus != 'Running'), containerNotrunning = sumif(1, ContainerStatus != 'running')
        by ClusterName, TimeGenerated, ServiceName, PodLabel, Namespace
    )
    on ClusterName
| project ClusterName, ServiceName, podCount, containerNotrunning, containerNotrunningPercent = (containerNotrunning * 100 / podCount), TimeGenerated, PodStatus, PodLabel, Namespace, Environment = tostring(split(ClusterName, '-')[3]), Location = tostring(split(ClusterName, '-')[4]), ContainerStatus
//Uncomment the below line to set for automated alert
//| where PodStatus == "Running" and containerNotrunningPercent > _minalertThreshold and containerNotrunningPercent < _maxalertThreshold
| summarize arg_max(TimeGenerated, *), c_entry=count() by PodLabel, ServiceName, ClusterName
//Below lines are to parse the labels to identify the impacted service/component name
| extend parseLabel = replace(@'k8s-app', @'k8sapp', PodLabel)
| extend parseLabel = replace(@'app.kubernetes.io\\/component', @'appkubernetesiocomponent', parseLabel)
| extend parseLabel = replace(@'app.kubernetes.io\\/instance', @'appkubernetesioinstance', parseLabel)
| extend tags = todynamic(parseLabel)
| extend tag01 = todynamic(tags[0].app)
| extend tag02 = todynamic(tags[0].k8sapp)
| extend tag03 = todynamic(tags[0].appkubernetesiocomponent)
| extend tag04 = todynamic(tags[0].aadpodidbinding)
| extend tag05 = todynamic(tags[0].appkubernetesioinstance)
| extend tag06 = todynamic(tags[0].component)
| project ClusterName, TimeGenerated,
    ServiceName = strcat( ServiceName, tag01, tag02, tag03, tag04, tag05, tag06),
    ContainerUnavailable = strcat("Unavailable Percentage: ", containerNotrunningPercent),
    PodStatus = strcat("PodStatus: ", PodStatus), 
    ContainerStatus = strcat("Container Status: ", ContainerStatus)

컨테이너 로그

AKS에 대한 컨테이너 로그는 ContainerLogV2 테이블에 저장됩니다. 다음 샘플 쿼리를 실행하여 대상 Pod, 배포 또는 네임스페이스에서 stderr/stdout 로그 출력을 찾을 수 있습니다.

특정 Pod, 네임스페이스 및 컨테이너에 대한 컨테이너 로그

ContainerLogV2
| where _ResourceId =~ "clusterResourceID" //update with resource ID
| where PodNamespace == "podNameSpace" //update with target namespace
| where PodName == "podName" //update with target pod
| where ContainerName == "containerName" //update with target container
| project TimeGenerated, Computer, ContainerId, LogMessage, LogSource

특정 배포에 대한 컨테이너 로그

let KubePodInv = KubePodInventory
| where _ResourceId =~ "clusterResourceID" //update with resource ID
| where Namespace == "deploymentNamespace" //update with target namespace
| where ControllerKind == "ReplicaSet"
| extend deployment = reverse(substring(reverse(ControllerName), indexof(reverse(ControllerName), "-") + 1))
| where deployment == "deploymentName" //update with target deployment
| extend ContainerId = ContainerID
| summarize arg_max(TimeGenerated, *)  by deployment, ContainerId, PodStatus, ContainerStatus
| project deployment, ContainerId, PodStatus, ContainerStatus;

KubePodInv
| join
(
    ContainerLogV2
  | where TimeGenerated >= startTime and TimeGenerated < endTime
  | where PodNamespace == "deploymentNamespace" //update with target namespace
  | where PodName startswith "deploymentName" //update with target deployment
) on ContainerId
| project TimeGenerated, deployment, PodName, PodStatus, ContainerName, ContainerId, ContainerStatus, LogMessage, LogSource

특정 네임스페이스의 실패한 Pod에 대한 컨테이너 로그

    let KubePodInv = KubePodInventory
    | where TimeGenerated >= startTime and TimeGenerated < endTime
    | where _ResourceId =~ "clustereResourceID" //update with resource ID
    | where Namespace == "podNamespace" //update with target namespace
    | where PodStatus == "Failed"
    | extend ContainerId = ContainerID
    | summarize arg_max(TimeGenerated, *)  by  ContainerId, PodStatus, ContainerStatus
    | project ContainerId, PodStatus, ContainerStatus;

    KubePodInv
    | join
    (
        ContainerLogV2
    | where TimeGenerated >= startTime and TimeGenerated < endTime
    | where PodNamespace == "podNamespace" //update with target namespace
    ) on ContainerId
    | project TimeGenerated, PodName, PodStatus, ContainerName, ContainerId, ContainerStatus, LogMessage, LogSource

컨테이너 인사이트 기본 시각화 쿼리

이러한 쿼리는 컨테이너 인사이트의 기본 시각화에서 생성됩니다. 기본 차트 대신 사용자 지정 비용 최적화 설정을 사용하도록 설정한 경우 이를 사용하도록 선택할 수 있습니다.

상태별 노드 수

이 차트에 필요한 테이블에는 KubeNodeInventory가 포함됩니다.

 let trendBinSize = 5m;
 let maxListSize = 1000;
 let clusterId = 'clusterResourceID'; //update with resource ID
 
 let rawData = KubeNodeInventory 
| where ClusterId =~ clusterId 
| distinct ClusterId, TimeGenerated 
| summarize ClusterSnapshotCount = count() by Timestamp = bin(TimeGenerated, trendBinSize), ClusterId 
| join hint.strategy=broadcast ( KubeNodeInventory 
| where ClusterId =~ clusterId 
| summarize TotalCount = count(), ReadyCount = sumif(1, Status contains ('Ready')) by ClusterId, Timestamp = bin(TimeGenerated, trendBinSize) 
| extend NotReadyCount = TotalCount - ReadyCount ) on ClusterId, Timestamp 
| project ClusterId, Timestamp, TotalCount = todouble(TotalCount) / ClusterSnapshotCount, ReadyCount = todouble(ReadyCount) / ClusterSnapshotCount, NotReadyCount = todouble(NotReadyCount) / ClusterSnapshotCount;

 rawData 
| order by Timestamp asc 
| summarize makelist(Timestamp, maxListSize), makelist(TotalCount, maxListSize), makelist(ReadyCount, maxListSize), makelist(NotReadyCount, maxListSize) by ClusterId 
| join ( rawData 
| summarize Avg_TotalCount = avg(TotalCount), Avg_ReadyCount = avg(ReadyCount), Avg_NotReadyCount = avg(NotReadyCount) by ClusterId ) on ClusterId 
| project ClusterId, Avg_TotalCount, Avg_ReadyCount, Avg_NotReadyCount, list_Timestamp, list_TotalCount, list_ReadyCount, list_NotReadyCount

상태별 Pod 수

이 차트에 필요한 테이블에는 KubePodInventory가 포함됩니다.

 let trendBinSize = 5m;
 let maxListSize = 1000;
 let clusterId = 'clusterResourceID'; //update with resource ID
 
 let rawData = KubePodInventory 
| where ClusterId =~ clusterId 
| distinct ClusterId, TimeGenerated 
| summarize ClusterSnapshotCount = count() by bin(TimeGenerated, trendBinSize), ClusterId 
| join hint.strategy=broadcast ( KubePodInventory 
| where ClusterId =~ clusterId 
| summarize PodStatus=any(PodStatus) by TimeGenerated, PodUid, ClusterId 
| summarize TotalCount = count(), PendingCount = sumif(1, PodStatus =~ 'Pending'), RunningCount = sumif(1, PodStatus =~ 'Running'), SucceededCount = sumif(1, PodStatus =~ 'Succeeded'), FailedCount = sumif(1, PodStatus =~ 'Failed'), TerminatingCount = sumif(1, PodStatus =~ 'Terminating') by ClusterId, bin(TimeGenerated, trendBinSize) ) on ClusterId, TimeGenerated 
| extend UnknownCount = TotalCount - PendingCount - RunningCount - SucceededCount - FailedCount - TerminatingCount 
| project ClusterId, Timestamp = TimeGenerated, TotalCount = todouble(TotalCount) / ClusterSnapshotCount, PendingCount = todouble(PendingCount) / ClusterSnapshotCount, RunningCount = todouble(RunningCount) / ClusterSnapshotCount, SucceededCount = todouble(SucceededCount) / ClusterSnapshotCount, FailedCount = todouble(FailedCount) / ClusterSnapshotCount, TerminatingCount = todouble(TerminatingCount) / ClusterSnapshotCount, UnknownCount = todouble(UnknownCount) / ClusterSnapshotCount;

 let rawDataCached = rawData;
 
 rawDataCached 
| order by Timestamp asc 
| summarize makelist(Timestamp, maxListSize), makelist(TotalCount, maxListSize), makelist(PendingCount, maxListSize), makelist(RunningCount, maxListSize), makelist(SucceededCount, maxListSize), makelist(FailedCount, maxListSize), makelist(TerminatingCount, maxListSize), makelist(UnknownCount, maxListSize) by ClusterId 
| join ( rawDataCached 
| summarize Avg_TotalCount = avg(TotalCount), Avg_PendingCount = avg(PendingCount), Avg_RunningCount = avg(RunningCount), Avg_SucceededCount = avg(SucceededCount), Avg_FailedCount = avg(FailedCount), Avg_TerminatingCount = avg(TerminatingCount), Avg_UnknownCount = avg(UnknownCount) by ClusterId ) on ClusterId 
| project ClusterId, Avg_TotalCount, Avg_PendingCount, Avg_RunningCount, Avg_SucceededCount, Avg_FailedCount, Avg_TerminatingCount, Avg_UnknownCount, list_Timestamp, list_TotalCount, list_PendingCount, list_RunningCount, list_SucceededCount, list_FailedCount, list_TerminatingCount, list_UnknownCount

상태별 컨테이너 목록

이 차트에 필요한 테이블에는 KubePodInventory 및 Perf가 포함됩니다.

 let startDateTime = datetime('start time');
 let endDateTime = datetime('end time');
 let trendBinSize = 15m;
 let maxResultCount = 10000;
 let metricUsageCounterName = 'cpuUsageNanoCores';
 let metricLimitCounterName = 'cpuLimitNanoCores';
 
 let KubePodInventoryTable = KubePodInventory 
| where TimeGenerated >= startDateTime 
| where TimeGenerated < endDateTime 
| where isnotempty(ClusterName) 
| where isnotempty(Namespace) 
| where isnotempty(Computer) 
| project TimeGenerated, ClusterId, ClusterName, Namespace, ServiceName, ControllerName, Node = Computer, Pod = Name, ContainerInstance = ContainerName, ContainerID, ReadySinceNow = format_timespan(endDateTime - ContainerCreationTimeStamp , 'ddd.hh:mm:ss.fff'), Restarts = ContainerRestartCount, Status = ContainerStatus, ContainerStatusReason = columnifexists('ContainerStatusReason', ''), ControllerKind = ControllerKind, PodStatus;

 let startRestart = KubePodInventoryTable 
| summarize arg_min(TimeGenerated, *) by Node, ContainerInstance 
| where ClusterId =~ 'clusterResourceID' //update with resource ID
| project Node, ContainerInstance, InstanceName = strcat(ClusterId, '/', ContainerInstance), StartRestart = Restarts;

 let IdentityTable = KubePodInventoryTable 
| summarize arg_max(TimeGenerated, *) by Node, ContainerInstance 
| where ClusterId =~ 'clusterResourceID' //update with resource ID
| project ClusterName, Namespace, ServiceName, ControllerName, Node, Pod, ContainerInstance, InstanceName = strcat(ClusterId, '/', ContainerInstance), ContainerID, ReadySinceNow, Restarts, Status = iff(Status =~ 'running', 0, iff(Status=~'waiting', 1, iff(Status =~'terminated', 2, 3))), ContainerStatusReason, ControllerKind, Containers = 1, ContainerName = tostring(split(ContainerInstance, '/')[1]), PodStatus, LastPodInventoryTimeGenerated = TimeGenerated, ClusterId;

 let CachedIdentityTable = IdentityTable;
 
 let FilteredPerfTable = Perf 
| where TimeGenerated >= startDateTime 
| where TimeGenerated < endDateTime 
| where ObjectName == 'K8SContainer' 
| where InstanceName startswith 'clusterResourceID' 
| project Node = Computer, TimeGenerated, CounterName, CounterValue, InstanceName ;

 let CachedFilteredPerfTable = FilteredPerfTable;
 
 let LimitsTable = CachedFilteredPerfTable 
| where CounterName =~ metricLimitCounterName 
| summarize arg_max(TimeGenerated, *) by Node, InstanceName 
| project Node, InstanceName, LimitsValue = iff(CounterName =~ 'cpuLimitNanoCores', CounterValue/1000000, CounterValue), TimeGenerated;
 let MetaDataTable = CachedIdentityTable 
| join kind=leftouter ( LimitsTable ) on Node, InstanceName 
| join kind= leftouter ( startRestart ) on Node, InstanceName 
| project ClusterName, Namespace, ServiceName, ControllerName, Node, Pod, InstanceName, ContainerID, ReadySinceNow, Restarts, LimitsValue, Status, ContainerStatusReason = columnifexists('ContainerStatusReason', ''), ControllerKind, Containers, ContainerName, ContainerInstance, StartRestart, PodStatus, LastPodInventoryTimeGenerated, ClusterId;

 let UsagePerfTable = CachedFilteredPerfTable 
| where CounterName =~ metricUsageCounterName 
| project TimeGenerated, Node, InstanceName, CounterValue = iff(CounterName =~ 'cpuUsageNanoCores', CounterValue/1000000, CounterValue);

 let LastRestartPerfTable = CachedFilteredPerfTable 
| where CounterName =~ 'restartTimeEpoch' 
| summarize arg_max(TimeGenerated, *) by Node, InstanceName 
| project Node, InstanceName, UpTime = CounterValue, LastReported = TimeGenerated;

 let AggregationTable = UsagePerfTable 
| summarize Aggregation = max(CounterValue) by Node, InstanceName 
| project Node, InstanceName, Aggregation;

 let TrendTable = UsagePerfTable 
| summarize TrendAggregation = max(CounterValue) by bin(TimeGenerated, trendBinSize), Node, InstanceName 
| project TrendTimeGenerated = TimeGenerated, Node, InstanceName , TrendAggregation 
| summarize TrendList = makelist(pack("timestamp", TrendTimeGenerated, "value", TrendAggregation)) by Node, InstanceName;

 let containerFinalTable = MetaDataTable 
| join kind= leftouter( AggregationTable ) on Node, InstanceName 
| join kind = leftouter (LastRestartPerfTable) on Node, InstanceName 
| order by Aggregation desc, ContainerName 
| join kind = leftouter ( TrendTable) on Node, InstanceName 
| extend ContainerIdentity = strcat(ContainerName, ' ', Pod) 
| project ContainerIdentity, Status, ContainerStatusReason = columnifexists('ContainerStatusReason', ''), Aggregation, Node, Restarts, ReadySinceNow, TrendList = iif(isempty(TrendList), parse_json('[]'), TrendList), LimitsValue, ControllerName, ControllerKind, ContainerID, Containers, UpTimeNow = datetime_diff('Millisecond', endDateTime, datetime_add('second', toint(UpTime), make_datetime(1970,1,1))), ContainerInstance, StartRestart, LastReportedDelta = datetime_diff('Millisecond', endDateTime, LastReported), PodStatus, InstanceName, Namespace, LastPodInventoryTimeGenerated, ClusterId;
containerFinalTable 
| limit 200

상태별 컨트롤러 목록

이 차트에 필요한 테이블에는 KubePodInventory 및 Perf가 포함됩니다.

 let endDateTime = datetime('start time');
 let startDateTime = datetime('end time');
 let trendBinSize = 15m;
 let metricLimitCounterName = 'cpuLimitNanoCores';
 let metricUsageCounterName = 'cpuUsageNanoCores';
 
 let primaryInventory = KubePodInventory 
| where TimeGenerated >= startDateTime 
| where TimeGenerated < endDateTime 
| where isnotempty(ClusterName) 
| where isnotempty(Namespace) 
| extend Node = Computer 
| where ClusterId =~ 'clusterResourceID' //update with resource ID
| project TimeGenerated, ClusterId, ClusterName, Namespace, ServiceName, Node = Computer, ControllerName, Pod = Name, ContainerInstance = ContainerName, ContainerID, InstanceName, PerfJoinKey = strcat(ClusterId, '/', ContainerName), ReadySinceNow = format_timespan(endDateTime - ContainerCreationTimeStamp, 'ddd.hh:mm:ss.fff'), Restarts = ContainerRestartCount, Status = ContainerStatus, ContainerStatusReason = columnifexists('ContainerStatusReason', ''), ControllerKind = ControllerKind, PodStatus, ControllerId = strcat(ClusterId, '/', Namespace, '/', ControllerName);

let podStatusRollup = primaryInventory 
| summarize arg_max(TimeGenerated, *) by Pod 
| project ControllerId, PodStatus, TimeGenerated 
| summarize count() by ControllerId, PodStatus = iif(TimeGenerated < ago(30m), 'Unknown', PodStatus) 
| summarize PodStatusList = makelist(pack('Status', PodStatus, 'Count', count_)) by ControllerId;

let latestContainersByController = primaryInventory 
| where isnotempty(Node) 
| summarize arg_max(TimeGenerated, *) by PerfJoinKey 
| project ControllerId, PerfJoinKey;

let filteredPerformance = Perf 
| where TimeGenerated >= startDateTime 
| where TimeGenerated < endDateTime 
| where ObjectName == 'K8SContainer' 
| where InstanceName startswith 'clusterResourceID' //update with resource ID
| project TimeGenerated, CounterName, CounterValue, InstanceName, Node = Computer ;

let metricByController = filteredPerformance 
| where CounterName =~ metricUsageCounterName 
| extend PerfJoinKey = InstanceName 
| summarize Value = percentile(CounterValue, 95) by PerfJoinKey, CounterName 
| join (latestContainersByController) on PerfJoinKey 
| summarize Value = sum(Value) by ControllerId, CounterName 
| project ControllerId, CounterName, AggregationValue = iff(CounterName =~ 'cpuUsageNanoCores', Value/1000000, Value);

let containerCountByController = latestContainersByController 
| summarize ContainerCount = count() by ControllerId;

let restartCountsByController = primaryInventory 
| summarize Restarts = max(Restarts) by ControllerId;

let oldestRestart = primaryInventory 
| summarize ReadySinceNow = min(ReadySinceNow) by ControllerId;

let trendLineByController = filteredPerformance 
| where CounterName =~ metricUsageCounterName 
| extend PerfJoinKey = InstanceName 
| summarize Value = percentile(CounterValue, 95) by bin(TimeGenerated, trendBinSize), PerfJoinKey, CounterName 
| order by TimeGenerated asc 
| join kind=leftouter (latestContainersByController) on PerfJoinKey 
| summarize Value=sum(Value) by ControllerId, TimeGenerated, CounterName 
| project TimeGenerated, Value = iff(CounterName =~ 'cpuUsageNanoCores', Value/1000000, Value), ControllerId 
| summarize TrendList = makelist(pack("timestamp", TimeGenerated, "value", Value)) by ControllerId;

let latestLimit = filteredPerformance 
| where CounterName =~ metricLimitCounterName 
| extend PerfJoinKey = InstanceName 
| summarize arg_max(TimeGenerated, *) by PerfJoinKey 
| join kind=leftouter (latestContainersByController) on PerfJoinKey 
| summarize Value = sum(CounterValue) by ControllerId, CounterName 
| project ControllerId, LimitValue = iff(CounterName =~ 'cpuLimitNanoCores', Value/1000000, Value);

let latestTimeGeneratedByController = primaryInventory 
| summarize arg_max(TimeGenerated, *) by ControllerId 
| project ControllerId, LastTimeGenerated = TimeGenerated;

primaryInventory 
| distinct ControllerId, ControllerName, ControllerKind, Namespace 
| join kind=leftouter (podStatusRollup) on ControllerId 
| join kind=leftouter (metricByController) on ControllerId 
| join kind=leftouter (containerCountByController) on ControllerId 
| join kind=leftouter (restartCountsByController) on ControllerId 
| join kind=leftouter (oldestRestart) on ControllerId 
| join kind=leftouter (trendLineByController) on ControllerId 
| join kind=leftouter (latestLimit) on ControllerId 
| join kind=leftouter (latestTimeGeneratedByController) on ControllerId 
| project ControllerId, ControllerName, ControllerKind, PodStatusList, AggregationValue, ContainerCount = iif(isempty(ContainerCount), 0, ContainerCount), Restarts, ReadySinceNow, Node = '-', TrendList, LimitValue, LastTimeGenerated, Namespace 
| limit 250;

상태별 노드 목록

이 차트에 필요한 테이블에는 KubeNodeInventory, KubePodInventory 및 Perf가 포함됩니다.

 let endDateTime = datetime('start time');
 let startDateTime = datetime('end time');
 let binSize = 15m;
 let limitMetricName = 'cpuCapacityNanoCores';
 let usedMetricName = 'cpuUsageNanoCores'; 
 
 let materializedNodeInventory = KubeNodeInventory 
| where TimeGenerated < endDateTime 
| where TimeGenerated >= startDateTime 
| project ClusterName, ClusterId, Node = Computer, TimeGenerated, Status, NodeName = Computer, NodeId = strcat(ClusterId, '/', Computer), Labels 
| where ClusterId =~ 'clusterResourceID'; //update with resource ID

 let materializedPerf = Perf 
| where TimeGenerated < endDateTime 
| where TimeGenerated >= startDateTime 
| where ObjectName == 'K8SNode' 
| extend NodeId = InstanceName;

 let materializedPodInventory = KubePodInventory 
| where TimeGenerated < endDateTime 
| where TimeGenerated >= startDateTime 
| where isnotempty(ClusterName) 
| where isnotempty(Namespace) 
| where ClusterId =~ 'clusterResourceID'; //update with resource ID

 let inventoryOfCluster = materializedNodeInventory 
| summarize arg_max(TimeGenerated, Status) by ClusterName, ClusterId, NodeName, NodeId;

 let labelsByNode = materializedNodeInventory 
| summarize arg_max(TimeGenerated, Labels) by ClusterName, ClusterId, NodeName, NodeId;

 let countainerCountByNode = materializedPodInventory 
| project ContainerName, NodeId = strcat(ClusterId, '/', Computer) 
| distinct NodeId, ContainerName 
| summarize ContainerCount = count() by NodeId;

 let latestUptime = materializedPerf 
| where CounterName == 'restartTimeEpoch' 
| summarize arg_max(TimeGenerated, CounterValue) by NodeId 
| extend UpTimeMs = datetime_diff('Millisecond', endDateTime, datetime_add('second', toint(CounterValue), make_datetime(1970,1,1))) 
| project NodeId, UpTimeMs;

 let latestLimitOfNodes = materializedPerf 
| where CounterName == limitMetricName 
| summarize CounterValue = max(CounterValue) by NodeId 
| project NodeId, LimitValue = CounterValue;

 let actualUsageAggregated = materializedPerf 
| where CounterName == usedMetricName 
| summarize Aggregation = percentile(CounterValue, 95) by NodeId //This line updates to the desired aggregation
| project NodeId, Aggregation;

 let aggregateTrendsOverTime = materializedPerf 
| where CounterName == usedMetricName 
| summarize TrendAggregation = percentile(CounterValue, 95) by NodeId, bin(TimeGenerated, binSize) //This line updates to the desired aggregation
| project NodeId, TrendAggregation, TrendDateTime = TimeGenerated;

 let unscheduledPods = materializedPodInventory 
| where isempty(Computer) 
| extend Node = Computer 
| where isempty(ContainerStatus) 
| where PodStatus == 'Pending' 
| order by TimeGenerated desc 
| take 1 
| project ClusterName, NodeName = 'unscheduled', LastReceivedDateTime = TimeGenerated, Status = 'unscheduled', ContainerCount = 0, UpTimeMs = '0', Aggregation = '0', LimitValue = '0', ClusterId;

 let scheduledPods = inventoryOfCluster 
| join kind=leftouter (aggregateTrendsOverTime) on NodeId 
| extend TrendPoint = pack("TrendTime", TrendDateTime, "TrendAggregation", TrendAggregation) 
| summarize make_list(TrendPoint) by NodeId, NodeName, Status 
| join kind=leftouter (labelsByNode) on NodeId 
| join kind=leftouter (countainerCountByNode) on NodeId 
| join kind=leftouter (latestUptime) on NodeId 
| join kind=leftouter (latestLimitOfNodes) on NodeId 
| join kind=leftouter (actualUsageAggregated) on NodeId 
| project ClusterName, NodeName, ClusterId, list_TrendPoint, LastReceivedDateTime = TimeGenerated, Status, ContainerCount, UpTimeMs, Aggregation, LimitValue, Labels 
| limit 250;

 union (scheduledPods), (unscheduledPods) 
| project ClusterName, NodeName, LastReceivedDateTime, Status, ContainerCount, UpTimeMs = UpTimeMs_long, Aggregation = Aggregation_real, LimitValue = LimitValue_real, list_TrendPoint, Labels, ClusterId

Prometheus 메트릭

다음 예제에서는 컨테이너 인사이트를 사용하여 Prometheus 메트릭을 Log Analytics 작업 영역으로 보내기에 설명된 구성이 필요합니다.

Azure Monitor가 스크래핑하고 네임스페이스로 필터링된 Prometheus 메트릭을 보려면 "prometheus"를 지정합니다. 다음은 default kubernetes 네임스페이스에서 Prometheus 메트릭을 보는 샘플 쿼리입니다.

InsightsMetrics 
| where Namespace contains "prometheus"
| extend tags=parse_json(Tags)
| summarize count() by Name

또한 이름으로 프로메테우스 데이터를 직접 쿼리할 수 있습니다.

InsightsMetrics 
| where Namespace contains "prometheus"
| where Name contains "some_prometheus_metric"

각 메트릭의 수집 볼륨 크기를 GB/일 단위로 식별하여 높은지 여부를 파악할 수 있도록 다음 쿼리가 제공됩니다.

InsightsMetrics
| where Namespace contains "prometheus"
| where TimeGenerated > ago(24h)
| summarize VolumeInGB = (sum(_BilledSize) / (1024 * 1024 * 1024)) by Name
| order by VolumeInGB desc
| render barchart

출력은 다음 예제와 유사한 결과를 표시합니다.

1달 동안의 GB 단위의 각 메트릭 크기를 평가하여 작업 영역에서 수신되고 수집된 데이터의 양이 높은지 여부를 파악할 수 있도록 다음 쿼리가 제공됩니다.

InsightsMetrics
| where Namespace contains "prometheus"
| where TimeGenerated > ago(24h)
| summarize EstimatedGBPer30dayMonth = (sum(_BilledSize) / (1024 * 1024 * 1024)) * 30 by Name
| order by EstimatedGBPer30dayMonth desc
| render barchart

출력은 다음 예제와 유사한 결과를 표시합니다.

구성 또는 스크래핑 오류

구성 또는 스크래핑 오류를 조사하기 위해, 다음 예제 쿼리는 KubeMonAgentEvents 테이블에서 정보 이벤트를 반환합니다.

KubeMonAgentEvents | where Level != "Info"

출력은 다음 예제와 유사한 결과를 표시합니다.

자주 묻는 질문

이 섹션에서는 일반적인 질문에 대한 답변을 제공합니다.

Grafana에서 수집된 메트릭을 볼 수 있나요?

컨테이너 인사이트는 Grafana 대시보드의 Log Analytics 작업 영역에 저장된 메트릭을 볼 수 있도록 지원합니다. Grafana의 대시보드 리포지토리에서 다운로드할 수 있는 템플릿이 제공되었습니다. 시작 및 참조를 위해 이 템플릿을 사용하여 사용자 지정 Grafana 대시보드에서 시각화를 위해 모니터링되는 클러스터에서 데이터를 쿼리하는 방법을 알아 볼 수 있습니다.

16KB보다 큰 로그 줄이 Log Analytics에서 여러 레코드로 분할되는 이유는 무엇인가요?

에이전트는 Docker JSON 파일 로깅 드라이버를 사용하여 컨테이너의 stdout 및 stderr을 캡처합니다. 이 로깅 드라이버는 stdout 또는 stderr에서 파일로 복사될 때 16KB보다 큰 로그 줄을 여러 줄로 분할합니다.

다음 단계

Container Insights에는 미리 정의된 경고 집합이 없습니다. Container Insights로 성능 경고 만들기를 검토하여 DevOps 또는 운영 프로세스 및 절차를 지원하기 위해 높은 CPU 및 메모리 사용률에 대한 권장 경고를 만드는 방법을 알아봅니다.

Container Insights에서 로그 쿼리

Log Analytics 열기

기존 로그 쿼리

컨테이너 테이블

샘플 로그 쿼리

컨테이너의 수명 주기 정보 모두 나열

kubernetes 이벤트

컨테이너 CPU

컨테이너 메모리

사용자 지정 메트릭의 분당 요청 수

이름 및 네임스페이스별 Pod

Pod 스케일 아웃(HPA)

Nodepool 스케일 아웃

시스템 컨테이너(replicaset) 가용성

시스템 컨테이너(daemonsets) 가용성

컨테이너 로그

특정 Pod, 네임스페이스 및 컨테이너에 대한 컨테이너 로그

특정 배포에 대한 컨테이너 로그

특정 네임스페이스의 실패한 Pod에 대한 컨테이너 로그

컨테이너 인사이트 기본 시각화 쿼리

상태별 노드 수

상태별 Pod 수

상태별 컨테이너 목록

상태별 컨트롤러 목록

상태별 노드 목록

Prometheus 메트릭

구성 또는 스크래핑 오류

자주 묻는 질문

Grafana에서 수집된 메트릭을 볼 수 있나요?

16KB보다 큰 로그 줄이 Log Analytics에서 여러 레코드로 분할되는 이유는 무엇인가요?

다음 단계

추가 리소스