Kernrate 를 이용한 성능 분석

안녕하세요 오늘은 Kernrate 라는 성능분석 툴에 대해서 설명을 해보려고 합니다.
필터 드라이버들을 개발 하다 보면 성능 문제를 만나는 경우가 많이 있습니다. Filesystem Filter Driver 를 작성하였는데 Load 해보니 system 이 느려지더라 등의 문제가 있을 수 있는데 이때 어디가 병목지점인지 찾기가 힘듭니다. 왜냐하면 Filesystem Filter 의 경우 1초에도 수천개의 I/O 가 지나고 있기 때문에 전부를 비교할 수 없습니다. 그리고 어떤 지점에서 얼마의 시간이 걸렸는지는 알수 없습니다. 무식하게 작성한다면 모든 함수에 시간을 출력하게 해서 각 시간을 비교해 보는 방법이 있을 수 있겠지만 수 많은 (수만개)의 I/O 를 분석한다는 것은 정말 힘든 일 입니다.
이럴때 사용하기 좋은 툴이 Kernrate 입니다. 이 툴은 MS 사이트에서 공짜로 받을 수 있는 좋은 툴 입니다. MS 는 정말로 많은 툴을 만들어 놓네요 (참고로 Compuware 에 TrueTime 이라는 툴도 있습니다. 이것은 상용이고 좀더 보기좋은 GUI 를 가진 결과를 보여 줍니다.)
다음은 dir /s c: 를 하고 난 후 Kernrate 를 실행 시켜본 결과 입니다.

 

C:\KrView\Kernrates>Kernrate_i386_XP.exe
/==============================\
<         KERNRATE LOG           >
\==============================/
Date: 2004/12/21   Time: 15:33:21
Machine Name: AOLTEAN-H4
Number of Processors: 1
PROCESSOR_ARCHITECTURE: x86
PROCESSOR_LEVEL: 6
PROCESSOR_REVISION: 0800
Physical Memory: 480 MB
Pagefile Total: 1125 MB
Virtual Total: 2047 MB
PageFile1: \??\E:\pagefile.sys, 720MB
OS Version: 5.1 Build 2600 Service-Pack: 2.0
WinDir: E:\WINDOWS

Kernrate User-Specified Command Line:
Kernrate_i386_XP.exe

Kernel Profile (PID = 0): Source= Time,
Using Kernrate Default Rate of 25000 events/hit
Starting to collect profile data

***> Press ctrl-c to finish collecting profile data
===> Finished Collecting Data, Starting to Process Results

------------Overall Summary:--------------

P0     K 0:00:13.656 (38.2%)  U 0:00:02.484 ( 7.0%)  I 0:00:19.578 (54.8%)  DPC 0:00:00.312 ( 0.9%)  Interrupt 0:00:00.296 ( 0.8%)
       Interrupts= 107928, Interrupt Rate= 3022/sec.

Total Profile Time = 35718 msec

                                       BytesStart          BytesStop         BytesDiff.
    Available Physical Memory   ,       107065344,       112259072,         5193728
    Available Pagefile(s)       ,       374460416,       371945472,        -2514944
    Available Virtual           ,      2132889600,      2131841024,        -1048576
    Available Extended Virtual  ,               0,               0,               0

                                  Total      Avg. Rate
    Context Switches     ,       395167,         11063/sec.
    System Calls         ,      1106131,         30968/sec.
    Page Faults          ,        20595,         577/sec.
    I/O Read Operations  ,         2346,         66/sec.
    I/O Write Operations ,         1102,         31/sec.
    I/O Other Operations ,        32185,         901/sec.
    I/O Read Bytes       ,       346338,         148/ I/O
    I/O Write Bytes      ,        83614,         76/ I/O
    I/O Other Bytes      ,      5832690,         181/ I/O

-----------------------------

Results for Kernel Mode:
-----------------------------

OutputResults: KernelModuleCount = 135
Percentage in the following table is based on the Total Hits for the Kernel

Time   13080 hits, 25000 events per hit --------
Module                                Hits   msec  %Total  Events/Sec
amdk7                                  7381      35718    56 %     5166162
nv4_disp                               3834      35718    29 %     2683520
ntoskrnl                                819      35718     6 %      573240
win32k                                  341      35718     2 %      238675
hal                                     288      35718     2 %      201579
Ntfs                                    180      35718     1 %      125986
NVENET                                   64      35718     0 %       44795
USBPORT                                  38      35718     0 %       26597
atapi                                    20      35718     0 %       13998
ino_fltr                                 19      35718     0 %       13298
nv4_mini                                 17      35718     0 %       11898
usbohci                                  11      35718     0 %        7699
watchdog                                  9      35718     0 %        6299
tcpip                                     7      35718     0 %        4899
HIDPARSE                                  7      35718     0 %        4899
Npfs                                      5      35718     0 %        3499
HIDCLASS                                  4      35718     0 %        2799
NDIS                                      4      35718     0 %        2799
sr                                        4      35718     0 %        2799
ftdisk                                    4      35718     0 %        2799
usbhub                                    3      35718     0 %        2099
PCIIDEX                                   3      35718     0 %        2099
ACPI                                      3      35718     0 %        2099
mouhid                                    2      35718     0 %        1399
hidusb                                    2      35718     0 %        1399
mouclass                                  2      35718     0 %        1399
TDI                                       2      35718     0 %        1399
PartMgr                                   2      35718     0 %        1399
rdbss                                     1      35718     0 %         699
psched                                    1      35718     0 %         699
VIDEOPRT                                  1      35718     0 %         699
imapi                                     1      35718     0 %         699
CLASSPNP                                  1      35718     0 %         699

================================= END OF RUN ==================================
============================== NORMAL END OF RUN ==============================

결과를 보면 nv4_disp.sys 라는 드라이버가 29% 의 시간을 소모하고 있는 것을 볼 수 있습니다.(NVidia chipset 과 관련된 드라이버 입니다.)
명령을 추가하여 Module 의 내부 함수에 대해서 알고 싶다면 아래와 같은 명령을 실행하면 됩니다.

 

C:\KrView\Kernrates>Kernrate_i386_XP.exe -z ntoskrnl
/==============================\
<         KERNRATE LOG           >
\==============================/
Date: 2004/12/21   Time: 15:37:38
Machine Name: AOLTEAN-H4
Number of Processors: 1
PROCESSOR_ARCHITECTURE: x86
PROCESSOR_LEVEL: 6
PROCESSOR_REVISION: 0800
Physical Memory: 480 MB
Pagefile Total: 1125 MB
Virtual Total: 2047 MB
PageFile1: \??\E:\pagefile.sys, 720MB
OS Version: 5.1 Build 2600 Service-Pack: 2.0
WinDir: E:\WINDOWS

Kernrate User-Specified Command Line:
Kernrate_i386_XP.exe -z ntoskrnl

Kernel Profile (PID = 0): Source= Time,
Using Kernrate Default Rate of 25000 events/hit
CallBack: Finished Attempt to Load symbols for 804d7000 \WINDOWS\system32\ntoskrnl.exe

Starting to collect profile data

***> Press ctrl-c to finish collecting profile data
===> Finished Collecting Data, Starting to Process Results

------------Overall Summary:--------------

P0     K 0:00:01.406 (24.3%)  U 0:00:00.859 (14.8%)  I 0:00:03.531 (60.9%)  DPC 0:00:00.031 ( 0.5%)  Interrupt 0:00:00.062 ( 1.1%)
       Interrupts= 23804, Interrupt Rate= 4106/sec.

Total Profile Time = 5796 msec

                                       BytesStart          BytesStop         BytesDiff.
    Available Physical Memory   ,       117850112,       114122752,        -3727360
    Available Pagefile(s)       ,       370819072,       368578560,        -2240512
    Available Virtual           ,      2132889600,      2130681856,        -2207744
    Available Extended Virtual  ,               0,               0,               0

                                  Total      Avg. Rate
    Context Switches     ,       206440,         35612/sec.
    System Calls         ,       372915,         64330/sec.
    Page Faults          ,        14872,         2566/sec.
    I/O Read Operations  ,          186,         32/sec.
    I/O Write Operations ,          180,         31/sec.
    I/O Other Operations ,        19183,         3309/sec.
    I/O Read Bytes       ,        39296,         211/ I/O
    I/O Write Bytes      ,        11940,         66/ I/O
    I/O Other Bytes      ,      3000748,         156/ I/O

-----------------------------

Results for Kernel Mode:
-----------------------------

OutputResults: KernelModuleCount = 135
Percentage in the following table is based on the Total Hits for the Kernel

Time   1893 hits, 25000 events per hit --------
Module                                Hits   msec  %Total  Events/Sec
amdk7                                  1309       5796    69 %     5646135
ntoskrnl                                292       5796    15 %     1259489
hal                                      91       5796     4 %      392512
Ntfs                                     79       5796     4 %      340752
win32k                                   56       5796     2 %      241545
NVENET                                   28       5796     1 %      120772
ino_fltr                                  8       5796     0 %       34506
atapi                                     6       5796     0 %       25879
CLASSPNP                                  4       5796     0 %       17253
Npfs                                      3       5796     0 %       12939
PCIIDEX                                   3       5796     0 %       12939
watchdog                                  2       5796     0 %        8626
nv4_mini                                  2       5796     0 %        8626
sr                                        2       5796     0 %        8626
PartMgr                                   2       5796     0 %        8626
ftdisk                                    2       5796     0 %        8626
nv4_disp                                  1       5796     0 %        4313
tcpip                                     1       5796     0 %        4313
USBPORT                                   1       5796     0 %        4313
NDIS                                      1       5796     0 %        4313

===> Processing Zoomed Module ntoskrnl.exe...

----- Zoomed module ntoskrnl.exe (Bucket size = 16 bytes, Rounding Down) --------
Percentage in the following table is based on the Total Hits for this Zoom Module

Time   292 hits, 25000 events per hit --------
Module                                Hits   msec  %Total  Events/Sec
CcUnpinDataForThread                     32       5796    10 %      138026
KiDispatchInterrupt                      27       5796     8 %      116459
ZwYieldExecution                         18       5796     5 %       77639
FsRtlIsNameInExpression                  14       5796     4 %       60386
KiIpiServiceRoutine                      13       5796     4 %       56073
SeUnlockSubjectContext                    9       5796     2 %       38819
NtAllocateVirtualMemory                   8       5796     2 %       34506
ObReferenceObjectByHandle                 8       5796     2 %       34506
ExAllocatePoolWithTag                     8       5796     2 %       34506
NtRequestWaitReplyPort                    7       5796     2 %       30193
ExInterlockedPopEntrySList                7       5796     2 %       30193
SeDeleteAccessState                       6       5796     1 %       25879
ExAcquireResourceExclusiveLite            6       5796     1 %       25879
ExReleaseResourceLite                     6       5796     1 %       25879
NtOpenProcessTokenEx                      5       5796     1 %       21566
ObCreateObject                            5       5796     1 %       21566
ObfDereferenceObject                      5       5796     1 %       21566
wcstombs                                  4       5796     1 %       17253
MmMapLockedPagesSpecifyCache              4       5796     1 %       17253
IoBuildPartialMdl                         4       5796     1 %       17253
RtlCopyUnicodeString                      4       5796     1 %       17253

 

마지막으로 자신이 만든 모듈의 내부를 알고 싶다면
kernrate -z module -j module의 심볼 패스
명령을 수행한다면 내부 모듈중 어떤 것에서 병목 현상이 발생하고 있는지 확인할 수 있을 것입니다.
https://www.microsoft.com/downloads/details.aspx?FamilyID=d6e95259-8d9d-4c22-89c4-fad382eddcd1&DisplayLang=en