Troubleshooting Cross Platform Discovery and Agent Installation (part 4)
This is part of a multi-part article (if you didn't gather that from the title). For easy reference, here are all the parts.
Scenario #4 – "New computer shows as Platform: Unknown and Version: Unknown"
When I go to the "Unix/Linux Servers" folder in the administration pane, I see my new computer as "Unmonitored" and in the Platform/Version Unknown grouping. The agent deployment was successful, so why is it showing as unmonitored? DebugView is no longer helpful since it doesn't show the agent communication processes. Since they're local, I go see if there's info in any of the module debug logs in C:\Windows\temp. Looking in the SCXLogModule.log file, sure enough, the last line says this:
13: 06/10/10 10:33:31 : ResourceStore::FetchNowAndDeliverIfTime - /var/log/secure - EXCEPTION: Access is denied.
Access denied? Hmm. I wonder if I forgot to set up the Run As account correctly… You know, I do remember having some other machines on here that *might* have used a different root password. I'll just go reset the root password in the Run As account and see what happens. I'll also restart the System Center Management Configuration service to make OpsMgr refresh the discovery. I'll give it a few minutes to go through that process and check again. I'll just wait for the SCXLogModule.log file to change again and that should signal it's tried to connect.
Ok, good. It's now at the "Warning" stage, which means it's being monitored, but it's still under the "Unknown" group, and the icon is grayed out, not in color. So something is still wrong. Looking in the Health Explorer, I see that the basic connection monitors have changed state, but not the OS, Application or hardware monitors:
Looking at the CentOS Computers diagram, I see my new computer, so I know that at least computer and group discovery is working correctly. I need to make sure that the Linux computer is giving me enough logging info, so I go to the computer and open a terminal window and go to /opt/microsoft/scx/bin/tools and run the command ./scxadmin –log-set all verbose and then restart the agent and CIM database with ./scxadmin –restart all.
Looking at the scx.log file (located at /var/opt/microsoft/scx/log/scx.log), I can see lots of activity, including calls to the log file provider, OS provider, and it's doing things like EnumInstances. So let me check to see if anything is actually showing up in the CIM database. I can do this by using a WinRM (WS-Man) command from the OpsMgr server:
winrm enumerate http://schemas.microsoft.com/wbem/wscim/1/cim-schema/2/SCX_OperatingSystem?__cimnamespace=root/scx -username:root -password:<mypassword> -r:https://centos55-x86:1270/wsman -auth:basic -skipCACheck -encoding:utf-8 -format:#pretty
I get an immediate response:
<wsman:Results xmlns:wsman="http://schemas.dmtf.org/wbem/wsman/1/wsman/results"> <p:SCX_OperatingSystem xml:lang="" xmlns:p=http://schemas.microsoft.com/wbem/wscim/1/cim-schema/2/SCX_OperatingSystem xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <p:CSCreationClassName>SCX_ComputerSystem</p:CSCreationClassName> <p:CSName>centos55-x86</p:CSName> <p:Caption>CentOS release 5.5 (Final)</p:Caption> <p:CreationClassName>SCX_OperatingSystem</p:CreationClassName> <p:CurrentTimeZone>-420</p:CurrentTimeZone> <p:Description>CentOS release 5.5 (Final)</p:Description> <p:Distributed xsi:nil="true"></p:Distributed> <p:ElementName xsi:nil="true"></p:ElementName> <p:EnabledDefault>2</p:EnabledDefault> <p:EnabledState>5</p:EnabledState> <p:FreePhysicalMemory>882688</p:FreePhysicalMemory> <p:FreeSpaceInPagingFiles>2096128</p:FreeSpaceInPagingFiles> <p:FreeVirtualMemory>2978816</p:FreeVirtualMemory> <p:HealthState xsi:nil="true"></p:HealthState> <p:InstallDate xsi:nil="true"></p:InstallDate> <p:LastBootUpTime>2010-06-10T09:33:15.329647Z</p:LastBootUpTime> <p:LocalDateTime>2010-06-10T11:29:25.019647Z</p:LocalDateTime> <p:MaxNumberOfProcesses>32766</p:MaxNumberOfProcesses> <p:MaxProcessMemorySize>0</p:MaxProcessMemorySize> <p:MaxProcessesPerUser>999</p:MaxProcessesPerUser> <p:Name>Red Hat Distribution</p:Name> <p:NumberOfLicensedUsers>0</p:NumberOfLicensedUsers> <p:NumberOfProcesses>113</p:NumberOfProcesses> <p:NumberOfUsers>2</p:NumberOfUsers> <p:OSType>36</p:OSType> <p:OperatingSystemCapability>32 bit</p:OperatingSystemCapability> <p:OperationalStatus xsi:nil="true"></p:OperationalStatus> <p:OtherEnabledState xsi:nil="true"></p:OtherEnabledState> <p:OtherTypeDescription>2.6.18-194.el5 #1 SMP Fri Apr 2 14:58:35 EDT 2010</p:OtherTypeDescription> <p:RequestedState>12</p:RequestedState> <p:SizeStoredInPagingFiles>2096128</p:SizeStoredInPagingFiles> <p:Status xsi:nil="true"></p:Status> <p:StatusDescriptions xsi:nil="true"></p:StatusDescriptions> <p:SystemUpTime>6969</p:SystemUpTime> <p:TimeOfLastStateChange xsi:nil="true"></p:TimeOfLastStateChange> <p:TotalSwapSpaceSize>2096128</p:TotalSwapSpaceSize> <p:TotalVirtualMemorySize>3130368</p:TotalVirtualMemorySize> <p:TotalVisibleMemorySize>1034240</p:TotalVisibleMemorySize> <p:Version>2.6.18-194.el5</p:Version> </p:SCX_OperatingSystem> </wsman:Results>
I try a few other classes and I get information back from them too. So I can tell from this that the agent/providers and CIM is working, and that going outside of OpsMgr and using WS-Man, I can retrieve that information, so something in OpsMgr is either not gathering the information or not linking it to the right classes so it will display in the monitors.
I decide to bring out the "big guns" and enable the OpsMgr trace logging. I go to C:\Program Files\System Center Operations Manager 2007\Tools and run the StartTracing.cmd script, I do that and I see a bunch of errors in one of the logs (in C:\Windows\temp\OpsMgrTrace) around the Data Warehouse functions that say "access denied".
[SecureStorageManager_cpp5654]GetLogonToken on user SCXDEM1\administrator failed with code 1326(ERROR_LOGON_FAILURE)
OMG…I went in and changed the OpsMgr administrator account, reset the password on the Unix Action accounts, but didn't reset the password on the Data Warehouse Action Account (which also runs off the Administrator login). I go reset that account configuration with the current password and now my CentOS computer shows as healthy! But wait… all it did was make the gray icons color. The same monitors that were blank before are still blank:
So it looks like I only solved part of the problem. I still need to figure out why my monitors aren't getting filled. Perhaps it's just a matter of waiting for the discovery processes to complete now that all the underlying stuff is working. As it turns out, this is really what's going on because I went to lunch and came back and now my computer is showing up under a proper OS grouping (not "Unknown"):
All of my monitors that should be working are now working:
And my diagram view changed to reflect the monitors:
So, finally, it looks like I'm done and everything is working! I will be writing more of these kinds of articles in the future to help you debug issues. Please comment below if you find them useful.
Until next time…