How does BizTalk detect if a host instance is dead in BTS2K6, 2K6R2 and 2K9?
is done by the cooperation between the front end BizTalk host instance process
and the backend BizTalk SQL job
1. When a BizTalk host instance
is started, it will call the store procedure
bts_ProcessHeartbeat_<HostName> once as the below.
Here @dwCommand=1 means
2. When a BizTalk host instance
is stopped, it will call the same store procedure
bts_ProcessHeartbeat_<HostName> once again as the below.
Here @dwCommand=2 means
3. Look at the code of the
store procedure bts_ProcessHeartbeat_<HostName>, could see the int_ProcessCleanup_<HostName> will be called if dwCommand=1(Process
Startup) or dwCommand=2(Process Shutdown). The store procedure int_ProcessCleanup_<HostName> is used internally
to clean up the records related with the host instance process and release all
messages and service instances which are locked by this process.
4. It is easy
to understand why int_ProcessCleanup_<HostName> is called when a host instance is shutdown, the
messages and service instances locked by the host instance can be freed before
the shutdown so they can be picked up by the same restarted host instance again
or they can be picked up by other host instances which are still running in a
multiple boxes BizTalk environment.
5. The questions is why int_ProcessCleanup_<HostName> is also called
when a host instance is startup if the locked messages and services instances
were already freed by the same SP when shutdown? Yes, it is redundant if the
same SP was already called when the instance shutdown normally, but the problem
is that the process could be terminated or crashed unexpectedly without calling
the SP, You can see that abnormal shutdown easily by killing a host process,
the SP bts_ProcessHeartbeat_<HostName> with the dwCommand=2 is not called
if the host process is killed by the command “kill /f”. As we can’t guarantee
the internal cleanup SP is already called before the host instance is started,
to call the internal cleanup SP again at the startup is the good choice
although it is redundant some times.
Another question is raised
at this point, if a host instance process is totally hang or dead there, or it
is just crashed or terminated unexpectedly without restarting, who can help to
detect the situation and release these locked messages and service instances?
How? Let’s continue.
6. In each running BizTalk host
instance, you can see one thread as the below.
7. The thread calls the
store procedure bts_ProcessHeartbeat_<HostName> with @dwCommand=0 every
60 seconds by default. The interval can be modified through the column
ConfigurationCacheRefreshInterval of the table adm_Group in BizTalkMGMTDb or
through the settings of the BizTalk group in BizTalk admin UI.
Here @dwCommand=0 means
8. Look at the code of the
store procedure bts_ProcessHeartbeat_<HostName>, the internal SP int_ProcessCleanup_<HostName> will not be called
if dwCommand=0(Process Live). The bts_ProcessHeartbeat_<HostName>
will only update a record or insert a new record if there is no existing one in
the table ProcessHeartbeats in BizTalkMsgBoxDb
for the host instance.
9. Let’s look at the record
in the table ProcessHeartbeats and see what it
is look like.
11/25/2009 4:52:44 AM
11/25/2009 4:52:44 AM
11/25/2009 5:02:44 AM
is the unique GUID for each host instance. You can get the uidProcessID for a
host instance by query the UniqueId column of the table adm_HostInstance in
BizTalkMGMTDb. You also can get the uidProcessID for a host instance by checking
the service property “Path to Executable” of the corresponding BizTalk service
in Windows Service Control Manager on a BizTalk box. For example, the following
is the “Path to Executable” setting for my "BizTalkServerApplication"
in my BizTalk testing box. You can see the ID is after the command line option
Files (x86)\Microsoft BizTalk Server 2006\BTSNTSvc.exe" -group
"BizTalk Group" -name "BizTalkServerApplication" -btsapp
note that the dtNextHeartbeatTime
is about equal to (dtLastHeartbeatTime+10*HeartbeatInterval).
Since the default HeartbeatInterval is
about 60 seconds by default(we mentioned how to modify that interval at the
above), the dtNextHeartbeatTime is about 10 minutes after the
dtLastHeartbeatTime by default. Of course if the HeartbeatInterval is modified to 30 seconds, then the dtNextHeartbeatTime
would be 5 minutes after the dtLastHeartbeatTime.
Normally the record for a specified host instance in the table ProcessHeartbeat
is updated every [HeartbeatInterval] seconds through the SP bts_ProcessHeartbeat_<HostName>
by the running host instance process.
11. Take a
look at the SQL job MessageBox_DeadProcesses_Cleanup_BizTalkMsgBoxDb,
it is configured to execute the store procedure bts_CleanupDeadProcesses every minute.
12. Look at the code of the SP bts_CleanupDeadProcesses, it is simply go
through the table ProcessHeartbeats and check if there is a host instance which
dtNextHeartbeatTime is older
than the current time. If there is, which means we already lost 10 heartbeats
at least from this host instance, this host instance is then considered as dead
and the store procedure int_ProcessCleanup_<HostName>
for that instance is called by the SQL job to clean up the records related with
the host instance process and release all messages and service instances which
are locked by the process.
If this SQL job is disabled
or get a problem when it is running, BizTalk will lost the capability to detect
“dead” of a host instance.