Question: How to Troubleshoot Alerts in WSS 3.0 / MOSS
The most common issue in alert is the user will get the initial email, but will not get when changes are made to the list where he configured the alert. Check the following settings in order and ensure that it is correct.
1. Is it an upgrade from V2 to V3? If yes, check the upgrade section at the bottom of this page.
2. If this is a new install of V3 (WSS or MOSS), then do the following
a. Create a new web app and see if the alerts work there. If yes, you can move the content db of the non-working site to this one. If this is not possible, then try the following.
b. Open the command prompt and go to the 12\Bin folder. Run this command and see whether alerts are enabled for the web application.
Stsadm.exe-o getproperty -url http://problemsite -pn alerts-enabled
The expected output is <Property Exist="Yes" Value="yes" />. If you don’t get this, run the following command to change the value.
stsadm.exe -o setproperty -pn alerts-enabled -pv "true" -url http://problemsite
If the property is Yes and still the alerts are not sent, toggle the property from Yes to NO
and then from No to Yes. This may delete all the existing alerts
c. Check the property job-immediate-alerts schedule through command prompt. Run this command from the 12\bin folder. Even though you have properties like job-daily-alerts and job-weekly-alerts, the only timer job that exists is the job-immediate-alerts.
stsadm.exe -o getproperty -url http://ProblemSite -pn job-immediate-alerts
The expected output is
<Property Exist="Yes" Value="every 5 minutes between 0 and 59"/>. If you don’t get this, run the following command to change the value.
stsadm.exe -o setproperty -pn job-immediate-alerts -pv "every 5 minutes
between 0 and 59" -url http://ProblemSite
d. Confirm the above step through the UI. Central adminàOperationsàTimer Job Definitions and ensure that a job named Immediate Alerts is present for the web application.
e. Configure the alert for a user in a list. Instead of typing the domain name\ user name, type the email address of the user and see if he gets initial email. Then make a change to the list and see whether he gets the mail related to the change.
f. If he gets the alert as expected, then create a new alert and this time select the user from the people picker or type the domain name\user name.
g. If the alert is not working after doing the above step, then check the ImmedSubscriptions / SchedSubscriptions table of the content db of the web application depending on the type of alert and see a new record is added and ensure that the email field for the user is populated correctly. If not, check whether the email address is present in the user’s profile through the SSP admin page.
h. The initial alert is not security trimmed. So irrespective of whether the user has privileges on the list or not, the initial alert will be sent. If the user is not getting alert for any changes, then check whether the user has Read permission on the list.
i. This also happen if there is some issue in their mail provider (third party email providers. In this case, capture the ULS log with all information events in the verbose mode and check for lines which says Alert has been sent.
j. Email enabled security groups can also be used for configuring alerts. If the changes are not notified to the members of the group, check whether the group is added to the list with minimum Read permission. Also some email providers block email groups if the number of members are more than certain number.
k. Open SQL Query Analyzer; connect to the content database of problematic site. Run the following query.
Select * from timerlock
This server is responsible for processing the timer service. You can follow this KB –
http://support.microsoft.com/kb/934838 to sync the accounts and password across the
Farm. Each content db can have a different server present in the timerlock table.
The internal working mechanism of how an alert should work in v3:
The EventCache table records the SQL level events as they occur and the EventData and ACL columns are not NULL for an alert event. There is an alerts timer job that runs periodically to process the records in the EventCache. After the alerts timer job runs, it nulls out the EventData and ACL columns. Then, it will log an event into the EventLog table. So check the following in SQL.
select * from eventcache where EventData is not null
This will output all of the subscriptions which have not been processed yet. We can see if there are some alerts which are not processed.
select * from eventlog where ListID = 'xxx'
You can get the ListID from the EventCache table by running
Select * from EventCache and check the documents which correspond to the problematic list.
If you cannot find any record, perform the following tests:
Run filemon on the MOSS server which is responsible for the timer service and check if the Timer
service picks up the alert template during the whole process.
Upload a new document to the document library which is supposed to have the alerts. Begin running
the filemon and analyze the logs.
p. select * from eventcache order by EventTime DESC
Check if the latest log is the one corresponds to your uploaded document. Make sure the EventData and ACL columns are not NULL.
After 5 minutes or more minutes, check the EventCache table again to see if the EventData and ACL columns are NULLed.
If so, stop filemon after the EventData and ACL columns are NULLed . Review log.
If its a scheduled alert (Dialy, Weekly), then you need to see the entries in the EventSubsmatches table as well.
Upgrade from V2 to V3
1. Check this KB - http://support.microsoft.com/kb/936759/en-us
2. If no alerts are getting triggered even for a new web app, then run the Psconfig Wizard without changing any options.
3. If the alert works on new web apps, then you can move the site collections from the upgraded web app to the newly created web app through stsadm –o backup / restore or stsadm –o export / import. This action will be useful only if you have fewer site collections and this is only a workaround.
Some interesting behaviours
The alert was working for one web app and not for any other web applications. The timerlock table in the content db of the working web app has the serverX and all the other content db has ServerY. Shutdown the ServerY and wait for the timerlock table for the non-working content db's to take ServerX. Once that is done, restart the ServerY and now the alerts started working. This is actually a workaround and not a fix. Now you need to find the actual issue by looking into the logs of ServerY.
All the elements of alert was checked and still the alerts dont fire. Then found that the time on the servers in the farm were different and the difference was only 2 mins. Once we did the time sync with the domain time server using the Net Time command, the alerts started flowing.