Usage Event Logging in Windows SharePoint Services 3.0

Summary: Parse the log files that Windows SharePoint Services 3.0 produces when logging is enabled to effectively analyze the usage event data. (9 printed pages)

Radu Rusu, Microsoft Corporation

Erick R. Lerma, Microsoft Corporation

Les W. Smith, Microsoft Corporation

September 2007

Applies to: Windows SharePoint Services 3.0

Contents:

  • Introduction to Usage Event Logging

  • Examining the Usage Log File Format

  • Parsing the Usage Event Logs

  • Testing a Code Example

  • Conclusion

  • Additional Resources

Introduction to Usage Event Logging

This article describes the best way to obtain usage event data from Windows SharePoint Services 3.0, which is to parse the log files that are created when logging is enabled in a deployment. The article provides information about the format of the log files generated by Windows SharePoint Services 3.0, and provides an example that demonstrates how to create a Microsoft Visual C++ application that extracts the usage data from these files.

Important

The example in this document is provided for information only and Microsoft makes no warranties, either expressed or implied, in this document. The entire risk of the use or the results of the use of the example in this document remains with the user.

You can obtain usage event data in Windows SharePoint Services 3.0 in other ways, such as through the following:

  • Managed code that implements the GetUsageData method of the SPWeb class

  • Windows SharePoint Services 3.0 remote procedure call (RPC) protocol that posts the GetUsageBlob method

  • Microsoft Office SharePoint Designer 2007

However, each of these approaches has limitations when compared to parsing the log files. For information about how to use the GetUsageBlob RPC method, and to view a Windows SharePoint Services 2.0 example that demonstrates how to post this method within managed code, download the Usage BLOB Parser, which is available in the Microsoft Download Center.

Note

The GetUsageData method of the SPWeb class returns usage data in formats that differ depending on the type of report or time period specified, but this method has a 2000-row limit that restricts its usefulness in site usage analysis. The GetUsageBlob method returns the same data and does not have a row limit, but this method does not return data in a useful format and is difficult to parse. You can use Office SharePoint Designer 2007, which parses the same data and summarizes usage in a compressed format. However, the SharePoint Designer client does not expose this data through its object model. Therefore, the data returned has no practical use for a server application. Each of these methods for returning usage data is additionally limited because its information applies only to a single SharePoint Web site and not to a Web application. The data accumulates usage information over a long period of time, and therefore no longer stores the correlation between the fields on a hit (for example, which user saw which page at a particular time).

Windows SharePoint Services 3.0 generates usage event logs daily for each Web application when Enable logging is selected on the Usage Analysis Processing page in SharePoint Central Administration. When logging is enabled, Windows SharePoint Services by default creates log files in the Program Files\Common Files\Microsoft Shared\Web Server Extensions\12\Logs path on the front-end Web server, although you can specify an alternative location. The Logs directory contains a folder for each Web application on the Web server, each named with a GUID that identifies the respective Web application. Each Web application folder contains subfolders for each day, which in turn contain the daily usage log for each Web application. In addition to containing information for each Web application, the Windows SharePoint Services 3.0 logs are also useful because they associate users with page hits and with time stamps.

Note

To view the log files generated in Windows SharePoint Services 3.0 you must be an administrator (or a member of the STS_WPG group, which includes but is not limited to the administrators) for the computer that contains the files.

Supplementing Usage Event Logging with IIS Logs

The information provided through usage event logging in Windows SharePoint Services 3.0 can be supplemented with information provided through the logs generated by Internet Information Services (IIS). IIS logs include the IP address of the server, the type of request, the port number for the request, and other information. For more information about IIS logging, see Logging Site Activity (IIS 6.0)

Examining the Usage Log File Format

A Windows SharePoint Services 3.0 log file consists of separate entries for each page hit that occurs in a Web application. Each log entry starts with a structure in binary format whose fields indicate the number of bytes used for each subsequent part in the entry.

Note

Windows SharePoint Services 3.0 adds 300 bytes of padding to the top of the log file, which starts with the string "Windows SharePoint Services HTTP log file". This padding helps prevent the introduction of malicious script into the file.

The following table shows the name and data type of each field in the structure and describes the information that is contained in each field of an entry.

Table 1. Fields represented in the structure

Name

Data Type

Description

pPrev

uint64

Points to the previous entry.

bitFlags

uint8

Flag that indicates the type of hit, which can be one of the following values:

0  Regular hit.

1  Used by the Microsoft Office SharePoint Designer client application to indicate a visit (whether or not from a site referrer URL).

2  List update.

4  List operation (for example, a post to owssvr.dll) that is not an update.

8  Discussion request made through the Office Server Extensions (OSE) Discussion button in Internet Explorer.

cbEntry

uint16

Number of bytes to skip ahead to next entry.

cbSiteUrl

uint16

Absolute URL of the top-level site in the site collection that contains the site from which the request was made. For example, http://Server/sites/Top_Site.

cbWeb

uint16

URL of the site or subsite relative to the top-level site. For example, Subsite_1/Subsite_2/Subsite_3.

cbDoc

uint16

Site-relative URL of the page that is visited. For example, lists/List_Name/allitems.aspx.

cBytes

uint32

Bandwidth consumed by the request, including bytes received and bytes sent.

httpStatus

uint16

HTTP status code, which is the same as in IIS logs. Windows SharePoint Services 3.0 logs only successful hits, so its value is always between 200 and 299 (in other words, never 304, 401, or 404). Almost all recorded hits have an HTTP status code equal to 200.

cbUser

uint16

Name of the user making the request. For example, DOMAIN\User_Alias.

cbRefQS

uint16

When applicable, query string that is used by a referring URL.

cbRef

uint16

When applicable, URL from which the user navigated to the page. Excludes cases in which the referring URL subsumes the site from which the request was made, and cases in which the URL is typed fully in the browser.

cbUAS

int16

User agent.

cbQS

uint16

When applicable, query string that is used in the request URL.

version

uint32

ETag indicating the version of the requested URL.

reserved

uint16

Reserved. No definition required. IIS instance ID.

In addition to containing a structure with byte counts, each log entry also contains the following:

  • A carriage return/line feed (\r\n), to make the entry more readable to human readers.

  • The GUID of the site in which the request was made.

  • The time stamp of the request expressed in the local time zone of the server.

  • Null-terminated strings containing values that correspond, respectively, to each field in the structure.

Windows SharePoint Services 3.0 inserts an ampersand (&) between the top-level site URL and the subsite URL when it processes the log files. This marks the log file as "processed" and prevents data from being counted twice if the usage processing job is accidentally run again on the same day. For example, if someone changes the processing time from 01:00 (1 A.M) to 11:00 (11 P.M.) in the middle of the day, the previous day's logs are not counted twice.

Parsing the Usage Event Logs

Humans can read the usage event log files easily, but the logs can also be consumed by a tool that can parse the files and provide information about site usage in a human-readable format.

You can create such a tool in Visual C++ or Microsoft Visual C# that reprocesses the usage logs and runs on the same server that generates the logs. The tool can deliver output in various formats for additional querying, such as in a database or as emitted data in a .csv file.

The tool must read a structure in the following format that precedes each log entry.

typedef struct _VLogFileEntry
{
    unsigned long long pPrev; /* Pointer to previous entry (8 bytes)*/
    unsigned char bitFlags; /* Flags describing the current log entry (1 byte)*/
    unsigned short cbEntry; /* Number of bytes to skip ahead to next entry (2 bytes)*/
    unsigned short cbSiteUrl; // (2 bytes)
    unsigned short cbWeb; // (2 bytes)
    unsigned short cbDoc; // (2 bytes)
    unsigned long cBytes; /* Bandwidth consumed (bytes in + bytes out) (4 bytes)*/
    unsigned short      httpStatus;// (2 bytes)
    unsigned short      cbUser;    // (2 bytes)
    unsigned short      cbRefQS;   // (2 bytes)
    unsigned short      cbRef;     // (2 bytes)
    short               cbUAS;     // (2 bytes)
    unsigned short      cbQS;      // (2 bytes)
    unsigned long       version;   // (4 bytes)
    unsigned short      reserved;  // (2 bytes)
} VLogFileEntry;

/*Note: The bitFlags field is a bitwise mask of the following values:
0x00000000 - Generic request
0x00000001 - List update request
0x00000010 - List non update request
0x00000100 - Discussion request
0x00001000 - Windows SharePoint 2007 request
*/

After the file is mapped to a memory address, code such as the following can then traverse each entry in a log and return site usage information.

unsigned long cbEntrySize = 0;

for(pCur = pBase; 
    pCur < pEnd; 
    pCur += cbEntrySize)
{
    pLFE = (VLogFileEntry *)pCur;

    pszSiteGuid = pCur + sizeof(VLogFileEntry) + 2;
    pszTS   = pszSiteGuid + cbSiteGuid + 1;
    pszSite = pszTS + cbTimeStamp + 1; 
    *(pszSite + pLFE->cbSiteUrl) = '\0';
    pszWeb = pszSite + pLFE->cbSiteUrl + 1;
    pszDoc  = pszWeb + pLFE->cbWeb + 1;
    pszUser = pszDoc + pLFE->cbDoc + 1;

After casting the current entry as a structure, the example proceeds to gather the site GUID, time stamp, URL of the top-level site, relative URL of the subsite, file name of the page that was visited, and name of the user. The example considers the two bytes used for the carriage return/line feed that appears between the binary structure and site GUID in each entry, and also the single byte used in null separators between the different parts of the entry.

The preceding for loop should also include error handling for cases of corrupted log data. The following code example determines the total size of an entry based on specific parts of the structure, and then checks for cases in which the size exceeds a specified value, in which an entry contains only a carriage return or line feed, or in which an entry does not equal the total size.

const unsigned long maxCbEntrySize = 4096;

cbEntrySize = sizeof(VLogFileEntry) \
        + cbWebAppGuid + cbSiteGuid + cbTimeStamp \
        + pLFE->cbSiteUrl + pLFE->cbWeb + pLFE->cbDoc \
        + pLFE->cbUser + pLFE->cbQS \
        + pLFE->cbRefQS + pLFE->cbRef \
        + pLFE->cbUAS + 13; /* 11 NULLs (1 per field) and 2 bytes for \r\n */

// Check for corrupt log files.
fError  = (cbEntrySize > maxCbEntrySize ||
    !(*(pCur + sizeof(VLogFileEntry)) == '\r') ||
    !(*(pCur +  sizeof(VLogFileEntry) + 1) == '\n') ||
    !(pLFE->cbEntry == cbEntrySize));

if (fError)
{
    printf("Error reading WSS log file, aborting.\n");
    goto cleanup;
}

Testing a Code Example

The following example shows code that can be used in the Project_Name.cpp file of a C++ application to parse a Windows SharePoint Services 3.0 log file and emit the usage data as a .csv file.

Important

The primary purpose of this example is to demonstrate basic considerations for how to write code that parses the log files. This example does not include all the code that would typically be found in a full production system; much of the usual data validation and error handling is removed to focus this example on what your code must accomplish. Technical support is not available for this example.

To test the example, open Microsoft Visual Studio 2005 on the server that contains the log files and create a Microsoft Visual C++ console application.

To create a Visual C++ console application

  1. On the File menu in Visual Studio, point to New, and then click Project.

  2. In the New Project dialog box, under Project Types, click Visual C++ Projects. In Templates, click Console Application.

  3. In the Name box, type a name for the project (Project_Name), in the Location box, type the path in which to create the application, and then click OK.

  4. In Solution Explorer, double-click the Project_Name.cpp file that is produced and replace the code that Visual Studio includes by default with the following code.

    #include "stdafx.h"
    #include "windows.h"
    #include "assert.h"
    #include <stdio.h>
    
    const char szPaddingLogFile[] = \
    "Windows SharePoint Services HTTP log file                   "\
    "                                                            "\
    "                                                            "\
    "                                                            "\
    "                                                           ";
    
    typedef struct _VLogFileEntry
    {
        unsigned long long  pPrev; // Pointer to previous entry.
        unsigned char       bitFlags;
        unsigned short      cbEntry; /* Number of bytes to skip ahead 
    to next entry. */
        unsigned short      cbSiteUrl;
        unsigned short      cbWeb;
        unsigned short      cbDoc;
        unsigned long       cBytes; /* Bandwidth consumed 
    (bytes in + bytes out) */
        unsigned short      httpStatus;
        unsigned short      cbUser;
        unsigned short      cbRefQS;
        unsigned short      cbRef;
        short               cbUAS;
        unsigned short      cbQS;
        unsigned long       version;
        unsigned short      reserved;
    } VLogFileEntry;
    
    
    int main(int argc, char * argv[])
    {
        bool fError = FALSE;
        if (argc < 3)
        {
            printf(
                "\nUsage: %s wsslogfile csvfile optionalField1 optionalField2\n", 
                argv[0]);
            return(1);
        }
    
        char *szFile = argv[1];
        char *szCsvFile = argv[2];
        char *szOptionalField1 = argc > 3 ? argv[3] : NULL;
        char *szOptionalField2 = argc > 4 ? argv[4] : NULL;
        char *szGuid = NULL;
        char *szReplace = NULL;
    
        /* Format of each .csv line. Include optional fields
        passed as command-line arguments, if any.*/
        char *szFormat = "%s,%s,%s,%s,%s,%s,%s,%s\r\n";
        if (NULL == szOptionalField1)
            szFormat += 3;
        if (NULL == szOptionalField2)
            szFormat += 3;
    
        FILE *csvFile = NULL;
        fopen_s(&csvFile, szCsvFile, "a");
    
        // Bytes (with no braces).
        static const unsigned short cbWebAppGuid = 36; 
        static const unsigned short cbSiteGuid   = 36; 
        static const unsigned short cbTimeStamp  = 8;
    
        printf("\r\nParsing %s to %s \r\n",  szFile, szCsvFile);
        char *pBase, *pEnd;
        HANDLE hF, hFM;
        if ((hF = CreateFile(
                szFile,
                GENERIC_READ, 
                0, 
                NULL, 
                OPEN_EXISTING, 
                FILE_ATTRIBUTE_NOT_CONTENT_INDEXED,
                NULL)) == INVALID_HANDLE_VALUE)
        {
            printf(
                "Cannot open file %s (perhaps because it doesn't exist)",
                szFile);
            return (1);
        }
    
        DWORD dwFileSize, dwFileSizeHigh = 0;
        dwFileSize = GetFileSize(hF, &dwFileSizeHigh);
    
        /* We should never encounter a file larger than about 1 GB. */
        if (dwFileSizeHigh || dwFileSize > 1000000000)
        {
            printf(" File too large %s", szFile);
            CloseHandle(hF);
            return (1);
        }
    
        if (dwFileSize == 0)
        {
            printf(" Skipping empty file %s", szFile);
            CloseHandle(hF);
            return (1);
        }
    
        hFM = CreateFileMapping(hF, NULL, PAGE_WRITECOPY, 0, 0, NULL);
        if ((NULL == hFM) || 
            (NULL == (pBase = (char *)MapViewOfFile(hFM, FILE_MAP_COPY, 0, 0, 0))))
        {
            printf(" Can't map file %s", szFile);    
            if (hFM)
                CloseHandle(hFM);
            CloseHandle(hF);
            return (1);
        }
    
        pEnd = pBase + dwFileSize - sizeof(VLogFileEntry);
    
        char *pCur, *pszSite, *pszSiteGuid, *pszTS;
        char *pszWeb, *pszDoc, *pszUser;
        VLogFileEntry *pLFE;
        unsigned long cItemsProcessed = 0;
        unsigned long cbEntrySize = 0;
        const unsigned long maxCbEntrySize = 4096;
    
        //Skip ahead to the end of the file padding.
        unsigned long cbPadding = sizeof(szPaddingLogFile)/sizeof(szPaddingLogFile[0]);
        if (!strncmp(pBase, szPaddingLogFile, cbPadding))
            pBase += cbPadding;
    
        for(pCur = pBase;
            pCur < pEnd;
            pCur += cbEntrySize)
        {
            pLFE = (VLogFileEntry *)pCur;
    
            cbEntrySize = sizeof(VLogFileEntry) \
                + cbWebAppGuid + cbSiteGuid + cbTimeStamp \
                + pLFE->cbSiteUrl + pLFE->cbWeb + pLFE->cbDoc \
                + pLFE->cbUser + pLFE->cbQS \
                + pLFE->cbRefQS + pLFE->cbRef \
                + pLFE->cbUAS + 13; /* 11 NULLs (1 per field) and 2 bytes for \r\n */
    
            // Check for corrupt log files.
            fError  = (cbEntrySize > maxCbEntrySize ||
                !(*(pCur + sizeof(VLogFileEntry)) == '\r') ||
                !(*(pCur +  sizeof(VLogFileEntry) + 1) == '\n') ||
                !(pLFE->cbEntry == cbEntrySize));
    
            if (fError)
            {
                printf("Error reading WSS log file, aborting.\n");
                goto cleanup;
            }
    
            // Skip 2 bytes for \r\n.
            pszSiteGuid = pCur + sizeof(VLogFileEntry) + 2;
            // Skip 1 byte for the NULL separator.
            pszTS = pszSiteGuid + cbSiteGuid + 1;
            pszSite = pszTS + cbTimeStamp + 1;
            // Stop at the end of the site URL. 
            *(pszSite + pLFE->cbSiteUrl) = '\0';
            // Skip 1 byte for the NULL separator.
            pszWeb = pszSite + pLFE->cbSiteUrl + 1;
            pszDoc  = pszWeb + pLFE->cbWeb + 1;
            pszUser = pszDoc + pLFE->cbDoc + 1;
    
            /* Output is in this format: timestamp, site guid, siteUrl, 
             subsite, document, user, optional1, optional2*/
            fprintf(csvFile, 
                szFormat,
                pszTS,
                pszSiteGuid,
                pszSite,
                pszWeb,
                pszDoc,
                pszUser,
                szOptionalField1,
                szOptionalField2);
        }
    
        cleanup:
        UnmapViewOfFile(pBase);
        CloseHandle(hFM);
        CloseHandle(hF);
        fclose(csvFile);
    
        return fError;
    }
    
  5. On the Build menu, click Build Solution.

  6. At a command prompt, navigate to the folder that contains the new .exe file of the project.

  7. At the prompt, type Project_Name.exe followed by a space, the complete path of a log file to parse, followed by a space, and the location where you want to create the file. The following example creates a .csv file in the root directory that contains usage information for April 1, 2007.

    WssLogParser.exe C:\WINDOWS\system32\LogFiles\STS\33AEF972-56BA-
       4294-98C7-0ACCF64585B8\2007-04-01\00.log c:\2007_04_01.csv
    

You can optionally pass two parameters to include two additional fields in the .csv file, such as the log date or the GUID of the Web application, which both serve as part of the path of the log file. These extra parameters can be useful to keep the Web application or day of the usage data clear in a scenario in which the tool is used on many log files spanning multiple Web applications or days, and the output is directed to a single .csv file.

Conclusion

The logs that Windows SharePoint Services 3.0 generates provide the most convenient access to usage event logging for a site. The example in this article shows how to parse these logs and generate output in a specified format. You can also create a tool that, instead of outputting data in .csv format, exports information to a database for additional processing in the context of a larger operation.

Additional Resources

For more information Windows SharePoint Services 3.0, see the following resources: