Uniform Resource Locators (URLs) in WinHTTP

A URL is a compact representation of the location and access method for a resource located on the Internet. Each URL consists of a scheme (HTTP, HTTPS, FTP, or Gopher) and a scheme-specific string. This string can also include a combination of a directory path, search string, or name of the resource. The Microsoft Windows HTTP Services (WinHTTP) functions provide the ability to create, combine, break down, and canonicalize URLs. For more information, see RFC 1738, Uniform Resource Locators and RFC 2396, Uniform Resource Identifiers (URI): Generic Syntax.

What Is a Canonicalized URL?

The specified syntax and semantics of URLs leaves room for variation and error. Canonicalization is the process of normalizing an actual URL into a correct, standard, "canonical" form.

This involves coding some characters as "escape sequences." Alphanumeric US-ASCII characters need not be encoded (the digits 0-9, the capital letters A-Z, and the lowercase letters a-z). Most other characters must be escaped, including control characters, the space character, the percent sign, "unsafe characters" ( <, >, ", #, {, }, |, \, ^, ~, [, ], and ' ), and all characters with a code point above 127.

Using the WinHTTP Functions to Handle URLs

WinHTTP provides two functions for handling URLs. WinHttpCrackUrl separates a URL into its component parts, and WinHttpCreateUrl creates a URL from components.

Separating URLs

The WinHttpCrackUrl function separates a URL into its component parts and returns the components indicated by the URL_COMPONENTS structure that is passed to the function.

The components that make up the URL_COMPONENTS structure are the scheme number, host name, port number, user name, password, URL path, and additional information such as search parameters. Each component, except the scheme and port numbers, has a string member that holds the information and a member that holds the length of the string member. The scheme and port numbers have only a member that stores the corresponding value; both the scheme and port numbers are returned on all successful calls to WinHttpCrackUrl.

To retrieve the value of a particular component in the URL_COMPONENTS structure, the member that stores the string length of that component must be set to a nonzero value. The string member can be either a pointer to a buffer or NULL.

If the pointer member contains a pointer to a buffer, the string length member must contain the size of that buffer. The WinHttpCrackUrl function returns the component information as a string in the buffer and stores the string length in the string length member.

If the pointer member is set to NULL, the string length member can be set to any nonzero value. The WinHttpCrackUrl function stores a pointer to the first character of the URL string that contains the component information and sets the string length to the number of characters in the remaining part of the URL string that pertains to the component.

All pointer members set to NULL with a nonzero length member point to the appropriate starting point in the URL string. The length stored in the length member must be used to determine the end of the individual component's information.

To finish initializing the URL_COMPONENTS structure properly, the dwStructSize member must be set to the size of the URL_COMPONENTS structure.

Creating URLs

The WinHttpCreateUrl function uses the information in the previously described URL_COMPONENTS structure to create a URL.

For each required component, the pointer member should contain a pointer to the buffer that holds the information. The length member should be set to zero if the pointer member contains a pointer to a zero-terminated string; the length member should be set to the string length if the pointer member contains a pointer to a string that is not zero-terminated. The pointer member of any components that are not required must be set to NULL.

Sample code

The following sample code shows how to use the WinHttpCrackUrl and WinHttpCreateUrl to disassemble an existing URL, modify one of its components, and reassemble it into a new URL.

  URL_COMPONENTS urlComp;
  LPCWSTR pwszUrl1 = 
    L"https://search.msn.com/results.asp?RS=CHECKED&FORM=MSNH&v=1&q=wininet";
  DWORD dwUrlLen = 0;

  // Initialize the URL_COMPONENTS structure.
  ZeroMemory(&urlComp, sizeof(urlComp));
  urlComp.dwStructSize = sizeof(urlComp);

  // Set required component lengths to non-zero so that they are cracked.
  urlComp.dwSchemeLength    = (DWORD)-1;
  urlComp.dwHostNameLength  = (DWORD)-1;
  urlComp.dwUrlPathLength   = (DWORD)-1;
  urlComp.dwExtraInfoLength = (DWORD)-1;

  // Crack the URL.
  if( !WinHttpCrackUrl( pwszUrl1, (DWORD)wcslen(pwszUrl1), 0, &urlComp ) )
      printf( "Error %u in WinHttpCrackUrl.\n", GetLastError( ) );
  else
  {
    // Change the search information.  New info is the same length.
    urlComp.lpszExtraInfo = L"?RS=CHECKED&FORM=MSNH&v=1&q=winhttp";

    // Obtain the size of the new URL and allocate memory.
    WinHttpCreateUrl( &urlComp, 0, NULL, &dwUrlLen );
    LPWSTR pwszUrl2 = new WCHAR[dwUrlLen];

    // Create a new URL.
    if( !WinHttpCreateUrl( &urlComp, 0, pwszUrl2, &dwUrlLen ) )
      printf( "Error %u in WinHttpCreateUrl.\n", GetLastError( ) );
    else
    {
      // Show both URLs.
      printf( "Old URL:  %S\nNew URL:  %S\n", pwszUrl1, pwszUrl2 );
    }

    // Free allocated memory.
    delete [] pwszUrl2;
  }