Improving String Handling Performance in ASP Applications

 

James Musson
Developer Services, Microsoft UK

March 2003

Applies to:
   Microsoft® Active Server Pages®
   Microsoft Visual Basic®

Summary: Most Active Server Pages (ASP) applications rely on string concatenation to build HTML-formatted data that is then presented to users. This article contains a comparison of several ways to create this HTML data stream, some of which provide better performance than others for a given situation. A reasonable knowledge of ASP and Visual Basic programming is assumed. (11 printed pages)

Contents

Introduction
ASP Design
String Concatenation
The Quick and Easy Solution
The StringBuilder
The Built-in Method
Testing
Results
Conclusion

Introduction

When writing ASP pages, the developer is really just creating a stream of formatted text that is written to the Web client via the Response object provided by ASP. You can build this text stream in several different ways and the method you choose can have a large impact on both the performance and the scalability of the Web application. On numerous occasions in which I have helped customers with performance-tuning their Web applications, I have found that one of the major wins has come from changing the way that the HTML stream is created. In this article I will show a few of the common techniques and test what effect they have on the performance of a simple ASP page.

ASP Design

Many ASP developers have followed good software engineering principles and modularized their code wherever possible. This design normally takes the form of a number of include files that contain functions modeling particular discrete sections of a page. The string outputs from these functions, usually HTML table code, can then be used in various combinations to build a complete page. Some developers have taken this a stage further and moved these HTML functions into Visual Basic COM components, hoping to benefit from the extra performance that compiled code can offer.

Although this is certainly a good design practice, the method used to build the strings that form these discrete HTML code components can have a large bearing on how well the Web site performs and scales—regardless of whether the actual operation is performed from within an ASP include file or a Visual Basic COM component.

String Concatenation

Consider the following code fragment taken from a function called WriteHTML. The parameter named Data is simply an array of strings containing some data that needs to be formatted into a table structure (data returned from a database, for instance).

Function WriteHTML( Data )

Dim nRep

For nRep = 0 to 99
   sHTML = sHTML & vbcrlf _ 
         & "<TR><TD>" & (nRep + 1) & "</TD><TD>" _ 
         & Data( 0, nRep ) & "</TD><TD>" _ 
         & Data( 1, nRep ) & "</TD><TD>" _ 
         & Data( 2, nRep ) & "</TD><TD>" _ 
         & Data( 3, nRep ) & "</TD><TD>" _ 
         & Data( 4, nRep ) & "</TD><TD>" _ 
         & Data( 5, nRep ) & "</TD></TR>"
Next

WriteHTML = sHTML

End Function

This is typical of how many ASP and Visual Basic developers build HTML code. The text contained in the sHTML variable is returned to the calling code and then written to the client using Response.Write. Of course, this could also be expressed as similar code embedded directly within the page without the indirection of the WriteHTML function. The problem with this code lies in the fact that the string data type used by ASP and Visual Basic, the BSTR or Basic String, cannot actually change length. This means that every time the length of the string is changed, the original representation of the string in memory is destroyed, and a new one is created containing the new string data: this results in a memory allocation operation and a memory de-allocation operation. Of course, in ASP and Visual Basic this is all taken care of for you, so the true cost is not immediately apparent. Allocating and de-allocating memory requires the underlying runtime code to take out exclusive locks and therefore can be expensive. This is especially apparent when strings get big and large blocks of memory are being allocated and de-allocated in quick succession, as happens during heavy string concatenation. While this may present no major problems in a single user environment, it can cause serious performance and scalability issues when used in a server environment such as in an ASP application running on a Web server.

So back to the code fragment above: how many string allocations are being performed here? In fact the answer is 16. In this situation every application of the '&' operator causes the string pointed to by the variable sHTML to be destroyed and recreated. I have already mentioned that string allocation is expensive, becoming increasingly more so as the string grows; armed with this knowledge, we can improve upon the code above.

The Quick and Easy Solution

There are two ways to mitigate the effect of string concatenations, the first is to try and decrease the size of the strings being manipulated and the second is to try and reduce the number of string allocation operations being performed. Look at the revised version of the WriteHTML code shown below.

Function WriteHTML( Data )

Dim nRep

For nRep = 0 to 99
   sHTML = sHTML & ( vbcrlf _ 
         & "<TR><TD>" & (nRep + 1) & "</TD><TD>" _ 
         & Data( 0, nRep ) & "</TD><TD>" _       
         & Data( 1, nRep ) & "</TD><TD>" _ 
         & Data( 2, nRep ) & "</TD><TD>" _ 
         & Data( 3, nRep ) & "</TD><TD>" _ 
         & Data( 4, nRep ) & "</TD><TD>" _ 
         & Data( 5, nRep ) & "</TD></TR>" )
Next

WriteHTML = sHTML

End Function

At first glance it may be difficult to spot the difference between this piece of code and the previous sample. This one simply has the addition of parentheses around everything after sHTML = sHTML &. This actually reduces the size of strings being manipulated in most of the string concatenation operations by changing the order of precedence. In the original code sample the ASP complier will look at the expression to the right of the equals sign and just evaluate it from left to right. This results in 16 concatenation operations per iteration involving sHTML which is growing all the time. In the new version we are giving the compiler a hint by changing the order in which it should carry out the operations. Now it will evaluate the expression from left to right but also inside out, i.e. inside the parentheses first. This technique results in 15 concatenation operations per iteration involving smaller strings which are not growing and only one with the large, and growing, sHTML. Figure 1 shows an impression of the memory usage patterns of this optimization against the standard concatenation method.

Figure 1   Comparison of memory usage pattern between standard and parenthesized concatenation

Using parentheses can make quite a marked difference in performance and scalability in certain circumstances, as I will demonstrate later in this article.

The StringBuilder

We have seen the quick and easy solution to the string concatenation problem, and for many situations this may provide the best trade-off between performance and effort to implement. If we want to get serious about improving the performance of building large strings, however, then we need to take the second alternative, which is to cut down the number of string allocation operations. In order to achieve this a StringBuilder is required. This is a class that maintains a configurable string buffer and manages insertions of new pieces of text into that buffer, causing string reallocation only when the length of the text exceeds the length of the string buffer. The Microsoft .NET framework provides such a class for free (System.Text.StringBuilder) that is recommended for all string concatenation operations in that environment. In the ASP and classic Visual Basic world we do not have access to this class, so we need to build our own. Below is a sample StringBuilder class created using Visual Basic 6.0 (error-handling code has been omitted in the interest of brevity).

Option Explicit

' default initial size of buffer and growth factor
Private Const DEF_INITIALSIZE As Long = 1000
Private Const DEF_GROWTH As Long = 1000

' buffer size and growth
Private m_nInitialSize As Long
Private m_nGrowth As Long

' buffer and buffer counters
Private m_sText As String
Private m_nSize As Long
Private m_nPos As Long

Private Sub Class_Initialize()
   ' set defaults for size and growth
   m_nInitialSize = DEF_INITIALSIZE
   m_nGrowth = DEF_GROWTH
   ' initialize buffer
   InitBuffer
End Sub

' set the initial size and growth amount
Public Sub Init(ByVal InitialSize As Long, ByVal Growth As Long)
   If InitialSize > 0 Then m_nInitialSize = InitialSize
   If Growth > 0 Then m_nGrowth = Growth
End Sub

' initialize the buffer
Private Sub InitBuffer()
   m_nSize = -1
   m_nPos = 1
End Sub

' grow the buffer
Private Sub Grow(Optional MinimimGrowth As Long)
   ' initialize buffer if necessary
   If m_nSize = -1 Then
      m_nSize = m_nInitialSize
      m_sText = Space$(m_nInitialSize)
   Else
      ' just grow
      Dim nGrowth As Long
      nGrowth = IIf(m_nGrowth > MinimimGrowth, 
            m_nGrowth, MinimimGrowth)
      m_nSize = m_nSize + nGrowth
      m_sText = m_sText & Space$(nGrowth)
   End If
End Sub

' trim the buffer to the currently used size
Private Sub Shrink()
   If m_nSize > m_nPos Then
      m_nSize = m_nPos - 1
      m_sText = RTrim$(m_sText)
   End If
End Sub

' add a single text string
Private Sub AppendInternal(ByVal Text As String)
   If (m_nPos + Len(Text)) > m_nSize Then Grow Len(Text)
   Mid$(m_sText, m_nPos, Len(Text)) = Text
   m_nPos = m_nPos + Len(Text)
End Sub

' add a number of text strings
Public Sub Append(ParamArray Text())
   Dim nArg As Long
   For nArg = 0 To UBound(Text)
      AppendInternal CStr(Text(nArg))
   Next nArg
End Sub
 
' return the current string data and trim the buffer
Public Function ToString() As String
   If m_nPos > 0 Then
      Shrink
      ToString = m_sText
   Else
      ToString = ""
   End If
End Function

' clear the buffer and reinit
Public Sub Clear()
   InitBuffer
End Sub

The basic principle used in this class is that a variable (m_sText) is held at the class level that acts as a string buffer and this buffer is set to a certain size by filling it with space characters using the Space$ function. When more text needs to be concatenated with the existing text, the Mid$ function is used to insert the text at the correct position, after checking that our buffer is big enough to hold the new text. The ToString function returns the text currently stored in the buffer, also trimming the size of the buffer to the correct length for this text. The ASP code to use the StringBuilder would look like that shown below.

Function WriteHTML( Data )

Dim oSB
Dim nRep

Set oSB = Server.CreateObject( "StringBuilderVB.StringBuilder" )
' initialize the buffer with size and growth factor
oSB.Init 15000, 7500

For nRep = 0 to 99
   oSB.Append "<TR><TD>", (nRep + 1), "</TD><TD>", _ 
         Data( 0, nRep ), "</TD><TD>", _ 
         Data( 1, nRep ), "</TD><TD>", _ 
         Data( 2, nRep ), "</TD><TD>", _ 
         Data( 3, nRep ), "</TD><TD>", _ 
         Data( 4, nRep ), "</TD><TD>", _ 
         Data( 5, nRep ), "</TD></TR>"
Next

WriteHTML = oSB.ToString()
Set oSB = Nothing

End Function

There is a definite overhead for using the StringBuilder because an instance of the class must be created each time it is used (and the DLL containing the class must be loaded on the first class instance creation). There is also the overhead involved with making the extra method calls to the StringBuilder instance. How the StringBuilder performs against the parenthesized '&' method depends on a number of factors including the number of concatenations, the size of the string being built, and how well the initialization parameters for the StringBuilder string buffer are chosen. Note that in most cases it is going to be far better to overestimate the amount of space needed in the buffer than to have it grow often.

The Built-in Method

ASP includes a very fast way of building up your HTML code, and it involves simply using multiple calls to Response.Write. The Write function uses an optimized string buffer under the covers that provides very good performance characteristics. The revised WriteHTML code would look like the code shown below.

Function WriteHTML( Data )

Dim nRep

For nRep = 0 to 99
   Response.Write "<TR><TD>" 
   Response.Write (nRep + 1) 
   Response.Write "</TD><TD>"
   Response.Write Data( 0, nRep ) 
   Response.Write "</TD><TD>"
   Response.Write Data( 1, nRep ) 
   Response.Write "</TD><TD>" 
   Response.Write Data( 2, nRep ) 
   Response.Write "</TD><TD>"
   Response.Write Data( 3, nRep ) 
   Response.Write "</TD><TD>"
   Response.Write Data( 4, nRep ) 
   Response.Write "</TD><TD>"
   Response.Write Data( 5, nRep ) 
   Response.Write "</TD></TR>"
Next

End Function

Although this will most likely provide us with the best performance and scalability, we have broken the encapsulation somewhat because we now have code inside our function writing directly to the Response stream and thus the calling code has lost a degree of control. It also becomes more difficult to move this code (into a COM component for example) because the function has a dependency on the Response stream being available.

Testing

The four methods presented above were tested against each other using a simple ASP page with a single table fed from a dummy array of strings. The tests were performed using Application Center Test® (ACT) from a single client (Windows® XP Professional, PIII-850MHz, 512MB RAM) against a single server (Windows 2000 Advanced Server, dual PIII-1000MHz, 256MB RAM) over a 100Mb/sec network. ACT was configured to use 5 threads so as to simulate a load of 5 users connecting to the web site. Each test consisted of a 20 second warm up period followed by a 100 second load period in which as many requests as possible were made.

The test runs were repeated for various numbers of concatenation operations by varying the number of iterations in the main table loop, as shown in the code fragments for the WriteHTML function. Each test run was performed with all of the various concatenation methods described so far.

Results

Below is a series of charts showing the effect of each method on the throughput of the application and also the response time for the ASP page. This gives some idea of how many requests the application could support and also how long the users would be waiting for pages to be downloaded to their browser.

Table 1   Key to concatenation method abbreviations used

Method Abbreviation Description
RESP The built-in Response.Write method
CAT The standard concatenation ('&') method
PCAT The parenthesized concatenation ('&') method
BLDR The StringBuilder method

Whilst this test is far from realistic in terms of simulating the workload for a typical ASP application, it is evident from Table 2 that even at 420 repetitions the page is not particularly large; and there are many complex ASP pages in existence today that fall in the higher ranges of these figures and may even exceed the limits of this testing range.

Table 2   Page sizes and number of concatenations for test samples

No. of iterations No. of concatenations Page size (bytes)
15 240 2,667
30 480 4,917
45 720 7,167
60 960 9,417
75 1,200 11,667
120 1,920 18,539
180 2,880 27,899
240 3,840 37,259
300 4,800 46,619
360 5,760 55,979
420 6,720 62,219

Figure 2   Chart showing throughput results

We can see from the chart shown in Figure 2 that, as expected, the multiple Response.Write method (RESP) gives us the best throughput throughout the entire range of iterations tested. What is surprising, though, is how quickly the standard string concatenation method (CAT) degrades and how much better the parenthesized version (PCAT) performs up to over 300 iterations. At somewhere around 220 iterations the overhead inherent in the StringBuilder method (BLDR) begins to be outweighed by the performance gains due to the string buffering and at anything above this point it would most likely be worth investing the extra effort to use a StringBuilder in this ASP page.

Figure 3   Chart showing response time results

Figure 4   Chart showing response time results with CAT omitted

The charts in Figure 3 and 4 show response time as measured by Time-To-First-Byte in milliseconds. The response times for the standard string concatenation method (CAT) increase so quickly that the chart is repeated without this method included (Figure 4) so that the differences between the other methods can be examined. It is interesting to note that the multiple Response.Write method (RESP) and the StringBuilder method (BLDR) give what looks like a fairly linear progression as the iterations increase whereas the standard concatenation method (CAT) and the parenthesized concatenation method (PCAT) both increase very rapidly once a certain threshold has been passed.

Conclusion

During this discussion I have focused on how different string building techniques can be applied within the ASP environment; but don't forget that this applies to any scenario where you are creating large strings in Visual Basic code, such as manually creating XML documents. The following guidelines should help you decide which method might work best for your situation.

  • Try the parenthesized '&' method first, especially when dealing with existing code. This will have very little impact on the structure of the code and you might well find that this increases the performance of the application such that it exceeds your targets.
  • If it is possible without compromising the encapsulation level you require, use Response.Write. This will always give you the best performance by avoiding unnecessary in-memory string manipulation.
  • Use a StringBuilder for building really large, or concatenation-intensive, strings.

Although you may not see exactly the same kind of performance increase shown in this article, I have used these techniques in real-world ASP web applications to deliver very good improvements in both performance and scalability for very little extra effort.