Insight into String Concatenation in JScript
Have you read my post on the String Concatenation issue? If yes, then I can sense your curiosity to know a little bit more about what went on behind the scenes. For example, what is the new algorithm used to get to the performance results projected in the blog, what were the issues faced, how tough/easy it was to implement etc. etc. Well, I didn’t talk much about all these last time as we were still testing the fix and things were not stable at that point of time. But now since IE8 Beta is out and you are already running it on your box (if not, install from here) and loving it J, why not talk a little bit more and try to get few of the queries answered.
In the new implementation, which is running on your box if you are reading this in IE8, we don’t produce actual regular string for each append operation anymore. Instead, we keep the information about all strings which took part in the append operation. The production of actual regular string is deferred for later point of time.
Let’s apply it on the same example I used in the previous blog post…
resultStr = “Jscript” + “String” + “Append” + “problem”
In the previous implementation, we used to produce the actual string and store it in the resultStr. In the current implementation, resultStr points to a structure (let’s call it pseudo-string), which keeps the details like what all strings are needed to produce actual string and in which order.
The logic looks simple. However there were few issues we faced while implementing it, which I think are worth sharing here…
When to Convert pseudo-string back to actual String – As Eric mentioned in his blog, their ‘Fancy-String’ implementation was heavy as they had to convert ‘Fancy–Strings’ back to regular string every time one is passed by variable reference. We also had this big question facing us – ‘When to convert pseudo-strings back to regular strings’? Should it be done as soon as ‘pseudo-string’ has grown to certain size or should it be done soon after append operation is done? Or should it be done when it is passed by variable reference? We evaluated all these approaches but none of them seemed to be giving the kind of gains we wanted. Later we thought why not defer the conversion to the point when pseudo-string can’t serve the purpose and we have to have the regular string. In other words, do the conversion only when actual regular string is needed. For example do the conversion when a ‘subStr’ or ‘trim’ operation is performed on the ‘pseudo – string’ or when the pseudo string needs to be passed somewhere out of the engine, like IE or VBScript, that doesn’t understand pseudo-strings. We tried it and yes, with this, we hit bulls-eyeJ.
Multiple References to single Pseudo-String
If there are multiple references to a single ‘pseudo-string’, then we want to do the conversion only once. For example…
X = “Hi” + “Microsoft”;
Y = X;
In this example X, Y and Z are pointing to the same pseudo-string. Now if x.toString() is called, one approach is, allocate a buffer of size 11 and copy the strings “HI” and “Microsoft” one by one to the buffer. Later when Y.toString() and Z.toString() are called, we again allocate new buffers for each of them and copy the strings.
Hmmm. with such multiple allocations and copy operations, are we not back to the problem we were trying to solve? Yes, we are.
So how did we address this? Well, we decided not to have X, Y and Z point to actual pseudo-String. Instead they point to a proxy object, which keeps actual reference to pseudo-string. When X.toString() is called, conversion from pseudo-string to regular string happens (a new buffer of enough size is allocated and strings are copied one by one) and the proxy object is made point to regular string. Later when Y.toString() and z.toString() are called, we need not do the conversion again as they point to the proxy-object which already points to the regular string.
So X, Y and Z always point to the proxy-object. Proxy object may either contain a pseudo-string or a regular string, depending upon the time of conversion.
What if Pseudo-string is partially converted to regular string?
X = “Jscript” + “Team”;
Z = X + “Microsoft”;
In above snippet, X is a pseudo string and is used in formation of another pseudo string Z. Now toString() is called on X, so X has to be converted to regular string. Fine. I can do it. But wait a moment. X is used by another pseudo string Z. I can’t convert it just like that?
Well, as I said, all references to a pseudo-string are through the proxy-object. Hence Z also refers to the proxy object for X, not to the actual pseudo-string. So even after X is converted to the regular string, references to proxy object remain alive and things work perfectly.
And yes we had to test it J
‘Maintaining compatibility is priority no. 1’ – this is the mantra we all in the team swear by. At any cost we had to ensure that we don’t break real world when fix is out. Our test team has done an awesome job in ensuring it. They found couple of cool crashes, which were not directly in the new implementation, but because new implementation broke few assumptions made in the engine. Interesting part is that they were reported on the last day of our internal milestones; therefore we had to sit whole night gazing at our screens, fixing and testing them.
This is all about string append problem and I hope it was an interesting read. One more interesting performance improvement I have worked on in this release and would like to share with you all is ‘Arrays’, but we’ll leave that for another post so stay tuned.