The Astonishing S"Literal" String Type
One of the astonishing infelicities of the original language design was the unflagged overhead of the seemingly trivial failing of placing an S in front of a string literal targeted to a managed reference object. For example, given the following two System::String declarations,
String *ps1 = "hello";
String *ps2 = S"goodbye";
here is the MSIL representation as seen through ildasm of the following two String declarations. Notice the astonishing performance difference.
// String *ps1 = "hello";
ldsflda valuetype $ArrayType$0xd61117dd
newobj instance void [mscorlib]System.String::.ctor(int8*)
// String *ps2 = S"goodbye";
That’s a pretty remarkable savings for just remembering [or learning] to prefix a literal string with an S; or, to look at it another way, that’s a durn stern penalty for not doing so. [In addition, if S”goodbye” occurs 5 times, they are collapsed into a single shared instance.] And ignorance is not a mitigating defense! Using the default Visual Studio settings for a project, this compiles without any warning, as the following illustrates:
nettest - 0 error(s), 0 warning(s)
What’s perhaps equally remarkable is that in another common corner of the language, implicit value type boxing was explicitly not supported because it was felt that it would result in a false sense of security for the programmer who would not realize its run-time overhead. For example,
Object *po = ival; // error
Object *po = __box( ival ); // ok
Of course, these two design corners are not really at all the same – in fact, they seem to illustrate opposite design philosophies. In the one case, a trivial detail that is context sensitive silently causes a truly astonishing inflation of the run-time program. In the other case, there is no underlying gain or loss in the behavior of the program by having the explicit __box operator – only in the behavior of the programmer. It is a pedagogical design intended to teach the programmer about the nature of the CLR’s unified type system.
The solution in both cases is to make the behavior transparent. A reference type assigned with or initialized to a value type results in a boxing operation. This is as fundamental to the unified type system of the CLR as the copy constructor and copy assignment operator are to native C++. Ignore them at your peril. If you assign a literal string in a context where an S should be, the S is implicitly present.
What about cases in which we need to explicitly direct the compiler to one interpretation or another, as in the case of an overloaded pair of functions?
f("ABC"); // calls f(char*)
The decision of the language design team is to drop the S and rather require the user to explicitly cast the literal string, as in
f(( String^ )"ABC");