# Improved Interpolated Strings

## Summary

We introduce a new pattern for creating and using interpolated string expressions to allow for efficient formatting and use in both general string scenarios and more specialized scenarios such as logging frameworks, without incurring unnecessary allocations from formatting the string in the framework.

## Motivation

Today, string interpolation mainly lowers down to a call to string.Format. This, while general purpose, can be inefficient for a number of reasons:

1. It boxes any struct arguments, unless the runtime has happened to introduce an overload of string.Format that takes exactly the correct types of arguments in exactly the correct order.
• This ordering is why the runtime is hesitant to introduce generic versions of the method, as it would lead to combinatoric explosion of generic instantiations of a very common method.
2. It has to allocate an array for the arguments in most cases.
3. There is no opportunity to avoid instantiating the instance if it's not needed. Logging frameworks, for example, will recommend avoiding string interpolation because it will cause a string to be realized that may not be needed, depending on the current log-level of the application.
4. It can never use Span or other ref struct types today, because ref structs are not allowed as generic type parameters, meaning that if a user wants to avoid copying to intermediate locations they have to manually format strings.

Internally, the runtime has a type called ValueStringBuilder to help deal with the first 2 of these scenarios. They pass a stackalloc'd buffer to the builder, repeatedly call AppendFormat with every part, and then get a final string out. If the resulting string goes past the bounds of the stack buffer, they can then move to an array on the heap. However, this type is dangerous to expose directly, as incorrect usage could lead to a rented array to be double-disposed, which then will cause all sorts of undefined behavior in the program as two locations think they have sole access to the rented array. This proposal creates a way to use this type safely from native C# code by just writing an interpolated string literal, leaving written code unchanged while improving every interpolated string that a user writes. It also extends this pattern to allow for interpolated strings passed as arguments to other methods to use a handler pattern, defined by receiver of the method, that will allow things like logging frameworks to avoid allocating strings that will never be needed, and giving C# users familiar, convenient interpolation syntax.

## Detailed Design

### The handler pattern

We introduce a new handler pattern that can represent an interpolated string passed as an argument to a method. The simple English of the pattern is as follows:

When an interpolated_string_expression is passed as an argument to a method, we look at the type of the parameter. If the parameter type has a constructor that can be invoked with 2 int parameters, literalLength and formattedCount, optionally takes additional parameters specified by an attribute on the original parameter, optionally has an out boolean trailing parameter, and the type of the original parameter has instance AppendLiteral and AppendFormatted methods that can be invoked for every part of the interpolated string, then we lower the interpolation using that, instead of into a traditional call to string.Format(formatStr, args). A more concrete example is helpful for picturing this:

// The handler that will actually "build" the interpolated string"
[InterpolatedStringHandler]
public ref struct TraceLoggerParamsInterpolatedStringHandler
{
// Storage for the built-up string

private bool _logLevelEnabled;

public TraceLoggerParamsInterpolatedStringHandler(int literalLength, int formattedCount, Logger logger, out bool handlerIsValid)
{
if (!logger._logLevelEnabled)
{
handlerIsValid = false;
return;
}

handlerIsValid = true;
_logLevelEnabled = logger.EnabledLevel;
}

public void AppendLiteral(string s)
{
// Store and format part as required
}

public void AppendFormatted<T>(T t)
{
// Store and format part as required
}
}

// The logger class. The user has an instance of this, accesses it via static state, or some other access
// mechanism
public class Logger
{
// Initialization code omitted
public LogLevel EnabledLevel;

public void LogTrace([InterpolatedStringHandlerArguments("")]TraceLoggerParamsInterpolatedStringHandler handler)
{
// Impl of logging
}
}

Logger logger = GetLogger(LogLevel.Info);

// Given the above definitions, usage looks like this:
var name = "Fred Silberberg";

#### Performing the conversion

Given an applicable_interpolated_string_handler_type T and an interpolated_string_expression i that had a valid constructor Fc and Append... methods Fa resolved, lowering for i is performed as follows:

1. Any arguments to Fc that occur lexically before i are evaluated and stored into temporary variables in lexical order. In order to preserve lexical ordering, if i occurred as part of a larger expression e, any components of e that occurred before i will be evaluated as well, again in lexical order.
2. Fc is called with the length of the interpolated string literal components, the number of interpolation holes, any previously evaluated arguments, and a bool out argument (if Fc was resolved with one as the last parameter). The result is stored into a temporary value ib.
1. The length of the literal components is calculated after replacing any open_brace_escape_sequence with a single {, and any close_brace_escape_sequence with a single }.
3. If Fc ended with a bool out argument, a check on that bool value is generated. If true, the methods in Fa will be called. Otherwise, they will not be called.
4. For every Fax in Fa, Fax is called on ib with either the current literal component or interpolation expression, as appropriate. If Fax returns a bool, the result is logically anded with all preceding Fax calls.
1. If Fax is a call to AppendLiteral, the literal component is unescaped by replacing any open_brace_escape_sequence with a single {, and any close_brace_escape_sequence with a single }.
5. The result of the conversion is ib.

Again, note that arguments passed to Fc and arguments passed to e are the same temp. Conversions may occur on top of the temp to convert to a form that Fc requires, but for example lambdas cannot be bound to a different delegate type between Fc and e.

Open Question

This lowering means that subsequent parts of the interpolated string after a false-returning Append... call don't get evaluated. This could potentially be very confusing, particularly if the format hole is side-effecting. We could instead evaluate all format holes first, then repeatedly call Append... with the results, stopping if it returns false. This would ensure that all expressions get evaluated as one might expect, but we call as few methods as we need to. While the partial evaluation might be desirable for some more advanced cases, it is perhaps non-intuitive for the general case.

Another alternative, if we want to always evaluate all format holes, is to remove the Append... version of the API and just do repeated Format calls. The handler can track whether it should just be dropping the argument and immediately returning for this version.

Answer: We will have conditional evaluation of the holes.

Open Question

Do we need to dispose of disposable handler types, and wrap calls with try/finally to ensure that Dispose is called? For example, the interpolated string handler in the bcl might have a rented array inside it, and if one of the interpolation holes throws an exception during evaluation, that rented array could be leaked if it wasn't disposed.

Answer: No. handlers can be assigned to locals (such as MyHandler handler = $"{MyCode()};), and the lifetime of such handlers is unclear. Unlike foreach enumerators, where the lifetime is obvious and no user-defined local is created for the enumerator. ### Impact on nullable reference types To minimize complexity of the implementation, we have a few limitations on how we perform nullable analysis on interpolated string handler constructors used as arguments to a method or indexer. In particular, we do not flow information from the constructor back through to the original slots of parameters or arguments from the original context, and we do not use constructor parameter types to inform generic type inference for type parameters in the containing method. An example of where this can have an impact is: string s = ""; C c = new C(); c.M(s,$"", c.ToString(), s.ToString()); // No warnings on c.ToString() or s.ToString(), as the MaybeNull does not flow back.

public class C
{
public void M(string s1, [InterpolatedStringHandlerArgument("", "s1")] CustomHandler c1, string s2, string s3) { }
}

[InterpolatedStringHandler]
public partial struct CustomHandler
{
public CustomHandler(int literalLength, int formattedCount, [MaybeNull] C c, [MaybeNull] string s) : this()
{
}
}

string? s = null;
M(s, $""); // Infers string for T because of the T? parameter, not string?, as flow analysis does not consider the unannotated T parameter of the constructor void M<T>(T? t, [InterpolatedStringHandlerArgument("s1")] CustomHandler<T> c) { } [InterpolatedStringHandler] public partial struct CustomHandler<T> { public CustomHandler(int literalLength, int formattedCount, T t) : this() { } }  ## Other considerations ### Allow string types to be convertible to handlers as well For type author simplicity, we could consider allowing expressions of type string to be implicitly-convertible to applicable_interpolated_string_handler_types. As proposed today, authors will likely need to overload on both that handler type and regular string types, so their users don't have to understand the difference. This may be an annoying and non-obvious overhead, as a string expression can be viewed as an interpolation with expression.Length prefilled length and 0 holes to be filled. This would allow new APIs to only expose a handler, without also having to expose a string-accepting overload. However, it won't get around the need for changes to better conversion from expression, so while it would work it may be unnecessary overhead. Answer: We think that this could end up being confusing, and there's an easy workaround for custom handler types: add a user-defined conversion from string. ### Incorporating spans for heap-less strings ValueStringBuilder as it exists today has 2 constructors: one that takes a count, and allocates on the heap eagerly, and one that takes a Span<char>. That Span<char> is usually a fixed size in the runtime codebase, around 250 elements on average. To truly replace that type, we should consider an extension to this where we also recognize GetInterpolatedString methods that take a Span<char>, instead of just the count version. However, we see a few potential thorny cases to resolve here: • We don't want to stackalloc repeatedly in a hot loop. If we were to do this extension to the feature, we'd likely want to share the stackalloc'd span between loop iterations. We know this is safe, as Span<T> is a ref struct that can't be stored on the heap, and users would have to be pretty devious to manage to extract a reference to that Span (such as creating a method that accepts such a handler then deliberately retrieving the Span from the handler and returning it to the caller). However, allocating ahead of time produces other questions: • Should we eagerly stackalloc? What if the loop is never entered, or exits before it needs the space? • If we don't eagerly stackalloc, does that mean we introduce a hidden branch on every loop? Most loops likely won't care about this, but it could affect some tight loops that don't want to pay the cost. • Some strings can be quite big, and the appropriate amount to stackalloc is dependent on a number of factors, including runtime factors. We don't really want the C# compiler and specification to have to determine this ahead of time, so we'd want to resolve https://github.com/dotnet/runtime/issues/25423 and add an API for the compiler to call in these cases. It also adds more pros and cons to the points from the previous loop, where we don't want to potentially allocate large arrays on the heap many times or before one is needed. Answer: This is out of scope for C# 10. We can look at this in general when we look at the more general params Span<T> feature. ### Non-try version of the API For simplicity, this spec currently just proposes recognizing a Append... method, and things that always succeed (like InterpolatedStringHandler) would always return true from the method. This was done to support partial formatting scenarios where the user wants to stop formatting if an error occurs or if it's unnecessary, such as the logging case, but could potentially introduce a bunch of unnecessary branches in standard interpolated string usage. We could consider an addendum where we use just FormatX methods if no Append... method is present, but it does present questions about what we do if there's a mix of both Append... and FormatX calls. Answer: We want the non-try version of the API. The proposal has been updated to reflect this. ### Passing previous arguments to the handler There is unfortunate lack of symmetry in the proposal at it currently exists: invoking an extension method in reduced form produces different semantics than invoking the extension method in normal form. This is different from most other locations in the language, where reduced form is just a sugar. We propose adding an attribute to the framework that we will recognize when binding a method, that informs the compiler that certain parameters should be passed to the constructor on the handler. Usage looks like this: namespace System.Runtime.CompilerServices { [AttributeUsage(AttributeTargets.Parameter, AllowMultiple = false, Inherited = false)] public sealed class InterpolatedStringHandlerArgumentAttribute : Attribute { public InterpolatedStringHandlerArgumentAttribute(string argument); public InterpolatedStringHandlerArgumentAttribute(params string[] arguments); public string[] Arguments { get; } } }  Usage of this is then: namespace System { public sealed class String { public static string Format(IFormatProvider? provider, [InterpolatedStringHandlerArgument("provider")] ref DefaultInterpolatedStringHandler handler); … } } namespace System.Runtime.CompilerServices { public ref struct DefaultInterpolatedStringHandler { public DefaultInterpolatedStringHandler(int baseLength, int holeCount, IFormatProvider? provider); // additional factory … } } var formatted = string.Format(CultureInfo.InvariantCulture,$"{X} = {Y}");

// Is lowered to

var tmp1 = CultureInfo.InvariantCulture;
var handler = new DefaultInterpolatedStringHandler(3, 2, tmp1);
handler.AppendFormatted(X);
handler.AppendLiteral(" = ");
handler.AppendFormatted(Y);
var formatted = string.Format(tmp1, handler);


The questions we need to answer:

1. Do we like this pattern in general?
2. Do we want to allow these arguments to come from after the handler parameter? Some existing patterns in the BCL, such as Utf8Formatter, put the value to be formatted before the thing needed to format into. To fit in best with these patterns, we'd likely want to allow this, but we need to decide if this out-of-order evaluate is ok.

We want to support this. The spec has been updated to reflect this. Arguments will be required to be specified in lexical order at the call site, and if a needed argument to the create method is specified after the interpolated string literal, an error is produced.

Because $"{await A()}" is a valid expression today, we need to rationalize how interpolation holes with await. We could solve this with a few rules: 1. If an interpolated string used as a string, IFormattable, or FormattableString has an await in an interpolation hole, fall back to old-style formatter. 2. If an interpolated string is subject to an implicit_string_handler_conversion and applicable_interpolated_string_handler_type is a ref struct, await is not allowed to be used in the format holes. Fundamentally, this desugaring could use a ref struct in an async method as long as we guarantee that the ref struct will not need to be saved to the heap, which should be possible if we forbid awaits in the interpolation holes. Alternatively, we could simply make all handler types non-ref structs, including the framework handler for interpolated strings. This would, however, preclude us from someday recognizing a Span version that does not need to allocate any scratch space at all. Answer: We will treat interpolated string handlers the same as any other type: this means that if the handler type is a ref struct and the current context doesn't allow the usage of ref structs, it is illegal to use handler here. The spec around lowering of string literals used as strings is intentionally vague to allow the compiler to decide on what rules it deems appropriate, but for custom handler types they will have to follow the same rules as the rest of the language. ### Handlers as ref parameters Some handlers might want to be passed as ref parameters (either in or ref). Should we allow either? And if so, what will a ref handler look like? ref$"" is confusing, as you're not actually passing the string by ref, you're passing the handler that is created from the ref by ref, and has similar potential issues with async methods.

We want to support this. The spec has been updated to reflect this. The rules should reflect the same rules that apply to extension methods on value types.

### Interpolated strings through binary expressions and conversions

Because this proposal makes interpolated strings context sensitive, we would like to allow the compiler to treat a binary expression composed entirely of interpolated strings, or an interpolated string subjected to a cast, as an interpolated string literal for the purposes of overload resolution. For example, take the following scenario:

struct Handler1
{
public Handler1(int literalLength, int formattedCount, C c) => ...;
// AppendX... methods as necessary
}
struct Handler2
{
public Handler2(int literalLength, int formattedCount, C c) => ...;
// AppendX... methods as necessary
}

class C
{
void M(Handler1 handler) => ...;
void M(Handler2 handler) => ...;
}

c.M(\$"{X}"); // Ambiguous between the M overloads


This would be ambiguous, necessitating a cast to either Handler1 or Handler2 in order to resolve. However, in making that cast, we would potentially throw away the information that there is context from the method receiver, meaning that the cast would fail because there is nothing to fill in the information of c. A similar issue arises with binary concatenation of strings: the user could want to format the literal across several lines to avoid line wrapping, but would not be able to because that would no longer be an interpolated string literal convertible to the handler type.

To resolve these cases, we make the following changes:

• An additive_expression composed entirely of interpolated_string_expressions and using only + operators is considered to be an interpolated_string_literal for the purposes of conversions and overload resolution. The final interpolated string is created by logically concatinating all individual interpolated_string_expression components, from left to right.
• A cast_expression or a relational_expression with operator as whose operand is an interpolated_string_expressions is considered an interpolated_string_expressions for the purposes of conversions and overload resolution.

Open Questions:

Do we want to do this? We don't do this for System.FormattableString, for example, but that can be broken out onto a different line, whereas this can be context-dependent and therefore not able to be broken out into a different line. There are also no overload resolution concerns with FormattableString and IFormattable.