String Interpolation Trickery and Magic with C# 10 and .NET 6
This blog is one of The December 17th entries on the 2021 C# Advent Calendar. Thanks for having me again Matt!
For the last few years we’ve gotten a new version of C# paired with a new version of .NET each November. And every year this new version is packed with great new features. For me, one of the coolest features is interpolated string handlers.
Interpolated string handlers are primarily designed to provide a performance boost building strings. But is there more to them than meets the eye? I believe that they lay the groundwork for doing much more than just building strings faster.
Interpolated String Handler Overview
First, let’s start with an overview of how interpolated string handlers work. For a more in-depth look, see the blog post from Stephen Toub.
When using C# 9, interpolating a string is optimized by the compiler in a variety of ways. However, in many cases
the optimizations aren’t an option, and a call to string.Format(...)
is used. string.Format
brings a lot of overhead,
such as interpreting the format string every call, potentially allocating an object[]
on the heap, boxing value types,
and generating temporary intermediate strings.
For projects targeting .NET 6, even upgrading existing projects, string interpolation gets an immediate performance
boost because they will use the DefaultInterpolatedStringHandler
to build strings. This structure has a much better performance profile overall than string.Format
.
// Example code, from Stephen Toub's post
public static string FormatVersion(int major, int minor, int build, int revision) =>
$"{major}.{minor}.{build}.{revision}";
// Example equivalent code the compiler generates with .NET 6, from Stephen Toub's post
public static string FormatVersion(int major, int minor, int build, int revision)
{
var handler = new DefaultInterpolatedStringHandler(literalLength: 3, formattedCount: 4);
handler.AppendFormatted(major);
handler.AppendLiteral(".");
handler.AppendFormatted(minor);
handler.AppendLiteral(".");
handler.AppendFormatted(build);
handler.AppendLiteral(".");
handler.AppendFormatted(revision);
return handler.ToStringAndClear();
}
Custom Interpolated String Handlers
The DefaultInterpolatedStringHandler
is really just the beginning, though. Methods which accept a string
may have an overload which accepts a custom interpolated string handler. When present, this causes C# to
use the custom handler you define rather than the default, allowing more advanced behaviors.
These behaviors can include stack allocations of scratch space, accepting other parameters to the method
call as constructor arguments to change behaviors, returning a boolean from the constructor that skips
the AppendXXX
steps if we know they’ll be unused, or short-circuiting additional AppendXXX
steps if
we want to stop early.
A great example is AssertInterpolatedStringHandler
which is available on Debug.Assert
calls. It can
suppress most of the work building the string in the case where the first parameter to the Assert call
is true
.
public void Example()
{
var count = 0;
// The interpolated string below will never be constructed in .NET 6, even when compiled in DEBUG mode
Debug.Assert(count == 0, $"The count should be 0, but is {count}.");
}
The equivalent code for the above statement is something like:
public void Example()
{
var count = 0;
var condition = count == 0;
var handler = new AssertInterpolatedStringHandler(31, 1, condition, out bool shouldAppend);
if (shouldAppend)
{
handler.AppendLiteral("The count should be 0, but is ");
handler.AppendFormatted(count);
handler.AppendLiteral(".");
}
Debug.Assert(condition, handler);
}
If the AppendXXX
methods return bool
the result is checked after each call and can short-circuit part way through the operation.
For example, this might occur if the destination buffer runs out of space.
Now Bring On The Magic
Now, I’d like you to take a moment and consider the AssertInterpolatedStringHandler
example above. What does the example
have to do with strings? At what point are strings involved?
Wait for it…
The answer is “Only the literal segments are strings.” Of course, Debug.Assert
ends up making a string out of it by calling
handler.ToStringAndClear();
But the AssertInterpolatedStringHandler
is what’s passed to Debug.Assert
, not a string.
It can do whatever it likes with the handler. Additionally, the implication of AppendFormatted
is that it will format count
as a string and append it. But in reality it may do whatever it likes.
Therefore, if we can imagine a “thing” which is built up of string literals and expressions presented in order, then we can build it using an interpolated string. Even if what we’re creating isn’t a string at all.
An Example: Parameterized SQL Queries
Have you ever needed to build a parameterized SQL query? Or N1QL Query if you’re a Couchbase user? It’s very important to parameterize user input to prevent injection attacks, so the answer for most database users should be yes. Even if you use an ODM or ORM like Entity Framework it’s often necessary to hand-write queries for special cases.
Building a parameterized query can be a pain. Take this relatively simple example for SqlCommand
:
public async Task<string> GetOptionValue(int optionSet, string optionName)
{
await using var cmd = new SqlCommand("SELECT OptionValue FROM Options WHERE OptionSet = @OptionSet AND OptionName = @OptionName", _connection);
cmd.Parameters.Add("@OptionSet", SqlDbType.Int).Value = optionSet;
cmd.Parameters.Add("@OptionName", SqlDbType.NVarChar).Value = optionName;
return (await cmd.ExecuteScalarAsync()).ToString();
}
If you hate the duplication involved in the parameter names, you may even pull those out to constants, which is even more convoluted.
private const string OptionSetParamName = "@OptionSet";
private const string OptionNameParamName = "@OptionName";
public async Task<string> GetOptionValue(int optionSet, string optionName)
{
await using var cmd = new SqlCommand($"SELECT OptionValue FROM Options WHERE OptionSet = {OptionSetParamName} AND OptionName = {OptionNameParamName}", _connection);
cmd.Parameters.Add(OptionSetParamName, SqlDbType.Int).Value = optionSet;
cmd.Parameters.Add(OptionNameParamName, SqlDbType.NVarChar).Value = optionName;
return (await cmd.ExecuteScalarAsync()).ToString();
}
What if this could be written instead as follows, but still retained all the security of parameterized queries?
public async Task<string> GetOptionValue(int optionSet, string optionName)
{
await using cmd = _connection.CreateCommand($"SELECT OptionValue FROM Options WHERE OptionSet = {optionSet} AND OptionName = {optionName}");
return (await cmd.ExecuteScalarAsync()).ToString();
}
Example Implementation
This example gets somewhat complicated, so I’ve tried to annotate it throughout with comments. First, the builder itself.
// This attribute let's C# know we're making an interpolated string handler
[InterpolatedStringHandler]
// The handler should usually be a "ref struct", meaning it only lives on the stack.
// This may be a limitation if you want to allow "await" within the holes in the expression, so "ref" may be removed in that case.
// However, this example requires "ref struct" because it includes a DefaultInterpolatedStringHandler in its fields.
public ref struct SqlCommandInterpolatedStringHandler
{
// Internally we'll use DefaultInterpolatedStringHandler to build the query string.
// This be more performant than reinventing the wheel.
private DefaultInterpolatedStringHandler _innerHandler;
// This will maintain a list of parameters as we build the query string
public SqlParameter[] Parameters { get; }
// The number of parameters added so far
private int _parameterCount;
public SqlCommandInterpolatedStringHandler(int literalLength, int formattedCount)
{
// Construct the inner handler, forwarding the same hints
_innerHandler = new DefaultInterpolatedStringHandler(literalLength, formattedCount);
// Build an empty list of parameters with the capacity we'll need
Parameters = new SqlParameter[formattedCount];
_parameterCount = 0;
}
public void AppendLiteral(string value) =>
// Forward literals to the inner handler to be added to the query string
// In this example, literals represent query text like "SELECT ..."
_innerHandler.AppendLiteral(value);
public void AppendFormatted(ReadOnlySpan<char> value) =>
// SqlParameters need strings not char spans, so forward to that implementation
// Other backing implementations may be able to optimize this to avoid allocating a string
AppendFormatted(value.ToString());
public void AppendFormatted<T>(T value)
{
switch (value)
{
case int intValue:
AppendParameter(SqlDbType.Int, intValue);
break;
case bool boolValue:
AppendParameter(SqlDbType.Bit, boolValue);
break;
case string stringValue:
AppendFormatted(stringValue);
break;
// Add support for more types here
default:
// Fallback for other types, we could make this smarter or throw an exception
AppendFormatted(value?.ToString());
break;
}
}
// There are a lot of AppendFormatted overloads we're required to implement
// We could use alignment and format parameters for our own purposes, here we ignore them
public void AppendFormatted<T>(T value, string? format) =>
AppendFormatted(value);
public void AppendFormatted<T>(T value, int alignment, string? format) =>
AppendFormatted(value);
public void AppendFormatted<T>(T value, int alignment) =>
AppendFormatted(value);
public void AppendFormatted(string? value) =>
AppendParameter(SqlDbType.NVarChar, value);
public void AppendFormatted(string? value, int alignment = 0, string? format = null) =>
AppendParameter(SqlDbType.NVarChar, value);
// Main handler for formatted segments
private void AppendParameter(SqlDbType paramType, object? value)
{
// Since this is intended for use from compiler-generated code, we'll leave out typical runtime
// preconditions like _parameterCount vs array length. We'll use Debug.Assert instead, and assume
// the compiler used the type correctly for release builds.
Debug.Assert(_parameterCount < Parameters.Length, "Exceeded formattedCount");
// Create a unique parameter name, use an interpolated string builder with a stack-allocated buffer
Span<char> paramNameBuffer = stackalloc char[8];
var paramName = string.Create(null, paramNameBuffer, $"@Param{_parameterCount}");
// Add the parameter name reference to the query string
_innerHandler.AppendFormatted(paramName);
// Add the parameter to the collection
Parameters[_parameterCount] = new SqlParameter(paramName, paramType)
{
Value = value
};
_parameterCount++;
}
// Forward to the inner handler
public readonly override string ToString() => _innerHandler.ToString();
// Forward to the inner handler
public string ToStringAndClear() => _innerHandler.ToStringAndClear();
}
This builder can then be invoked using this extension method:
public static class SqlConnectionExtensions
{
public static SqlCommand CreateCommand(this SqlConnection connection,
// The handler should be the last argument.
// Where possible (i.e. non-async methods) it should usually be a by-ref argument.
ref SqlCommandInterpolatedStringHandler handler)
{
// We must use ToStringAndClear(), not ToString(), to ensure we release resources
var commandText = handler.ToStringAndClear();
// Create the command and add the parameters stored in the handler
var cmd = new SqlCommand(commandText, connection);
cmd.Parameters.AddRange(handler.Parameters);
return cmd;
}
}
The generated SQL command text looks like this:
SELECT OptionValue FROM Options WHERE OptionSet = @Param0 AND OptionName = @Param1
It’s that easy! Okay, maybe not quite easy, but still very powerful. Of course, there’s also a lot of room for improvement on this quick example.
- The parameter array could come from the array pool, though I’d want to measure that with benchmarks to be sure it’s advantageous
- An overload that takes a stack-allocated
Span<SqlParameter>
(probably overkill given all the other heap allocations and boxing related toSqlParameter
) - A pre-built, static set of parameter names for reuse
- Support for more parameter types
- Using format strings to specify parameter types, i.e. varchar vs nvarchar
- Another static method to create and execute the command rather than just create it
- And probably much more I haven’t considered
Other Random Ideas
Here are a few random ideas. Some of these ideas are probably be silly in practice. I’m hoping they’ll get everyone’s creative juices flowing.
Please, don’t write back telling me how dumb the ideas are :). If you have any other ideas, I’d love to see them in the comments!
Random Idea #1: JSON Building Without POCOs
This idea is interesting, but the double curly braces get a bit gnarly. Also, in most cases you probably want a POCO, but I can see simple scenarios where POCOs are overkill. This example also assumes that the literals are parsed at runtime to add double quotes around attribute names. This probably doesn’t have great performance, so it’s more of a thought experiment, but who knows.
public string GetPersonJson(string name, int age, IEnumerable<Child> children)
{
// name will get wrapped in quotes and special characters escaped, age would be a plain number,
// children gets serialized as an array
return JsonInterpolatedSerializer.Serialize($"{{name:{name},age:{age},children:{children}}}");
}
Random Idea #2: Safely Building HTML
In most cases, we build HTML in Razor views or pages. However, sometimes we need to build HTML in code. This can be done with a TagBuilder, but that can feel unwieldy. Building a string that just applies appropriate escaping would be nice. It could even use a TagBuilder under the hood. Though it probably wouldn’t be quite as performant given that the literals would need parsing.
public IHtmlContent CreateParagraph(string content, string class)
{
// name will get wrapped in quotes and special characters escaped, age would be a plain number,
// children gets serialized as an array
return HtmlInterpolatedBuilder.Build($"<p class={class}>{content}</p>");
}
Random Other Idea #3: Hash Codes
This is an example that doesn’t involve strings at all. When overriding object.Equals(object other)
its generally accepted that
you should also override object.GetHashCode()
. However, calculating a hash code for your object may be cumbersome, especially
if there are a lot of fields. System.HashCode makes
this process easier, but still uses a lot of boilerplate.
public override int GetHashCode()
{
// I know, I know, there's also HashCode.Combine<...>(...)
// But that method can't override the comparer, and is limited to 8 fields.
var hash = new HashCode();
hash.Add(fieldA);
hash.Add(fieldB);
hash.Add(fieldC, StringComparer.OrdinalIgnoreCase);
return hash.ToHashCode();
}
What if we had this syntax:
// In theory, if AppendLiteral is an inlined no-op JIT should optimize away the call
// This would drop any literal segments, such as whitespace, without perf penalty.
// Just a theory, I haven't tested it.
// Also, note the use of "i" as a format string, which in this case indicates case-insensitive comparison.
public override int GetHashCode() => HashCodeInterpolated.Calculate($"{fieldA} {fieldB} {fieldC:i}");
Conclusion
In conclusion, I think the .NET community, especially library developers, should embrace the idea that interpolated string handlers can be valuable for much more than just boosting string building performance. They open up a wide range of exciting new possibilities.
Comments