Using Roslyn to Power C# SDK Generation from OpenAPI Specifications
This blog is one of The December 9th entries on the 2022 C# Advent Calendar. Believe it or not, this is my 6th year participating. Thanks for having me again Matt and Calvin!
The OpenAPI set of standards (formerly known as Swagger) for defining HTTP-based APIs is a great set of tools. At CenterEdge Software we normally use OpenAPI 3 specifications to describe many of our services, both internal and external, making it easy for applications to reach those services. We also typically use SDK generators to create C# SDKs directly from the API specifications.
One day I asked myself “Hey, I wonder if I can use Roslyn to make an even better, faster OpenAPI SDK generator?” A couple years later and Yardarm is a real thing and the go-to choice at CenterEdge for C# SDK generation.
What is Roslyn
For those that don’t know, Roslyn is the modern C# compiler which is itself written in C# (dogfooding at its finest). It powers everything from compilation to C# source generators to the syntax hints in Visual Studio.
There is a lot to Roslyn, it offers an incredible depth of functionality. But when it comes to its primary functionality, compiling a project, you can think of it as having three main components:
- A set of types to represent source code as an immutable syntax tree
- A parser that can read source code and turn it into a syntax tree
- A compiler that takes a syntax tree and produces output DLL files (and other related files like PDB debug files)
For our purposes, we are primarily interested in creating a syntax tree. Here is some example code which builds a part of a syntax tree to define a method in a class:
MethodDeclaration(
attributeLists: default, // No attributes
modifiers: SyntaxTokenList.Create(Token(SyntaxKind.PublicKeyword)), // Public keyword
returnType: PredefinedType(Token(SyntaxKind.StringKeyword)), // Returns a string
explicitInterfaceSpecifier: null,
identifier: Identifier("BuildUri"), // Name of the method
typeParameterList: default, // No type parameters (this isn't a generic method)
parameterList: ParameterList(), // No parameters so provide an empty list
constraintClauses: default, // No generic type constraints
body: Block(
// List of statements within the block body of the method
ReturnStatement(LiteralExpression(SyntaxKind.StringLiteralExpression, Literal("uri")))
),
expressionBody: null, // If we're using => expression syntax instead of a { ... } block, this would go here
semicolonToken: default // No trailing semicolon since we're not using expression syntax
);
This produces the following C# code:
public string BuildUri()
{
return "uri";
}
This seems very verbose, but there are also plenty of overloads that help with shorter code. I included a more complete example for clarity.
A single syntax tree, from the root node down, covers every detail behind a single C# code file from using statements
to class declarations to the statements in a method body. A collection of syntax trees, combined with other
information like referenced assemblies and compiler options, make up a CSharpCompilation
.
A note for all the VB and F# folks: Roslyn isn’t only for C#, it’s designed to be language agnostic and also powers other languages in the .NET ecosystem.
Oh, and here’s a great online tool for turning C# into Roslyn creation code.
Performance
The Roslyn C# compiler is fast, Microsoft and others have spent an inordinate amount of time optimizing it. The above approach to generating code works to our advantage when we use it during SDK generation.
A typical SDK generator works using this flow:
- Apply templates to the OpenAPI specification to generate C# files
- Write the C# files and a supporting
csproj
file to disk - Run the Roslyn compiler as a separate process
- Roslyn reads all the files from disk
- Roslyn runs the files through a parser to generate a syntax tree
- Roslyn generates the compiled output from the syntax tree and writes to disk
Alternatively, Yardarm works using this much shorter flow:
- Generate a syntax tree in memory directly from the OpenAPI specification
- Roslyn generates the compiled output from the syntax tree and writes to disk
This avoids the expense of launching two separate processes, the expense of parsing/rendering text-based templates, and the expense of writing and then reading the code from disk followed by a C# parse. In exchange, we simply need to build the syntax tree ourselves.
Extensibility
Earlier, I mentioned that a syntax tree, once built, is immutable. This is technically true but practically
false. The classes and structures themselves are immutable once created, making the tree immutable, but a new
tree can be created based the existing tree. Roslyn provides a variety of helper methods designed to make this
easier such as ReplaceNode
, ReplaceNodes
, Add...
, and With...
.
Yardarm uses this to its advantage, using a combination of the visitor pattern, aggregators, and dependency injection to allow easy customization of the generated code. I call the types “enrichers”, and their purpose is to enrich a particular syntax tree (or subset of a syntax tree) by examining it and either returning the original tree or creating a replacement. This is also a much more powerful tool for extensibility than typical template engine approaches, which often struggle with defining enough extension points or require replacing entire swaths of templates for a simple change.
This approach is used extensively within the Yardarm internals in addition to allowing the injection
of custom extensions. This provides better separation of concerns within the Yardarm source code. Here is an
example built-in enricher which adds nullable annotations or default initialization to properties which represent
request parameters. An IOpenApiSyntaxNodeEnricher<PropertyDeclarationSyntax, OpenApiParameter>
is applied to nodes in the tree which are A) property declarations and B) were generated by a parameter definition
on an OpenAPI request.
using Microsoft.CodeAnalysis;
using Microsoft.CodeAnalysis.CSharp.Syntax;
using Microsoft.OpenApi.Interfaces;
using Microsoft.OpenApi.Models;
using Yardarm.Helpers;
using static Microsoft.CodeAnalysis.CSharp.SyntaxFactory;
namespace Yardarm.Enrichment.Requests;
public class RequiredParameterEnricher : IOpenApiSyntaxNodeEnricher<PropertyDeclarationSyntax, OpenApiParameter>
{
public PropertyDeclarationSyntax Enrich(PropertyDeclarationSyntax syntax, OpenApiEnrichmentContext<OpenApiParameter> context)
{
if (!context.Element.Required || context.Element.Schema.Nullable)
{
// If the parameter is optional OR nullable then the property should be nullable
// This is because .NET does not have a good way to differentiate between missing and null
syntax = syntax.MakeNullable();
}
else
{
// The value needs to be initialized to avoid nullable ref type warnings
syntax = syntax.MakeNullableOrInitializeIfReferenceType(context.Compilation);
}
if (context.Element.Required)
{
// Explicitly annotate as required if the parameter is required
syntax = AddRequiredAttribute(syntax);
}
return syntax;
}
private PropertyDeclarationSyntax AddRequiredAttribute(PropertyDeclarationSyntax syntax) =>
syntax.AddAttributeLists(
AttributeList(SingletonSeparatedList(
Attribute(WellKnownTypes.System.ComponentModel.DataAnnotations.RequiredAttribute.Name)))
.WithTrailingTrivia(ElasticCarriageReturnLineFeed));
}
This class also uses several Yardarm-specific extension methods which perform some common actions on a syntax tree.
Of particular interest is the MakeNullableOrInitializeIfReferenceType
extension method, which makes
use of the SemanticModel
.
The SemanticModel
is an analysis of the syntax trees, referenced assemblies, etc to determine additional
information which isn’t apparent from the syntax tree itself. In this example it is used to determine if the type
of the property is a value type or a reference type as well as what constructors are available for that type.
/// <summary>
/// If the given property is a reference type and is not initialized, it should *either* be marked as nullable or
/// be initialized to a non-null value. This method will do it's best to initialize the property using a default constructor
/// or empty string, and failing that will mark the type as nullable.
/// </summary>
/// <param name="property">The <see cref="PropertyDeclarationSyntax"/> to update. Must be on a <see cref="SyntaxTree"/>.</param>
/// <param name="semanticModel"><see cref="SemanticModel"/> used to perform type analysis.</param>
/// <returns>The mutated property declaration, or the original if no mutation was required.</returns>
public static PropertyDeclarationSyntax MakeNullableOrInitializeIfReferenceType(this PropertyDeclarationSyntax property,
SemanticModel semanticModel)
{
if (property == null)
{
throw new ArgumentNullException(nameof(property));
}
if (semanticModel == null)
{
throw new ArgumentNullException(nameof(semanticModel));
}
if (property.Initializer != null || property.ExpressionBody != null)
{
// No need if already initialized or expression body only
return property;
}
if (property.Type is NullableTypeSyntax)
{
// Already nullable
return property;
}
var typeInfo = semanticModel.GetTypeInfo(property.Type);
if (typeInfo.Type?.IsReferenceType ?? false)
{
if (typeInfo.Type.SpecialType == SpecialType.System_String)
{
// Initialize to an empty string
property = property
.WithInitializer(EqualsValueClause(SyntaxHelpers.StringLiteral("")))
.WithSemicolonToken(Token(SyntaxKind.SemicolonToken));
}
else if (!typeInfo.Type.IsAbstract && typeInfo.Type.GetMembers()
.Where(p => p.Kind == SymbolKind.Method && p.Name == ".ctor")
.Cast<IMethodSymbol>()
.Any(p => p.Parameters.Length == 0 && p.DeclaredAccessibility == Accessibility.Public))
{
// Build a default object using the default constructor
property = property
.WithInitializer(EqualsValueClause(ObjectCreationExpression(property.Type)))
.WithSemicolonToken(Token(SyntaxKind.SemicolonToken));
}
else
{
// Mark the types as nullable, even if the parameter is required
// This will encourage SDK consumers to check for nulls and prevent NREs
property = property.MakeNullable();
}
}
return property;
}
Extensibility and JSON Serialization
There is a lot of upheaval in the JSON space in .NET since the arrival of System.Text.Json. Newtonsoft.Json has been the go-to serializer for years and is very feature rich, but System.Text.Json is maintained by Microsoft and offers some significant performance benefits. As a result, I wanted Yardarm to have robust support for both of these options (or any other option someone may prefer).
Therefore, JSON serialization within Yardarm is provided as an extension to the core Yardarm implementation.
When generating an SDK, simply add one of the two extensions and they add all necessary annotation attributes
and wire themselves up as the serializer for application/json
and other related content types.
This also means that support for other formats, such as XML, could be added via extensions.
Embedded Source Code
Visual Studio also offers a great set of features around debugging third-party DLL files. One such feature is SourceLink, which links the compiled code back to lines in the original source files on GitHub, GitLab, etc. However, since Yardarm generates code directly using Roslyn, how can we provide the source code with the SDK for debugging purposes?
The answer: we directly embed the source in the PDB file (optionally, of course). However, the C# source code generated doesn’t tend to be very legible. We’re generating the syntax tree without all the nice things like, you know, whitespace.
publicstringMyMethod(){returncodethatlookslikethis+isTERRIBLYhardtoread;}
Yardarm addresses this using another enricher, run only if source embedding is enabled, which applies formatting.
It uses Formatter.FormatAsync
to do the heavy lifting, which is (I believe) the formatter used internally by
Visual Studio when you autoformat a file.
public class FormatCompilationEnricher : ICompilationEnricher
{
private readonly YardarmGenerationSettings _settings;
private readonly ILogger<FormatCompilationEnricher> _logger;
public Type[] ExecuteAfter { get; } =
{
typeof(VersionAssemblyInfoEnricher),
typeof(SyntaxTreeCompilationEnricher),
typeof(DefaultTypeSerializersEnricher),
typeof(OpenApiCompilationEnricher),
typeof(ResourceFileCompilationEnricher)
};
public FormatCompilationEnricher(YardarmGenerationSettings settings,
ILogger<FormatCompilationEnricher> logger)
{
ArgumentNullException.ThrowIfNull(settings);
ArgumentNullException.ThrowIfNull(logger);
_settings = settings;
_logger = logger;
}
public async ValueTask<CSharpCompilation> EnrichAsync(CSharpCompilation target,
CancellationToken cancellationToken = default)
{
if (!_settings.EmbedAllSources)
{
// Don't bother formatting if we're not embedding source
return target;
}
var stopwatch = Stopwatch.StartNew();
using var workspace = new AdhocWorkspace();
var solution = workspace
.AddSolution(
SolutionInfo.Create(
SolutionId.CreateNewId(_settings.AssemblyName),
VersionStamp.Default));
Project project =
solution.AddProject(_settings.AssemblyName, _settings.AssemblyName + ".dll",
LanguageNames.CSharp);
workspace.TryApplyChanges(solution);
// Exclude files with no path (won't be embedded)
// We still format resource files, which are typically already formatted, because they may have
// been mutated by other enrichers.
IEnumerable<SyntaxTree> treesToBeFormatted = target.SyntaxTrees
.Where(static p => p.FilePath != "" && p.HasCompilationUnitRoot);
// Process formatting in parallel, this gives a slight perf boost
object lockObj = new();
await Parallel.ForEachAsync(treesToBeFormatted, cancellationToken,
async (syntaxTree, localCt) =>
{
SyntaxNode root = await syntaxTree.GetRootAsync(localCt);
Document document = project.AddDocument(Guid.NewGuid().ToString(), root);
document = await Formatter.FormatAsync(document, solution.Options, cancellationToken);
SyntaxNode? newRoot = await document.GetSyntaxRootAsync(localCt);
if (newRoot is not null && newRoot != root)
{
lock (lockObj)
{
target = target.ReplaceSyntaxTree(syntaxTree,
syntaxTree.WithRootAndOptions(newRoot, syntaxTree.Options));
}
}
});
stopwatch.Stop();
_logger.LogInformation("Sources formatted for embedding in {elapsed}ms", stopwatch.ElapsedMilliseconds);
return target;
}
}
Other Functionality
Yardarm has a LOT of other functionality. Too much to cover in this blog post, which focuses on the Roslyn aspects of Yardarm. However, I’ll try to list a few of the key points here.
- Allows targeting many variants of .NET, including .NET 6 and 7
- .NET Framework 4.6.1 and later is supported via .NET Standard 2.0
- Generates DLLs, PDB debug files, XML documentation files, and reference assemblies
- XML documentation is extracted from the documentation in the OpenAPI specification
- Directly generates a NuGet package which includes the above files, including support for multi-targeting
- Uses many modern compiler features
- For example, when targeting .NET 6 it will use string interpolation handlers for more efficient string building
- Supports many great patterns and practices
- Includes a built-in extension which supports DI registration using
HttpClientFactory
- Generated SDKs are asynchronous from top to bottom
- Makes use of polymorphism to handle the complexities of requests, responses, and discriminated schemas in a way that (usually) won’t break over time as the specification changes
- Interfaces are generated to make mocking for unit tests easy
- Includes a built-in extension which supports DI registration using
- Available as a .NET Global Tool, a Docker image, or an MSBuild SDK
- The MSBuild SDK approach is particularly cool, you can build SDKs as a project in your solution
- Automatically uses NuGet to download dependencies for compilation, such as
Newtonsoft.Json
or any other dependency defined by an extension- Yardarm even executes any Roslyn 4 source generators included in the dependencies
Future Work
Yardarm is in use in production, but is still a work in progress. Future plans include:
- Extensions for serializing/deserializing dates and times using NodaTime
- Refactor System.Text.Json polymorphism to use the new built-in support in .NET 7
- Support for link-level trimming when used with System.Text.Json
There are also doubtless API specifications in the wild with use cases which don’t work well today. I’d love to hear about them in Issues so Yardarm can continue to improve.
Conclusion
Hopefully Yardarm showcases the power and flexibility of Roslyn, above and beyond basic compilation and cool code refactoring in Visual Studio. Version 0.3.0 is available and ready for use, and I look forward to feedback from the .NET community on the project.
Comments