NDC Oslo 2024 Workshop: High Performance .NET Development

I was looking forward to the workshop Designing APIs with Irina Scurtu, but that one got cancelled. Thus, I took the opportunity to finally spend some time with performance optimisation and load testing. The workshop with Mark Rendle was an excellent substitute where I learned a lot.

Day 1: 1 Billion Row Challenge

The first day was all about the 1 billion row challenge. In this challenge we must read 1 billion rows from a file, do some parsing and check how long it takes. The straightforward approach took 3:46 minutes to run on my machine.

At the end of the day, we got that time down to 31 seconds. We used Spans, a memory-mapped file and various optimisations to make that significant improvement.

Day 2: Further optimisations and web performance testing

In the morning, we got rid of floats altogether and got the runtime down to 9 seconds. Who had thought that parsing a float out of a string took that much time.

After the break we finally got to the part I was most interested in: web performance. We explored various approaches, and I was impressed by the massive impact we got from defining a JsonSerializerContext and prepare that part at compilation time. I had hoped for a bit more content in this part, but after two days of performance optimisation my head was full.

BenchmarkDotNet for micro-benchmarks

To measure the impact of our changes, we used BenchmarkDotNet. This is a great tool for micro-benchmarks, were we want to know if method A, B or C is faster.

We made a console application with this line to run the different benchmarks we created:

using System.Reflection;
using BenchmarkDotNet.Running;
using OneBRC.Benchmarks;

BenchmarkSwitcher.FromAssembly(Assembly.GetExecutingAssembly()).Run(args);

using System.Reflection;

using BenchmarkDotNet.Running;

using OneBRC.Benchmarks;

BenchmarkSwitcher.FromAssembly(Assembly.GetExecutingAssembly()).Run(args);

A benchmark can look like this:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using BenchmarkDotNet.Attributes;

namespace MinimalBenchmark
{
    [DisassemblyDiagnoser]
    public class LoopBenchmarks
    {
        public static readonly List<int> Data = new List<int>{0, 1, 2, 3, 4, 5, 6, 7, 8, 9};

        [Benchmark(Baseline = true)]
        public int For()
        {
            var total = 0;
            for (int i = 0; i < Data.Count; i++)
            {
                total += Data[i];
            }

            return total;
        }

        [Benchmark]
        public int Foreach()
        {
            var total = 0;
            foreach (var value in Data)
            {
                total += value;
            }
            
            return total;
        }

        [Benchmark]
        public int While()
        {
            var total = 0;
            var counter = 0;
            while (counter < Data.Count)
            {
                total += Data[counter];
                counter++;
            }

            return total;
        }
    }
}

using System;

using System.Collections.Generic;

using System.Linq;

using System.Text;

using System.Threading.Tasks;

using BenchmarkDotNet.Attributes;

namespace MinimalBenchmark

{

[DisassemblyDiagnoser]

public class LoopBenchmarks

{

public static readonly List<int> Data = new List<int>{0, 1, 2, 3, 4, 5, 6, 7, 8, 9};

[Benchmark(Baseline = true)]

public int For()

{

var total = 0;

for (int i = 0; i < Data.Count; i++)

{

total += Data[i];

}

return total;

}

[Benchmark]

public int Foreach()

{

var total = 0;

foreach (var value in Data)

{

total += value;

}

return total;

}

[Benchmark]

public int While()

{

var total = 0;

var counter = 0;

while (counter < Data.Count)

{

total += Data[counter];

counter++;

}

return total;

}

When we run the project with dotnet run -c Release, we get a nice output table (after BenchmarkDotNet run our methods thousands of times):

The for loop is our baseline, foreach is two times slower and while is a bit faster

Be aware that 1 nanosecond is 0.000000001 second.

NBomber for web performance tests

For the web performance testing we used NBomber, a tool that is free for personal use. We created scenarios to access various endpoints and then looked at the collected OpenTelemetry data to see how long our request take.

using NBomber.CSharp;

var client = new HttpClient
{
    BaseAddress = new Uri("https://localhost:7042/")
};

string[] artists = ["Muse", "Nirvana", "Tool", "U2"];

var controllerScenario = Scenario.Create("controller", async context =>
    {
        var i = Random.Shared.Next(4);
        var artist = artists[i];
        var response = await client.GetAsync($"/artists/query?name={artist}");
        return response.IsSuccessStatusCode ? Response.Ok() : Response.Fail();
    })
    .WithWarmUpDuration(TimeSpan.FromSeconds(10))
    .WithLoadSimulations(
        Simulation.Inject(rate: 1,
            interval: TimeSpan.FromSeconds(1),
            during: TimeSpan.FromSeconds(30))
    );

var minApiScenario = Scenario.Create("minapi", async context =>
    {
        var i = Random.Shared.Next(4);
        var artist = artists[i];
        var response = await client.GetAsync($"/min/artists/query?name={artist}");
        return response.IsSuccessStatusCode ? Response.Ok() : Response.Fail();
    })
    .WithWarmUpDuration(TimeSpan.FromSeconds(10))
    .WithLoadSimulations(
        Simulation.Inject(rate: 1,
            interval: TimeSpan.FromSeconds(1),
            during: TimeSpan.FromSeconds(30))
    );

NBomberRunner.RegisterScenarios(controllerScenario, minApiScenario).Run();

using NBomber.CSharp;

var client = new HttpClient

{

BaseAddress = new Uri("https://localhost:7042/")

};

string[] artists = ["Muse", "Nirvana", "Tool", "U2"];

var controllerScenario = Scenario.Create("controller", async context =>

{

var i = Random.Shared.Next(4);

var artist = artists[i];

var response = await client.GetAsync($"/artists/query?name={artist}");

return response.IsSuccessStatusCode ? Response.Ok() : Response.Fail();

})

.WithWarmUpDuration(TimeSpan.FromSeconds(10))

.WithLoadSimulations(

Simulation.Inject(rate: 1,

interval: TimeSpan.FromSeconds(1),

during: TimeSpan.FromSeconds(30))

);

var minApiScenario = Scenario.Create("minapi", async context =>

{

var i = Random.Shared.Next(4);

var artist = artists[i];

var response = await client.GetAsync($"/min/artists/query?name={artist}");

return response.IsSuccessStatusCode ? Response.Ok() : Response.Fail();

})

.WithWarmUpDuration(TimeSpan.FromSeconds(10))

.WithLoadSimulations(

Simulation.Inject(rate: 1,

interval: TimeSpan.FromSeconds(1),

during: TimeSpan.FromSeconds(30))

);

NBomberRunner.RegisterScenarios(controllerScenario, minApiScenario).Run();

Other helpful tools

On SharpLab.io we can look behind the syntactic sugar of C# and see in what code different language features get translated before the compiler turns them into IL code. This explains why a while loop is faster than a for loop:

Our for loop is translated into a while loop before it is handed to the compiler.

If you like the REPL in Python, you should get C# REPL:

dotnet tool install -g csharprepl

1	dotnet tool install -g csharprepl

This gives us a REPL for C# running on .Net 8. We can use it to experiment with code and explore libraries while we can profit from syntax highlighting and code completion:

The REPL offers us a lot of helpful features like code completion.

To view the OpenTelemetry data, Aspire Dashboard is a great help. We can use it to see how long a request took and in what parts it waited on external services.

Do not forget your tests

The exercises in this workshop showed the importance of having tests, even for the methods you try something out. It does not help much if your fastest method cannot deliver the correct result. And a mistake is easily done, then the simple code we had in our baseline is something we leave behind as we go for more optimised versions of our algorithms.

Conclusion

The workshop made a deep dive into performance optimisation. We saw a lot of helpful tactics that can help to speed up our code. The exercises were realistic and if I need to optimise an application, I have a good set of starting points.

The only thing that was missing is how to structure your approach to performance optimisation. Without that, there is always the risk to run down a rabbit hole and go far deeper than we wanted to go. To prevent that, I like the approach of Daniel Marbach that he presents in his talk “The performance loop—A practical guide to profiling and benchmarking“.

NDC Oslo 2024 Workshop: High Performance .NET Development

Day 1: 1 Billion Row Challenge

Day 2: Further optimisations and web performance testing

BenchmarkDotNet for micro-benchmarks

NBomber for web performance tests

Other helpful tools

Do not forget your tests

Conclusion

Like this:

Related

Leave a Comment Cancel reply

Day 1: 1 Billion Row Challenge

Day 2: Further optimisations and web performance testing

BenchmarkDotNet for micro-benchmarks

NBomber for web performance tests

Other helpful tools

Do not forget your tests

Conclusion

Share this:

Like this:

Related

Leave a Comment Cancel reply