In an attempt to improve the performance of Fredis.net, to bring it as close a possible to that of the Microsoft Open Tech version of Redis, I implemented three different versions of async message processing. These different async implementations have large differences in performance. The three implementations are
1. 100% async using F# computation expressions, all socket/stream reads and writes are async.
2. hybrid ‘async at the borders’, the first socket read of an incoming message and the final write/flush of a reply are async, all other reads and writes are synchronous.
3. 100% async using SocketAsyncEventArgs, adapted to work with F# Async computation expressions.
The graphs below show the number of requests per second Fredis.net can process for the PING_INLINE, PING_BULK, GET, SET, INCR and MSET commands, for each type of async IO. The number of clients ranges from 1 to 1024. The data was generated by redis-benchmark running on the same machine as Fredis.net.
Surprisingly, to me at least, the hybrid-async/sync option was faster than using fully asynchronous socketAsyncEventArgs (except for PingInline, which i think is a special case, as it does only requires three read/write ops). I suspect what happens is that the first async read pulls-in more bytes than asked for, subsequent synchronous reads are fast as the data is already available and sync reads do not pay the costs of async. Similarly sync writes may be buffered but not sent, before an async flush triggers the socket write op. Because this the code is async ‘at the borders’ there is no thread blocking while waiting for an incoming client message.
Async function calls do more work than the corresponding sync call due to their thread-hopping, continuation calling nature. To quantify async overhead* I used BenchmarkDotNet, and wrote a simple program that compares Stream.AsyncRead, which returns an F# Async, and Stream.Read. I also benchmarked Stream.ReadAsync, which returns a TPL Task, and C# async/await because why not. This benchmark is not intended to measure the advantages of async IO, there is no IO being performed. An array of bytes was written to a MemoryStream, then MemoryStream sync and async read functions were timed by BenchmarkDotNet. (code is at the end of this article)
F# benchmark results Type=BenchmarkSyncVsAsync Mode=Throughput Method | ArraySize | Median | StdDev | ---------- |---------- |----------- |---------- | Read | 256 | 3.3425 ns | 0.0395 ns | AsyncRead | 256 | 15.6435 ns | 0.1872 ns | ReadAsync | 256 | 12.5468 ns | 0.2811 ns | Read | 1024 | 3.3362 ns | 0.0280 ns | AsyncRead | 1024 | 15.5780 ns | 0.1480 ns | ReadAsync | 1024 | 12.3052 ns | 0.0501 ns | Read | 4096 | 3.3496 ns | 0.0329 ns | AsyncRead | 4096 | 15.6718 ns | 0.1052 ns | ReadAsync | 4096 | 12.3786 ns | 0.0985 ns | Read | 16384 | 3.3665 ns | 0.0347 ns | AsyncRead | 16384 | 15.7203 ns | 0.5591 ns | ReadAsync | 16384 | 12.4000 ns | 0.3800 ns | Read | 65536 | 3.3617 ns | 0.0399 ns | AsyncRead | 65536 | 15.7170 ns | 0.1364 ns | ReadAsync | 65536 | 12.4067 ns | 0.0872 ns | C# async/await stream read benchmark results Type=CsBenchmarkAsyncAwait Mode=Throughput Method | ArraySize | Median | StdDev | ------------ |---------- |----------- |---------- | CsAsyncRead | 256 | 13.3116 ns | 0.2075 ns | CsAsyncRead | 1024 | 13.2593 ns | 1.6684 ns | CsAsyncRead | 4096 | 13.2188 ns | 0.1407 ns | CsAsyncRead | 16384 | 13.2381 ns | 0.1144 ns | CsAsyncRead | 65536 | 13.2687 ns | 0.2302 ns |
The benchmark shows that sync reads are roughly 5x faster than async reads for x64 applications, which might explain why the hybrid async/sync approach is faster.
Notes
BenchmarkDotNet system config output
BenchmarkDotNet=v0.9.7.0 OS=Microsoft Windows NT 6.2.9200.0 Processor=Intel(R) Core(TM) i7-4960HQ CPU 2.60GHz, ProcessorCount=8 Frequency=2533209 ticks, Resolution=394.7562 ns, Timer=TSC HostCLR=MS.NET 4.0.30319.42000, Arch=64-bit RELEASE [RyuJIT] JitModules=clrjit-v4.6.1080.0
F# Read vs AsyncRead vs ReadAsync benchmark code
type BenchmarkSyncVsAsync () = let memStrm:MemoryStream = new MemoryStream() let mutable dst:byte array = null [<Params(256, 1024, 4096, 16384, 65536)>] member val public ArraySize = 0 with get, set [<Setup>] member this.Setup () = let arr = Array.zeroCreate<byte> this.ArraySize let rnd = System.Random() rnd.NextBytes arr memStrm.Write(arr, 0, this.ArraySize) dst <- Array.zeroCreate<byte> this.ArraySize [<Benchmark>] member this.Read () = memStrm.Read( dst, 0, this.ArraySize ) [<Benchmark>] member this.AsyncRead () = async{ return! memStrm.AsyncRead ( dst, 0, this.ArraySize ) } [<Benchmark>] member this.ReadAsync () = let tsk = memStrm.ReadAsync ( dst, 0, this.ArraySize ) tsk.Wait() tsk.Result [<EntryPoint>] let Main args = BenchmarkRunner.Run<BenchmarkSyncVsAsync>() |> ignore 0
C# async/await benchmark code
public class CsBenchmarkAsyncAwait { [Params(256, 1024, 4096, 16384, 65536)] public int ArraySize { get; set; } private byte[] dst; private MemoryStream memStrm = new MemoryStream(); [Setup] public void Setup() { dst = new byte[ArraySize]; var src = new byte[ArraySize]; var rnd = new Random(); rnd.NextBytes(src); memStrm.Write(src, 0, ArraySize); } private async Task<int> ReadAsync() { var tsk = memStrm.ReadAsync(dst, 0, ArraySize); var numBytes = await tsk; return numBytes; } [Benchmark] public int CsAsyncRead() { var tsk = memStrm.ReadAsync(dst, 0, ArraySize); tsk.Wait(); return tsk.Result; } } class Program { static void Main(string[] args) { BenchmarkRunner.Run<CsBenchmarkAsyncAwait>(); } }
*disclaimer, computation expressions and async IO are wonderful, just because I say they have a cost does not mean I am against their use.