Linux extreme performance H1 load generator (opens in new tab)

(gcannon.org)

30 pointsMDA2AV2mo ago16 comments

16 comments

12 comments · 4 top-level

Veserv2mo ago· 3 in thread

What is the point of making up claims of "extreme" performance without any accompanying benchmarks or comparisons?

It really should be shameful to use unqualified adjectives in headline claims without also providing the supporting evidence.

MDA2AVOP2mo ago

I agree, I'll try adding some. We use the tool on a benchmarking platform so we run this thing hundreads of times daily and did dozens of tests against pretty much every other load generator (that I know of). Numbers are also always tied to the hardware where you run it and typically benchmarks provided by the maintainer himself are always biased and won't match what you get though.

I personally never care about benchmarks presented, it's much better to use and see for myself so didn't think much about having a table with values there but I can understand how it may help.

raks6192mo ago

did you scroll down?

ziml772mo ago

I did and I still didn't see any numbers. Just a bunch of AI generated text about why it's supposedly fast. It even says it records numbers multiple times, so why aren't there any presented?

0x000xca0xfe2mo ago· 2 in thread

Interesting, I made something similar years ago when io_uring wasn't around yet and it is just a couple threads blocking on sendfile: https://github.com/evelance/sockbiter

Of course it needs to pre-generate the file and you need enough RAM for both the server running and caching the file but it needs almost zero CPU during the test run and can probably produce even more load than this io_uring tool.

MDA2AVOP2mo ago

Very cool!

So I just tried your tool and it just hangs, I see you're sending close requests, is this configurable to keep-alive, or even better, nothing? In Http/1.1 keep-alive/close is better not used at all, never try to enforce this as it is not mandatory.

A lot of servers just ignore the close and don't close the connection (like the one I am using) so this can be the issue I am having.

0x000xca0xfe2mo ago

Cool, thanks for trying it.

Try the -shutwr option if the server doesn't close the connection itself. I used it to test lots of exotic implementations and there are weird things going on in overload situations and around connection management. NodeJS for example started dropping connections on localhost(!!) on high load.

The tool was built for high values of keepalive requests, if the server is too fast just use more requests, e.g. -n 1000000 or something similar. Unfortunately some servers close keepalive connections after quite few requests, nginx has a default of 1000 for example.

This is just a simple tool I hacked together as a student to collect some data, didn't spend any time making it more accessible/user friendly, sorry.

1 more reply

bawolff2mo ago· 2 in thread

Really stupid question from someone who doesnt know much about io_uring. Wouldn't doing all this i/o async make the latency measurements less accurate? How do you know when the i/o starts if you are submitting it async in batches of 2048?

tuetuopay2mo ago

The main difference with io_uring is you're not blocking the thread, just like O_NONBLOCK + epoll would, but don't have to rely on thread-level syscalls to do so: there's no expensive context switch to kernel mode. Using O_NONBLOCK + epoll is already async :)

In fact, in all cases, you don't know when the syscall actually starts execution even with regular calls. The only thing you're sure is the kernel "knows" about the syscall you want. However, you have absolutely no indication on whether it started to run or not.

The real question is: are the classical measures accurate? All we have is an upper bound on the time it took: I fired the write at t0 and finished reading the response at t1. This does not really change with io_uring. Batches will mostly change one fact: multiple measurements will share a t0, and possibly a t1 when multiple replies arrive at once.

Is it important? Yes and no. The most important thing in such benchmarks is for the added delay to be consistent between measurements, and when it starts to break down. So it's important if you're chasing every µs in the stack, but not if your goal is lowering the p99 which happens under heavy load. In this case, consistency between measurements is paramount in order to get histograms and such that make sense.

dijit2mo ago

Its not a stupid question.

Normally when I have run latency calculations in the past I run them from the perspective of the caller, not the server.

In most cases this is over the network, a named pipe or sock file.

I guess it should be possible to run multiple runtimes inside a program that run independently.

qcoudeyr2mo ago· 1 in thread

From my benchmark, i will keep using oha (https://github.com/hatoo/oha). Oha is more complete than gcannon and have similar req/s rate while handling ipv6, https, etc...

G3nt02mo ago

oha is one of the slowest load gen, you should look into h2load if you need h2/h3 support. I just tried oha and it pulls more CPU than the server I am testing, not to mention h2 and h3 results are just nonsense

j / k navigate · click thread line to collapse

16 comments

12 comments · 4 top-level

Veserv2mo ago· 3 in thread

What is the point of making up claims of "extreme" performance without any accompanying benchmarks or comparisons?

It really should be shameful to use unqualified adjectives in headline claims without also providing the supporting evidence.

MDA2AVOP2mo ago

I personally never care about benchmarks presented, it's much better to use and see for myself so didn't think much about having a table with values there but I can understand how it may help.

raks6192mo ago

did you scroll down?

ziml772mo ago

I did and I still didn't see any numbers. Just a bunch of AI generated text about why it's supposedly fast. It even says it records numbers multiple times, so why aren't there any presented?

0x000xca0xfe2mo ago· 2 in thread

Interesting, I made something similar years ago when io_uring wasn't around yet and it is just a couple threads blocking on sendfile: https://github.com/evelance/sockbiter

MDA2AVOP2mo ago

Very cool!

A lot of servers just ignore the close and don't close the connection (like the one I am using) so this can be the issue I am having.

0x000xca0xfe2mo ago

Cool, thanks for trying it.

This is just a simple tool I hacked together as a student to collect some data, didn't spend any time making it more accessible/user friendly, sorry.

1 more reply

bawolff2mo ago· 2 in thread

tuetuopay2mo ago

dijit2mo ago

Its not a stupid question.

Normally when I have run latency calculations in the past I run them from the perspective of the caller, not the server.

In most cases this is over the network, a named pipe or sock file.

I guess it should be possible to run multiple runtimes inside a program that run independently.

qcoudeyr2mo ago· 1 in thread

From my benchmark, i will keep using oha (https://github.com/hatoo/oha). Oha is more complete than gcannon and have similar req/s rate while handling ipv6, https, etc...

G3nt02mo ago

j / k navigate · click thread line to collapse