undefined | Better HN

0 pointseieio11mo ago0 comments

So sometimes I don't test these projects that much but I did this time. Here are a few thoughts:

My biggest goal was "make sure that my bottleneck is serialization or syscalls for sending to the client." Those are both things I can parallelize really well, so I could (probably) scale my way out of them vertically in a pinch.

So I tried to pick an architecture that would make that true; I evaluated a ton of different options but eventually did some napkin math and decided that a 64-million uint64 array with a single mutex was probably ok[1].

To validate that I made a script that spins up ~600 bots, has 100 of them slam 1,000,000 moves through the server as fast as possible, and has the other 500 request lots of reads. This is NOT a perfect simulation of load, but it let me take profiles of my server under a reasonable amount of load and gave me a decent sense of my bottlenecks, whether changes were good for speed, etc.

I had a plan to move from a single RWMutex to a row-locking approach with 8,000 of them. I didn't want to do this because it's more complicated and I might mess it up. So instead I just measure the number of nanos that I hold my mutex for and send that to a loki instance. This was helpful during testing (at one point my read lock time went up 10x!) but more importantly gave me a plan for what to do if prod was slow - I can look at that metric and only tweak the mutex if it's actually a problem.

I also took some free wins like using protobufs instead of JSON for websockets. I was worried about connection overhead so I moved to GET polling behind Cloudflare's cache for global resources instead of pushing them over websockets.

And then I got comfortable with the fact that I might miss something! There are plenty more measurements I could have taken (if there was money on the line I would have measured some things like "number of TCP connections sending 0 moves this server can support" but I was lazy) but...some of the joy of projects like this is the firefighting :). So I was just ready for that.

Oh and finally I consulted with some very talented systems/performance engineer friends and ran some numbers by them as a sanity check.

It looks like this was way more work than I needed to do! I think I could comfortable 25x the current load and my server would be ok. But I learned a lot and this should all make the next project faster to make :)

[1] I originally did my math wrong and modeled the 100x100 snapshots I send to clients as 10,000 reads from main memory instead of 100 copies of 100 uint64s, which lead me down a very different path... I'm not used to thinking about this stuff!

0 comments

phatskat11mo ago

> To validate that I made a script that spins up ~600 bots

Funny, when I went there were just over 600 active players and things were running super smoothly, even on my mobile. Kudos!

Do you see this project and the things you’ve tried applying to other future projects?

eieioOP11mo ago

Hah, yes, but for testing I removed all my rate limits so I pushed 1 million moves in 2 or 3 seconds, whereas now I think I rate limit people to like 3 or 4 moves a second (which is beyond what I can achieve on a trackpad going as fast as I can!) so the test isn't quite comparable!

I definitely learned a lot here. Most of my projects like this are basically just "give the internet access to my computer's memory but with rules." And now I think I've got a really good framework for doing that performantly in golang, which should make the next set of projects like this much quicker to implement.

I also just...know how to write go now. Which I did not 6 weeks ago. So that's nice.

cheekyfleek11mo ago

You ain't the only one who's removed the rate limits lol. Some of these queens are clearing a whole board in like 3s, must've written something to keep a piece selected. This is turning into a race to the godliest piece hackathon.

1 more reply

phatskat11mo ago

Six weeks is pretty quick! Can I ask what editor you use (always curious), and what other languages you have a background in?

1 more reply

sebmellen11mo ago

Makes sense that you’re a Jane Street alum. Damn cool stuff.

j / k navigate · click thread line to collapse

0 pointseieio11mo ago0 comments

So sometimes I don't test these projects that much but I did this time. Here are a few thoughts:

Oh and finally I consulted with some very talented systems/performance engineer friends and ran some numbers by them as a sanity check.

0 comments

phatskat11mo ago

> To validate that I made a script that spins up ~600 bots

Funny, when I went there were just over 600 active players and things were running super smoothly, even on my mobile. Kudos!

Do you see this project and the things you’ve tried applying to other future projects?

eieioOP11mo ago

I also just...know how to write go now. Which I did not 6 weeks ago. So that's nice.

cheekyfleek11mo ago

1 more reply

phatskat11mo ago

Six weeks is pretty quick! Can I ask what editor you use (always curious), and what other languages you have a background in?

1 more reply

sebmellen11mo ago

Makes sense that you’re a Jane Street alum. Damn cool stuff.

j / k navigate · click thread line to collapse