What happens when you make a move in lichess.org? (opens in new tab)

(davidreis.me)

359 pointsdreis_sw1y ago158 comments

158 comments

I wish this discussed the timing arbitration of each move. Based on the packet information (if that is correct & complete) then the timing is done entirely on the clients. However, they show the time in seconds which can't be right so I am curious how accurate this packet schema is (or if those are float values).

Regardless, one thing I find maddening about chess.com is the time architecture of the game. I haven't seen the underlying code, but it feels like the SERVER is tracking the time. This completely neglects transport time & latency meaning that 1s to move isn't really a second. Playing on the mobile client is an exercise in frustration if you are playing timed games and down to the wire. Even when you aren't, your clock will jump on normal moves and it is most obvious during the opening.

This could also be due to general poor network code as well. The number of errors I get during puzzles is also frustrating. Do they really not retry a send automatically?? <breath>

Chess.com has the brand and the names... but dang, the tech feels SO rough to me.

bluecalm1y ago

Chess.com software might be the worst public facing software ever assembled. During their most popular weekly tournament (by the number of spectators) called Titled Tuesday where significant % of the world elite regularly competes they send links on a public chat to a 3rd site every 4 rounds. The reason is that there is a few minutes break and they failed implementing a clock on their side so they need 3rd party service for that.

This is one of the many, many things but imo it's the most telling. They can't even add a clock counting down the 6 minutes to their web client.

luisgvv1y ago

I can't believe this, but it makes sense now lol I think I heard a streamer say it was for kicking out the cheaters

1 more reply

pshc1y ago

> it feels like the SERVER is tracking the time

TBH this is what I expected for all online chess. How else to reconcile the two players' differing clocks and also prevent client-side cheating?

MobileVet1y ago

I guess my naive frustration comes from crazy fps games tracking things so precisely and yet somehow Chess.com can’t handle a turn based game?! Honestly.

I do recognize that fps games utilize predictive algorithms and planning to estimate future player positions but still, turn based networking with 100ms accuracy should be a solved problem

2 more replies

nightowl_games1y ago

Netcode dev here. Predicting the clock is a trivially solved problem. The client and server know the latency between each other, the server can offset the timestamp on the input from the client to compensate for this difference, and the client can offset it's rendering of the clock data from the server. The same techniques used in regular online gaming would apply here. The only X factor here is the impact of the client lieing about its latency to the server, perhaps that could have an impact, not sure.

1 more reply

palata1y ago

> How else to reconcile the two players' differing clocks and also prevent client-side cheating?

Is there a point in preventing cheating, really? I can just make a bot...

MichaelZuo1y ago

It hasn’t been done client side in any pvp game I’ve heard of.

stevage1y ago

I'm pretty sure freechess.org did.

1 more reply

bongodongobob1y ago

Track the two clients pings? What client side cheating prevention would you need to do in chess? Afaik you can't cheat by clipping through walls or jumping around on the map.

2 more replies

pengowray1y ago

> they show the time in seconds which can't be right

Seems right.

If you export/download games from lichess, they use the .pgn (Portable Game Notation) format, which is a standard plain-text format circa 1993, used by pretty much everyone for describing a chess game.

Lichess follows the specification to the letter, and as it only technically allows one-second accuracy, lichess only record moves with one-second accuracy. It seems insane, but that's how they do it.

Chess.com also exports PGN files, but they add a decimal place, allowing subsecond accuracy. No one has a problem with this. There is no software which cannot handle this. But Lichess refuses to "break" the spec.

lichess PGN export example:

> 1. d3 { [%eval -0.15] [%clk 0:01:00] } 1... g6 { [%eval 0.04] [%clk 0:01:00] }

Chess.com PGN export example:

> 1. d4 {[%clk 0:02:58.6]} 1... b6 {[%clk 0:02:59.2]}

kibwen1y ago

> lichess only record moves with one-second accuracy

According to this blog post, this doesn't appear to be the case since at least 2017:

https://lichess.org/@/lichess/blog/a-better-game-clock-histo...

"Move times are now stored and displayed with a precision of one tenth of a second. The precision even goes up to one hundredth of a second, for positions where the player had less than 10 seconds left on their clock."

1 more reply

Scene_Cast21y ago

What I'd love is for my pre-moves to be sent to the server immediately so I don't time out when I pre-moved.

fbernier1y ago

What's interesting about this is chess.com allows you to stack as many pre-moves as you like but they each cost 0.1s, whereas on lichess you can only have one pre-move which is technically free but maybe not because of delay.

y-curious1y ago

The worst part is they call it an intentional choice. "First off, premoves take 0.1 seconds. That is what has been preferred and agreed upon by most professional players we have consulted on the topic. They prefer .1 to .0 for premove. This is also what other chess servers do."[1]

It's super annoying and the reason I only play blitz+ on chesscom.

[1]https://www.chess.com/forum/view/help-support/mate-in-one-qu...

1 more reply

KolmogorovComp1y ago

That would introduce other issues I think. Since premove are cancellable/changeable, what happen if you changed at the very last moment but due to delay it did not reached the server in time?

ycombinete1y ago

This is how it works on Lichess

bongodongobob1y ago

I can't play bullet on chess.com for this reason. Lost way too many games on "time" even though I had a second or two on the clock. Incredibly frustrating.

mkagenius1y ago

Vladimir Kramnik agrees with your observations about chesscom.

tkahlrt1y ago

Yes, he had timing problems in an online tournament on chess.com (against a Mexican GM in the same room) where his computer did not have all Windows updates and/or the timezone was wrong.

chess.com confirmed the issue.

chongli1y ago

I'm surprised to see anyone bring him up here!

1 more reply

nih1y ago

Interesting

galkk1y ago

So essentially lichess chose StackOverflow approach - (rather) beefy servers, instead of "treating them like a cattle".

Interesting that they accumulate and periodically store game state. Unfortunately it is not very clear, where they store ongoing game state - in redis or on server itself. Also cost breakdown doesn't have server for redis, only for DB.

BTW, their github has better architectural picture, than overly simplified one in the article: https://raw.githubusercontent.com/lichess-org/lila/master/pu.... Unfortunately, I'm afraid, drawing something like that during interview may not land a job at faang =(

Note that they have cost per game fairly low: $0.00027, 3,671 games per dollar.

Their cost breakdown, for ones who are curious https://docs.google.com/spreadsheets/d/1Si3PMUJGR9KrpE5lngSk...

p.s. I'm not saying that Lichess's approach is the best or faang is the worst. Remember, lichess had 10 hours outage exactly because of the architecture chosen (single datacenter dependency). https://lichess.org/@/Lichess/blog/post-mortem-of-our-longes... . And outages like that are exactly the reasons why multi-datacenter and multi-region architectures are drilled down into faang engineers.

My point is is that there are cases when this approach is legit, but typical interview is laser focused on different things, and most probably won't appreciate the "old style" approach to the problem. I'm sure that if Thibault will ever decide to land in faang he will neither do whiteboard coding nor system design.

winrid1y ago

The downtime here is mostly OVH's fault. They're not known for fast support on hardware failures, that's why they're cheap. If they had this architecture on AWS EC2 and could just spin up a new AMI, then they'd only have a few minutes downtime, and the same simple architecture.

juujian1y ago

I remember Meta having a few outages of their own. And outlook as well. So I'm not sure what to think now. But sure, on paper FAANG is redundant and hence better.

xmprt1y ago

In my experience, issues scale exponentially with scale. So handling 10x the traffic might mean 100x the potentially issues. Redundancy helps with that so when something inevitably fails, the architecture is able to automatically recover and the end user doesn't see any degradation. So what works for lichess wouldn't work for Meta.

benediktwerner1y ago

Redid runs on the main server, where lila runs, as indicated in the diagram you linked. And moves are buffered in lila. Redis is only used for pub-sub.

jeanlucas1y ago

roughtly 3600 games per dollar? I have over 30k games... Time to pay up

epolanski1y ago

> Unfortunately, I'm afraid, drawing something like that during interview may not land a job at faang =(

Yet another reason to be skeptical of the quality of hiring in faang if anything.

immibis1y ago

Why feel anything about it at all? You work at FAANG: be glad for the money or quit if there isn't any. You don't work at FAANG: bad hiring makes it easier for you to get hired and make money.

2 more replies

perihelions1y ago

- "While these moves could be calculated client-side, providing them server-side ensures consistency - especially for complex or esoteric chess variants - and optimizes performance on clients with limited processing capabilities or energy restrictions."

Just a wild guess: might be intended to lower the implementation barrier for new open-source software clients on new platforms, and/or preempt them from implementing subtle logic bugs that only show up much later.

The rules of chess are a bit tedious to implement, and you can easily get tired and code an edge-case bug that's almost invisible. Lichess itself did this—it once had a logic error that affected a very tiny number (exactly 7) of games,

https://github.com/lichess-org/database/issues/23 ("Before 2015: Some games with illegal moves were recorded")

(I apologize I couldn't find the specific patch that fixed this)

xmprt1y ago

For those curious about the illegal move, it seems like it's allowing queen side castling through the king side rook (or vice versa). eg. if this is the first rank, R _ _ R K _ _ _, then you could make the move O-O-O and end up with _ _ _ R K _ _ _

Naturally, it's not possible to view this move anymore, but this game (https://lichess.org/XDQeUk6j#48) has everything up until the last legal move right before the illegal castling happened.

ARandumGuy1y ago

I can see why that only appeared in 7 games. It's pretty rare to see a rook in between a king and another rook that are otherwise legally able to castle. Even rarer for someone to get into that position and actually try to castle.

Also that linked game is pretty entertaining. It's not a good game, but it can be fun watching lower ranked players make moves that you'd never see in higher level games. Like, who plays Bb5+ against the Scandinavian? Amazing stuff.

complexworld1y ago

Wouldn't the bug with queen side castling end up with _ _ K R _ _ _ _?

adamisom1y ago

Wow it just ate the rook huh?

1 more reply

pfedak1y ago

This looks like the relevant fix: https://github.com/lichess-org/scalachess/pull/154

(the broken code checked that the only pieces on the king's path to its new position were kings and rooks of the appropriate color)

ARandumGuy1y ago

Another wild guess: Lichess could be pre-calculating and caching the legal moves for the most common chess positions. While pre-calculating every possible legal move for every position would be impossible, you could pre-calculate the most common openings and endgames, which could cover a lot of real-world positions. This cache could easily be larger then practical for the client, but a server could hold onto it no problem. This could save on the net processing time, compared to the client determining all legal moves for every position.

Sesse__1y ago

Given that a good chess move generator will work in way less than a microsecond (TBH, probably even less than taking a DRAM lookup for a large hash table), and most chess positions have never been seen before, having a cache sounds counterproductive.

epcoa1y ago

> and/or preempt them from implementing subtle logic bugs that only show up much later.

Validating a submitted move is distinct from listing valid moves. I assumed the server would need to validate regardless of providing a list to the client.

perihelions1y ago

It's still duplicated work, and clients are likely to get it wrong and create more work for both devs.

benediktwerner1y ago

From what I remember, one of the main reason also was to avoid bloating the JS on the game page. That page is kept especially slim to maximize performance and load times for low-powered devices.

ngcc_hk1y ago

Great!

A bit of surprise consideration … is that even common in these days of overfancy web sites.

hyperhopper1y ago

I wish the article explained how it dealt with message loss from the at-most-once redis pub/sub channel

benatkin1y ago

Indeed, it does deal with the message loss. I was momentarily confused because in my many thousands of bullet chess games on Lichess I haven't had much of any message loss that can be attributed to Lichess's servers (but plenty when my Internet connection is down or unstable).

I will have to take a look, because whatever it's doing, it works very well!

crabmusket1y ago

The at-most-once delivery could be an issue if lichess's backend services (lila or lila-ws) crash. Presumably this a rare enough occurrence that message loss is more of a theoretical concern.

MathMonkeyMan1y ago

I have no idea, but the in-house pub/sub tech at a previous job used [PGM][1] together with some hand-written brokers and a client library. The overall delivery guarantee is at-most-once, but in over ten years and across tens of thousands of machines in multiple datacenters, they never saw a single dropped message. Not sure how they measured that, but I was told the measurements were accurate.

Well, except for that one major outage where everything shit the bed due to some misconfiguration of IP multicast in the datacenters, or so I was told.

So, maybe if your mission isn't life critical, you can just wrongfully assume exactly-once delivery.

[1]: https://en.wikipedia.org/wiki/Pragmatic_General_Multicast

DylanSp1y ago

I was hoping for that too, that's the kind of interesting architectural question I wanted this article to answer.

d4rti1y ago

I suspect the “l” parameter is for observed latency as the client displays observed latency from the server.

lxgr1y ago

Lichess also compensates for latency to some extent.

To do that, the server needs some measure of “how long does the client think the player actually took to make a move”, to later subtract latency not attributable to actual thinking from the clock.

zxilly1y ago

I wonder why this protocol needs an ack? a websocket wrapped in a tls should be perfectly capable of guaranteeing the integrity of the message

parl_match1y ago

That just means that the message hit the TLS terminator. It doesn't mean that the backend logic received the state change.

andai1y ago

You can verify this with ten lines of code and clumsy (a tool for simulating packet loss).

I tried this and not all the messages I sent arrived.

enneff1y ago

What do you mean? If you open a web socket connection it should behave like a normal TCP connection. All sent data guaranteed to be delivered complete and in order, unless the connection fails.

2 more replies

augusto-moura1y ago

Maybe authorization, illegal moves? Don't know the full protocol to know how they handle edge cases. They might just return a NACK

enneff1y ago

So that the client knows the message has been delivered and handled by the server, which can make the UI indicate the state of the connection.

burgerquizz1y ago

how would you protect your websocket server? I am building a game, but when I put the domain behind (free plan) cloudflare, I get latency delay (3x slower) on the players events.

Saw CF had some paying solution, but was wondering about a free solution

NathanFlurry1y ago

I've been managing game servers that get attacked on a daily basis for almost a decade. I've tried Cloudflare a few times (on their business plan) and seen poor results every time.

Cloudflare has a lower latency product called Argo Smart Routing [1]. When we tried Argo in 2020, we still saw 10+ ms increased latency across the board, which is unacceptable for competitive multiplayer games. That said, Discord voice still (or used to) uses Argo for voice, so there are certainly less latency-sensitive games where it would work well.

The other issue with sockets over Cloudflare (circa 2020 on business plan) is they get terminate liberally with the assumption you have a reconnection mechanism in place. I'd imagine this is acceptable for traditional WebSocket use cases, but not for games.

Services like OVH & Vultr also advertise "DDoS protection for games," but I've found these to be pretty useless in practice. We can only measure traffic that reaches our game servers, so I have no way of knowing if they're actually helping at all.

Your best bet is getting familiar with iptables and fine-tuning rules to match your game's traffic patterns. Thankfully, LLMs are pretty good at generating these rules for you nowadays if you're not already familiar with these tools. Make sure to set up something like node-exporter to be able to monitor attacks and understand where things go wrong. There have been a few other posts on HN in the past that go into more depth about game server DoS mitigation [2] [3].

I built something in the same vein for my startup (Apache 2.0 OSS, steal our code!) [4] that runs a series of load balancers in front of game servers in order to act like a mini-Cloudflare. In addition to the basics I already listed, we also have logic under the hood that (a) dynamically routes traffic to load balancers and (b) autoscales hardware based on traffic in order to absorb attacks. We're rolling out a dynamic bot attack & mitigation mechanism soon to handle more complex patterns.

[1] https://www.cloudflare.com/application-services/products/arg...

[2] https://news.ycombinator.com/item?id=35771466

[3] https://news.ycombinator.com/item?id=28675094

[4] https://github.com/rivet-gg/rivet

immibis1y ago

As I understand, the separation between Lila and Lila-ws is primarily for fault isolation rather than independent scaling. Maybe independent scaling becomes useful if websocket overhead exceeds what one machine can handle.

jackcviers31y ago

And scalachess is written in scala, to piggyback off a post earlier this month that claimed the language is dead. The project is very successful and has been around and maintained for years.

valenterry1y ago

If all the Rust people knew how nice Scala 3 as a language is... they would be surprised.

What still isn't great is the ecosystem and the build-tooling compared to Rust (part of it because of the JVM). But just language-wise, it basically has all the goodies of Rust and much more. Ofc. it's easier for Scala to have that because it does not have to balance against zero-overhead abstraction like Rust does.

Still, Scala was hyped at some point (and I find it wasn't justified). But now, the language is actually one if not the best of very-high-level-languages that is used in production and not just academic. It's kind of sad to see, that it does not receive more traction, but it does not have the marketing budget of, say, golang.

ackfoobar1y ago

I think the incompatibilities burned a lot of the good will. I'm very fluent in Scala 2, but I will avoid Scala if I can, mostly to stay away from purely functional programmers.

> all the goodies of Rust

Does it prevent me from using a non-thread-safe object in multiple threads? Or storing a given object which is no longer valid after the call ends?

Does it have a unified error handling culture? In Scala some prefer exceptions (with or without `using CanThrow`), some prefer the `Either` (`Result`) type.

Does it have named destructuring?

1 more reply

kriiuuu1y ago

https://bleep.build is a very promising tool for building Scala projects. I like it more than I like cargo

1 more reply

huins1y ago

> - l: Probably some length?

I don't understand why the author didn't just look this up in the source code. Lichess is open source and we can see exactly what this field is here, it's the average lag:

https://github.com/lichess-org/lila/blob/45b5f0cfbbf6c045ad7...

  send = (t: string, d: any, o: any = {}, noRetry = false): void => {
    const msg: Partial<MsgOut> = { t };
    if (d !== undefined) {
      if (o.withLag) d.l = Math.round(this.averageLag);
      if (o.millis >= 0) d.s = Math.round(o.millis * 0.1).toString(36);
      msg.d = d;
    }
    if (o.ackable) {
      msg.d = msg.d || {}; // can't ack message without data
      this.ackable.register(t, msg.d); // adds d.a, the ack ID we expect to get back
    }

    const message = JSON.stringify(msg);
    ...

Which is calculated from how long the server takes to respond to ping messages that the client sends:

  private schedulePing = (delay: number): void => {
    clearTimeout(this.pingSchedule);
    this.pingSchedule = setTimeout(this.pingNow, delay);
  };

  private pingNow = (): void => {
    clearTimeout(this.pingSchedule);
    clearTimeout(this.connectSchedule);
    const pingData =
      this.options.isAuth && this.pongCount % 10 == 2
        ? JSON.stringify({
            t: 'p',
            l: Math.round(0.1 * this.averageLag),
          })
        : 'null';
    try {
      this.ws!.send(pingData);
      this.lastPingTime = performance.now();
    } catch (e) {
      this.debug(e, true);
    }
    this.scheduleConnect();
  };

  private computePingDelay = (): number => this.options.pingDelay + (this.options.idle ? 1000 : 0);

  private pong = (): void => {
    clearTimeout(this.connectSchedule);
    this.schedulePing(this.computePingDelay());
    const currentLag = Math.min(performance.now() - this.lastPingTime, 10000);
    this.pongCount++;

    // Average first 4 pings, then switch to decaying average.
    const mix = this.pongCount > 4 ? 0.1 : 1 / this.pongCount;
    this.averageLag += mix * (currentLag - this.averageLag);

    pubsub.emit('socket.lag', this.averageLag);
    this.updateStats(currentLag);
  };

stevage1y ago

To be fair, the author already put tons of work into this post. Don't begrudge them for not doing even more.

huins1y ago

I don't begrudge the author, I'm just surprised given the otherwise high quality of the analysis, including him looking at other parts of the source code.

ruereed1y ago

what actually happens when i make a move is someone takes my piece

evrydayhustling1y ago

It seems shocking to me that the server enumerates and transmits all legal next-moves. I get that there could be chess variants with server side information, but the article also says it might be good for constrained clients. Is it really cheaper to read moves off a serialized interface than to compute them client side??

jdthedisciple1y ago

pretty sure computing moves is in NP so probably yep

evrydayhustling1y ago

Nope, finite number of pieces and finite number of viable moves to check on each. Not sure what you're thinking of, but the entire concept of complexity class only applies if there is some axis of scaling (n-size chess board?).

1 more reply

bobmcnamara1y ago

nit: fen only encodes board state, not game state

Edit: also includes move count but not repetition.

xrisk1y ago

How is the game state not just the board state? Move history doesn’t matter in chess (FEN encodes the 50 move rule)

andrewaylett1y ago

Per Wikipedia, it doesn't encode the threefold repetition rule.

https://en.wikipedia.org/wiki/Forsyth%E2%80%93Edwards_Notati...

anamexis1y ago

Indeed, the 50 move rule, as well as castling rights, whose move it is, and whether any pawns are currently eligible for en passant.

kzrdude1y ago

Unfortunately move history does matter

michaelmarkell1y ago

Timing of moves

shironandon1y ago

what happens to those websocket connections when the API is updated or redeployed?

paxys1y ago

It's pretty easy to build auto reconnect capability in the client. The server will drop all its connections and go out of rotation, and the client will start a new connection and find the new one. If the switch happens fast enough then the user shouldn't even notice.

conover1y ago

Along with the reconnect solution already mentioned, you can also decouple your Websocket and business logic layers using something like Pushpin: https://pushpin.org/. This allows you to deploy your business logic layer without disconnecting/reconnecting clients.

zazaulola1y ago

It is to be expected that LLM will make a decision on its own if it suspects any changes to the API. In any case, there is no time to fix the code during the game.

VoidWhisperer1y ago

They werent talking about an LLM here

sam0x171y ago

20 years later I still think "female lich" whenever I see the word lichess, even though I know it's li chess.

krisoft1y ago

One day I, if I find the time for the pun, i really want to sculpt a chess set where the black pieces are all undead necromancer wizards and the white pieces are all asian fruits with rough-skin. That way we can have a game of lychees vs liches on lichess.

AlienRobot1y ago

When you promote a pawn to queen that's actually the lichess.

Suppafly1y ago

makes me think of the Asian fruit.

Keyframe1y ago

there are more of us then!

blastro1y ago

lichess is one of the best sites on the internet. very happy to contribute my $5/mo

hilux1y ago

Hello, fellow Patron!

Even though nowadays I hardly have time to play, I'm still happy to support such a delightfully honorable and usable(!) open-source project.

dankwizard1y ago

People love mentioning that they donate to LiChess.

It's a weird trend. Altruism truly does not exist

(I donated btw) (Probably more than you) (But who's counting)

1 more reply

trod1231y ago

If you consider this to be true, you would seem to have a rather low standard.

There are many aspects in which they are not the best.

dibyadarshan1y ago

Like?

Ad-free, compute intensive, non-CRUD, massively scaled, complex cheat moderation, infinite puzzles/analysis, educational (studies/tactics/openings explorer), etc. All this for free. I'm curious what's the best website in your opinion

1 more reply

ilrwbwrkhv1y ago

Beautiful architecture. Startups and companies like Netflix should learn from this instead of cargo culting microservices.

enneff1y ago

And what exactly do you think lila, lila-ws, and redis are if not microservices (or as they should be called, “services”)? Lichess could easily be implemented as a single monolithic process but it is not.

immibis1y ago

They are services, but not micro. lila-ws spun off of Lila for a good reason (fault isolation) and not because "let's make everything a service". And they don't follow any standard microservice pattern - a reverse proxy isn't a microservice.

1 more reply

ajkjk1y ago

What? Do you have some reason to think Netflix's architecture is deficient?

paxys1y ago

Because the top 5 comments on HN always say so, so it must be true.

ilrwbwrkhv1y ago

Overly complicated with microservices. Can be made 10x simpler.

2 more replies

j / k navigate · click thread line to collapse

158 comments

MobileVet1y ago

This could also be due to general poor network code as well. The number of errors I get during puzzles is also frustrating. Do they really not retry a send automatically?? <breath>

Chess.com has the brand and the names... but dang, the tech feels SO rough to me.

bluecalm1y ago

This is one of the many, many things but imo it's the most telling. They can't even add a clock counting down the 6 minutes to their web client.

luisgvv1y ago

I can't believe this, but it makes sense now lol I think I heard a streamer say it was for kicking out the cheaters

1 more reply

pshc1y ago

> it feels like the SERVER is tracking the time

TBH this is what I expected for all online chess. How else to reconcile the two players' differing clocks and also prevent client-side cheating?

MobileVet1y ago

I guess my naive frustration comes from crazy fps games tracking things so precisely and yet somehow Chess.com can’t handle a turn based game?! Honestly.

I do recognize that fps games utilize predictive algorithms and planning to estimate future player positions but still, turn based networking with 100ms accuracy should be a solved problem

2 more replies

nightowl_games1y ago

1 more reply

palata1y ago

> How else to reconcile the two players' differing clocks and also prevent client-side cheating?

Is there a point in preventing cheating, really? I can just make a bot...

MichaelZuo1y ago

It hasn’t been done client side in any pvp game I’ve heard of.

stevage1y ago

I'm pretty sure freechess.org did.

1 more reply

bongodongobob1y ago

Track the two clients pings? What client side cheating prevention would you need to do in chess? Afaik you can't cheat by clipping through walls or jumping around on the map.

2 more replies

pengowray1y ago

> they show the time in seconds which can't be right

Seems right.

Lichess follows the specification to the letter, and as it only technically allows one-second accuracy, lichess only record moves with one-second accuracy. It seems insane, but that's how they do it.

lichess PGN export example:

> 1. d3 { [%eval -0.15] [%clk 0:01:00] } 1... g6 { [%eval 0.04] [%clk 0:01:00] }

Chess.com PGN export example:

> 1. d4 {[%clk 0:02:58.6]} 1... b6 {[%clk 0:02:59.2]}

kibwen1y ago

> lichess only record moves with one-second accuracy

According to this blog post, this doesn't appear to be the case since at least 2017:

https://lichess.org/@/lichess/blog/a-better-game-clock-histo...

1 more reply

Scene_Cast21y ago

What I'd love is for my pre-moves to be sent to the server immediately so I don't time out when I pre-moved.

fbernier1y ago

y-curious1y ago

It's super annoying and the reason I only play blitz+ on chesscom.

[1]https://www.chess.com/forum/view/help-support/mate-in-one-qu...

1 more reply

KolmogorovComp1y ago

That would introduce other issues I think. Since premove are cancellable/changeable, what happen if you changed at the very last moment but due to delay it did not reached the server in time?

ycombinete1y ago

This is how it works on Lichess

bongodongobob1y ago

I can't play bullet on chess.com for this reason. Lost way too many games on "time" even though I had a second or two on the clock. Incredibly frustrating.

mkagenius1y ago

Vladimir Kramnik agrees with your observations about chesscom.

tkahlrt1y ago

Yes, he had timing problems in an online tournament on chess.com (against a Mexican GM in the same room) where his computer did not have all Windows updates and/or the timezone was wrong.

chess.com confirmed the issue.

chongli1y ago

I'm surprised to see anyone bring him up here!

1 more reply

nih1y ago

Interesting

galkk1y ago

So essentially lichess chose StackOverflow approach - (rather) beefy servers, instead of "treating them like a cattle".

Note that they have cost per game fairly low: $0.00027, 3,671 games per dollar.

Their cost breakdown, for ones who are curious https://docs.google.com/spreadsheets/d/1Si3PMUJGR9KrpE5lngSk...

winrid1y ago

juujian1y ago

I remember Meta having a few outages of their own. And outlook as well. So I'm not sure what to think now. But sure, on paper FAANG is redundant and hence better.

xmprt1y ago

benediktwerner1y ago

Redid runs on the main server, where lila runs, as indicated in the diagram you linked. And moves are buffered in lila. Redis is only used for pub-sub.

jeanlucas1y ago

roughtly 3600 games per dollar? I have over 30k games... Time to pay up

epolanski1y ago

> Unfortunately, I'm afraid, drawing something like that during interview may not land a job at faang =(

Yet another reason to be skeptical of the quality of hiring in faang if anything.

immibis1y ago

Why feel anything about it at all? You work at FAANG: be glad for the money or quit if there isn't any. You don't work at FAANG: bad hiring makes it easier for you to get hired and make money.

2 more replies

perihelions1y ago

https://github.com/lichess-org/database/issues/23 ("Before 2015: Some games with illegal moves were recorded")

(I apologize I couldn't find the specific patch that fixed this)

xmprt1y ago

Naturally, it's not possible to view this move anymore, but this game (https://lichess.org/XDQeUk6j#48) has everything up until the last legal move right before the illegal castling happened.

ARandumGuy1y ago

complexworld1y ago

Wouldn't the bug with queen side castling end up with _ _ K R _ _ _ _?

adamisom1y ago

Wow it just ate the rook huh?

1 more reply

pfedak1y ago

This looks like the relevant fix: https://github.com/lichess-org/scalachess/pull/154

(the broken code checked that the only pieces on the king's path to its new position were kings and rooks of the appropriate color)

ARandumGuy1y ago

Sesse__1y ago

epcoa1y ago

> and/or preempt them from implementing subtle logic bugs that only show up much later.

Validating a submitted move is distinct from listing valid moves. I assumed the server would need to validate regardless of providing a list to the client.

perihelions1y ago

It's still duplicated work, and clients are likely to get it wrong and create more work for both devs.

benediktwerner1y ago

From what I remember, one of the main reason also was to avoid bloating the JS on the game page. That page is kept especially slim to maximize performance and load times for low-powered devices.

ngcc_hk1y ago

Great!

A bit of surprise consideration … is that even common in these days of overfancy web sites.

hyperhopper1y ago

I wish the article explained how it dealt with message loss from the at-most-once redis pub/sub channel

benatkin1y ago

I will have to take a look, because whatever it's doing, it works very well!

crabmusket1y ago

The at-most-once delivery could be an issue if lichess's backend services (lila or lila-ws) crash. Presumably this a rare enough occurrence that message loss is more of a theoretical concern.

MathMonkeyMan1y ago

Well, except for that one major outage where everything shit the bed due to some misconfiguration of IP multicast in the datacenters, or so I was told.

So, maybe if your mission isn't life critical, you can just wrongfully assume exactly-once delivery.

[1]: https://en.wikipedia.org/wiki/Pragmatic_General_Multicast

DylanSp1y ago

I was hoping for that too, that's the kind of interesting architectural question I wanted this article to answer.

d4rti1y ago

I suspect the “l” parameter is for observed latency as the client displays observed latency from the server.

lxgr1y ago

Lichess also compensates for latency to some extent.

zxilly1y ago

I wonder why this protocol needs an ack? a websocket wrapped in a tls should be perfectly capable of guaranteeing the integrity of the message

parl_match1y ago

That just means that the message hit the TLS terminator. It doesn't mean that the backend logic received the state change.

andai1y ago

You can verify this with ten lines of code and clumsy (a tool for simulating packet loss).

I tried this and not all the messages I sent arrived.

enneff1y ago

What do you mean? If you open a web socket connection it should behave like a normal TCP connection. All sent data guaranteed to be delivered complete and in order, unless the connection fails.

2 more replies

augusto-moura1y ago

Maybe authorization, illegal moves? Don't know the full protocol to know how they handle edge cases. They might just return a NACK

enneff1y ago

So that the client knows the message has been delivered and handled by the server, which can make the UI indicate the state of the connection.

burgerquizz1y ago

how would you protect your websocket server? I am building a game, but when I put the domain behind (free plan) cloudflare, I get latency delay (3x slower) on the players events.

Saw CF had some paying solution, but was wondering about a free solution

NathanFlurry1y ago

I've been managing game servers that get attacked on a daily basis for almost a decade. I've tried Cloudflare a few times (on their business plan) and seen poor results every time.

[1] https://www.cloudflare.com/application-services/products/arg...

[2] https://news.ycombinator.com/item?id=35771466

[3] https://news.ycombinator.com/item?id=28675094

[4] https://github.com/rivet-gg/rivet

immibis1y ago

jackcviers31y ago

And scalachess is written in scala, to piggyback off a post earlier this month that claimed the language is dead. The project is very successful and has been around and maintained for years.

valenterry1y ago

If all the Rust people knew how nice Scala 3 as a language is... they would be surprised.

ackfoobar1y ago

I think the incompatibilities burned a lot of the good will. I'm very fluent in Scala 2, but I will avoid Scala if I can, mostly to stay away from purely functional programmers.

> all the goodies of Rust

Does it prevent me from using a non-thread-safe object in multiple threads? Or storing a given object which is no longer valid after the call ends?

Does it have a unified error handling culture? In Scala some prefer exceptions (with or without `using CanThrow`), some prefer the `Either` (`Result`) type.

Does it have named destructuring?

1 more reply

kriiuuu1y ago

https://bleep.build is a very promising tool for building Scala projects. I like it more than I like cargo

1 more reply

huins1y ago

> - l: Probably some length?

I don't understand why the author didn't just look this up in the source code. Lichess is open source and we can see exactly what this field is here, it's the average lag:

https://github.com/lichess-org/lila/blob/45b5f0cfbbf6c045ad7...

  send = (t: string, d: any, o: any = {}, noRetry = false): void => {
    const msg: Partial<MsgOut> = { t };
    if (d !== undefined) {
      if (o.withLag) d.l = Math.round(this.averageLag);
      if (o.millis >= 0) d.s = Math.round(o.millis * 0.1).toString(36);
      msg.d = d;
    }
    if (o.ackable) {
      msg.d = msg.d || {}; // can't ack message without data
      this.ackable.register(t, msg.d); // adds d.a, the ack ID we expect to get back
    }

    const message = JSON.stringify(msg);
    ...

Which is calculated from how long the server takes to respond to ping messages that the client sends:

  private schedulePing = (delay: number): void => {
    clearTimeout(this.pingSchedule);
    this.pingSchedule = setTimeout(this.pingNow, delay);
  };

  private pingNow = (): void => {
    clearTimeout(this.pingSchedule);
    clearTimeout(this.connectSchedule);
    const pingData =
      this.options.isAuth && this.pongCount % 10 == 2
        ? JSON.stringify({
            t: 'p',
            l: Math.round(0.1 * this.averageLag),
          })
        : 'null';
    try {
      this.ws!.send(pingData);
      this.lastPingTime = performance.now();
    } catch (e) {
      this.debug(e, true);
    }
    this.scheduleConnect();
  };

  private computePingDelay = (): number => this.options.pingDelay + (this.options.idle ? 1000 : 0);

  private pong = (): void => {
    clearTimeout(this.connectSchedule);
    this.schedulePing(this.computePingDelay());
    const currentLag = Math.min(performance.now() - this.lastPingTime, 10000);
    this.pongCount++;

    // Average first 4 pings, then switch to decaying average.
    const mix = this.pongCount > 4 ? 0.1 : 1 / this.pongCount;
    this.averageLag += mix * (currentLag - this.averageLag);

    pubsub.emit('socket.lag', this.averageLag);
    this.updateStats(currentLag);
  };

stevage1y ago

To be fair, the author already put tons of work into this post. Don't begrudge them for not doing even more.

huins1y ago

I don't begrudge the author, I'm just surprised given the otherwise high quality of the analysis, including him looking at other parts of the source code.

ruereed1y ago

what actually happens when i make a move is someone takes my piece

evrydayhustling1y ago

jdthedisciple1y ago

pretty sure computing moves is in NP so probably yep

evrydayhustling1y ago

1 more reply

bobmcnamara1y ago

nit: fen only encodes board state, not game state

Edit: also includes move count but not repetition.

xrisk1y ago

How is the game state not just the board state? Move history doesn’t matter in chess (FEN encodes the 50 move rule)

andrewaylett1y ago

Per Wikipedia, it doesn't encode the threefold repetition rule.

https://en.wikipedia.org/wiki/Forsyth%E2%80%93Edwards_Notati...

anamexis1y ago

Indeed, the 50 move rule, as well as castling rights, whose move it is, and whether any pawns are currently eligible for en passant.

kzrdude1y ago

Unfortunately move history does matter

michaelmarkell1y ago

Timing of moves

shironandon1y ago

what happens to those websocket connections when the API is updated or redeployed?

paxys1y ago

conover1y ago

zazaulola1y ago

It is to be expected that LLM will make a decision on its own if it suspects any changes to the API. In any case, there is no time to fix the code during the game.

VoidWhisperer1y ago

They werent talking about an LLM here

sam0x171y ago

20 years later I still think "female lich" whenever I see the word lichess, even though I know it's li chess.

krisoft1y ago

AlienRobot1y ago

When you promote a pawn to queen that's actually the lichess.

Suppafly1y ago

makes me think of the Asian fruit.

Keyframe1y ago

there are more of us then!

blastro1y ago