I've always said that with infinite money we could get 100% uptime, but no one has infinite money. Trading firms are about as close as I can imagine to infinite money though.
I work with a major one and, being honest, from day one it was obvious they were incompetent. They employ a huge number of engineers and are unable to deliver basic features at any reasonable pace. Not even remotely close to it either (as in: you ask them to do something, they say yes, execs say yes, you get a deadline, date comes...deployment difficulties, environment not working, run around goes on and on forever).
I remember the CEO got on a call with us at the start and was slapping himself on the back saying they had no downtime...because they were able to do maintenance when markets shut (and have heard very bad things about how that goes). But it is 24/7 world now, our service is up 24/7 and, of course, this led to massive issues in time due to the very different expectations around delivery/quality. Our execs were impressed, our engineers said this was a bad sign. And, ofc, it transpired that they were total amateurs (to be clear, this is one of the biggest exchanges in the world) and were unable to deliver.
To come back to my original statement: there is a company of 16 people total who is, from the point of view of customers, delivering features faster. It is difficult to understate how insane that is.
It depends what you mean by easy. Even if you are using a slow chain, you still have to compete for finite block space, you still have to work out how to risk/matching fast, etc.
With chains built for exchange use, operating them easier, that is why they don't require thousands of engineers. But the actual technical capability of the system is significantly in excess of tradfi exchanges. For example, risk function is real-time on-chain as opposed to EoD settlement. This significantly changes the possible feature set. Once you have built it, it is very easy...the question is why big exchanges rely so heavily on eod processes? The answer is: they are bad at engineering.
If done right, it would be a complete separate system. Separate IP addresses and all.
I image you'd have to use shadow execution, where you roll out a full second copy, run every transaction through both, and compare the results. And then, only after a certain time, switch traffic to the new infra and tear down the old.
But you would need a ton of extra hardware (more than double) and a lot of ways to keep data in sync. And of course if you put an LLM or other non-deterministic system in there, that's a whole other can of worms.
Like I said, a fun problem to solve. :)
I couldn’t do it. I like infra and all but it’s just not my cup of tea. Def true that in a trading pov the trade must be executed. It must settle. It must work. Or capital flight will be huge.