Corrosion | Better HN

105 comments

55 comments · 20 top-level

throwaway2908mo ago· 10 in thread

I guess all designers at fly were replaced by ai because this article is using gray bold font for the whole text. I remember these guys had good blog some time ago

tptacek8mo ago

The design hasn't changed in years. If someone has a screenshot and a browser version we can try to figure out why it's coming out fucky for you.

kg8mo ago

Looking at the css, there's a .text-gray-600 CSS style that would cause this, and it's overridden by some other style in order to achieve the actual desired appearance. Maybe the override style isn't loading - perhaps the GP has javascript disabled?

dewey8mo ago

Not sure if that was changed since then, but it's not bold for me and also readable. Maybe browser rendering?

throwaway2908mo ago

stock safari on ios

and I think the intended webfont is loaded because the font is clearly weird ish and non-standard and the text is invisible for good 2 seconds at first while it loads:)

ceigey8mo ago

Also not bold for me (Safari). Variable font rendering issue?

mcny8mo ago

Please try the article mode in your web browser. Firefox has a pretty good one but I understand all major browsers have this now.

throwaway2908mo ago

I only use article mode in exceptional cases. I hold fly to higher standard than that.

jjtheblunt8mo ago

latest macos firefox and safari both show grey on white, legible but contrast somewhat lacking, but rendered properly for grey on white.

foofoo128mo ago

It's totally unreadable.

davidham8mo ago

Looks like it always has, to me.

bananapub8mo ago· 4 in thread

in case people don't read all the way to the end, the important takeaway is "you simply can't afford to do instant global state distribution" - you can formal method and Rust and test and watchdog yourself as much as you want, but you simply have to stop doing that or the unknown unknowns will just keep taking you down.

tptacek8mo ago

I mean, the thing we're saying is that instant global state with database-style consensus is unworkable. Instant state distribution though is kind of just... necessary? for a platform like ours. You bring up an app in Europe, proxies in Asia need to know about it to route to it. So you say, "ok, well, they can wait a minute to learn about the app, not the end of the world". Now: that same European instance goes down. Proxies in Asia need to know about that, right away, and this time you can't afford to wait.

vlovich1238mo ago

> Now: that same European instance goes down. Proxies in Asia need to know about that, right away, and this time you can't afford to wait.

But they have to. Physically no solution will be instantaneous because that’s not how the speed of light nor relativity works - even two events next to each other cannot find out about each other instantaneously. So then the question is “how long can I wait for this information”. And that’s the part that I feel isn’t answered - eg if the app dies, the TCP connections die and in theory that information travels as quickly as anything else you send. It’s not reliably detectable but conceivably you could have an eBPF program monitoring death and notifying the proxies. Thats the part that’s really not explained in the article which is why you need to maintain an eventually consistent view of the connectivity. I get maybe why that could be useful but noticing app connectivity death seems wrong considering I believe you’re more tracking machine and cluster health right? Ie not noticing an app instance goes down but noticing all app instances on a given machine are gone and consensus deciding globally where the new app instance will be as quickly as possible?

bananapub7mo ago

apologies for misinterpreting it. that said, I would be very interested if in a couple of years you write a followup post about whether you have found that global ~instantaneous state is workable under the right circumstances or not.

__turbobrew__8mo ago

> Proxies in Asia need to know about that, right away, and this time you can't afford to wait.

Did you ever consider envoy xDS?

There are a lot of really cool things in envoy like outlier detection, circuit breakers, load shedding, etc…

LennyHenrysNuts8mo ago· 4 in thread

I left that site after reading the first half of the first line. Transmogrifies, indeed.

andrethegiant8mo ago

What’s wrong with it? It’s a great word

adastra228mo ago

You need to read the classics.

IAmGraydon8mo ago

That says more about you than the site.

LennyHenrysNuts8mo ago

Not really, I have an excellent vocabulary, I'm just averse to those who neglect to eschew obfuscation.

anentropic8mo ago· 3 in thread

blog posts should have a date at the top

chrisweekly8mo ago

YES. THIS. ALWAYS!

Huge pet peeve. At least this one has a date somewhere (at the bottom, "last updated Oct 22, 2025").

anentropic8mo ago

Saw that at the bottom, but I more want to know when it was originally published

LtdJorge8mo ago

So frustrating when they don’t

jimmyl028mo ago· 3 in thread

always wondered at what scale gossip / SWIM breaks down and you need a hierarchy / partitioning. fly's use of corrosion seems to imply it's good enough for a single region which is pretty surprising because iirc Uber's ringpop was said to face problems at around 3K nodes.

it would be super cool to learn more about how the world's largest gossip systems work :)

tptacek8mo ago

SWIM is probably going to scale pretty much indefinitely. The issue we have with a single global SWIM broadcast domain isn't that the scale is breaking down; it's just that the blast radius for bugs (both in Corrosion itself, and in the services that depend on Corrosion) is too big.

We're actually keeping the global Corrosion cluster! We're just stripping most of the data out of it.

chucky_z8mo ago

Back of napkin math I’ve done previously, it breaks down around 2 million members with Hashicorps defaults. The defaults are quite aggressive though and if you can tolerate seconds of latency (called out in the article) you could reach billions without a lot of trouble.

tptacek8mo ago

It's also frequency of changes and granularity of state, when sizing workloads. My understanding is that most Hashi shops would federate workloads of our size/global distribution; it would be weird to try to run one big cluster to capture everything.

kflansburg8mo ago· 2 in thread

> an if let expression over an RWLock assumed (reasonably, but incorrectly) in its else branch that the lock had been released. Instant and virulently contagious deadlock.

I believe this behavior is changing in the 2024 edition: https://doc.rust-lang.org/edition-guide/rust-2024/temporary-...

kibwen8mo ago

> I believe this behavior is changing

Past tense, the 2024 edition stabilized in (and has been the default edition for `cargo new` since) Rust 1.85.

kflansburg8mo ago

Yes, I've already performed the upgrade for my projects, but since they hit this bug, I'm guessing they haven't.

blinkingled8mo ago· 2 in thread

> The bidding model is elegant, but it’s insufficient to route network requests. To allow an HTTP request in Tokyo to find the nearest instance in Sydney, we really do need some kind of global map of every app we host.

So is this a case of wanting to deliver a differentiating feature before the technical maturity is there and validated? It's an acceptable strategy if you are building a lesser product but if you are selling Public Cloud maybe having a better strategy than waiting for problems to crop up makes more sense? Consul, missing watchdogs, certificate expiry, CRDT back filling nullable columns - sure in a normal case these are not very unexpected or to-be-ashamed-of problems but for a product that claims to be Public Cloud you want to think of these things and address them before day 1. Cert expiry for example - you should be giving your users tools to never have a cert expire - not fixing it for your stuff after the fact! (Most CAs offer API to automate all this - no excuse for it.)

I don't mean to be dismissive or disrespectful, the problem is challenging and the work is great - merely thinking of loss of customer trust - people are never going to trust a new comer that has issues like this and for that reason move fast break things and fix when you find isn't a good fit for this kind of a product.

tptacek8mo ago

It's not a "differentiating feature"; it eliminated a scaling bottleneck. It's also a decision that long predates Corrosion.

blinkingled8mo ago

I was referring to the "HTTP request in Tokyo to find the nearest instance in Sydney" part which felt to me like a differentiating feature- no other cloud provider seems to have bidding or HTTP request level cross regional lookup or whatever.

The "decision that long predates Corrosion" is precisely the point I was trying to make - was it made too soon before understanding the ramifications and/or having a validated technical solution ready? IOW maybe the feature requiring the problem solution could have come later? (I don't know much about fly.io and its features, so apologies if some of this is unclear/wrongly assumes things.)

soamv8mo ago· 2 in thread

> New nullable columns are kryptonite to large Corrosion tables: cr-sqlite needs to backfill values for every row in the table

Is this a typo? Why does it backfill values for a nullable column?

ricardobeat8mo ago

It seems to be a quirk of cr-sqlite, it wants to keep track of clock values for the new column. It's not backfilling the field values as far as I understand. There is a comment mentioning it could be optimized away:

https://github.com/vlcn-io/cr-sqlite/blob/891fe9e0190dd20917...

andrewaylett8mo ago

I assume it would backfill values for any column, as a side-effect of propagating values for any column. But nullable columns are the only type you can add to a table that already contains rows, and mean that every row immediately has an update that needs to be sent.

mosura8mo ago· 2 in thread

Someone needs to read about ant colony optimization. https://en.wikipedia.org/wiki/Ant_colony_optimization_algori...

This blog is not impressive for an infra company.

tucnak8mo ago

I respect Fly, and it does sound like a nice place to work, but honestly, you're onto something. You would expect ostensibly Public Cloud provider to have a more solid grasp on networking. Instead, we're discovering how they're learning about things like OSPF!

Makes you think that's all.

tptacek8mo ago

What a weird thing to say. I wrote my first OSPF implementation in 1999. The point is that we noticed the solution we'd settled on owes more to protocols like OSPF than to distributed consensus databases, which are the mainstream solution to this problem. It's not "OMG we just discovered this neat protocol called OSPF". We don't actually run OSPF. We don't even do a graph->tree reduction. We're routing HTTP requests, not packets.

nodesocket8mo ago· 1 in thread

Anybody used rqlite[1] in production? I'm exploring how to make my application fault-tolerant using multiple app vm instances. The problem of course is the SQLite database on disk. Using a network file system like NFS is a no-go with SQLite (this includes Amazon Elastic File System (EFS)).

I was thinking I'll just have to bite the bullet and migrate to PostgreSQL, but perhaps rqlite can work.

[1] https://rqlite.io

otoolep8mo ago

rqlite creator here. Right there on the rqlite homepage[1] are listed two production users: replicated.com[2] and textgroove.com are both using it.

[1] https://rqlite.io/

[2] https://www.replicated.com/blog/app-manager-with-rqlite

isolay8mo ago· 1 in thread

Oh, not the library that makes Rust code usable from C++. It's time we also had namespaces for names of software.

isolay8mo ago

Did somebody not believe what I said? Here it is:

https://github.com/corrosion-rs/corrosion

tucnak8mo ago· 1 in thread

What's this obsession with SQLite? For all intents and purposes, what they'd accomplished is effectively a Type 2 table with extra steps. CRDT is totally overkill in this situation. You can implement this in Postgres easily with very little changes to your access patterns... DISTINCT ON. Maybe this kind of "solution" is impressive for Rust programmers, I'm not sure what's the deal exactly, but all it tells me is Fly ought to hire actual networking professionals, maybe even compute-in-network guys with FPGA experience like everyone else, and develop their own routers that way—if only to learn more about networking.

tptacek8mo ago

What part of this problem do you think FPGAs would help with?

In what sense do you think we need specialty routers?

How would you deploy Postgres to address these problems?

ricardobeat8mo ago

> Like an unattended turkey deep frying on the patio, truly global distributed consensus promises deliciousness while yielding only immolation

Their writing is so good, always a fun and enlightening read.

natebrennand8mo ago

> Finally, let’s revisit that global state problem. After the contagious deadlock bug, we concluded we need to evolve past a single cluster. So we took on a project we call “regionalization”, which creates a two-level database scheme. Each region we operate in runs a Corrosion cluster with fine-grained data about every Fly Machine in the region. The global cluster then maps applications to regions, which is sufficient to make forwarding decisions at our edge proxies.

This tier approach makes a lot of sense to mitigate the scaling limit per corrosion node. Can you share how much data you wind up tracking in each tier in practice?

How concise is the entry for each application -> [regions] table? Does the constraint of running this on every node mean that this creates a global limit for number of applications? It also seems like the region level database would have a regional limit for the number of Fly machines too?

conradev8mo ago

  To ensure every instance arrives at the same “working set” picture, we use cr-sqlite, the CRDT SQLite extension.

Cool to see cr-sqlite used in production!

kiitos8mo ago

woof

vlcn-io/cr-sqlite definitely built by someone who doesn't understand the fundamentals of the space

> As of cr-sqlite 0.15, the CRDT for an existing row being update is this: (1) Biggest col_version wins

col_version is definitely something, but it isn't a logical timestamp!

--

https://github.com/superfly/corrosion/blob/main/doc/crdts.md

> Crsqlite specifically uses a "lamport timestamp" which, if you squint at from a distance, could be most concisely boiled down to a monotonically increasing counter.

lamport clocks can be boiled down to monotonically-increasing counters _per physical node in the system_, not per logical row/entity in the data model

so if you want to do conflict resolution based on logical (lamport) clocks you need to evaluate/resolve concurrent modifications according to site-specific logical clocks and their histories -- not just raw integers

which 100% vlcn.io does not do

> destroyed comes before started and so started is "bigger"

eep. good luck!

mrbluecoat8mo ago

For the TL;DR folks: https://github.com/superfly/corrosion

yencabulator8mo ago

Of all the ways I'd want to interact with CRDTs, doing it within SQL & SQLite syntax and being stuck with LWW would be my least preferred route.

cadamsdotcom8mo ago

> for a long time we ran both Corrosion and Consul, because two distributed systems means twice the resiliency.

Nice.

jadbox8mo ago

Could this be used as a multi-writer alternative to litestream?

j / k navigate · click thread line to collapse