Understanding Round Robin DNS (opens in new tab)

(blog.hyperknot.com)

394 pointshyperknot1y ago123 comments

123 comments

98 comments · 35 top-level

__turbobrew__1y ago· 11 in thread

DNS load balancing has some really nasty edge cases. I have had to deal with golang HTTP2 clients using RR DNS and it has caused issues.

Golang HTTP2 clients will reuse the first server they can connect to over and over and the DNS is never re-resolved. This can lead to issues where clients will not discover new servers which are added to the pool.

An particularly pathological case is if all serving backends go down the clients will all pin to the first serving backend which comes up and they will not move off. As other servers come up few clients will connect since they are already connected to the first server which came back.

A similar issue happens with grpc-go. The grpc DNS resolver will only re-resolve when the connection to a backend is broken. Similarly grpc clients can all gang onto a host and never move off. There are suggestions that on the server side you can set `MAX_CONNECTION_AGE` which will periodically disconnect clients after a while which causes the client to re-resolve the DNS.

I really wish there was a better standard solution for service discovery. I guess the best you can do is implement a request based load balancer with a virtual IP and have the load balancer perform health checks. But you are still kicking the can down the road as you are just pushing down the problem to the system which implements virtual IPs. I guess you assume that the routing system is relatively static compared to the backends and that is where the benefits come in.

I'm curious how do people do this on bare metal? I know AWS/GCP/etc... have their internal load balancers, but I am kind of curious what the secret sauce is to doing this. Maybe suggestions on blog posts or white papers?

fotta1y ago

> Golang HTTP2 clients will reuse the first server they can connect to over and over and the DNS is never re-resolved.

I’m not a DNS expert but shouldn’t it re-resolve when the TTL expires?

__turbobrew__1y ago

You nerd sniped me. The guts of how http2 deals with this in golang is in transport.go : https://github.com/golang/go/blob/master/src/net/http/transp...

If I’m reading the code right round trips (HTTP requests) go through queueForIdleConn which picks up any pre-existing connections to a host. The only time these connections are cleaned up (in HTTP2) is if keepalives are turned off and the connection has been idle for too long OR the connection breaks in some way OR the max number of connections is hit LRU cache evictions take place.

Furthermore, the golang dnsclient doesn’t even expose record TTLs to callers so how could the HTTP2 transport know when an entry is stale? https://github.com/golang/go/blob/master/src/net/dnsclient_u...

toast01y ago

It should, but like the sibling, I haven't seen what Go does. I've seen it happen elsewhere. Exchange used to cache any answer it got until it restarted. Java has had that behavior from time to time if you're not careful as well.

Querying DNS can be expensive, so it makes sense to build a cache to avoid querying again when you don't need to, but typical APIs for name resolution such as gethostbyname / getaddrinfo don't return the TTL, so people just assume forever is a good TTL. Especially for a persistant (http) connection, it kind of makes sense to never query DNS again while you already have a working connection that you made with that name, and if it's TLS, it's quite possible that you don't check if the certificate has expired while you're connected or if you do a session resumption.

But innocent things like this add up to make operating services tricky. Many times, if you start refusing connections, clients figure it out, but sometimes the caches still don't get cleared.

fotta1y ago

> but typical APIs for name resolution such as gethostbyname / getaddrinfo don't return the TTL

Oh wow I didn’t know this but I looked it up and you’re right. Interesting.

hypeatei1y ago

I've seen DNS only be refreshed when restarting on embedded devices I work with too. They use a proprietary HTTP library...

loevborg1y ago

I don't know about Golang but I swear I've seen this before as well - clients holding on to an old IP address without ever re-resolving the domain name. It makes me wary of using DNS for load balancing or blue-green deployments. I feel like I can't trust DNS clients.

wink1y ago

It's been 8-10 years but when I was serving tracking pixels we were astonished how long we still got requests from residential IPs for whole hostnames we had deprecated. That means I would not trust DNS caching anyway. I'm not talking days here, but months, with a TTL set to mere days.

ignoramous1y ago

Some reasons to connect to the same IP: TCP Fast Open, TLS session resumption, connection pools, residual censorship.

1 more reply

kkielhofner1y ago

TTL isn't universally respected. Consider the following path:

Your machine -> Local router -> Configured upstream DNS Server (ISP/CF/Quad8/etc) -> ? -> Authoritative DNS Server

Any one of those layers can override/mess with/cache in a variety of ways including TTL. This is why Cloudflare and a variety of other providers use IP anycast. They accepted DNS for what it is and worked around it.

Not only is the IP always the IP, the "global" BGP routing table actually universally and consistently updates much faster than DNS. Then whatever routers, machines, etc downstream from that don't matter.

__turbobrew__1y ago

I read through the golang code once due to coming across this issue with kubernetes clients which use the standard golang http client under the hood.

I would need to re-read the code to refresh my memory.

pvtmert1y ago

not an expert but overall; unless connection closes for any reason, resolution does not happen.

also, java historically had -1 ttl (eg: infinite) by default. causing a lot of headaches with ephemeral/container services.

latchkey1y ago· 10 in thread

  > "It's an amazingly simple and elegant solution that avoids using Load Balancers."

When a server is down, you have a globally distributed / cached IP address that you can't prevent people from hitting.

https://www.cloudflare.com/learning/dns/glossary/round-robin...

toast01y ago

Skipping an unnecessary intermediary is worth considering.

Load balancing isn't without cost, and load balancers subtly (or unsubtly) messing up connections is an issue. I've also used providers where their load balancers had worse availability than our hosts.

If you control the clients, it's reasonable to call the platform dns api to get a list of ips and shuffle and iterate through in an appropriate way. Even better if you have a few stablely allocated IPs you can distribute in client binaries for when DNS is broken; but DNS is often not broken and it's nice to use for operational changes without having to push new configuration/binaries everytime you update the cluster.

If your clients are browsers, default behavior is ok; they usually use IPs in order, which can be problematic [1], but otherwise, they have good retry behavior: on connection refused they try another IP right away, in case of timeout, they try at least a few different IPs. It's not ideal, and I'd use a load balancer for browsers, at least to serve the initial page load if feasible, and maybe DNS RR and semi-smart client logic in JS for websockets/etc; but DNS RR is workable for a whole site too.

If your clients are not browsers and not controlled by you, best of luck?

I will 100% admit that sometimes you have to assume someone built their DNS caching resolver to interpret the TTL field as a number of days, rather than number of seconds. And that clients behind those resolvers will have trouble when you update DNS, but if your loadbalancer is behind a DNS name, when it needs to change addresses, you'll deal with that then, and you won't have experience.

[1] one of the RFCs suggests that OS apis should sort responses by prefix match, which might make sense if IP prefixes were heirarchical as a proxy to get to a least network distance server. But in the real world, numerically adjacent /24s are often not network adjacent, but if your servers have widely disparate addresses, you may see traffic from some client ips gravitate towards numerically similar server ips.

ignoramous1y ago

> you control the clients, it's reasonable to call the platform dns api to get a list of ips and shuffle and iterate through in an appropriate way. Even better if you have a few stable allocated IPs you can distribute in client binaries for when DNS is broken

You know, not many apps do this but in particular WhatsApp does! Was it you?

toast01y ago

Not my idea, but I supported it. Originally, client build scripts resolved the service names at build time, and that worked ok because our hosts tended to have a lot of longevity, and DNS tends to work, but things got a little better when we were more intentional about selecting the servers to be in the list, and keep track of which ones were in the list, so retirements could be managed a bit better. And I pushed until we got agreement on a set of FB load balancer IPs to include as well.

1 more reply

ectospheno1y ago

> I will 100% admit that sometimes you have to assume someone built their DNS caching resolver to interpret the TTL field as a number of days, rather than number of seconds.

I’ve run a min ttl of 3600 on my home network for over a year. No one has complained yet.

toast01y ago

That's only because there's no way for service operators to effectively complain when your clients continue to hit service ips for 55 minutes after you should. And if there was, we'd first yell at all the people who continue to hit service ips for weeks and months after a change... by the time we get to complaining about one home using an hour ttl, it's not a big deal.

1 more reply

wongarsu1y ago

An clients tested in the article behaved correctly and chose one of the reachable servers instead.

Of course somebody will inevitably misconfigure their local DNS or use a bad client. Either you accept an outage for people with broken setups or you reassign the IP to a different server in the same DC.

latchkey1y ago

If you know all of your clients, then you don't even need DNS. But, you don't know all of your clients. Nor do you always know your upstream DNS provider.

Design for failure. Don't fabricate failure.

zamadatix1y ago

Why would knowing your clients change whether or not you want to use DNS? Even when you control all of the clients you'll almost always want to keep using DNS.

A large number of services successfully achieve their failure tolerances via these kinds of DNS methods. That doesn't mean all services would or that it's always the best answer, it just means it's a path you can consider when designing for the needs of a system.

1 more reply

arrty881y ago

The standard today is to use a relatively low TTL and to health check the members of the pool from the dns server.

latchkey1y ago

That's like saying there are traffic rules in Saigon.

Exact implementation of TTL, is a suggestion.

jgrahamc1y ago· 9 in thread

Hmm. I've asked the authoritative DNS team to explain what's happening here. I'll let HN know when I get an authoritative answer. It's been a few years since I looked at the code and a whole bunch of people keep changing it :-)

My suspicion is that this is to do with the fact that we want to keep affinity between the client IP and a backend server (which OP mentions in their blog). And the question is "do you break that affinity if the backend server goes down?" But I'll reply to my own comment when I know more.

delusional1y ago

> I'll let HN know when I get an authoritative answer

Please remember to include a TTL so I know how long I can cache that answer.

jgrahamc1y ago

Thank you for appreciating my lame joke.

mlhpdx1y ago

So many sins have been committed in the name of session affinity.

jgrahamc1y ago

Looks like this has nothing to do with session affinity. I was wrong. Apparently, this is a difference between our paid and free plans. Getting the details, and finding out why there's a difference, and will post.

_cenw1y ago

Well, CEO said there is none, get on it engineering :)

1 more reply

jgrahamc1y ago

Update: change is rolling out to do zero downtime failover on free accounts.

hyperknotOP1y ago

Great news, thanks for the amazing turnaround time!

tiffanyh1y ago

And follow-up as well.

egberts11y ago

Please ignore the hidden master server, carry on.

metadat1y ago· 7 in thread

> This allows you to share the load between multiple servers, as well as to automatically detect which servers are offline and choose the online ones.

To [hesitantly] clarify a pedantry regarding "DNS automatic offline detection":

Out of the box, RR-DNS is only good for load balancing.

Nothing automatic happens on the availability state detection front unless you build smarts into the client. TFA introduction does sort of mention this, but it took me several re-reads of the intro to get their meaning (which to be fair could be a PEBKAC). Then I read the rest of TFA, which is all about the smarts.

If the 1/N server record selected by your browser ends up being unavailable, no automatic recovery / retry occurs at the protocol level.

p.s. "Related fun": Don't forget about Java's DNS TTL [1] and `.equals()' [2] behaviors.

[1] https://stackoverflow.com/questions/1256556/how-to-make-java...

[2] https://news.ycombinator.com/item?id=21765788 (5y ago, 168 comments)

encoderer1y ago

We accomplish this on Route53 by having it pull servers out of the dns response if they are not healthy, and serving all responses with a very low ttl. A few clients out there ignore ttl but it’s pretty rare.

ChocolateGod1y ago

I once achieved something similar with PowerDNS, which you can use LUA rules to do health checks on a pool of servers and only return health servers as part of the DNS record, but found odd occurrences of clients not respecting the TTL on DNS records and caching too long.

tetha1y ago

You usually do this with servers that should be rock-solid and stateless. HAProxy, Traefik, F5. That way, you can pull the DNS record for maintenance 24 - 48 hours in advance. If something overrides DNS TTLs that much, there is probably some reason.

1 more reply

d_k_f1y ago

Honest question to somebody who seems to have a bit of knowledge about this in the real world: several (German, if relevant) providers default to a TTL of ~4 hours. Lovely if everything is more or less finally set up, but usually our first step is to decrease pretty much everything down to 60 seconds so we can change things around in emergencies.

On average, does this really matter/make sense?

stackskipton1y ago

Lower TTLs is cheap insurance so you can move hostnames around.

However, you should understand that not ALL clients will respect those TTLs. There are resolvers that may minimum TTL threshold where IF TTL < Threshold, TTL == Threshold, Common with some ISPs, and also, there may be cases where browsers and operating systems will ignore TTLs or fudge them.

toast01y ago

From experience, 90%+ of traffic will respect your TTLs or something close. So on average, it definitely does make a difference. There's always going to be a long tail of straglers though.

Personally, my default for names that are likely to change often is 5 minutes, but 1 minute is ok, but might drive a lot more DNS traffic.

rrdnsd1y ago

Shameless plug: a FOSS project to provide failover for RR-DNS and it's being funded by NLnet https://codeberg.org/FedericoCeratto/rrdnsd

tetha1y ago· 5 in thread

> As you can see, all clients correctly detect it and choose an alternative server.

This is the nasty key point. The reliability is decided client-side.

For example, systemd-resolved at times enacted maximum technical correctness by always returning the lowest IP address. After all, DNS-RR is not well-defined, so always returning the lowest IPs is not wrong. It got changed after some riots, but as far as I know, Debian 11 is stuck with that behavior, or was for a long time.

Or, I deal with many applications with shitty or no retry behavior. They go "Oh no, I have one connection refused, gotta cancel everything, shutdown, never try again". So now 20% - 30% of all requests die in a fire.

It's an acceptable solution if you have nothing else. As the article notices, if you have quality HTTP clients with a few retries configured on them (like browsers), DNS-RR is fine to find an actual load balancer with health checks and everything, which can provide a 100% success rate.

But DNS-RR is no loadbalancer and loadbalancers are better.

aarmenaa1y ago

True. On the other hand, if you control the clients and can guarantee their behavior then DNS load balancing is highly effective. A place I used to work had internal DNS servers with hundreds of millions of records with 60 second TTLs for a bespoke internal routing system that connected incoming connections from customers with the correct resources inside our network. It was actually excellent. Changing routing was as simple as doing a DDNS update, and with NOTIFY to push changes to all child servers the average delay was less than 60 seconds for full effect. This made it easy to write more complicated tools, and I wrote a control panel that could take components from a single server to a whole data center out of service at the click of a button.

There were definitely some warts in that system but as those sorts of systems go it was fast, easy to introspect, and relatively bulletproof.

nerdile1y ago

It's putting reliability in the hands of the client, or whatever random caching DNS resolver they're sitting behind.

It also puts failover in those same hands. If one of your regions goes down, do you want the traffic to spread evenly to your other regions? Or pile on to the next nearest neighbor? If you care what happens, then you want to retain control of your traffic management and not cede it to others.

latchkey1y ago

> It's an acceptable solution if you have nothing else.

I'd argue it isn't acceptable at all in this day and age and that there are other solutions one should pick today long before you get to the "nothing else" choice.

toast01y ago

Anycast is nice, but it's not something you can do yourself well unless you have large scale. You need to have a large number of PoPs, and direct connectivity to many/most transit providers, or you'll get weird routing.

You also need to find yourself some IP ranges. And learn BGP and find providers where you can use it.

DNS round robin works as long as you can manage to find two boxes to run your stuff on, and it scales pretty high too. When I was at WhatsApp, we used DNS round robin until we moved into Facebook's hosting where it was infeasible due to servers not having public addresses. Yes, mostly not browsers, but not completely browserless.

latchkey1y ago

Back in 2013, that might have been the best solution for you. But there were still plenty of headlines... https://www.wamda.com/2013/11/whatsapp-goes-down

We're talking about today.

The reason why I said Anycast is cause the vast majority of people trying to solve the need for having multiple servers in multiple locations, will just use CF or any one of the various anycast based CDN providers available today.

1 more reply

nielsole1y ago· 4 in thread

> Curl also works correctly. First time it might not, but if you run the command twice, it always corrects to the nearest server.

I always assumed curl was stateless between invocations. What's going on here?

barrkel1y ago

My hypothesis: he's running on macOS and he's seeing the same behavior from Safari as from curl because they're both using OS-provided name resolution which is doing the lowest-latency selection.

Firefox and Chrome use DNS over HTTPS by default I believe, which may mean they use a different name resolution path.

The above is entirely conjection on my part, but the guess is heavily informed by the surprise of curl's behavior.

plagiat0r1y ago

But this does not make sense. How Mac operating system resolver are supposed to test the latency of (A)ddress records? Browser use this network address to actually make a tcp connection on 443 and measure latency here. Or udp/443 when using http3/quic.

But operating system resolver only speak with DNS servers. It does not make https connections to calculate latency which would pick "the closest server". Also dns had no way to tell what port you will be using, maybe service is on 8443 or something.

For geo DNS I've built a custom backed for powerdns with geo DNS capabilities and healthckecks to quickly remove a broken vps from the DNS responses.

barrkel1y ago

If I had to hypothesize further, I'd say that macOS may let its DNS resolver cache interact with its TCP stack. It's not inconceivable that the TCP handshake is used to make a rough estimate of network latency.

1 more reply

hyperknotOP1y ago

Correct. I'm on macOS and I tried turning off DoH in Firefox and then it worked like Safari.

stackskipton1y ago· 4 in thread

As SRE, I get a chuckle out of this article and some of the responses. Devs mess this up constantly.

DNS has one job. Hostname -> IP. Nothing further. You can mess with it on server side like checking to see if HTTP server is up before delivering the IP but once IP is given, the client takes over and DNS can do nothing further so behavior will be wildly inconsistent IME.

Assuming DNS RR is standard where Hostname returns multiple IPs, then it's only useful for load balancing in similar latency datacenters. If you want fancy stuff like geographic load balancing or health checks, you need fancy DNS server but at end of day, you should only return single IP so client will target the endpoint you want them to connect to.

plagiat0r1y ago

I've implemented a custom powerdns backend that combines heathchecks, weighted probabilistic round robin, and geo DNS and it works excellent to build and auto healing CDN.

It was specifically built for multi DC or multi cloud or hybrid operations that are on separate continents, with geo DNS, heathchecks and faiolver on the DNS level at the same time. When all usa servers in the WRR pool are down, or DC is down, it starts to answers the closest next set of WRR (Canada) automatically.WRR pools are dynamic and auto healing, constantly doing http heathchecks.

It is also dirt cheap, like 100x cheaper as opposed to aquire provider independent IP address space and run and operate AnyCast and having 24/7 NOC teams on this AnyCast, constantly adjusting bgp communities etc. and it is not like anycast and bgp solve anything when one server is down but other works. You can't stop announcing whole prefix if you run 200 machines but only one or two are down.

TTL I'm using is 30 seconds.

I never shared this backed with the world, you can't test it or purchase it. But maybe some day I'll launch a route53 competitor ;)

lysace1y ago

I've never ever come up with a scenario where RR DNS is useful in the goal of achieving high availability. I'm similarly mystified.

What can be useful: dynamically adjusting DNS responses depending on what DC is up. But at this point shouldn't you be doing something via BGP instead? (This is where my knowledge breaks down.)

stackskipton1y ago

Yea, Anycast IP like what Cloudflare does is the best.

If you want cheaper load balancing and are ok with some downtime while DNS reconfigures, DNS system that returns IP based on which Datacenter is up works. Examples of this are Route53, Azure Traffic Manager and I assume Google has solution, I just don't know what it is.

lysace1y ago

Worked on implementing a distributed-consensus driven DNS thing like 15 years ago. We had 3 DCs around the world for a very compute-intense but not very stateful service. It actually just worked without any meaningful testing on the first single DC outage. In retrospect I'm amazed.

teddyh1y ago· 3 in thread

One of the early proposed solutions for this was the SRV DNS record, which was similar to the MX record, but for every service, not just e-mail. With MX and SRV records, you can specify a list of servers with associated priority for clients to try. SRV also had an extra “weight” parameter to facilitate load balancing. However, SRV did not want the political fight of effectively hijacking every standard protocol to force all clients of every protocol to also check SRV records, so they specified that SRV should only be used by a client if the standard for that protocol explicitly specifies the use of SRV records. This technically prohibited HTTP clients from using SRV. Also, when the HTTP/2 (and later) HTTP standards were being written, bogus arguments from Google (and others) prevented the new HTTP protocols from specifying SRV. SRV seems to be effectively dead for new development, only used by some older standards.

The new solution for load balancing seems to be the new HTTPS and SVCB DNS records. As I understand it, they are standardized by people wanting to add extra parameters to the DNS in order to to jump-start the TLS1.3 handshake, thereby making fewer roundtrips. (The SVCB record type is the same as HTTPS, but generalized like SRV.) The HTTPS and SVCB DNS record types both have the priority parameter from the SRV and MX record types, but HTTPS/SVCB lack the weight parameter from SRV. The standards have been published, and support seem to have been done in some browsers, but not all have enabled it. We will see what browsers will actually do in the near future.

jsheard1y ago

> The new solution for load balancing seems to be the new HTTPS and SVCB DNS records. As I understand it, they are standardized by people wanting to add extra parameters to the DNS in order to to jump-start the TLS1.3 handshake, thereby making fewer roundtrips.

The other big advantage of the HTTPS record is that it allows for proper CNAME-like delegation at the domain apex, rather than requiring CNAME flattening hacks that can cause routing issues on CDNs which use GeoDNS in addition to or instead of anycast. If you've ever seen a platform recommend using a www subdomain instead of an apex domain, that's why, and it's part of why Akamai pushed for HTTPS records to be standardized since they use GeoDNS.

teddyh1y ago

Oh yes¹. This is an advantage shared by all of MX, SRV and HTTPS/SVCB, though.

1. <https://news.ycombinator.com/item?id=38420555>

jcgl1y ago

I wish so badly for proper adoption of SRV or other MX-style records that could be used for HTTP. Their lack is especially painful when dealing with the fact that people commonly want to host websites at their domain apex.

However, using MX-style records safely can be tricky if you can’t rely on DNSSEC.

unilynx1y ago· 2 in thread

> So what happens when one of the servers is offline? Say I stop the US server:

> service nginx stop

But that's not how you should test this. A client will see the connection being refused, and go on to the next IP. But in practice, a server may not respond at all, or accept the connection and then go silent.

Now you're dependent on client timeouts, and round robin DNS will suddenly look a whole lot less attractive to increase reliability.

globular-toast1y ago

Yes, this can be tested by just unplugging or turning off a machine/VM with that IP address. Stopping a service is a planned action that you could handle by updating your DNS first.

Joe_Cool1y ago

Yeah SIG_STOP or just ip/nftables DROP would be a much more realistic test.

jgrahamc1y ago· 2 in thread

Hey. This is Cloudflare's CTO. We've rolled out a change to all free accounts in Cloudflare to bring them into line with paid accounts. The problem you are talking about here has been fixed and we should be doing Zero Downtime Failover for all account types. Can you retest it?

PS Thanks for writing this up. Glad we were able to change this behaviour for everyone.

hyperknotOP1y ago

Retested it, works brilliantly! I'll update the article accordingly.

Thanks for bringing it to the Free accounts, great outcome!

jgrahamc1y ago

Nice. Glad we got this fixed.

cybice1y ago· 2 in thread

Cloudflare results with worker as a reverse proxy can be much better.

easylion1y ago

But won't it add an additional hop hence additional latency to every single request ?

rodcodes1y ago

Nah, because the Cloudflare Workers run at closest edge location and are real fast.

The real solution with Cloudflare is to use their Load Balancing (https://developers.cloudflare.com/load-balancing) which is a paid feature.

freitasm1y ago· 1 in thread

Interesting. The author starts by discussing DNS round robin but then briefly touches on Cloudflare Load Balancing.

I use this feature, and there are options to control Affinity, Geolocation and others. I don't see this discussed in the article, so I'm not sure why Cloudflare load balancing is mentioned if the author does not test the whole thing.

Their Cloudflare wishlist includes "Offline servers should be detected."

This is also interesting because when creating a Cloudflare load balancing configuration, you create monitors, and if one is down, Cloudflare will automatically switch to other origin servers.

These screenshots show what I see on my Load Balancing configuration options:

https://cdn.geekzone.co.nz/imagessubs/62250c035c074a1ee6e986...

https://cdn.geekzone.co.nz/imagessubs/04654d4cdda2d6d1976f86...

hyperknotOP1y ago

I briefly mention that I don't go into L7 Load Balancing because it'd be cost prohibitive for my use case (millions of requests).

Also, the article is about DNS-RR, not the L7 solution.

zamalek1y ago· 1 in thread

Take a look at SRV records instead - they are very intentionally designed for this, and behave vaguely similarly to MX. Creating a DNS server (or a CoreDNS/whatever module) that dynamically updates weights based on backend metrics has been a pending pet project of mine for some time now.

jeroenhd1y ago

Until the HTTP spec gets updated to include SRV records, using SRV records for HTTP(S) is technically spec-incompliant and practically useless.

However, as is common with web tech, the old SRV record has been reinvented as the SVCB record with a smidge of DANE for good measure.

V__1y ago· 1 in thread

This seems like a nice solution for zero-downtime updates. Clone the server, add a the specified ip, deny access to the main one, upgrade and turn the cloned server off.

nrnrjrjrj1y ago

Those exact words (aka blue green deployment) apply to loadbalancers too and they can do it better. They can even do health checks and slowly ramp traffic to the new server and back off if things go bad for an automated rollback.

meindnoch1y ago· 1 in thread

So half of your content is served from another server? Sounds like a recipe for inconsistent states.

ChocolateGod1y ago

You can easily use something like an object store or shared database to keep data consistent.

edm0nd1y ago

The dark remix version of this is fast flux hosting and what a lot of the bulletproof hosting providers use.

https://unit42.paloaltonetworks.com/fast-flux-101/

realchaika1y ago

May be worth mentioning Zero downtime failover is a Pro or higher feature I believe, that's how it was documented before as well, back when protect your origin server docs were split by plan level. So you may see different behavior/retries.

solatic1y ago

Multiple A records is not for load balancing, a key component of which is full control over registering new targets and deregistering old targets in order to shift traffic. Because DNS responses are cached, you can't reliably use DNS to quickly shift traffic to new IP addresses, or use DNS to remove traffic from old IP addresses.

As OP clearly shows, it's also not useful for geographically routing traffic to the nearest endpoint. Clients are dumb and may do things against their interest, the user will suffer for it, and you will get the complaints. Use a DNS provider with proper georouting if this is important to you.

The only genuinely valid reason for multiple A addresses is redundancy. If you have a physical NIC, guess what, those fail sometimes. If you get a virtual IP address from a cloud provider, guess what, those abstractions leak sometimes. Setting up multiple servers with multiple NICs per server and multiple A records pointing to those NICs is one of those things you do when your usecase requires some stratospherically high reliability SLA and you systematically start to work through every last single point of failure in your hot path.

neuroelectron1y ago

We used to do this at Amazon in the 00's for onsite hosts. At the time round robin DNS was the fastest way to load balance as even with dedicated load balancers of the time, the latency was a few milliseconds slower. A lot of the decisions didn't make sense to me and seemed to be grandfathered in from the 90's.

We had a dedicated DNS host and various other dedicated hosts for various services related to order fulfillment. A batch job would be downloaded in the morning to the order server (app) and split up amongst the symbol scanners which ran basic terminals. To keep latency as low as possible the scanners would dns round robin. I'm not sure how much that helped because the wifi was by far the biggest bottleneck simply for the fact of interference, reflection and so on.

With this setup an outage would have no effect the throughput of the warehouse since the batch job was all handled locally. As we moved toward same day shipping of course this was no longer a good solution and we moved to redundant, dedicated fiber and cellular data backup then almost completely remote servers for everything but app servers. So what we were left with was million dollars hvac to cool a quarter rack of hardware and a bunch of redundant onsite tech workers.

hypeatei1y ago

The browser behavior is really nice, good to know that it falls back quickly and smoothly. Round robin DNS has always been referred to as a "poor mans load balancer" which it seems to be living up to.

> Curl also works correctly. First time it might not, but if you run the command twice, it always corrects to the nearest server.

This took two tries for me, which begs the question how curl is keeping track of RTT (round trip times), interesting.

mlhpdx1y ago

Interesting topic for me, and I’ve been looking at anycast IP services and latency based DNS resolvers as well. I even made a repo[1] for anyone interested in a quick start for setting up AWS global accelerator.

[1] https://github.com/mlhpdx/cloudformation-examples/tree/maste...

why-el1y ago

Hm, I thought Happy Eyeballs (HE) was mainly concerned with IPv6 issues and falling back to IPV4. I didn't think it was this RFC in which finally some words were said about round-robin specifically, but it looks like it was (from this article).

Is it true then that before HE, most round-robin implementations simply cycled and no one considered latency? That's a very surprising finding.

LikeBeans1y ago

Another way to solve for clients that stick with an IP after resolving is to use a combination of DNS RR and Anycast (if you have control over the physical infra). That means you resolve with RR to an IP in the regional data center and then use Anycast for local delivery. That way if the server goes down these clients can continue to operate.

chasil1y ago

I actually use round robin into a set of ssh servers.

There is never a delay if one of them is down.

I am using a closed-source client (Bluezone Rocket), but I'm assuming that it pulled a lot of code from PuTTY as it uses the PPK format.

jkrauska1y ago

Check out what happens when you use IPv6 addresses. RFC 6724 is awkward about ordering with IPv6.

How your OS sorts DNS responses also comes in to play. Depends on what your browser makes DNS requests.

bar000n1y ago

hey! so i got a cdn for video made of 4 bare metals and 2 are newer and more powerful so i give them each 2 ip addresses from the 6 addresses replied by dns for the respective a record. but from a very diverse pool of devices (proprietary set top boxes, smart tv sets, mobile clients ios and android, web browsers, etc) i still get ~40% of traffic on the older servers instead of the expected 33% given 2 out of 6 ip addresses resolved as dns a records for these hosts. why?

urbandw311er1y ago

What a great article! It’s often easy to forget just how flexible and self-correcting the “official” network protocols are. Thanks to the author for putting in the legwork.

backtoyoujim1y ago

"I wrote a decoder in Perl. Everything must be in Perl."

preach on.

rebelde1y ago

I have use round robin for years.

Wish I could add instructions like:

- random choice #round robin, like now

- first response # usually connects to closest server

- weights (1.0.0.1:40%; 2.0.0.2:60%)

- failover: (quick | never)

- etc: naming countries, continents

tiahura1y ago

Back in the day DNS consumed a lot more oxygen - Bind, double-reverse mx records, windows dns, etc. What happened? Did cloud make all of that go away?

specto1y ago

Chrome and Firefox use the OS dns server by default, which in most OS' have caching as well.

easylion1y ago

did you try running a simple bash curl loop instead of manually printing. The data and statistics will be become exactly clear. Because i want to understand how to ensure my clients get the nearest edge data center

kawsper1y ago

37signals/Basecamp wrote about something similar on their blog, they saw traffic switching almost immediately: https://signalvnoise.com/posts/3857-when-disaster-strikes and in their comments they said it was hinted that it was just a DNS update with low TTLs.

egberts11y ago

round robin ≠ load balancer

but please do continue reading on…

easylion1y ago

https://www.cloudflare.com/en-gb/learning/cdn/glossary/anyca...

j / k navigate · click thread line to collapse

123 comments

98 comments · 35 top-level

__turbobrew__1y ago· 11 in thread

DNS load balancing has some really nasty edge cases. I have had to deal with golang HTTP2 clients using RR DNS and it has caused issues.

fotta1y ago

> Golang HTTP2 clients will reuse the first server they can connect to over and over and the DNS is never re-resolved.

I’m not a DNS expert but shouldn’t it re-resolve when the TTL expires?

__turbobrew__1y ago

You nerd sniped me. The guts of how http2 deals with this in golang is in transport.go : https://github.com/golang/go/blob/master/src/net/http/transp...

toast01y ago

But innocent things like this add up to make operating services tricky. Many times, if you start refusing connections, clients figure it out, but sometimes the caches still don't get cleared.

fotta1y ago

> but typical APIs for name resolution such as gethostbyname / getaddrinfo don't return the TTL

Oh wow I didn’t know this but I looked it up and you’re right. Interesting.

hypeatei1y ago

I've seen DNS only be refreshed when restarting on embedded devices I work with too. They use a proprietary HTTP library...

loevborg1y ago

wink1y ago

ignoramous1y ago

Some reasons to connect to the same IP: TCP Fast Open, TLS session resumption, connection pools, residual censorship.

1 more reply

kkielhofner1y ago

TTL isn't universally respected. Consider the following path:

Your machine -> Local router -> Configured upstream DNS Server (ISP/CF/Quad8/etc) -> ? -> Authoritative DNS Server

__turbobrew__1y ago

I read through the golang code once due to coming across this issue with kubernetes clients which use the standard golang http client under the hood.

I would need to re-read the code to refresh my memory.

pvtmert1y ago

not an expert but overall; unless connection closes for any reason, resolution does not happen.

also, java historically had -1 ttl (eg: infinite) by default. causing a lot of headaches with ephemeral/container services.

latchkey1y ago· 10 in thread

  > "It's an amazingly simple and elegant solution that avoids using Load Balancers."

When a server is down, you have a globally distributed / cached IP address that you can't prevent people from hitting.

https://www.cloudflare.com/learning/dns/glossary/round-robin...

toast01y ago

Skipping an unnecessary intermediary is worth considering.

If your clients are not browsers and not controlled by you, best of luck?

ignoramous1y ago

You know, not many apps do this but in particular WhatsApp does! Was it you?

toast01y ago

1 more reply

ectospheno1y ago

> I will 100% admit that sometimes you have to assume someone built their DNS caching resolver to interpret the TTL field as a number of days, rather than number of seconds.

I’ve run a min ttl of 3600 on my home network for over a year. No one has complained yet.

toast01y ago

1 more reply

wongarsu1y ago

An clients tested in the article behaved correctly and chose one of the reachable servers instead.

latchkey1y ago

If you know all of your clients, then you don't even need DNS. But, you don't know all of your clients. Nor do you always know your upstream DNS provider.

Design for failure. Don't fabricate failure.

zamadatix1y ago

Why would knowing your clients change whether or not you want to use DNS? Even when you control all of the clients you'll almost always want to keep using DNS.

1 more reply

arrty881y ago

The standard today is to use a relatively low TTL and to health check the members of the pool from the dns server.

latchkey1y ago

That's like saying there are traffic rules in Saigon.

Exact implementation of TTL, is a suggestion.

jgrahamc1y ago· 9 in thread

delusional1y ago

> I'll let HN know when I get an authoritative answer

Please remember to include a TTL so I know how long I can cache that answer.

jgrahamc1y ago

Thank you for appreciating my lame joke.

mlhpdx1y ago

So many sins have been committed in the name of session affinity.

jgrahamc1y ago

_cenw1y ago

Well, CEO said there is none, get on it engineering :)

1 more reply

jgrahamc1y ago

Update: change is rolling out to do zero downtime failover on free accounts.

hyperknotOP1y ago

Great news, thanks for the amazing turnaround time!

tiffanyh1y ago

And follow-up as well.

egberts11y ago

Please ignore the hidden master server, carry on.

metadat1y ago· 7 in thread

> This allows you to share the load between multiple servers, as well as to automatically detect which servers are offline and choose the online ones.

To [hesitantly] clarify a pedantry regarding "DNS automatic offline detection":

Out of the box, RR-DNS is only good for load balancing.

If the 1/N server record selected by your browser ends up being unavailable, no automatic recovery / retry occurs at the protocol level.

p.s. "Related fun": Don't forget about Java's DNS TTL [1] and `.equals()' [2] behaviors.

[1] https://stackoverflow.com/questions/1256556/how-to-make-java...

[2] https://news.ycombinator.com/item?id=21765788 (5y ago, 168 comments)

encoderer1y ago

ChocolateGod1y ago

tetha1y ago

1 more reply

d_k_f1y ago

On average, does this really matter/make sense?

stackskipton1y ago

Lower TTLs is cheap insurance so you can move hostnames around.

toast01y ago

From experience, 90%+ of traffic will respect your TTLs or something close. So on average, it definitely does make a difference. There's always going to be a long tail of straglers though.

Personally, my default for names that are likely to change often is 5 minutes, but 1 minute is ok, but might drive a lot more DNS traffic.

rrdnsd1y ago

Shameless plug: a FOSS project to provide failover for RR-DNS and it's being funded by NLnet https://codeberg.org/FedericoCeratto/rrdnsd

tetha1y ago· 5 in thread

> As you can see, all clients correctly detect it and choose an alternative server.

This is the nasty key point. The reliability is decided client-side.

But DNS-RR is no loadbalancer and loadbalancers are better.

aarmenaa1y ago

There were definitely some warts in that system but as those sorts of systems go it was fast, easy to introspect, and relatively bulletproof.

nerdile1y ago

It's putting reliability in the hands of the client, or whatever random caching DNS resolver they're sitting behind.

latchkey1y ago

> It's an acceptable solution if you have nothing else.

I'd argue it isn't acceptable at all in this day and age and that there are other solutions one should pick today long before you get to the "nothing else" choice.

toast01y ago

You also need to find yourself some IP ranges. And learn BGP and find providers where you can use it.

latchkey1y ago

Back in 2013, that might have been the best solution for you. But there were still plenty of headlines... https://www.wamda.com/2013/11/whatsapp-goes-down

We're talking about today.

1 more reply

nielsole1y ago· 4 in thread

> Curl also works correctly. First time it might not, but if you run the command twice, it always corrects to the nearest server.

I always assumed curl was stateless between invocations. What's going on here?

barrkel1y ago

My hypothesis: he's running on macOS and he's seeing the same behavior from Safari as from curl because they're both using OS-provided name resolution which is doing the lowest-latency selection.

Firefox and Chrome use DNS over HTTPS by default I believe, which may mean they use a different name resolution path.

The above is entirely conjection on my part, but the guess is heavily informed by the surprise of curl's behavior.

plagiat0r1y ago

For geo DNS I've built a custom backed for powerdns with geo DNS capabilities and healthckecks to quickly remove a broken vps from the DNS responses.

barrkel1y ago

1 more reply

hyperknotOP1y ago

Correct. I'm on macOS and I tried turning off DoH in Firefox and then it worked like Safari.

stackskipton1y ago· 4 in thread

As SRE, I get a chuckle out of this article and some of the responses. Devs mess this up constantly.

plagiat0r1y ago

I've implemented a custom powerdns backend that combines heathchecks, weighted probabilistic round robin, and geo DNS and it works excellent to build and auto healing CDN.

TTL I'm using is 30 seconds.

I never shared this backed with the world, you can't test it or purchase it. But maybe some day I'll launch a route53 competitor ;)

lysace1y ago

I've never ever come up with a scenario where RR DNS is useful in the goal of achieving high availability. I'm similarly mystified.

What can be useful: dynamically adjusting DNS responses depending on what DC is up. But at this point shouldn't you be doing something via BGP instead? (This is where my knowledge breaks down.)

stackskipton1y ago

Yea, Anycast IP like what Cloudflare does is the best.

lysace1y ago

teddyh1y ago· 3 in thread

jsheard1y ago

teddyh1y ago

Oh yes¹. This is an advantage shared by all of MX, SRV and HTTPS/SVCB, though.

1. <https://news.ycombinator.com/item?id=38420555>

jcgl1y ago

However, using MX-style records safely can be tricky if you can’t rely on DNSSEC.

unilynx1y ago· 2 in thread

> So what happens when one of the servers is offline? Say I stop the US server:

> service nginx stop

Now you're dependent on client timeouts, and round robin DNS will suddenly look a whole lot less attractive to increase reliability.

globular-toast1y ago

Yes, this can be tested by just unplugging or turning off a machine/VM with that IP address. Stopping a service is a planned action that you could handle by updating your DNS first.

Joe_Cool1y ago

Yeah SIG_STOP or just ip/nftables DROP would be a much more realistic test.

jgrahamc1y ago· 2 in thread

PS Thanks for writing this up. Glad we were able to change this behaviour for everyone.

hyperknotOP1y ago

Retested it, works brilliantly! I'll update the article accordingly.

Thanks for bringing it to the Free accounts, great outcome!

jgrahamc1y ago

Nice. Glad we got this fixed.

cybice1y ago· 2 in thread

Cloudflare results with worker as a reverse proxy can be much better.

easylion1y ago

But won't it add an additional hop hence additional latency to every single request ?

rodcodes1y ago

Nah, because the Cloudflare Workers run at closest edge location and are real fast.

The real solution with Cloudflare is to use their Load Balancing (https://developers.cloudflare.com/load-balancing) which is a paid feature.

freitasm1y ago· 1 in thread

Interesting. The author starts by discussing DNS round robin but then briefly touches on Cloudflare Load Balancing.

Their Cloudflare wishlist includes "Offline servers should be detected."

This is also interesting because when creating a Cloudflare load balancing configuration, you create monitors, and if one is down, Cloudflare will automatically switch to other origin servers.

These screenshots show what I see on my Load Balancing configuration options:

https://cdn.geekzone.co.nz/imagessubs/62250c035c074a1ee6e986...

https://cdn.geekzone.co.nz/imagessubs/04654d4cdda2d6d1976f86...

hyperknotOP1y ago

I briefly mention that I don't go into L7 Load Balancing because it'd be cost prohibitive for my use case (millions of requests).

Also, the article is about DNS-RR, not the L7 solution.

zamalek1y ago· 1 in thread

jeroenhd1y ago

Until the HTTP spec gets updated to include SRV records, using SRV records for HTTP(S) is technically spec-incompliant and practically useless.

However, as is common with web tech, the old SRV record has been reinvented as the SVCB record with a smidge of DANE for good measure.

V__1y ago· 1 in thread

This seems like a nice solution for zero-downtime updates. Clone the server, add a the specified ip, deny access to the main one, upgrade and turn the cloned server off.

nrnrjrjrj1y ago

meindnoch1y ago· 1 in thread

So half of your content is served from another server? Sounds like a recipe for inconsistent states.

ChocolateGod1y ago

You can easily use something like an object store or shared database to keep data consistent.

edm0nd1y ago

The dark remix version of this is fast flux hosting and what a lot of the bulletproof hosting providers use.

https://unit42.paloaltonetworks.com/fast-flux-101/

realchaika1y ago

solatic1y ago

neuroelectron1y ago

hypeatei1y ago

> Curl also works correctly. First time it might not, but if you run the command twice, it always corrects to the nearest server.

This took two tries for me, which begs the question how curl is keeping track of RTT (round trip times), interesting.

mlhpdx1y ago

[1] https://github.com/mlhpdx/cloudformation-examples/tree/maste...

why-el1y ago

Is it true then that before HE, most round-robin implementations simply cycled and no one considered latency? That's a very surprising finding.

LikeBeans1y ago

chasil1y ago

I actually use round robin into a set of ssh servers.

There is never a delay if one of them is down.

I am using a closed-source client (Bluezone Rocket), but I'm assuming that it pulled a lot of code from PuTTY as it uses the PPK format.

jkrauska1y ago

Check out what happens when you use IPv6 addresses. RFC 6724 is awkward about ordering with IPv6.

How your OS sorts DNS responses also comes in to play. Depends on what your browser makes DNS requests.

bar000n1y ago

urbandw311er1y ago

What a great article! It’s often easy to forget just how flexible and self-correcting the “official” network protocols are. Thanks to the author for putting in the legwork.

backtoyoujim1y ago

"I wrote a decoder in Perl. Everything must be in Perl."

preach on.

rebelde1y ago

I have use round robin for years.

Wish I could add instructions like:

- random choice #round robin, like now

- first response # usually connects to closest server

- weights (1.0.0.1:40%; 2.0.0.2:60%)

- failover: (quick | never)

- etc: naming countries, continents

tiahura1y ago

Back in the day DNS consumed a lot more oxygen - Bind, double-reverse mx records, windows dns, etc. What happened? Did cloud make all of that go away?

specto1y ago

Chrome and Firefox use the OS dns server by default, which in most OS' have caching as well.

easylion1y ago

kawsper1y ago

egberts11y ago

round robin ≠ load balancer

but please do continue reading on…

easylion1y ago

https://www.cloudflare.com/en-gb/learning/cdn/glossary/anyca...

j / k navigate · click thread line to collapse