https://soundcloud.com/ryan-flowers-916961339/dns-to-the-tun...
That's weird to me. I have been working in sysadmin/DevOps for over a decade, but it did not take me very long to learn that DNS outages cause massive problems.
Love this.
So many sites down... and unfortunately not one of them is Twitter
> So many websites are down, are AWS servers down or something?
> Amazon web services is down which is affecting a lot of company web sites and services. Not sure what is going on.
> Miss us? @aldotcom and a whole bunch of other folks have been knocked off the internet by what appears to be an AWS attack/system failure. We'll be back. ?
Basically soft-invalidate your local DNS cache but it back from the cache graveyard if DNS is down.
From what I observed here, it was more internal DNS related: Newegg was serving an opaque “DNS failure” error page from Akamai’s front-end which is likely because their infra was failing to resolve names internally.
Please keep comments like this off HN
Reference: #11.453a2f17.1393u44848484.3aee33433
At the bottom of a very bland looking error page.sudo dscacheutil -flushcache; sudo killall -HUP mDNSResponder
Both Bitwarden and 1Password are great.
GOV.UK for example uses both aws and gcp for DNS
But then, the cached values from AWS take a while to clear, TTL never seems to be applied properly. It always feels like the worst case in such a scenario is you can point everyone at the right thing within 24 hours.
What I'm a bit surprised / unsure of is what happens when I run "dig ns gov.uk". The results are:
gov.uk. 21559 IN NS ns1.surfnet.nl.
gov.uk. 21559 IN NS auth50.ns.de.uu.net.
gov.uk. 21559 IN NS ns3.ja.net.
gov.uk. 21559 IN NS ns2.ja.net.
gov.uk. 21559 IN NS ns0.ja.net.
gov.uk. 21559 IN NS auth00.ns.de.uu.net.
gov.uk. 21559 IN NS ns4.ja.net.
Who is ja.net , uu.net and surfnet.nl ..?EDIT: I see that ja.net i.e. jisc.ac.uk "manages the second level domain .gov.uk" -- https://www.jisc.ac.uk/domain-registry . I imagine that uu.net and surfnet.nl are there for redundancy
Problems starts when you want to easy make frequent changes and introduce complex software to manage DNS zones (and complexity usually comes with bugs).
The whole reason it takes a domain 24h to fully work with DNS is because it propagates the information other DNS servers, thus making not be a centralized service.
Relatively short TTLs are ubiquitous these days though.
https://namebase.io is a "registrar" for it.
https://learn.namebase.io/starting-from-zero/how-to-get-a-na...
This is so convoluted it actually makes the whole thing a non-starter
What’s the single point of failure?
I wonder how much they spend on multi-AZ redundant architectures...
Whatever multi-home means, why can't there just be one service provider that does that? And are we sure that these service providers aren't already doing that as best we might hope for? (For instance, Amazon already has multiple zones, etc.)
I suppose the one thing this can't protect against is some sort of political (broadly defined) threat related to the company itself.
Many of these outages are due to pushing broken artifacts or configuration to production.
A single provider can pretty easily offer geographic or network topological redundancy, but administrative and/or technological independence is pretty hard to achieve in a single company.
Using multiple providers for fancy DNS, like only providing IPs that pass healthchecks or geotargetting users to datacenters gets pretty hard, because the different providers have similar capabilities, but no uniform interface, so you've either got to do it manually, or you have to build out your own abstraction that is probably limiting.
If possible, insourcing DNS makes the most sense to me, because if you can't keep your service online, it's not the worst if your DNS is offline; and if you can keep your service online, you probably won't mess up your DNS too badly.
Specifically, 1.1.1.1 provided bad addresses (as opposed to no addresses), and removing 1.1.1.1 fixed my problem. By then it had returned a bunch of bad addresses and I had to flush my DNS cache.
Server: 1.1.1.1 Address: 1.1.1.1#53
Non-authoritative answer: Name: newegg.com Address: 23.35.185.6
vs
Server: 8.8.8.8 Address: 8.8.8.8#53
Non-authoritative answer: Name: newegg.com Address: 104.80.92.252
104.80.92.252 is newegg.com
23.35.185.6 is a server that provides an error message.
So 1.1.1.1 lied. The proper response would be to reply "I don't recognize that domain". Instead it said, "yeah, I know that, its here..."
Newegg was not down, and when I got macos to forget what it had cached from 1.1.1.1 I was able to use newegg.com fine.
(a component of my consulting work is reporting to financial regulators for institutions)
As CTO of a bank, I wasn’t aware of this. So either we wasted a ton of money and time constantly upgrading redundancy and business continuity technologies to satisfy our regulators… or this statement could be mistaken.
So if everyone searched "is google down" and visited the link on downdetector that was returned in the search, that would add to the downdetector count for that site.
Downdetector doesn't actually know if the site is up or down.
Downdetector only reports an issue if a significant number of users are impacted. To that end, Downdetector calculates a baseline volume of typical problem reports for each service monitored, based on the average number of reports for that given time of day over the last year. Downdetector’s incident detection system compares the current number of problem reports to this baseline and only reports an issue if the current volume significantly exceeds the typical volume of reports.
https://www.speedtest.net/insights/blog/how-downdetector-wor...
e.g. dig @1.1.1.1 www.nvidia.com +trace
... various things from the root ...
www.nvidia.com. 7200 IN CNAME www.nvidia.com.edgekey.net. ;; Received 83 bytes from 208.94.148.13#53(ns5.dnsmadeeasy.com) in 35 ms
So the main DNS is fine, but it'll never get an A record because the last link in the chain is toast -- edgekey being Akamai in this case, but all CDNs do this so they can route traffic. Normally, this is a good thing so they can shift traffic within 30 seconds on their side. Unfortunately, it also means it would take nvidia an two hours to point away from Akamai.
So for example:
Top level domain for nvidia resolved fine..
dig @1.1.1.1 nvidia.com => status: NOERROR, Nameservers are ns6.dnsmadeeasy.com
But the website didnt. dig @1.1.1.1 www.nvidia.com => status: SERVFAIL,
The Nameserver for the this www.nvidia resolved to the akamai nameserver which had a problem..
dig @1.1.1.1 www.nvidia.com NS => CNAME e33907.a.akamaiedge.net.
Services like Akamai use short TTLs for their edge services for a variety of reasons, not least because if one of their edge servers goes offline (for planned or unplanned reasons) it lets them sub in a new one and have it receive traffic immediately, rather than have a bunch of clients continue trying to talk to a dead node. So sure, you can increase those TTLs to trade 'what if the DNS server goes down?' risk with 'what if the edge server goes down?' risk...
But keeping the edge servers up and running is probably a lot harder - they need to scale more to handle traffic load, they have to actually handle client data, TLS termination, much more complex configuration.... so if I'm placing bets on which of those things is more likely to die on me, it's the edge node, not the DNS server.
Clearly a big one.
Multiple websites including DraftKings, Airbnb, FedEx, Delta and others appear to be experiencing issues.
https://www.bloomberg.com/news/articles/2021-07-22/multiple-...
That makes me think that whatever the fix was, it had to wait for some one-hour cache to expire before it took effect. I'm very interested to find out what the cache issue was, more so than what the original bug was.
This time i think /r/sysadmin pegged the issue first, great sub.
It’s just a completely random DNS outage, nothing more.
archive.org seems to indicate there was never anything there...
EDIT: So HN can't even take a joke after this? [0]
Either way, the joke's is now on the HNers in that thread.
How am I going to sell my AMC stock...