I'll be happy to answer any questions you have regarding our service either here or through our support channels.
I feel for the operational folks at Zerigo - dealing with this type of outage is hard. The best thing they can do at this point is get back on Twitter and talk to their customers - the last post was 4 hours ago - that's a lifetime when your system is critical.
Great product, great support, great team, great price.
Recommended.
* How much DNS traffic can it handle for one customer? Usually DNS-services that charge much more than DNSimple have this capped.
* 99% availability (3.65d/y) sounds a bit too much for a production server DNS SLA, maybe after introducing a secondary DNS a much higher availability could be offered. Any timeline for adding a secondary DNS?
* We don't have a timeline, but we have been working on it. We'd like to roll out support for secondary NOTIFY and AXFR, but perhaps we can find a short term solution without that.
However, what I would like to know is - have you guys implemented any procedures to mitigate any negative effects a DDoS may have on your services? (Assuming your service gets DDoS'd like Zerigo) The last thing I want is more down time and to switch to another provider once again.
Even more impressive is we have directed a couple very non-technical customers to them and they all have been able to get up and running in no time.
That said, I don't know anything about Zerigo and I have no opinion about them.
We'll post a more extensive post-mortem once we have a better understanding of what happened, but our main goal at the moment has been to ensure systems remain stable and responsive.
I can't claim that SlickDNS is invulnerable to DDOS attack, but FWIW it does run tinydns name servers which have good performance and excellent security. So if you're impacted by the Zerigo outage, feel free to check out SlickDNS. There's a 30-day free trial with all plans and record updates are pushed through to all the name servers in under 5 seconds.
The REST API is in final testing, and should be released later this week. It will ship with libraries for Python, Ruby and PHP.
I ended up hacking something together to firewall any IPs which sent more than 1000 requests in a short period of time.
I can give you one tip to get you started: if you're running named, you can enable logging of every query, something like (hope this formats ok) :
logging {
channel query_logging {
file "/var/log/named/querylog"
versions 3 size 100M;
print-time yes; // timestamp log entries
};
category queries {
query_logging;
};
};I'd say the main reason to use a DNS hosting service is to consolidate your DNS management for all of your domains regardless of registrars or server hosting providers. E.g., I personally have domains registered with 5 registrars and use two server providers. And because they specialize in DNS, DNS hosting providers should have superior interfaces, APIs and support for DNS hosting compared to generalist hosting providers.
The SlickDNS interface has two features in particular that I haven't seen in any other DNS hosting service: automatic management of "alias domains" and mapping IP addresses to named servers. See https://www.slickdns.com/features/ for details.
Looks like they are close to getting NOTIFY and IXFR (incremental AXFR) working. It's an interesting approach none-the-less.
What am I missing here?
I have a 100 Mb/s internet connection. Scale that up to 10000, and you have saturated even the fastest of internet connections.
Mitigating a DDoS is not easy. Heck, its damn near impossible, considering the fact that DNS DDoS attacks are done via UDP, which allow you to spoof the source IP address. Even if you do block the IP address of al the attackers, your upstream provider is still impacted by the packets trying to come into your server. Most upstream ISPs will blackhole your server IP to diminish the impact on their network.
64.27.57.25 manage.zerigo.com
64.27.57.8 dns.zerigo.com
Source: https://twitter.com/coldclimate/status/227369346891132928
The only problem seems to be keeping them in sync. Seems like you'd have to poll the primary (using whatever API it exposes) to update the secondary.
Mostly thinking out loud, surely someone more experienced could provide better guidance?
There is another challenge in that we're pushing the envelope a bit by offering features that rely on more than just a DNS record (for example ALIAS and POOL records). These are useful features for some people, but if you're using these types of features then they won't be portable to secondary providers.
Shit happens, but 99.9% (8 hours a year of downtime) is completely unacceptable for a DNS provider.
Which really sucks, DDoS are really hard to combat and Zerigo are an awesome company.