Everything is back up. We're waiting for Cloudflare's RCA and will follow up with additional Render context right after.
------------
(Render CEO) While Cloudflare investigates the issue on their end, we're also working on ways to bypass Cloudflare.
Really sorry about this, folks. We'll keep https://status.render.com updated and will post an RCA once things calm down.
Cloudflare have declared an incident at https://www.cloudflarestatus.com/incidents/2xffnv666yd7.
In case you're wondering, we use Cloudflare to keep Render's network up during DDoS attacks. Both Render and our customers are often targeted. We've already started building a product that lets customers bypass Cloudflare altogether, and I expect we'll see more demand for it after today's incident.
We really like Render but are running into issues with Cloudflare blocking requests that are incorrectly flagged as malicious (our service passes code blocks over HTTP, similar to Replit).
Not to mention our site was down for way too long this evening…
We’d consider staying if we can bypass Cloudflare altogether. Render has been stable otherwise.
Really happy to hear this. Thank you.
Who spends resources (money?) on running those? What is the incentive?
"Memcrashed - Major amplification attacks from UDP port 11211" - https://blog.cloudflare.com/memcrashed-major-amplification-a...
The people harmed by them are too small to fix it, and the people big enough make more money selling DDoS mitigation.
From what I understand you can avoid many DDoS just by going IPv6 only, because DDoS mainly depends on unpatched shitmachines from the old days.
A raspberry pi can generate enough traffic to overload an otherwise unprotected service. It doesn't cost much, if anything to launch a brute force attack.
There's been posts on here about malicious browser extensions, infected IOT devices, malware in mobile apps that give someone the means to launch an utterly brutal attack. Imagine if I had a service that could handle 10k rps. Now imagine 600k android devices from all across the world send one request per second each [0].
[0] https://www.trendmicro.com/vinfo/pl/security/news/mobile-saf...
Why they do it... well:
Competition suppression
Vindictive nastiness
Fun
Just because you can (the world is your sandbox)
Other reasons that might not occur to you but are very real for the attacker...
We know developers don't actually care who's at fault and will move off of Render if we're down, period. Even before the incident, we'd started working on a project to eliminate the SPOF with Cloudflare, and now it's only a matter of time before we ship it.
The stance that I take is; its a fine line between Oversharing and Passing Blame in outages like this, and while I'm happy that a line like that when shared by Render means it was just oversharing (I love your product!), its easy to see how a line like that when shared by a less admirable company could be seen as "Nah man, its not on us, we didn't do anything wrong." A critical difference being; if Cloudflare was the cause, how are we working toward avoiding this cause in the future; which leads nicely to where pointing at Cloudflare (or any upstream provider) generally feels more agreeable; the retro.
To be clear; I have no intention of leaving Render, even if y'all weren't planning to alleviate this SPOF. I fully grok the difficult engineering required to nuke SPOFs like Cloudflare or AWS; and a bit of downtime here and there is a price I'm fine with paying.
I would still blame them for not having back up generators though.
However a failure to plan for emergencies is different from other kinds of failure.
The bigger issue you're alluding to is that of supply-chain reliability in SAAS products: when AWS goes down, multiple other (seemingly unrelated) services go down. But saying its the downstream service's fault is pointless, because if you were to do it yourself you'd be using the same upstream provider, and be dealing with their outage yourself.
In that example, Slack as a bigger of AWS would have a much bigger say, and a more direct line to AWS engineers, than you would.
Its turtles all the way down, and in the midst of an outage I totally empathize with the off-the-cuff thinking that oversharing is better than undersharing, but after the fog of war clears you can even retro language like that and come to a different conclusion. What value do my customers, even if they're highly technical, gain by knowing its Render's fault that MyCoolService was down? Are they going to go open support tickets with Render? I'd bet Render very reasonably wouldn't appreciate that, and they're not going to have a better trunk to their support than I do.
It would be the same like requiring them to use Postgress instead of Ms Sql as a backend.
Maybe time to consider multiple CDN providers as an abstraction like you consider AWS/GCP as an abstraction.
Yes. While Cloudflare is generally rock solid, we can't let this happen again.
As a render customer, its affecting us too. Hope Cloudflare fixes this asap.
best of luck render & cloudflare teams!
EDIT: it's back! yay