Fastly Outage (opens in new tab)

(fastly.com)

1255 pointspcr05y ago694 comments

694 comments

278 comments · 143 top-level

lpmitchell5y ago· 55 in thread

This seems to be impacting a number of huge sites, including the UK government website[0].

[0] https://www.gov.uk/

https://m.media-amazon.com/

https://pages.github.com/

https://www.paypal.com/

https://stackoverflow.com/

https://nytimes.com/

Edit:

Fastly's incident report status page: https://status.fastly.com/incidents/vpk0ssybt3bj

caseymarquis5y ago

Fastly Engineer 1: Seems like a common error message. Can you check stackoverflow to see if there's an easy fix?

Fastly Engineer 2: I have some very bad news...

5 more replies

lucasverra5y ago

But https://news.ycombinator.com/ is UP! :) Prepare those HN servers for massive influx in 3...2..1..

5 more replies

maest5y ago

Amusingly, the Stackoverflow 503 page has a typo:

  Error 503 Service Unavailable
  Service Unavailable
  
  Guru *Mediation*:
  Details: cache-lon4236-LON 1623146049 854282175
  
  Varnish cache server

9 more replies

c-fe5y ago

also https://www.reddit.com (at least in Netherlands)

edit: 12:05 up again for me, no images or custom fonts loading though ... and down again 1 minute later

edit: 13:01 reliably up again for me

3 more replies

kevincox5y ago

> potential impact to performance

So it is a "performance" issue when all pages give a 503.

1 more reply

threeseed5y ago

I wonder why Amazon is not using Cloudfront for their own website.

3 more replies

antihero5y ago

I wonder why amazon.co.uk uses Fastly and not CloudFront?

3 more replies

lkbm5y ago

Good thing we use Cloudfront and Cloudflare where I work.

> Statuspage Automation updated third-party component Spreedly Core from Operational to Major Outage.

> Statuspage Automation updated third-party component Filestack API from Operational to Degraded Performance.

Oh, right. :-D

Don't get me wrong, I love the proliferation of APIs and easily-integrated services over the past 20 years. We're all one interdependent family, for better and for worse.

huijzer5y ago

CSS/Javascript at https://github.com/ appears to be down as well making GitHub quite unusable.

1 more reply

weird-eye-issue5y ago

Yikes seeing just a "connection failure" on Paypal is something else.

edit: PayPal looks be back up at least in US East but when I turn off my VPN and access from Asia I get "Fastly error: unknown domain: www.paypal.com."

Now I'm seeing a 503

1ncorrect5y ago

> Monitoring The issue has been identified and a fix has been applied. Customers may experience increased origin load as global services return. Posted 4 minutes ago. Jun 08, 2021 - 10:57 UTC

Looks to be working again my end.

thrdbndndn5y ago

Interestingly, Twitter only has its emoji SVGs down.

1 more reply

doublerabbit5y ago

https://deb.debian.org is down too which borked my installation.

benrbray5y ago

https://hackage.haskell.org/ down as well

1 more reply

jakub_g5y ago

https://www.bbc.com/news/technology-57399628

"A number of leading media websites are currently not working, including the Guardian, Financial Times, Independent and the New York Times."

2 more replies

toaway5y ago

also https://www.theguardian.com/

aliasEli5y ago

What's far worse than half of the internet being down was that Hacker News also had problems. If I waited long enough on a comments page I got an error message. I don't quite understand what happened there. The communication between my system and HN must have been working otherwise I would never have gotten an error message, so it must have been some internal HN problem. But since HN should only need its own internal "database" to generate comment pages, I don't understand why it should be impacted by the Fastly problems.

jzer0cool5y ago

I could not tell from the fastly status page. What caused the fault? Could anyone point to any past stories which may be of similar nature other than DDos?

1 more reply

zhan_eg5y ago

Bitwarden is also down (the Web Vault, not the website).

2 more replies

stevenwliao5y ago

Seems to affect Target ( https://www.target.com/ ) and Reddit ( https://www.reddit.com/ ) as well.

tandav5y ago

https://docs.python.org/ https://www.last.fm/

abhiminator5y ago

PayPal seems to be working for me at the moment. Rest of the sites are 503s.

rvz5y ago

Centralising everything™ and the whole internet goes down because of that.

2 more replies

Simran-B5y ago

Pantheon was also affected: https://status.pantheon.io/incidents/n9sngt0q0mct

jb19915y ago

For what its worth, I'm having these problems also with cnn.com, reddit and many others, however when I switch away from WiFi to use my cell provider network, they work fine.

1 more reply

tus895y ago

Paypal back, off fastly

1 more reply

dan-robertson5y ago

Is their anything these big sites could do in this situation, or must they choose between running and maintaining all of their own infra or relying on a single CDN?

2 more replies

zarker5y ago

Spotify is behaving strangely as well https://www.spotify.com/

mondaygreens5y ago

Quora and reddit too

black_puppydog5y ago

All of these work from here in Grenoble, France...

7 more replies

EmptyStatement5y ago

https://repo.maven.apache.org/ is down too

amiga-workbench5y ago

https://www.theverge.com/ seems to be down too

conradfr5y ago

https://getbootstrap.com/ as well

cellover5y ago

Is the fact of looking at those links is like looking at a road accident with insistence instead of just passing by?

zthxxx5y ago

https://docs.npmjs.com down as well

alvis5y ago

https://www.reddit.com as well

xmdx5y ago

Terraform having issues and rubygems down too

2 more replies

max23_5y ago

also https://www.speedtest.net/

sammygreen5y ago

Seems to be every site that runs varnish...

2 more replies

dreamer75y ago

Firebase hosting has been affected as well

playpause5y ago

also https://www.ft.com/

rottc0dd5y ago

SSO and github are back online now

maelito5y ago

nature.com

linuxfan20215y ago

You would think that the UK GOVERNMENT would have their own private CDN or something...

1 more reply

keithnz5y ago

twitch also, lots of other minor ish websites

brador5y ago

Searchable offline backup of stack anyone?

toaway5y ago

www.gov.uk & bbc are back

dpacmittal5y ago

elastic.co down as well

treeshateorcs5y ago

developer.spotify.com

magicturtle5y ago

reddit down aswell

1 more reply

ramshanker5y ago

twitch.tv Too.

Hani13375y ago

etsy.com too

madeofpalk5y ago

> [0] https://www.gov.uk/

Just checked, thank god the NHS vaccine site is still available - vaccines just got rolled out for under 30s today.

1 more reply

PaywallBuster5y ago

and Imgix

collyw5y ago

Click the new tab. Lots of posts about sites being down. All flagged.

2 more replies

austinjp5y ago· 12 in thread

Yeah so it's been mentioned in the comments already, but to everyone in Fastly right now: I feel for you. Something like this must be insanely stressful, and not just during the outage. There will be (should be) a massive post-mortem. People will be losing sleep over this for days, weeks, months.

Edit: There seems to be a major empathy outage in this thread. Disgusted but not surprised, unfortunately.

strictfp5y ago

Meh. Losing sleep sounds like an over-reaction. No system is foolproof. Of course Fastly should do what they can to prevent downtime, but it's still expected that they will go down.

I would blame anyone who claimed otherwise or couldn't deal with it while not having a fallback.

2 more replies

H8crilA5y ago

Imagine losing sleep over a corporate problem where you're just the next Joe Engineer, to be fired the second you're not needed. Have some perspective people.

1 more reply

throwaway77475y ago

I have worked for one of their competitors (I'm not saying which) for quite a while. I've indirectly caused multiple outages that were maybe 1% this bad before, that didn't make the news only due to luck. Code that I owned (but did not write) was once a key cause of a severe outage that did make the news, and it would have been worse if I weren't coincidentally halfway through replacing that code with something more modern. I also had to do some very rapid work on internal failsafes around the time of the infamous Mirai botnet, to minimize service degradation in case it was pointed at us.

It sucks. Working on CDN reliability is like working on wastewater management: the public forgets you exist until something breaks, when they start asking why you weren't doing your job. Fortunately, internal people at least seem to get it -- I hope this is the same as Fastly.

mnordhoff5y ago

They shouldn't lose sleep over it, though.

1 more reply

yvan5y ago

Well, not much, I mean all our competitors are also using Fastly. I would be more worried if we were the only one using Fastly and everybody else was fine. But as we are all in the same boat, we lose the same :-)

1 more reply

dm3195y ago

Empathy is hard to find around here, maybe someone needs to study it. Is it a feature of people in tech? Don't remember much being on slashdot either.

1 more reply

willejs5y ago

#HugOps

southerntofu5y ago

I feel for the Fastly workers, who managers are probably currently harassing to get things back online. I certainly don't feel any sympathy for Fastly administrators/managers who make business out of exploiting other people.

mothsonasloth5y ago

Call me old fashioned but the latest trend of showing "empathy" for a serious incident, then proceeding to dance around the aftermath of it, whilst people give themselves a pat on back in a retro/post-mortem, isn't the way to do it.

People need to be blamed, and responsibility for actions taken (without covering asses)

13 more replies

jtdev5y ago

Our fathers and mothers put man on the moon… we build shitty software that helps the technocrats sell more junk to the masses.

colesantiago5y ago

Well, while engineers are getting paid $100K/yr to post #HugOps, I know someone in HFT and their dashboard uses the Fastly service, so this has had a huge impact on them for sure.

Flag and downvote all you want, you know this is true.

2 more replies

colesantiago5y ago

Not my problem. Fastly should work as intended.

The fault is theirs and they have said that they have failover, this worldwide outage caused by them just goes to show you that Fastly does not actually have a failover system in place.

> "Fastly’s network has built-in redundancies and automatic failover routing to ensure optimal performance and uptime." - status.fastly.com

Even their status page was down. Very embarrassing, Fastly did not work as advertised and mislead its customers.

Edit: Offended flaggers circling around silencing misled Fastly customers. How pathetic.

2 more replies

mrzool5y ago· 5 in thread

Why is this a link to the Fastly homepage, where absolutely no information is provided?

This is the page that should be linked:

https://status.fastly.com

jmvoodoo5y ago

Oddly their homepage rendering an error was a more accurate description of the problem than "investigating potential impact to performance with our CDN"

2 more replies

Haydos585x25y ago

This is the link you want I think https://status.fastly.com/incidents/vpk0ssybt3bj

scolvin5y ago

Because even their homepage is down intermittently/for some people.

Silhouette5y ago

To save everyone else hitting the site as well:

As of 10:44UTC, this status page has just updated to say the issue has been identified and a fix is being implemented.

lucasverra5y ago

it is starting to show several Degraded Performance tags

barosl5y ago· 5 in thread

I didn't know so many sites were depending on Fastly. Stack Overflow, GitHub, reddit, .... Even pip is unavailable. My development workflow is completely janked up. It is a bit scary that we are putting too many eggs in one basket.

liveoneggs5y ago

fastly gives free service to things like pip. It's actually very nice.

1 more reply

kypro5y ago

You would think sites like Github and key government sites would at least have a fall back at the ready. It reasonable to use a CDN like Fastly, but having a single point of failure seems silly if you're the BBC or Gov UK. Although, it does seem BBC managed to get back up and running pretty quick so perhaps they were prepared for this.

1 more reply

samhh5y ago

Hackage (Haskell) is down as well: http://hackage.haskell.org

1 more reply

Zyansheep5y ago

Must... decentralize... internet...

joshenders5y ago

Blame site operators that are single homing and not loadbalancing CDNs

1 more reply

creamyhorror5y ago· 5 in thread

basically the internet is down

reddit, stackoverflow, github, paypal, pypi, twitter, twitch, NYT, CNN, BBC, the Guardian...

edit: wow, even Amazon.com relies on Fastly for some of its edge caches!

iso16315y ago

https://www.washingtonpost.com/technology/2020/04/06/your-in...

“This basic architecture is 50 years old, and everyone is online,” Cerf noted in a video interview over Google Hangouts, with a mix of triumph and wonder in his voice. “And the thing is not collapsing.”

The Internet, born as a Pentagon project during the chillier years of the Cold War, has taken such a central role in 21st Century civilian society, culture and business that few pause any longer to appreciate its wonders — except perhaps, as in the past few weeks, when it becomes even more central to our lives.

aunetx5y ago

Opened my browser, ad my three major Web pages : github, gitlab.gnome.org and old.reddit.com... They all are down.

1 more reply

busymom05y ago

> stackoverflow

How will they troubleshoot the error messages now?

1 more reply

secondcoming5y ago

BBC is still up at least in the UK

4 more replies

3np5y ago

debian's main apt repo mirror affected as well

csmattryder5y ago· 4 in thread

Here's the status page incident for this.

https://status.fastly.com/incidents/vpk0ssybt3bj

algo_cheese5y ago

> We're currently investigating potential impact to performance with our CDN services.

Guys, you are offline with a 503 error, this is a little more than "potential impact to performance".

3 more replies

parksy5y ago

No issues reported for Perth Australia. Strange because reddit, zip pay, fastly itself, and probably a bunch of other sites are down.

Doesn't seem the status page is automatically updated or perhaps whatever event or polling is used is also broken.

1 more reply

JrProgrammer5y ago

> This incident affects: North America (Ashburn (BWI), Ashburn (DCA)).

How come we are affected by this in the Netherlands?

3 more replies

nebulous15y ago

Currently only listing a small issue in NA

iso16315y ago· 3 in thread

https://easydns.com/blog/2020/07/20/turns-out-half-the-inter...

The whole idea of the internet was a distributed network impervious to most attacks.

The reality is that a single failure can knock out 90% of the services people use.

fagnerbrack5y ago

The internet still works, only the websites are returning the wrong response

4 more replies

emptyparadise5y ago

There are ten websites left on the internet and they're all hosted by four or so megacorps. Isn't it great?

2 more replies

nabla95y ago

The Web (World Wide Web) build atop of the Internet, is not impervious.

ps. "The Internet was build to survive attacks" is not true. It's a myth made popular by Robert Cringely in the early 1990s. The Arpanet was simply a protocol for mainframes used by computer scientists to connect. The Internet is relatively resilient against attacks, but that was not the "whole idea". It was not in the design at all.

Bob Taylor: “In February of 1966 I initiated the ARPAnet project. I was Director of ARPA‘s Information Processing Techniques Office (IPTO) from late ‚65 to late ‚69. There were only two people involved in the decision to launch the ARPAnet: my boss, the Director of ARPA Charles Herzfeld, and me. The creation of the ARPAnet was not motivated by considerations of war. The ARPAnet was created to enable folks with common interests to connect with one another through interactive computing even when widely separated by geography”.

Vint Cerf says the same about invention if TCP/IP transport protocol.

1 more reply

ClearAndPresent5y ago· 3 in thread

What conclusions can we draw about concentrating web content in a few CDNs?

threeseed5y ago

In HTML/CSS you should be able to specify a fallback source if the first returns a non-200.

Or that companies need to have better DNS strategies.

4 more replies

npteljes5y ago

That sometimes they fail but the world goes on.

itsbits5y ago

we had that experience when cloudfare was down for sometime lastyear. We now setup a minor own static server as a backup, if at all this happens again. Althgh we hadn't so far had to use it.

threeseed5y ago· 3 in thread

Shopify's CDN is down.

Which is causing $15+ million in lost product sales for every hour of outage.

Not to mention the loss of any new customers.

dspillett5y ago

StackOverflow and all the StackExchange family of sites are down. I suspect the lost productivity from that will be more costly over the whole economy than potential lost sales via Shopify. People can go back to shopify so those transactions not definitely blocked for ever, any time "lost" due to reference resources being unavailable can't so easily be claimed back.

3 more replies

john373865y ago

Here is lesson to learn for shopify talented staff. Don't put all your eggs in the same nest. I'm sure they can build something better than that. Hopefully, they will learn from this outage.

grumple5y ago

Does Shopify do that much when the US is asleep?

csomar5y ago· 3 in thread

So I'm wondering where in the "hundreds of servers around the world" did they exactly go wrong.

This happened with Cloudflare before too. I think we are a little too dependent on these services.

0xbkt5y ago

It is a meaningless premise when you actually have SPoFs baked deep inside the system.

1 more reply

fagnerbrack5y ago

In Software Engineering we call it "coupling"

jfny5y ago

Yeah seriously. Time to rebuilt the architecture from the ground up.

lysp5y ago· 3 in thread

This incident affects: Europe (Amsterdam (AMS), Dublin (DUB), Frankfurt (FRA), Frankfurt (HHN), London (LCY)), North America (Ashburn (BWI), Ashburn (DCA), Ashburn (IAD), Ashburn (WDC), Atlanta (FTY), Atlanta (PDK), Boston (BOS), Chicago (ORD), Dallas (DAL), Los Angeles (LAX)), and Asia/Pacific (Hong Kong (HKG), Tokyo (HND), Tokyo (TYO), Singapore (QPG)).

tendencydriven5y ago

Their status page is now saying every location has degraded performance.

kiwijamo5y ago

Affecting Auckland (AKL) which is not on the list so I can only assume it's affecting more locations than they're letting on.

Banana6995y ago

+= North Africa (Egypt, Cairo)

Stackoverflow.com, reddit, qoura down. (and probably more, those are the ones I tested)

dkarp5y ago· 3 in thread

Before the "Error 503 Service Unavailable" messages appeared, there were a few minutes where the error was a single line:

    connection failure

Not sure if that provides anyone here with more insight into what might have caused this!

stordoff5y ago

I got that, then a 'Fastly unknown domain' error (on Reddit), then the 503s on multiple sites (I also had an API I use return a 502 then a 500 error, but I don't know what the full response was as it was just a quickly thrown together script I was using).

Edit: and now "I/O error" on Reddit.

q3k5y ago

I also saw a glimpse of 'I/O error'. That sounds fun.

SileNce5k5y ago

It was `connection failure` for me.

optiomal_isgood5y ago· 2 in thread

Amazon.com was completely broken here (Europe) and they're back, I was observing from where the assets were loaded from and they switched from EU to NA as a failover. Homework well done.

00deadbeef5y ago

I was surprised to learn Amazon don't use their own CDN

2 more replies

abluecloud5y ago

Still getting broken assets from the UK.

1 more reply

omk5y ago· 2 in thread

This outage made me realize that github is served over a single IP address (A record) for my point of origin (India). Stackoverflow has 4 A record listing, but all of these belong to fastly.

The internet is designed for redundancy. Wonder why these companies don't have a fail over network. Makes me wonder if cost is factor considering their already massive infra. But a single point of failure ... <confused>.

raphaelj5y ago

> The internet is designed for redundancy. Wonder why these companies don't have a fail over network. Makes me wonder if cost is factor considering their already massive infra. But a single point of failure ..

Well, Internet was indeed designed for redundancy, and it worked as intended. A no point in time it failed to make you reach the server it was supposed to make you talk to.

What are failing are all the application protocols that are running on top of the network.

kayfox5y ago

Github's DNS likely will serve up a different IP for github when there is an outage. I can't talk about the details but GitHub and the rest of Microsoft use a global load balancing system that works through DNS.

1 more reply

Haydos585x25y ago· 2 in thread

Such a huge number of sites. It seems like it's mostly US based sites and Australians are okay. Sending good vibes to whatever poor person is on support right now.

nineteen9995y ago

I'm in Australia and there are heaps of sites down for me.

2 more replies

paimoe5y ago

In Perth, reddit is down. So is Blackboard files for uni

alexchamberlain5y ago· 2 in thread

Stupid question: why didn't sites "just" fail over to their actual servers to handle the traffic, albeit slowly? I guess they won't be sized to handle the load in a lot of cases, and Fastly was responding, so DNS fail over didn't work?

altacc5y ago

Probably a different answer for each site. I'm not a DNS expert but I think you're right on both counts. Having failover also requires a duplicate CDN architecture at the fallback location, which is an increase of costs in time, money & maintenance for relatively little benefit. Often there's a fair amount of background integration with a CDN, and each function slightly differently, so it's not simply plug & play.

abluecloud5y ago

yeah. the dns was up. the problem was the servers weren't able to proxy the traffic. Also, as you say, you'll probably end up bringing down the upstream servers if you just fail open (and not even sure that'd be a possibility with fastly in it's "down" state that we saw).

pimterry5y ago· 2 in thread

This is one of the things that excites me about IPFS: in a world of decentralized data storage, yes self-hosting and control over your data is nice and all, but serious resilience to most random infrastructure outages is a much bigger deal.

It's still early days, but I'm hopeful that it can provide a real solution to today's CDN centralization.

jokoon5y ago

Agree, but currently, ipfs would serve as a fallback, since it's about files. Decentralized/distributed generally has slower network performance.

Unless most nodes are high performance, I guess?

Personally I think a distributed database system, where entries are being made redundant in something like a blockchain+dht, would be a good start?

Decentralizing the internet works if it financially makes sense for platforms to build such tools.

1 more reply

jfny5y ago

I'm pretty sure you can serve hundreds if not thousands of users from a single Raspberry Pi

1 more reply

unfunco5y ago· 2 in thread

Amazon being down surely points to something other than Fastly being the cause?

austinjp5y ago

I just had a look at amazon.co.uk and most assets fail to load, the browser debug console is full of 503 errors. Picking one at random, it's fastly:

    $ nslookup images-eu.ssl-images-amazon.com

    Server:  127.0.0.53
    Address: 127.0.0.53#53

    Non-authoritative answer:
    images-eu.ssl-images-amazon.com canonical name = m.media-amazon.com.
    m.media-amazon.com canonical name = media.amazon.map.fastly.net.
    Name: media.amazon.map.fastly.net
    Address: 199.232.177.16
    Name: media.amazon.map.fastly.net
    Address: 2a04:4e42:1d::272

mpitt5y ago

Amazon.com uses Fastly https://www.streamingmediablog.com/2020/05/fastly-amazon-hom...

1 more reply

sleepyshift5y ago· 2 in thread

Looks like this has taken out Reddit at least.

spyke1125y ago

Is it also hitting Github? I'm not getting any css when loading Github.

1 more reply

algo_cheese5y ago

And a large part of GitLab

gansai5y ago· 2 in thread

wouldn't websites have alternate CDN's managing their traffic, why should they have a single point of failure ?

I was assuming there are couple of services like Fastly and companies might have architected keeping in mind the alternatives too, I guess.

raimondious5y ago

Normally you configure your a record to point at the cdn as the cdn is the thing that gives you multiple points of failure (caches all over the world). Hard to have a fallback to that. Running multiple cdns would be extremely expensive. Cdn caches are kept useful by traffic running through them, so hard to have a backup for that too.

ImpactStrafe5y ago

Because interacting and switching between cdns can be very complicated and/or costly

It should be planned for, especially by major tech organizations like reddit, or Amazon, etc.

But I won't fault news organizations, who already don't have boatloads of money for not having fail over cdns

evouga5y ago· 2 in thread

Since Fastly’s own website is currently down:

What is fastly? Why are a huge number of web sites dependent on them? They are some kind of web host for companies that don’t want to run their own servers/data centers?

ImpactStrafe5y ago

Fastly is a Content Distribution Network (CDN).

Basically the closer the server serving the webpage is to the end user the faster it is for the end user to see and interact with.

But running servers all over the world 1) isn't efficient 2) costs a lot of money.

So a few companies (fastly, cloud flare, akamai) figured, hey, why don't we build a bunch of small data centers all over the world and then provide a distributed way to serve web traffic from it.

It originally was brought about for services like Netflix, but has expanded greatly.

You still host your servers, but a copy of the webpage/media is given to the CDN to serve to customers.

1 more reply

ceejayoz5y ago

I'm particularly intrigued as to why Amazon.com uses them.

They literally have their own directly competing CDN product. You'd think they'd be dogfooding it.

2 more replies

ysavir5y ago· 2 in thread

Tangential question, but with services like these, is there a known way to handle failure gracefully? Some way to automatically bypass these services if they are known to be down?

efficax5y ago

You have to have two separate cdns and use DNS to fail over. The problem is that means paying for a CDN that just sits dormant for the 99.999% of the time that your primary is down.

Alternatively you could use DNS to fail over to the content you host, instead of another CDN. But in many cases that would be the same as an outage since the CDN exists to reduce the impact of all those requests on your infra

richardwhiuk5y ago

Have two different CDN partners, own your own DNS, and then withdraw one of the CDNs if they are down. Suspect that's what Amazon have done.

john373865y ago· 2 in thread

It should be resolve soon. From fastly status page:

The issue has been identified and a fix is being implemented. Posted 1 minute ago. Jun 08, 2021 - 10:44 UTC

abluecloud5y ago

Wonder if all the caches will have been wiped, causing knock on issues

1 more reply

grumple5y ago

Phew!

That time to find the issue is always the stressful part. < 1 hour is pretty good for weird stuff, and fortunately the east coast of the US is barely online this early (sorry Europe!).

atymic5y ago· 1 in thread

This has got to be even bigger than when cloudflare went offline, in terms of big companies affected. Clearly they have way more F500 customers than CF.

Good luck to the on call engineers!

yxhuvud5y ago

The funny part is that it isn't uncommon for sites to depend on both cloudflare and fastly in one way or another, due to buying services from saas companies that also depend on them.

k_5y ago· 1 in thread

Update: The issue has been identified and a fix is being implemented. Posted Jun 08, 2021 - 10:44 UTC

Seems like this is being resolved; curious to see the details afterwards

(from https://status.fastly.com/incidents/vpk0ssybt3bj)

optiomal_isgood5y ago

Reddit, Stack Overflow, Spotify, all back for me. Good job Fastly engineers!

jujodi5y ago· 1 in thread

Would be fascinating if Fastly is not be able to use GitHub, Travis, Terraform, pip, etc. to deploy their fix

nraval17295y ago

Interesting thought. I had not thought about this before. If there is a cyclic dependency (not saying there is at the moment) how would things play out? Do you just ssh into your own servers to deploy the fix?

aero-glide25y ago· 1 in thread

isitdownrightnow.com is down

roachpepe5y ago

Thanks for the best laughs in a while friend - that's pure irony right there!

Jamie99125y ago· 1 in thread

Yep, seems like:

Reddit BBC News Twitch.tv Twitter emoji cdn?

are all down 503 service error

another-dave5y ago

Ah didn't cop that Twitter emoji issue was related! Thought an ad-blocker was stepping up its filters aggressively :)

Stack Overflow, The Guardian, Gov.uk too as some other biggish names getting hit.

1 more reply

kypro5y ago· 1 in thread

Some people are claiming online that this is a cyber attack. I contract for the UK Gov and I'm hearing reports that traffic is going through the roof right now.

Anyone know if there is any legitimacy to this?

fr2null5y ago

The fastly monitoring/status page says: "Customers may experience increased origin load as global services return". Which sounds like the increased traffic is to be expected.

[1] status.fastly.com

ZoomStop5y ago· 1 in thread

The outage has already been added to the Fastly Wikipedia page

abhiminator5y ago

Holy smokes these Wikipedia writers are quick! I'm sometimes impressed by how fast a page on a super recent happening gets populated with all of the currently known details.

choult5y ago· 1 in thread

My money is on an expired internal certificate or CA.

hugh-avherald5y ago

Fastly has scheduled maintenance to retire some TLS certs next week.

devops0005y ago· 1 in thread

BTC/USD is down too.

creamyhorror5y ago

Perfect time for the crypto whales to dump massively and cause an absolute panic.

permb5y ago

Made my alpine linux docker builds fail as well (varnish) - but shouldn’t it use a mirror when the primary download site is gone?

fetch http://dl-cdn.alpinelinux.org/alpine/v3.12/main/x86_64/APKIN... fetch http://dl-cdn.alpinelinux.org/alpine/v3.12/community/x86_64/... ERROR: http://dl-cdn.alpinelinux.org/alpine/v3.12/main: temporary error (try again later)

oneeyedpigeon5y ago

Good marketing for Fastly! I had no idea so much of the internet relied on it...

sjaak5y ago

Perhaps Fastly is simply taking their commitment to reducing CO2 seriously? Three hurrays for the climate!

snookdebook5y ago

I gave it about 10 tries, and it seems a very small percentage of transactions do go through.

A decent number of tries is rejected right at the Varnish front door:

< HTTP/2 503 < server: Varnish < retry-after: 0 < date: Tue, 08 Jun 2021 10:11:41 GMT < x-varnish: 271470009 < via: 1.1 varnish < fastly-debug-path: (D cache-bma1666-BMA 1623147101) < fastly-debug-ttl: (M cache-bma1666-BMA - - -) < content-length: 450 < Service Unavailable Guru Mediation: Details: cache-bma1666-BMA 1623147101 271470009

Many more reach some backend system that just dumps "connection failure":

< HTTP/2 502 < content-type: text/plain; charset=utf-8 < content-length: 18 < connection failure

And a tiny few do get through:

< HTTP/2 200 < content-type: text/html; charset=UTF-8 < cache-control: max-age=0, must-revalidate < date: Tue, 08 Jun 2021 10:11:43 GMT < via: 1.1 varnish < vary: accept-encoding < set-cookie: ...snip... < server: snooserv < content-length: 275036 < <!doctype html><html>...snip...

DoreenMichele5y ago

I'm having intermittent Reddit issues, as one more data point.

I'm grateful for HN. I rebooted my computer. I thought it was my device and then saw this on my phone while rebooting.

monkeydust5y ago

Just occurring to me how CDNs are a major point of failure now for the internet

cph-w5y ago

I did not realise fastly adoption was so wide-spread. Can anyone more enlightened tell my why or have some resource on which use-cases fastly is superior to other CDNs such as CloudFlare?

simonbarker875y ago

how will their devs fix it if stackoverflow has gone down?!

modshatereality5y ago

This post is suspiciously ranked much lower than it should be (1216 points, 9 hours ago), lower than posts with < 100 points.

optiomal_isgood5y ago

FWIW, Fastly ~8 hours ago (3am UTC) reported another incident: https://status.fastly.com/incidents/1glxxb8sf2zv and deployed a fix—either the fix made it worse or wasn't sufficient to mitigate the problem.

marmot7775y ago

I think the honorable thing would be for them to have a statement easily findable.

So many companies sweep this sort of things under the rug if it’s only customer data that’s been breached. If they can’t sweep they have a high priced PR agency do the communicating.

I do not trust companies who handle things this way.

tommoor5y ago

Hands up if you're also here after being woken up by downtime alerts on the west coast

i3865y ago

Anyone want to talk about half the internet going out because one provider couldn’t keep their service up instead of SO jokes and feels for the engineers? the entire internet is like a stack of cards from the protocol to the economic model.

fagnerbrack5y ago

https://dashboard.stripe.com/ is down https://github.com/ is defaced

fullstackwife5y ago

No mention of outage on https://status.cloud.google.com/, and I wonder why, because apparently this is a GCP problem.

mschuster915y ago

Ah yes, the wonders of centralized internet infrastructure.

Let's use a handful of providers for everything, they said. It will be cheaper, they said. It will be easier to manage, they said.

And it was cheaper, until downtimes began to affect more and more sites when central SPOFs got hit.

And I wonder how much of that need for these centralized SPOFs actually comes from the sheer absurd amount of bloat, ads, code and assets that sites these days "have" to deliver to the customer. I 'member times when pages had 100kb total size, loaded in an instant and were perfectly usable.

sergiomattei5y ago

Yikes, seems like a massive outage.

EDIT: Hexdocs is down, elixir-lang.org is down

angled5y ago

None of the ES/NQ/RTY/YM futures contracts took kindly to the outage! This could have had a much wider financial impact. Most seem to have recovered now.

asicsp5y ago

hypnoscripto5y ago

Looks like fastly.com uses fastly…

mcintyre19945y ago

Do they have an official status page? Googling gets https://docs.fastly.com/en/guides/fastlys-network-status which is 503

Edit: Elsewhere in the comments: https://status.fastly.com/incidents/vpk0ssybt3bj

1 more reply

devops0005y ago

Hacker News is the only one UP!

willvarfar5y ago

https://www.bbc.com/news/technology-57399628 is rendering and reporting on the story, but BBC itself was down at the start of the outage, with the same 503 varnish error message.

Presumably the BBC has some kind of fallback in place.

The journalists ought interview their own techies :)

jchandra5y ago

https://www.greenhouse.io/ down as well.

hestefisk5y ago

The Guardian summarised this as well: https://www.theguardian.com/technology/2021/jun/08/massive-i...

perino5y ago

Anything hosted on Firebase seems to be down

easytiger5y ago

I will NEVER understand why people put so much trust in single provider solutions for anything critical.

vfclists5y ago

What happens when there is excessive centralization.

I thought that one of the principles behind the Internet is to be able to reroute around failures, but neither these service providers nor their clients ever seem to learn.

I guess in their mind that only applies to packet routing not services. SMH

MrGilbert5y ago

Interestingly, https://www.fastly.com/ works for me, whereas https://fastly.com/ doesn't.

1 more reply

Omnious585y ago

I was wondering why my Tidal app just stopped mid song and won't connect, after much googling and absolutely no help or even notifications from Tidal explaining there's an issue it seems this outage is the culprit. Bugger.

diveanon5y ago

Time to develop CDN for CDNs.

It seems like a pattern that CDN have overly centralized the web and lead to issues like this.

Maybe its time to build a CDN that distributes your static assets to multiple CDNs and has a set of fallback states for service outtages.

tfar5y ago

https://flutter.dev/ and https://fastlane.tools/ as well.

Dobbs5y ago

I got a push notification from the CNN app telling me a bunch of the internet was down due to a cloud provider. I clicked the link only for the app to open to a 503. In hindsight not surprising, but quite amusing.

misnome5y ago

pypi.org, but not https://status.python.org/ - I'm impressed that they actually hosted the status page differently!

1 more reply

lopatin5y ago

Their status page keeps claiming that my region, Chicago (ORD), is either Degraded Performance, or Operational. But clearly it's down. Is fuzzing metrics like this how they hit their SLA targets?

abhiminator5y ago

Looks like they're currently applying a fix.

https://status.fastly.com/incidents/vpk0ssybt3bj

montag5y ago

It's funny, I searched Twitter for "Ebay down" and the top result was an Ebay tweet with some not coincidentally broken Twitter emoji SVGs (as another person mentioned)...

theginger5y ago

GitHub? I had some issues, checked the service status page said no issues, but images were returning a 503. Maybe they host their service status page elsewhere including using fastly.

1 more reply

monkeydust5y ago

Pretty bad www.gov.uk is down as more services move to digital.

2 more replies

plasma5y ago

I briefly saw an output error about "domain not found" when hitting fastly.com, wonder if some list of domains has hit a limit/flushed/etc.

1 more reply

fareesh5y ago

How does one design a system that has a redundancy for when the CDN goes down? Paying for more than one CDN is probably too expensive isn't it?

grumple5y ago

Good job Fastly for getting the issue identified and resolved so quickly. < 1 hour to identify, <13 minutes to fix (assuming status is accurate).

an0n4u5y ago

numpy docs, too. i think it's cloudflare related as well. at least, I keep seeing some cloudflare errors interpolated with the 503 varnish error.

2 more replies

MyOnePiece5y ago

Quick question if the cdns are down why cant traffic be routed to the web servers the central web servers the company owns ?

I thought cdns had fallback configured ?

_kyran5y ago

Those of you that work in DevOps, SRE or are CTOs.

What kind of things do you put in place to manage these kind of centralised issues that are beyond your control?

1 more reply

devops0005y ago

Heroku is down https://dashboard.heroku.com/

1 more reply

JCWasmx865y ago

>The issue has been identified and a fix has been applied. Customers may experience increased origin load as global services return.

Is fixed

Nilef5y ago

Ironically, even this Outage page is out for me

ur-whale5y ago

Wow, talk about a brutal SPOF, most of the things I had planned to work with today are broken: reddit, github, stack overflow.

taosx5y ago

I̶n̶ ̶r̶o̶m̶a̶n̶i̶a̶ ̶e̶v̶e̶r̶y̶t̶h̶i̶n̶g̶ ̶s̶e̶e̶m̶s̶ ̶b̶a̶c̶k̶ ̶t̶o̶ ̶n̶o̶r̶m̶a̶l̶.̶.̶.̶?̶

Edit: nope, just worked for 2-3 requests (10 secs)

anotheryou5y ago

Looks fixed: https://downdetector.com/

jl65y ago

Worrying that this is impacting so many dev toolchains and services, which will hinder the ability to respond to the issue.

timvisee5y ago

This seems to be a bigger issue. BGP failure?

1 more reply

_kyran5y ago

Things seem to have come back online in Australia, although not sure if that's just sites switching over their DNS?

LightG5y ago

"The internet will just route around a local / centralised problem ... like water around an object"

Obligatory LOL ...

2 more replies

graphman5y ago

Firebase Dynamic Links is affected too. Checking the IP looks like they are using Fastly which is quite surprising.

taurath5y ago

I’ve noticed lots of social media content is tied to this - Reddit and Twitter images and some videos, for one.

loriverkutya5y ago

The issue has been identified and a fix is being implemented. Posted 3 minutes ago. Jun 08, 2021 - 10:44 UTC

ilaksh5y ago

Let's make all of the main internet sites dependent upon one central private service. Great idea guys.

artembugara5y ago

Seems like another single point of failure. What is a solution to not be affected by such an outage?

toong5y ago

It is time to remove that "100% uptime guarantee" claim from the website :grimacing:

classicflavour5y ago

My work's website is down too and the regular sites I use to escape work borderm

gansai5y ago

Fastly is back now. (The issue has been identified and a fix is being implemented.)

pattyj5y ago

It would be interesting to see estimations on the man-hour cost of this outage.

mothershesha5y ago

Got the same here (Australia)

johnstonnorth5y ago

rubygems.org affected too

vincentmarle5y ago

Well I know where to go next time if I were to be a Russian hacker

clawphantom5y ago

Twitch isn’t working and not responding and also the web dashboard

luke2m5y ago

When this happens to cloudflare, it will be even more impactful.

colesantiago5y ago

Looks like Fastly did not work as advertised, very misleading.

reuben_scratton5y ago

I'm sure it's just a coincidence that today is Patch Tuesday.

:-|

zwirbl5y ago

Spotify is also hit, though it still works without images

ddtaylor5y ago

Someone must have 51% attack the Pied Piper blockchain!

vlan1215y ago

Damn, I thought I cloud blame myself or the provider..

ronyfadel5y ago

Ten Percent Happier is down, and now my day is ruined.

1 more reply

fsnowdin5y ago

just had my own site down because of this. glad to see it wasn't my fault lol but good luck to the Fastly people on fixing the issue.

clawphantom5y ago

Twitch isn’t responding and also the web dashboard

8K832d7tNmiQ5y ago

That explains why I couldn't access reddit

navanchauhan5y ago

No wonder, The Verge and NYT are down too.

rich_sasha5y ago

www.python.org down as well, with the shortest of messages: 'connection failure'. Probably related?

1 more reply

NewLogic5y ago

Even amazon.com styling is borked for me

dilawar5y ago

I think reddit in India is down as well.

JosephK5y ago

Extremely long call, but what are the chances this turns out connected to the raids on organised crime using the An0m app that started today?

john373865y ago

It's probably a DDoS attack.

dragosbulugean5y ago

And all Webflow sites it seems...

alixaxel5y ago

Indeed, part of GitHub (.io) too.

ur-whale5y ago

Looks like HN is working ;-)

jfny5y ago

Do companies really not run test suites / do manual testing before deploying to production?

timetosleep5y ago

Seems to be back online

rvz5y ago

Basically everything is broken. "Centralising Everything" huh

dragosbulugean5y ago

All Webflow sites?

mlnj5y ago

StackOverflow too.

schappim5y ago

Parts of Shopify

ur-whale5y ago

Looks like an SRE team rolled out buggy software.

1 more reply

rottc0dd5y ago

github is back online. SSO too.

raylus5y ago

Whew, DevOps fire alarms are going off!

raylus5y ago

github.com is pretty broken

schappim5y ago

SMH.com.au

heavydust5y ago

the problem has been fixed

heavydust5y ago

reddit.com is affected too

alexannic5y ago

cnn.com is down as well.

cwen5y ago

A real-world Chaos experiment!

cdev5y ago

it seems to be up now

magicturtle5y ago

reddit down aswell

Metacelsus5y ago

I first noticed that xkcd was down. Then I went to post about it on reddit . . . also down! Good thing HN is up.

nindalf5y ago

Taken out xkcd as well.

2 more replies

pts_5y ago

Are these sites on the same cloud or CDN?

1 more reply

colesantiago5y ago

Also, why has this been allowed to happen? Billions of dollars lost because of this one company?

I don't understand this.

ramraj075y ago

For a moment I thought all of Western internet was cut off from India. Says how siloed my browsing habits are!

raphaelj5y ago

Couldn't be happier I moved https://noisycamp.com to BunnyCDN.com.

TheRealDunkirk5y ago

Every other comment about what's down in this thread -- as if we needed dozens of site-by-site accountings of this outage in the first place -- is a bitch about reddit. Why is reddit so important to this crowd? The specific topics I used to read the site for (half a dozen years ago) have all been overrun by "bucket people," there is literally never an answer to any question I find a google link to there, and the site's design is actively user-hostile. Seriously: what's keeping that place afloat? Porn, I suppose.

3 more replies

j / k navigate · click thread line to collapse

694 comments

278 comments · 143 top-level

lpmitchell5y ago· 55 in thread

This seems to be impacting a number of huge sites, including the UK government website[0].

[0] https://www.gov.uk/

https://m.media-amazon.com/

https://pages.github.com/

https://www.paypal.com/

https://stackoverflow.com/

https://nytimes.com/

Edit:

Fastly's incident report status page: https://status.fastly.com/incidents/vpk0ssybt3bj

caseymarquis5y ago

Fastly Engineer 1: Seems like a common error message. Can you check stackoverflow to see if there's an easy fix?

Fastly Engineer 2: I have some very bad news...

5 more replies

lucasverra5y ago

But https://news.ycombinator.com/ is UP! :) Prepare those HN servers for massive influx in 3...2..1..

5 more replies

maest5y ago

Amusingly, the Stackoverflow 503 page has a typo:

  Error 503 Service Unavailable
  Service Unavailable
  
  Guru *Mediation*:
  Details: cache-lon4236-LON 1623146049 854282175
  
  Varnish cache server

9 more replies

c-fe5y ago

also https://www.reddit.com (at least in Netherlands)

edit: 12:05 up again for me, no images or custom fonts loading though ... and down again 1 minute later

edit: 13:01 reliably up again for me

3 more replies

kevincox5y ago

> potential impact to performance

So it is a "performance" issue when all pages give a 503.

1 more reply

threeseed5y ago

I wonder why Amazon is not using Cloudfront for their own website.

3 more replies

antihero5y ago

I wonder why amazon.co.uk uses Fastly and not CloudFront?

3 more replies

lkbm5y ago

Good thing we use Cloudfront and Cloudflare where I work.

> Statuspage Automation updated third-party component Spreedly Core from Operational to Major Outage.

> Statuspage Automation updated third-party component Filestack API from Operational to Degraded Performance.

Oh, right. :-D

Don't get me wrong, I love the proliferation of APIs and easily-integrated services over the past 20 years. We're all one interdependent family, for better and for worse.

huijzer5y ago

CSS/Javascript at https://github.com/ appears to be down as well making GitHub quite unusable.

1 more reply

weird-eye-issue5y ago

Yikes seeing just a "connection failure" on Paypal is something else.

edit: PayPal looks be back up at least in US East but when I turn off my VPN and access from Asia I get "Fastly error: unknown domain: www.paypal.com."

Now I'm seeing a 503

1ncorrect5y ago

> Monitoring The issue has been identified and a fix has been applied. Customers may experience increased origin load as global services return. Posted 4 minutes ago. Jun 08, 2021 - 10:57 UTC

Looks to be working again my end.

thrdbndndn5y ago

Interestingly, Twitter only has its emoji SVGs down.

1 more reply

doublerabbit5y ago

https://deb.debian.org is down too which borked my installation.

benrbray5y ago

https://hackage.haskell.org/ down as well

1 more reply

jakub_g5y ago

https://www.bbc.com/news/technology-57399628

"A number of leading media websites are currently not working, including the Guardian, Financial Times, Independent and the New York Times."

2 more replies

toaway5y ago

also https://www.theguardian.com/

aliasEli5y ago

jzer0cool5y ago

I could not tell from the fastly status page. What caused the fault? Could anyone point to any past stories which may be of similar nature other than DDos?

1 more reply

zhan_eg5y ago

Bitwarden is also down (the Web Vault, not the website).

2 more replies

stevenwliao5y ago

Seems to affect Target ( https://www.target.com/ ) and Reddit ( https://www.reddit.com/ ) as well.

tandav5y ago

https://docs.python.org/ https://www.last.fm/

abhiminator5y ago

PayPal seems to be working for me at the moment. Rest of the sites are 503s.

rvz5y ago

Centralising everything™ and the whole internet goes down because of that.

2 more replies

Simran-B5y ago

Pantheon was also affected: https://status.pantheon.io/incidents/n9sngt0q0mct

jb19915y ago

For what its worth, I'm having these problems also with cnn.com, reddit and many others, however when I switch away from WiFi to use my cell provider network, they work fine.

1 more reply

tus895y ago

Paypal back, off fastly

1 more reply

dan-robertson5y ago

Is their anything these big sites could do in this situation, or must they choose between running and maintaining all of their own infra or relying on a single CDN?

2 more replies

zarker5y ago

Spotify is behaving strangely as well https://www.spotify.com/

mondaygreens5y ago

Quora and reddit too

black_puppydog5y ago

All of these work from here in Grenoble, France...

7 more replies

EmptyStatement5y ago

https://repo.maven.apache.org/ is down too

amiga-workbench5y ago

https://www.theverge.com/ seems to be down too

conradfr5y ago

https://getbootstrap.com/ as well

cellover5y ago

Is the fact of looking at those links is like looking at a road accident with insistence instead of just passing by?

zthxxx5y ago

https://docs.npmjs.com down as well

alvis5y ago

https://www.reddit.com as well

xmdx5y ago

Terraform having issues and rubygems down too

2 more replies

max23_5y ago

also https://www.speedtest.net/

sammygreen5y ago

Seems to be every site that runs varnish...

2 more replies

dreamer75y ago

Firebase hosting has been affected as well

playpause5y ago

also https://www.ft.com/

rottc0dd5y ago

SSO and github are back online now

maelito5y ago

nature.com

linuxfan20215y ago

You would think that the UK GOVERNMENT would have their own private CDN or something...

1 more reply

keithnz5y ago

twitch also, lots of other minor ish websites

brador5y ago

Searchable offline backup of stack anyone?

toaway5y ago

www.gov.uk & bbc are back

dpacmittal5y ago

elastic.co down as well

treeshateorcs5y ago

developer.spotify.com

magicturtle5y ago

reddit down aswell

1 more reply

ramshanker5y ago

twitch.tv Too.

Hani13375y ago

etsy.com too

madeofpalk5y ago

> [0] https://www.gov.uk/

Just checked, thank god the NHS vaccine site is still available - vaccines just got rolled out for under 30s today.

1 more reply

PaywallBuster5y ago

and Imgix

collyw5y ago

Click the new tab. Lots of posts about sites being down. All flagged.

2 more replies

austinjp5y ago· 12 in thread

Edit: There seems to be a major empathy outage in this thread. Disgusted but not surprised, unfortunately.

strictfp5y ago

Meh. Losing sleep sounds like an over-reaction. No system is foolproof. Of course Fastly should do what they can to prevent downtime, but it's still expected that they will go down.

I would blame anyone who claimed otherwise or couldn't deal with it while not having a fallback.

2 more replies

H8crilA5y ago

Imagine losing sleep over a corporate problem where you're just the next Joe Engineer, to be fired the second you're not needed. Have some perspective people.

1 more reply

throwaway77475y ago

mnordhoff5y ago

They shouldn't lose sleep over it, though.

1 more reply

yvan5y ago

1 more reply

dm3195y ago

Empathy is hard to find around here, maybe someone needs to study it. Is it a feature of people in tech? Don't remember much being on slashdot either.

1 more reply

willejs5y ago

#HugOps

southerntofu5y ago

mothsonasloth5y ago

People need to be blamed, and responsibility for actions taken (without covering asses)

13 more replies

jtdev5y ago

Our fathers and mothers put man on the moon… we build shitty software that helps the technocrats sell more junk to the masses.

colesantiago5y ago

Well, while engineers are getting paid $100K/yr to post #HugOps, I know someone in HFT and their dashboard uses the Fastly service, so this has had a huge impact on them for sure.

Flag and downvote all you want, you know this is true.

2 more replies

colesantiago5y ago

Not my problem. Fastly should work as intended.

The fault is theirs and they have said that they have failover, this worldwide outage caused by them just goes to show you that Fastly does not actually have a failover system in place.

> "Fastly’s network has built-in redundancies and automatic failover routing to ensure optimal performance and uptime." - status.fastly.com

Even their status page was down. Very embarrassing, Fastly did not work as advertised and mislead its customers.

Edit: Offended flaggers circling around silencing misled Fastly customers. How pathetic.

2 more replies

mrzool5y ago· 5 in thread

Why is this a link to the Fastly homepage, where absolutely no information is provided?

This is the page that should be linked:

https://status.fastly.com

jmvoodoo5y ago

Oddly their homepage rendering an error was a more accurate description of the problem than "investigating potential impact to performance with our CDN"

2 more replies

Haydos585x25y ago

This is the link you want I think https://status.fastly.com/incidents/vpk0ssybt3bj

scolvin5y ago

Because even their homepage is down intermittently/for some people.

Silhouette5y ago

To save everyone else hitting the site as well:

As of 10:44UTC, this status page has just updated to say the issue has been identified and a fix is being implemented.

lucasverra5y ago

it is starting to show several Degraded Performance tags

barosl5y ago· 5 in thread

liveoneggs5y ago

fastly gives free service to things like pip. It's actually very nice.

1 more reply

kypro5y ago

1 more reply

samhh5y ago

Hackage (Haskell) is down as well: http://hackage.haskell.org

1 more reply

Zyansheep5y ago

Must... decentralize... internet...

joshenders5y ago

Blame site operators that are single homing and not loadbalancing CDNs

1 more reply

creamyhorror5y ago· 5 in thread

basically the internet is down

reddit, stackoverflow, github, paypal, pypi, twitter, twitch, NYT, CNN, BBC, the Guardian...

edit: wow, even Amazon.com relies on Fastly for some of its edge caches!

iso16315y ago

https://www.washingtonpost.com/technology/2020/04/06/your-in...

aunetx5y ago

Opened my browser, ad my three major Web pages : github, gitlab.gnome.org and old.reddit.com... They all are down.

1 more reply

busymom05y ago

> stackoverflow

How will they troubleshoot the error messages now?

1 more reply

secondcoming5y ago

BBC is still up at least in the UK

4 more replies

3np5y ago

debian's main apt repo mirror affected as well

csmattryder5y ago· 4 in thread

Here's the status page incident for this.

https://status.fastly.com/incidents/vpk0ssybt3bj

algo_cheese5y ago

> We're currently investigating potential impact to performance with our CDN services.

Guys, you are offline with a 503 error, this is a little more than "potential impact to performance".

3 more replies

parksy5y ago

No issues reported for Perth Australia. Strange because reddit, zip pay, fastly itself, and probably a bunch of other sites are down.

Doesn't seem the status page is automatically updated or perhaps whatever event or polling is used is also broken.

1 more reply

JrProgrammer5y ago

> This incident affects: North America (Ashburn (BWI), Ashburn (DCA)).

How come we are affected by this in the Netherlands?

3 more replies

nebulous15y ago

Currently only listing a small issue in NA

iso16315y ago· 3 in thread

https://easydns.com/blog/2020/07/20/turns-out-half-the-inter...

The whole idea of the internet was a distributed network impervious to most attacks.

The reality is that a single failure can knock out 90% of the services people use.

fagnerbrack5y ago

The internet still works, only the websites are returning the wrong response

4 more replies

emptyparadise5y ago

There are ten websites left on the internet and they're all hosted by four or so megacorps. Isn't it great?

2 more replies

nabla95y ago

The Web (World Wide Web) build atop of the Internet, is not impervious.

Vint Cerf says the same about invention if TCP/IP transport protocol.

1 more reply

ClearAndPresent5y ago· 3 in thread

What conclusions can we draw about concentrating web content in a few CDNs?

threeseed5y ago

In HTML/CSS you should be able to specify a fallback source if the first returns a non-200.

Or that companies need to have better DNS strategies.

4 more replies

npteljes5y ago

That sometimes they fail but the world goes on.

itsbits5y ago

we had that experience when cloudfare was down for sometime lastyear. We now setup a minor own static server as a backup, if at all this happens again. Althgh we hadn't so far had to use it.

threeseed5y ago· 3 in thread

Shopify's CDN is down.

Which is causing $15+ million in lost product sales for every hour of outage.

Not to mention the loss of any new customers.

dspillett5y ago

3 more replies

john373865y ago

Here is lesson to learn for shopify talented staff. Don't put all your eggs in the same nest. I'm sure they can build something better than that. Hopefully, they will learn from this outage.

grumple5y ago

Does Shopify do that much when the US is asleep?

csomar5y ago· 3 in thread

So I'm wondering where in the "hundreds of servers around the world" did they exactly go wrong.

This happened with Cloudflare before too. I think we are a little too dependent on these services.

0xbkt5y ago

It is a meaningless premise when you actually have SPoFs baked deep inside the system.

1 more reply

fagnerbrack5y ago

In Software Engineering we call it "coupling"

jfny5y ago

Yeah seriously. Time to rebuilt the architecture from the ground up.

lysp5y ago· 3 in thread

tendencydriven5y ago

Their status page is now saying every location has degraded performance.

kiwijamo5y ago

Affecting Auckland (AKL) which is not on the list so I can only assume it's affecting more locations than they're letting on.

Banana6995y ago

+= North Africa (Egypt, Cairo)

Stackoverflow.com, reddit, qoura down. (and probably more, those are the ones I tested)

dkarp5y ago· 3 in thread

Before the "Error 503 Service Unavailable" messages appeared, there were a few minutes where the error was a single line:

    connection failure

Not sure if that provides anyone here with more insight into what might have caused this!

stordoff5y ago

Edit: and now "I/O error" on Reddit.

q3k5y ago

I also saw a glimpse of 'I/O error'. That sounds fun.

SileNce5k5y ago

It was `connection failure` for me.

optiomal_isgood5y ago· 2 in thread

Amazon.com was completely broken here (Europe) and they're back, I was observing from where the assets were loaded from and they switched from EU to NA as a failover. Homework well done.

00deadbeef5y ago

I was surprised to learn Amazon don't use their own CDN

2 more replies

abluecloud5y ago

Still getting broken assets from the UK.

1 more reply

omk5y ago· 2 in thread

This outage made me realize that github is served over a single IP address (A record) for my point of origin (India). Stackoverflow has 4 A record listing, but all of these belong to fastly.

raphaelj5y ago

Well, Internet was indeed designed for redundancy, and it worked as intended. A no point in time it failed to make you reach the server it was supposed to make you talk to.

What are failing are all the application protocols that are running on top of the network.

kayfox5y ago

1 more reply

Haydos585x25y ago· 2 in thread

Such a huge number of sites. It seems like it's mostly US based sites and Australians are okay. Sending good vibes to whatever poor person is on support right now.

nineteen9995y ago

I'm in Australia and there are heaps of sites down for me.

2 more replies

paimoe5y ago

In Perth, reddit is down. So is Blackboard files for uni

alexchamberlain5y ago· 2 in thread

altacc5y ago

abluecloud5y ago

pimterry5y ago· 2 in thread

It's still early days, but I'm hopeful that it can provide a real solution to today's CDN centralization.

jokoon5y ago

Agree, but currently, ipfs would serve as a fallback, since it's about files. Decentralized/distributed generally has slower network performance.

Unless most nodes are high performance, I guess?

Personally I think a distributed database system, where entries are being made redundant in something like a blockchain+dht, would be a good start?

Decentralizing the internet works if it financially makes sense for platforms to build such tools.

1 more reply

jfny5y ago

I'm pretty sure you can serve hundreds if not thousands of users from a single Raspberry Pi

1 more reply

unfunco5y ago· 2 in thread

Amazon being down surely points to something other than Fastly being the cause?

austinjp5y ago

I just had a look at amazon.co.uk and most assets fail to load, the browser debug console is full of 503 errors. Picking one at random, it's fastly:

    $ nslookup images-eu.ssl-images-amazon.com

    Server:  127.0.0.53
    Address: 127.0.0.53#53

    Non-authoritative answer:
    images-eu.ssl-images-amazon.com canonical name = m.media-amazon.com.
    m.media-amazon.com canonical name = media.amazon.map.fastly.net.
    Name: media.amazon.map.fastly.net
    Address: 199.232.177.16
    Name: media.amazon.map.fastly.net
    Address: 2a04:4e42:1d::272

mpitt5y ago

Amazon.com uses Fastly https://www.streamingmediablog.com/2020/05/fastly-amazon-hom...

1 more reply

sleepyshift5y ago· 2 in thread

Looks like this has taken out Reddit at least.

spyke1125y ago

Is it also hitting Github? I'm not getting any css when loading Github.

1 more reply

algo_cheese5y ago

And a large part of GitLab

gansai5y ago· 2 in thread

wouldn't websites have alternate CDN's managing their traffic, why should they have a single point of failure ?

I was assuming there are couple of services like Fastly and companies might have architected keeping in mind the alternatives too, I guess.

raimondious5y ago

ImpactStrafe5y ago

Because interacting and switching between cdns can be very complicated and/or costly

It should be planned for, especially by major tech organizations like reddit, or Amazon, etc.

But I won't fault news organizations, who already don't have boatloads of money for not having fail over cdns

evouga5y ago· 2 in thread

Since Fastly’s own website is currently down:

What is fastly? Why are a huge number of web sites dependent on them? They are some kind of web host for companies that don’t want to run their own servers/data centers?

ImpactStrafe5y ago

Fastly is a Content Distribution Network (CDN).

Basically the closer the server serving the webpage is to the end user the faster it is for the end user to see and interact with.

But running servers all over the world 1) isn't efficient 2) costs a lot of money.

So a few companies (fastly, cloud flare, akamai) figured, hey, why don't we build a bunch of small data centers all over the world and then provide a distributed way to serve web traffic from it.

It originally was brought about for services like Netflix, but has expanded greatly.

You still host your servers, but a copy of the webpage/media is given to the CDN to serve to customers.

1 more reply

ceejayoz5y ago

I'm particularly intrigued as to why Amazon.com uses them.

They literally have their own directly competing CDN product. You'd think they'd be dogfooding it.

2 more replies

ysavir5y ago· 2 in thread

Tangential question, but with services like these, is there a known way to handle failure gracefully? Some way to automatically bypass these services if they are known to be down?

efficax5y ago

You have to have two separate cdns and use DNS to fail over. The problem is that means paying for a CDN that just sits dormant for the 99.999% of the time that your primary is down.

richardwhiuk5y ago

Have two different CDN partners, own your own DNS, and then withdraw one of the CDNs if they are down. Suspect that's what Amazon have done.

john373865y ago· 2 in thread

It should be resolve soon. From fastly status page:

The issue has been identified and a fix is being implemented. Posted 1 minute ago. Jun 08, 2021 - 10:44 UTC

abluecloud5y ago

Wonder if all the caches will have been wiped, causing knock on issues

1 more reply

grumple5y ago

Phew!

That time to find the issue is always the stressful part. < 1 hour is pretty good for weird stuff, and fortunately the east coast of the US is barely online this early (sorry Europe!).

atymic5y ago· 1 in thread

This has got to be even bigger than when cloudflare went offline, in terms of big companies affected. Clearly they have way more F500 customers than CF.

Good luck to the on call engineers!

yxhuvud5y ago

The funny part is that it isn't uncommon for sites to depend on both cloudflare and fastly in one way or another, due to buying services from saas companies that also depend on them.

k_5y ago· 1 in thread

Update: The issue has been identified and a fix is being implemented. Posted Jun 08, 2021 - 10:44 UTC

Seems like this is being resolved; curious to see the details afterwards

(from https://status.fastly.com/incidents/vpk0ssybt3bj)

optiomal_isgood5y ago

Reddit, Stack Overflow, Spotify, all back for me. Good job Fastly engineers!

jujodi5y ago· 1 in thread

Would be fascinating if Fastly is not be able to use GitHub, Travis, Terraform, pip, etc. to deploy their fix

nraval17295y ago

aero-glide25y ago· 1 in thread

isitdownrightnow.com is down

roachpepe5y ago

Thanks for the best laughs in a while friend - that's pure irony right there!

Jamie99125y ago· 1 in thread

Yep, seems like:

Reddit BBC News Twitch.tv Twitter emoji cdn?

are all down 503 service error

another-dave5y ago

Ah didn't cop that Twitter emoji issue was related! Thought an ad-blocker was stepping up its filters aggressively :)

Stack Overflow, The Guardian, Gov.uk too as some other biggish names getting hit.

1 more reply

kypro5y ago· 1 in thread

Some people are claiming online that this is a cyber attack. I contract for the UK Gov and I'm hearing reports that traffic is going through the roof right now.

Anyone know if there is any legitimacy to this?

fr2null5y ago

The fastly monitoring/status page says: "Customers may experience increased origin load as global services return". Which sounds like the increased traffic is to be expected.

[1] status.fastly.com

ZoomStop5y ago· 1 in thread

The outage has already been added to the Fastly Wikipedia page

abhiminator5y ago

Holy smokes these Wikipedia writers are quick! I'm sometimes impressed by how fast a page on a super recent happening gets populated with all of the currently known details.

choult5y ago· 1 in thread

My money is on an expired internal certificate or CA.

hugh-avherald5y ago

Fastly has scheduled maintenance to retire some TLS certs next week.

devops0005y ago· 1 in thread

BTC/USD is down too.

creamyhorror5y ago

Perfect time for the crypto whales to dump massively and cause an absolute panic.

permb5y ago

Made my alpine linux docker builds fail as well (varnish) - but shouldn’t it use a mirror when the primary download site is gone?

oneeyedpigeon5y ago

Good marketing for Fastly! I had no idea so much of the internet relied on it...

sjaak5y ago

Perhaps Fastly is simply taking their commitment to reducing CO2 seriously? Three hurrays for the climate!

snookdebook5y ago

I gave it about 10 tries, and it seems a very small percentage of transactions do go through.

A decent number of tries is rejected right at the Varnish front door:

Many more reach some backend system that just dumps "connection failure":

< HTTP/2 502 < content-type: text/plain; charset=utf-8 < content-length: 18 < connection failure

And a tiny few do get through:

DoreenMichele5y ago

I'm having intermittent Reddit issues, as one more data point.

I'm grateful for HN. I rebooted my computer. I thought it was my device and then saw this on my phone while rebooting.

monkeydust5y ago

Just occurring to me how CDNs are a major point of failure now for the internet

cph-w5y ago

I did not realise fastly adoption was so wide-spread. Can anyone more enlightened tell my why or have some resource on which use-cases fastly is superior to other CDNs such as CloudFlare?

simonbarker875y ago

how will their devs fix it if stackoverflow has gone down?!

modshatereality5y ago

This post is suspiciously ranked much lower than it should be (1216 points, 9 hours ago), lower than posts with < 100 points.

optiomal_isgood5y ago

marmot7775y ago

I think the honorable thing would be for them to have a statement easily findable.

So many companies sweep this sort of things under the rug if it’s only customer data that’s been breached. If they can’t sweep they have a high priced PR agency do the communicating.

I do not trust companies who handle things this way.

tommoor5y ago

Hands up if you're also here after being woken up by downtime alerts on the west coast

i3865y ago

fagnerbrack5y ago

https://dashboard.stripe.com/ is down https://github.com/ is defaced

fullstackwife5y ago

No mention of outage on https://status.cloud.google.com/, and I wonder why, because apparently this is a GCP problem.

mschuster915y ago

Ah yes, the wonders of centralized internet infrastructure.

Let's use a handful of providers for everything, they said. It will be cheaper, they said. It will be easier to manage, they said.

And it was cheaper, until downtimes began to affect more and more sites when central SPOFs got hit.

sergiomattei5y ago

Yikes, seems like a massive outage.

EDIT: Hexdocs is down, elixir-lang.org is down

angled5y ago

None of the ES/NQ/RTY/YM futures contracts took kindly to the outage! This could have had a much wider financial impact. Most seem to have recovered now.

asicsp5y ago

hypnoscripto5y ago

Looks like fastly.com uses fastly…

mcintyre19945y ago

Do they have an official status page? Googling gets https://docs.fastly.com/en/guides/fastlys-network-status which is 503

Edit: Elsewhere in the comments: https://status.fastly.com/incidents/vpk0ssybt3bj

1 more reply

devops0005y ago

Hacker News is the only one UP!

willvarfar5y ago

https://www.bbc.com/news/technology-57399628 is rendering and reporting on the story, but BBC itself was down at the start of the outage, with the same 503 varnish error message.

Presumably the BBC has some kind of fallback in place.

The journalists ought interview their own techies :)

jchandra5y ago

https://www.greenhouse.io/ down as well.

hestefisk5y ago

The Guardian summarised this as well: https://www.theguardian.com/technology/2021/jun/08/massive-i...

perino5y ago

Anything hosted on Firebase seems to be down

easytiger5y ago

I will NEVER understand why people put so much trust in single provider solutions for anything critical.

vfclists5y ago

What happens when there is excessive centralization.

I thought that one of the principles behind the Internet is to be able to reroute around failures, but neither these service providers nor their clients ever seem to learn.

I guess in their mind that only applies to packet routing not services. SMH

MrGilbert5y ago

Interestingly, https://www.fastly.com/ works for me, whereas https://fastly.com/ doesn't.

1 more reply

Omnious585y ago

diveanon5y ago

Time to develop CDN for CDNs.

It seems like a pattern that CDN have overly centralized the web and lead to issues like this.

Maybe its time to build a CDN that distributes your static assets to multiple CDNs and has a set of fallback states for service outtages.

tfar5y ago

https://flutter.dev/ and https://fastlane.tools/ as well.

Dobbs5y ago

misnome5y ago

pypi.org, but not https://status.python.org/ - I'm impressed that they actually hosted the status page differently!

1 more reply

lopatin5y ago

Their status page keeps claiming that my region, Chicago (ORD), is either Degraded Performance, or Operational. But clearly it's down. Is fuzzing metrics like this how they hit their SLA targets?

abhiminator5y ago

Looks like they're currently applying a fix.

https://status.fastly.com/incidents/vpk0ssybt3bj

montag5y ago

It's funny, I searched Twitter for "Ebay down" and the top result was an Ebay tweet with some not coincidentally broken Twitter emoji SVGs (as another person mentioned)...

theginger5y ago

GitHub? I had some issues, checked the service status page said no issues, but images were returning a 503. Maybe they host their service status page elsewhere including using fastly.

1 more reply

monkeydust5y ago

Pretty bad www.gov.uk is down as more services move to digital.

2 more replies

plasma5y ago

I briefly saw an output error about "domain not found" when hitting fastly.com, wonder if some list of domains has hit a limit/flushed/etc.

1 more reply

fareesh5y ago

How does one design a system that has a redundancy for when the CDN goes down? Paying for more than one CDN is probably too expensive isn't it?

grumple5y ago

Good job Fastly for getting the issue identified and resolved so quickly. < 1 hour to identify, <13 minutes to fix (assuming status is accurate).

an0n4u5y ago

numpy docs, too. i think it's cloudflare related as well. at least, I keep seeing some cloudflare errors interpolated with the 503 varnish error.

2 more replies

MyOnePiece5y ago

Quick question if the cdns are down why cant traffic be routed to the web servers the central web servers the company owns ?

I thought cdns had fallback configured ?

_kyran5y ago

Those of you that work in DevOps, SRE or are CTOs.

What kind of things do you put in place to manage these kind of centralised issues that are beyond your control?

1 more reply

devops0005y ago

Heroku is down https://dashboard.heroku.com/

1 more reply

JCWasmx865y ago

>The issue has been identified and a fix has been applied. Customers may experience increased origin load as global services return.

Is fixed

Nilef5y ago

Ironically, even this Outage page is out for me

ur-whale5y ago

Wow, talk about a brutal SPOF, most of the things I had planned to work with today are broken: reddit, github, stack overflow.

taosx5y ago

I̶n̶ ̶r̶o̶m̶a̶n̶i̶a̶ ̶e̶v̶e̶r̶y̶t̶h̶i̶n̶g̶ ̶s̶e̶e̶m̶s̶ ̶b̶a̶c̶k̶ ̶t̶o̶ ̶n̶o̶r̶m̶a̶l̶.̶.̶.̶?̶

Edit: nope, just worked for 2-3 requests (10 secs)

anotheryou5y ago

Looks fixed: https://downdetector.com/

jl65y ago

Worrying that this is impacting so many dev toolchains and services, which will hinder the ability to respond to the issue.

timvisee5y ago

This seems to be a bigger issue. BGP failure?

1 more reply

_kyran5y ago

Things seem to have come back online in Australia, although not sure if that's just sites switching over their DNS?

LightG5y ago

"The internet will just route around a local / centralised problem ... like water around an object"

Obligatory LOL ...

2 more replies

graphman5y ago

Firebase Dynamic Links is affected too. Checking the IP looks like they are using Fastly which is quite surprising.

taurath5y ago

I’ve noticed lots of social media content is tied to this - Reddit and Twitter images and some videos, for one.

loriverkutya5y ago

The issue has been identified and a fix is being implemented. Posted 3 minutes ago. Jun 08, 2021 - 10:44 UTC

ilaksh5y ago

Let's make all of the main internet sites dependent upon one central private service. Great idea guys.

artembugara5y ago

Seems like another single point of failure. What is a solution to not be affected by such an outage?

toong5y ago

It is time to remove that "100% uptime guarantee" claim from the website :grimacing:

classicflavour5y ago

My work's website is down too and the regular sites I use to escape work borderm

gansai5y ago

Fastly is back now. (The issue has been identified and a fix is being implemented.)

pattyj5y ago

It would be interesting to see estimations on the man-hour cost of this outage.

mothershesha5y ago

Got the same here (Australia)

johnstonnorth5y ago

rubygems.org affected too

vincentmarle5y ago

Well I know where to go next time if I were to be a Russian hacker

clawphantom5y ago

Twitch isn’t working and not responding and also the web dashboard

luke2m5y ago

When this happens to cloudflare, it will be even more impactful.

colesantiago5y ago

Looks like Fastly did not work as advertised, very misleading.

reuben_scratton5y ago

I'm sure it's just a coincidence that today is Patch Tuesday.

:-|

zwirbl5y ago

Spotify is also hit, though it still works without images

ddtaylor5y ago

Someone must have 51% attack the Pied Piper blockchain!

vlan1215y ago

Damn, I thought I cloud blame myself or the provider..

ronyfadel5y ago

Ten Percent Happier is down, and now my day is ruined.

1 more reply

fsnowdin5y ago

just had my own site down because of this. glad to see it wasn't my fault lol but good luck to the Fastly people on fixing the issue.

clawphantom5y ago

Twitch isn’t responding and also the web dashboard

8K832d7tNmiQ5y ago

That explains why I couldn't access reddit

navanchauhan5y ago

No wonder, The Verge and NYT are down too.

rich_sasha5y ago

www.python.org down as well, with the shortest of messages: 'connection failure'. Probably related?

1 more reply

NewLogic5y ago

Even amazon.com styling is borked for me

dilawar5y ago

I think reddit in India is down as well.

JosephK5y ago

Extremely long call, but what are the chances this turns out connected to the raids on organised crime using the An0m app that started today?

john373865y ago

It's probably a DDoS attack.

dragosbulugean5y ago

And all Webflow sites it seems...

alixaxel5y ago

Indeed, part of GitHub (.io) too.

ur-whale5y ago

Looks like HN is working ;-)

jfny5y ago

Do companies really not run test suites / do manual testing before deploying to production?

timetosleep5y ago

Seems to be back online

rvz5y ago

Basically everything is broken. "Centralising Everything" huh

dragosbulugean5y ago

All Webflow sites?

mlnj5y ago

StackOverflow too.

schappim5y ago

Parts of Shopify

ur-whale5y ago

Looks like an SRE team rolled out buggy software.

1 more reply

rottc0dd5y ago

github is back online. SSO too.

raylus5y ago

Whew, DevOps fire alarms are going off!

raylus5y ago

github.com is pretty broken

schappim5y ago

SMH.com.au

heavydust5y ago

the problem has been fixed

heavydust5y ago

reddit.com is affected too

alexannic5y ago

cnn.com is down as well.

cwen5y ago

A real-world Chaos experiment!

cdev5y ago

it seems to be up now

magicturtle5y ago

reddit down aswell

Metacelsus5y ago

I first noticed that xkcd was down. Then I went to post about it on reddit . . . also down! Good thing HN is up.

nindalf5y ago

Taken out xkcd as well.

2 more replies

pts_5y ago

Are these sites on the same cloud or CDN?

1 more reply

colesantiago5y ago

Also, why has this been allowed to happen? Billions of dollars lost because of this one company?

I don't understand this.

ramraj075y ago

For a moment I thought all of Western internet was cut off from India. Says how siloed my browsing habits are!

raphaelj5y ago

Couldn't be happier I moved https://noisycamp.com to BunnyCDN.com.

TheRealDunkirk5y ago

3 more replies

j / k navigate · click thread line to collapse