Facebook, Instagram go down around the world in an apparent outage (opens in new tab)

(usatoday.com)

610 pointsajiang7y ago331 comments

331 comments

217 comments · 69 top-level

linsomniac7y ago· 30 in thread

Could this be related to the storm?

I was out shoveling, and came back in to my phone blowing up. Our systems at IronMountain (formerly Fortrust) in Denver all rebooted at once. These are all on redundant power, each systems redundant power supplies connecting to different circuits entering the cabinet, and those two circuits fed from 3 PDUs (two separate, one share). Each of those is supposed to be fed by a separate UPS and generator. Last status update I had says that they are running off generators, but they've been shockingly tight-lipped about it.

Don't get me wrong, it was hi-LAR-ious to call into their NOC and have them pretend that I was the only one having problems. "Can you tell me if there is a major data center outage going on?" "We are trying to gather information, we are making a bunch of client phone calls, we will know after we make those calls." "... Why are you making a bunch of client calls if you aren't having an outage?"

jyriand7y ago

So, a storm in Denver stops me from using Messengner in Estonia? I wonder where the butterfly flapped its wings.

ljm7y ago

Pretty sure it doesn't apply to Facebook, but Amazon's cheapest AWS tiers are around there. Same with Virginia.

2 more replies

iamgopal7y ago

https://en.m.wikipedia.org/wiki/Butterfly_effect

erobbins7y ago

No, FB datacenters are geographically diverse.

They do run quarterly 'storms' where a datacenter is shut down to test failover and resiliency. I have no idea if today is one of those days, since I left last year.

johannes12343217y ago

Theoretically a real shutdown might go in a different way than previous tests or simulations. For instance in a test you might cut the connection completely, while in the real case only some power circuits go down or whatever.

For instance GitHub's relatively recent shutdown was due to a fail-over heartbeat not going as expected.

linsomniac7y ago

Test failures are all well and good, but don't always match reality. In this case, the design of the power infrastructure was solid, and their plans include running monthly generator testing and quarterly "disconnect from the grid" testing. But apparently something about this failure of both of the incoming power lines caused failures in multiple UPSes. Still waiting on the after-action review.

chipperyman5737y ago

Interesting. Out of curiosity, how hard is it to turn the datacenter "back on" in case they discover there's a problem with the failover?

1 more reply

bertil7y ago

It is an internal software problem.

unethical_ban7y ago

> "We are trying to gather information, we are making a bunch of client phone calls, we will know after we make those calls."

I think that is a yes, and he getting ahead by saying "Yes and we have no idea why or ETA so let us do our job".

Granted, they should have a status page.

opportune7y ago

Last time I dealt with a cloud provider outage the status page was unresponsive during the outage because the status page had some kind of dependency on the resources that were down...

1 more reply

linsomniac7y ago

Sure, I understand the "so let us do our job". I've been on the other side of that.

On the other hand, I need information to be able to do my job: Is this only our cabinet having problems and I need to start rolling to the datacenter (in the middle of a giant blizzard)? Is this possibly some sort of problem with our own power infrastructure? Is something on fire (an EPO triggered by fire could cause this)? Did the roof cave in under the weight of the snow we are getting? Is the power stabilized or is there some indication that power might be up and down?

In short, I need answers to: Do I need to gracefully take down my site to prevent lost transactions and database corruption? Do I need to switch to our backup site?

For context: All of our servers powering off at once and then back on shouldn't be possible. It should require the failure of at least 3 independent pieces of equipment (except at the breaker panel or in our cabinet where it could be only two failures). It is extremely unusual for this to happen, first time it's happened for me and I've been in that facility since 2004.

So, yes, I respect that you need to do your job. But I also need to do my job.

Plus, I'm pretty sure the guy answering the trouble line, his job WAS talking with the customers. The people working the problem likely didn't include him. This is a huge data center run by a ginormous company. I don't think I was taking him away from twisting a wrench. :-)

rhizome7y ago

I wouldn't be surprised if they think a status page would open up liability for not putting it up soon enough, or for too long, or for some text that turned out to be wrong or unnecessary.

rconti7y ago

"The storm"? It's sunny in the Bay Area for the first time in I don't know how long. I imagine it's nice in other parts of the world as well, other than where this localized "storm" is.

kornish7y ago

Denver is getting slammed right now — power surges everywhere.

mises7y ago

You missed the bomb cyclone that's happening right now in half the country?

1 more reply

ninju7y ago

Colorado weather: Bomb cyclone brings wild winds, big impact as blizzard whips state

https://www.denverpost.com/2019/03/13/colorado-weather-bomb-...

jonstokes7y ago

My first reaction to "could this be related to the storm," was "oh no, now this QAnon stuff has spread to HN."

komali27y ago

Do they do this to get around and 99.9% uptime agreements?

Implicated7y ago

Probably a combination of that and to curtail the "I just spoke to Brad in Customer Service who confirmed _the whole datacenter if offline_" type posts.

But that's my presumption, I don't actually know anything and don't want to imply I do.

ljm7y ago

It's easy to be cynical but it's optimistic expectation management.

It might be resolved, it has to get worse before you escalate it further. They might not know the full facts. Might be worse than it really is. How do you know? You can't judge that because your personal rendering of Facebook failed. You have load balancers and CDNs and A/B testers all getting in the way of delivering data to your machine.

It's too easy to draw a conclusion from the client-side armchair and the provider is absolutely not going to make false promises, for the worse or for the better.

You want to hope that Facebook, in this case, acts on more complete information.

6nf7y ago

0.1% is still almost an hour per month

SilverSurfer9727y ago

That's the trust issue with current agreements we are solving. If an API is down the bound agreement is enforced instantly with our platform, no lies, no call, no pain. We are actually onboarding companies to try it out! https://stacktical.com

2 more replies

rubicon337y ago

Classic response for any kind of service provider:

Deny, deny, deny, obfuscate, deny, then blame someone else (usually, YOU).

jorblumesea7y ago

A company as large and sophisticated as FB has data centers and cloud services in multiple countries, and in the US, probably colocated data centers. Certainly nothing localized to where you are.

TallGuyShort7y ago

If the outage is at all infrastructure-related, the root cause was something that at some point was local and cascaded. Unless someone git pushed to a repo used by both companies and it's taken all day to get it git revert'd, their redundancy obviously didn't work, did it? There's effectively a category-2 hurricane moving from the Rockies through the mid-west right now.

ct5207y ago

Awesome we use iron mountain for escrow

taurath7y ago

“Outage, what outage?” Is a sort of laughable response but all too common with tons of providers.

rad_gruchalski7y ago

Reminds me when I was contacting Deutsche Telekom last year regarding an outage in Monschau area. "We have no problems". In fact, the whole exchange was down, press got a sniff of it when people could not contact emergency services anymore: https://www.aachener-nachrichten.de/lokales/eifel/netzstoeru...

bradhe7y ago

I'm looking at you, AWS...

2 more replies

ct5207y ago

Man that reminds me of that century link cluster f

wybiral7y ago· 12 in thread

Can you imagine if Twitter and Google went down at the same time?

People would be reactivating their Facebook accounts and having to sift through conspiracy theory posts about Hillary Clinton still just to figure out what was going on.

Edit: The points on this post keep going up and down every time I check these comments. Yes, it was sarcasm, I was joking, but I was trying to point out that most people rely on a small set of services. "Cloud" has centralized things a lot.

bouncycastle7y ago

Whenever I hear when some service is down, I immediately go to that service to confirm. Then I repeatedly hit reload if it doesn't work to see if it can come up. I guess many people do the same and that may contribute to the problem...

tyingq7y ago

Gmail's exponential back off, with it's visible countdown to next retry is a nice idea. Probably reduces that compounding wave of customer reloads.

spectre2567y ago

Years ago I worked at a large online casual gaming company who's name ended in -ynga. Our web tier was split into two: one for serving static content required to load the HTML, Flash app, assets, etc. The other was for actual communication regarding actions taken in game.

Whenever we had any sort of issue we could generally get a good idea of what was happening by looking at changes in traffic in those two web tiers.

If people couldn't play for most reasons, game action traffic would drop to near zero, but the static asset tier traffic would usually at least triple.

So yeah, there are a lot of F5 buttons being hit out there when pages don't load.

hiei7y ago

Gmail and other Google products went down last night. Close though. Thankfully not on Twitter or FB.

kyrra7y ago

I don't believe Gmail was ever fully down. For me, I was just having problems with attachments. I also noticed app icons in the play store failing to load.

3 more replies

hiccuphippo7y ago

My company's tech support sent us an email to tell us our email was down. Fun times we live in.

1 more reply

sundvor7y ago

People would be calling emergency hotlines..?

Wait, people are doing this already: https://twitter.com/SA_SES/status/1105969450698694656

pennaMan7y ago

I'm more worried of all of it going down at the same time.

celticninja7y ago

When it all goes down at the same time you should be worried. Not because of the lack of Twitter or FB or Gmail bit for what it means if it's all down.

josteink7y ago

> Can you imagine if Twitter and Google went down at the same time?

Google sure, but what people in the real world cares about twitter?

Twitter could be down for days and only the technocracy would notice.

corobo7y ago

Twitter is where we complain that Google is down

chriswarbo7y ago

The US president?

It also seems popular with journalists and media companies (e.g. TV shows asking viewers to "tweet us your questions")

snazz7y ago· 12 in thread

I’ve seen many systems go down over the last few days worldwide. Aside from the possibility of a mega-DDoS attack (which Facebook denies), all of these organizations have fairly diverse tech stacks to my knowledge. Google’s issue (supposedly) had to do with their Blobstore API, we don’t know what happened with Facebook, and many other, smaller services have had issues as well, including three intranet services at my workplace.

This leaves me wondering what software all these places have in common. The application layers are all different, the databases are all different, the containerization and provisioning systems are different, but I imagine that all these systems rely on two things: the global Internet backbone, and maybe the Linux kernel.

Have there been major security vulnerabilities patched lately in the Linux kernel that could have had unintended consequences?

str33t_punk7y ago

Both companies are massive and have tons of developers. It becomes almost impossible to look at the system as a whole with the amount of changes coming through. And, you get scenarios where small failures cascade through the stack reaking havok. Often times its just one config change

Its telling that one of the hottest areas of distributed systems research these days is the boring topic of configuration management. Google, Microsoft, etc are paying researchers top dollar to figure out how to prevent massive outages through novel techniques. It is one of the harder problems to solve and requires massive investment in tooling, refactoring, etc.

snazz7y ago

You’re undeniably right about not looking at Facebook or Google as one whole system, but there have also been what seems like an unprecedented number of strange little outages (see the ones mentioned by https://news.ycombinator.com/item?id=19382418) that aren’t huge companies. My workplace had some of their own today that I haven’t heard an incident report about (it’s a pretty large company and I’m not in IT).

hideo7y ago

>>Google, Microsoft, etc are paying researchers top dollar to figure out how to prevent massive outages through novel techniques

Curious what makes you think this. Are there specific job postings in either company that are focused on this?

1 more reply

shafte7y ago

The best explanation is coincidence, I think. I have direct knowledge of two of the incidents in the past few weeks, and they have completely unrelated causes.

Sometimes you just get unlucky!

smacktoward7y ago

"Once is happenstance. Twice is coincidence. The third time it’s enemy action."

-- Ian Fleming (in Goldfinger)

z3t47y ago

If it can happen, it will happen.

snazz7y ago

That’s certainly possible. We’re probably still too early to tell, but the innate conspiracy theorist slash pattern-matching part of my brain wants to find a probable connection.

taneq7y ago

Maybe we're in the first week of a rogue AI's hard takeoff. ;)

nubslayer27y ago

>all of these organizations have fairly diverse tech stacks to my knowledge

>This leaves me wondering what software all these places have in common.

dunno what systems you're talking about, but seems likely they are mostly x86 systems and maybe even mostly using Intel hardware and microcode

those systems can-be/are rooted and more, to my knowledge

felix2467y ago

It could also just be coincidence.

madrox7y ago

> This leaves me wondering what software all these places have in common.

Cisco or Arista

happythought7y ago

FANGs all use white box hardware with “merchant silicon” meaning they buy the chipsets directly from Broadcom, Mellanox, etc. and build their own devices. However, they do all have Broadcom and Mellanox in common and Cisco, juniper, and arista do too.

1 more reply

nabla97y ago· 12 in thread

Serious question: Was any value lost? (this may appear sarcastic)

Facebook obviously loses some ad revenue and Facebook customers may lose sales. But do Facebook/Instagram users suffer? But how does losing social media for several hours affect the quality of life of users?

nimir7y ago

I am not a big fan of social media too but you will be suprised ... For example here in Sudan (East Africa) the country has been under continuous protests for over 2 months now (53 dead, 4k+ detained, 500+ injured) with strong censorship from the regime & silents from the internatinal community. So facebook, whatsapp & twitter are the only media left for the people to fight for freedom —> every Thursday is the main protests in the week and this Wedensday night the outage might affect this as thousands around Sudan won't know about the meeting points of tomorrow!!!

Actually the government did block all social media for over a month but that was fixable with vpn. (Follow hashtag #SudanUprising on twitter to learn more)

nabla97y ago

Interesting point in general, but...

What I asked was what is the effect of sporadic interruptions of few hours. I mean, if Facebook had 30% availability, would I lose anything valuable from the experience? Is it that we are just used to it and and want it to be there always?

The value of 99.5 availability fore __users__ is not clear to me. Instant messaging is exception for this.

brudgers7y ago

I know parents who keep in touch with their children via Messenger. In part because it works in more places: Messenger works wherever there's internet over wifi not just cell service. People rely on Facebook for non-trivial reasons whether or not I (or someone else) think it's a good idea or not.

sampleinajar7y ago

It might seem pithy, but my wife has a small internet based business and uses facebook as a login for one of the sites she sells on. So, today instead of being able to autofill labels directly for shipping, she had to hand type addresses in for all shipping labels for products sold on that site.

OccamsMirror7y ago

I reached out to an old acquaintance that could be a great help to my company. I reached out over Facebook. Now that contact can not respond and may have not even seen my message. I have no other way of contacting this person. This affects my business.

I hate Facebook, but to deny its value is pretty naive.

TallGuyShort7y ago

FB has effectively replaced all other text messaging for several of my social circles. It's nice when you have groups that kinda change over time, otherwise group-texts always end up with numbers you don't necessarily have in your phone, etc.

hrrsn7y ago

I can live without FB and Instagram, but Messenger being down is a whole different story.

alanbernstein7y ago

Do you think of Messenger as separate from Facebook?

1 more reply

vokep7y ago

Potentially much value was gained (or stopped hemorrhaging)

anigbrowl7y ago

For hours, not especially - it's annoying but no worse than a power cut. There could even be benefits.

On the other hand, if someone were to sabotage the platform and prove/convincingly argue that they induced the failure, at minimum it would do significant damage to the tech sector and at maximum cause public panic.

This is a hypothetical, not speculation on the cause of this outage.

ZeroCool2u7y ago

Obviously this could be argued differently from a shareholder perspective, but I would say otherwise no. Interestingly, this might be one of the only times where a large outage could be claimed to be adding value. Again, not for FB, but for users, sure.

gukov7y ago

After a few hours of not being able to use an app people might start realizing how addicted they are to it. "I was bored initially but then realized talking to people in real life still works."

40acres7y ago· 10 in thread

I'm interviewing for a Production Engineer role at Facebook on Monday, thanks for providing relevant "do you have any questions for us" content.

mtw7y ago

Good question is why oh why switch WhatsApp to Facebook tech when it was running perfectly ok on its own. Never crashed.

duado7y ago

So that engineers can be moved between product groups while carrying relevant knowledge and experience with them.

2 more replies

macintux7y ago

Speaking as an Erlang developer I approve of this question.

mietek7y ago

The answer should be clear at this point. WhatsApp provided too much privacy and not enough monetisation opportunities.

packetslave7y ago

Good luck! If you’re interviewing in MPK make sure your recruiter takes you to the barbecue shack for lunch

erobbins7y ago

But be prepared to stand in line for longer than it takes you to actually eat.

1 more reply

B-Con7y ago

When I interviewed for SRE at Google, they'd had a non-trivial cross product outage days before. Good conversation starter, but I couldn't get many details out of them.

t3rabytes7y ago

You mean this week when they took out most of our their cloud and public products for a few hours? /s

taneq7y ago

You think they might have a couple more vacancies going now? :P

HelloFellowDevs7y ago

Funnily enough, same for me too! New Grad this Friday. Good luck!

jedberg7y ago· 10 in thread

So yesterday Google had a major (and out of character) outage across its apps, and today Facebook has a major (and also out of character) outage across its apps.

I can't wait to see the RCA for both of these and if they're related.

godzillabrennus7y ago

Private post Morten: The NSA middleware we are required to run (that took time to deploy to each of our social partners) is breaking something so let’s revert.

Public post Mortem:

Entirely believable technical cause.

samstave7y ago

Alternate Post Mortem: Cyber WWIII's first public skirmishes become visible...

(Ignore Stuxnet, Ignore DUQU)

2 more replies

popz417y ago

I imagine the NSA uses an optical tap device. These devices create identical copies and require no power or management.

4 more replies

drbenway7y ago

interesting that facebooks cavalrylogger is still being sucessfully injected despite their being nothing but a blank page also interesting that cavalrylogger has a function that lets you bind key-presses to callbacks even more interesting is that cavalrylogger seems to come prepackaged with any facebook like button! cheers for the keylogger facebook

https://stackoverflow.com/questions/4188605/what-is-cavalryl...

anticensor7y ago

Alternative post mortem (blind): Massive power outage

Yet another alternative: Third World War has just started, and this was the first battle.

betolink7y ago

I don't think it's the NSA this time, for once they don't have to do deep package analysis or install any MITM device since they get the whole info in bulk, maybe it's just a 400-pound hacker.

2 more replies

eqdw7y ago

Remember when youtube was down for like six hours a few months ago and we still haven't heard why?

mikemotherwell7y ago

Yeh why was that? I kept wondering when we'd hear. I've got bets to collect on!

crb0027y ago

Curious of they did it due to a kernel exploit being used by a nation state bad enough that it was worth YOLO patching.

arisAlexis7y ago

there are actively people downvoting such comments. I guess that's suspicious too.

cronix7y ago· 8 in thread

It looks like something much larger is going on. If you look at the front page of https://downdetector.com/ you'll see most major sites/backbones are having issues (Verizon/ATT/Sprint/CenturyLink/TMobile/Comcast/Level3/etc).

ceejayoz7y ago

That site relies on user reports.

I strongly suspect users are reporting "my Internet is having troubles" because their FB, Messenger, etc. isn't working right.

For example, in the comments of the T-Mobile outage page, there's stuff like "Haven't been able to upload anything to social media all day" and "Cannot send pictures through whatsapp and fb messenger".

cronix7y ago

That's true, but it is a good indicator. Here's a better map from Akamai: https://www.akamai.com/us/en/resources/visualizing-akamai/re...

Also, check out the "Attacks" tab. That one really lights up. Like seriously lights up. Something is going on... all over. US, China, Russia, EU...

1 more reply

jnothing7y ago

Yeah in Turkey people thought government is slowing down/blocking the whole internet

1 more reply

mclightning7y ago

Yes, I had that exact problem yesterday. I couldnt upload pictures/videos/gifs on whatsapp and instagram.

I thought maybe my ISP blocked a port which these services maybe transferring their multimedia on.

/Sweden

onychomys7y ago

That's a fair point, but those folks probably aren't reporting AWS problems, and it's showing a spike too.

1 more reply

wybiral7y ago

I believe that too but some of the services seem unrelated (like Flickr and Capital One).

Now, another interpretation is that the reports are simply false...

wybiral7y ago

Ironically https://outage.report/ is down too.

Edit: it's back now (8:37 PM UTC)

blang7y ago

HN was down for a minute.

earless17y ago· 7 in thread

What manner of failure would cause such globally deployed and distributed systems to go down like this? I'm very interested to read up on this when they release details of the failure.

rwultsch7y ago

Short duration: network, bad software deploy Long duration: db. If you break data, it takes a while to unbreak.

Source: Me. My career has been spent managing db's for internet scale sites.

dsfyu404ed7y ago

I work for a smaller but comparably large platform. "If everything is down check the DB" is at the top of one of our internal monitoring websites in red.

Screw ups related to data loss are rare (I've been here years and haven't seen one with the DBs that the stuff I work with uses) but failures at this scale tend to cascade a little ways and it takes time to dig out of the hole. They probably have the problem solved but they have to spend a bunch of time synchronizing things and verifying the fix before they press the big red "go live" button.

1 more reply

cheeze7y ago

Nothing worse than that sinking feeling of "oh fuck, we have to backfill a lot of data.

1 more reply

fixermark7y ago

I have no inside knowledge of this one, but broadly speaking, these sorts of failures can be caused by a change thought innocent at the time to the core software that is then widely deployed using automated systems. If the core's tests didn't catch a real issue in production (and for whatever reason, the rollout happens faster than the regular small-release verification process can catch the error), things can go sour in a way that's expensive to un-sour.

Amazon once pushed a seemingly-innocuous change to their internal DNS that caused all the routers between and within datacenters to drop their IP tables on the floor. They had to re-establish the entire network by hand---datacenter heads calling each other up and reading IP address ranges over the phone to be hand-entered into lookup tables. Cost a fortune in lost sales for the time the whole site was inaccessible.

str33t_punk7y ago

As someone who works at a large company in the networking space, you would be surprised that minor changes to configuration can cause catastrophic failures that are really challenging to come back from

Network failures are usually really bad when your system is globally deployed and distributed -- often times you can't even communicate with your machines to deliver fixes :p

phoe-krk7y ago

An expired certificate, for instance.

https://www.thesslstore.com/blog/expired-certificate-ericsso...

jankassens7y ago

Here's one example https://rachelbythebay.com/w/2019/01/20/quiet/

BucketSort7y ago· 7 in thread

Google and FB having successive outages? Is this just a coincidence or is there some shared infrastructure that would explain this?

js27y ago

Verizon also had SMS issues yesterday:

https://www.businessinsider.com/verizon-outage-on-east-coast...

_t2kx7y ago

Both companies have their own private data centers/infrastructure.

frostyj7y ago

^^^ I noticed this too. GCP is under all 50 shades of outages since past few days. Feel I might need to rush back to my house and start digging a bunker

FakeComments7y ago

Seems like when GMail and O365 experienced major outages on the same day.

aviv7y ago

Likely a firmware issue with the NSA intercept devices.

9999px7y ago

Are you basing that on anything or just BS'ing?

farisjarrah7y ago

The Bay Area Peninsula has been having strong winds and heavy rains for the past few months. The last 3 days, there have been major power outages across the area. Redwood City had power outages 2 days in a row and Pacifica lost power to a good chunk of the city for like 7 hours last night. It wouldn't surprise me if all these major tech outages that have been happening this week is all related to poor Bay Area infrastructure.

Implicated7y ago· 6 in thread

Facebook's own status dashboard (https://developers.facebook.com/status/dashboard/) showed no issues or outtage just 30 min ago.

I run a messenger bot platform - the webhooks stopped being delivered _hours_ ago... nothing on their status page until it had been down for hours.

Their current issue...

"We are currently experiencing issues that may cause some API requests to take longer or fail unexpectedly. We are investigating the issue and working on a resolution."

What? lmao

pbhjpbhj7y ago

I'm pretty sure businesses use status pages to divert attention from support resources, they never seem to give useful information about outages and half the time don't even mention the outage.

yalok7y ago

that page is down now as well

buboard7y ago

you get what you pay for

knicholes7y ago

If you're inferring that Facebook is "free," I'd strongly disagree. Data is the currency of today, and they're swimming in it.

1 more reply

henrikschroder7y ago

I can't even reach that status page right now...

augbog7y ago

when I click I see a blank screen. why

CodeWriter237y ago· 4 in thread

This is bigger than Facebook.

https://imgur.com/a/gePwi0i

https://www.akamai.com/us/en/resources/visualizing-akamai/re...

ce47y ago

The heavy traffic is due to sports events - champions league last-16 matchday live streaming: Bayern Munich vs. Liverpool, FC Barcelona vs Olympique Lyon. The heatmap matches the clubs' home countries UK, Germany, France & Spain quite well.

Dont conflate that with fb/insta problems.

why-el7y ago

Indeed. The games are streamed like crazy! Lots of streams are in HD.

1 more reply

ceejayoz7y ago

That chart doesn’t support your assertion. Akamai’s traffic and attack charts usually look like that, and the attack chart even says it’s currently low.

dr1ggins7y ago

That traffic map seems to line up with time zones

dstola7y ago· 3 in thread

The only things that I can think of that would cause this scale of being down is either a T1 center outage or (conspiracy hat on) a major hack and everyone is rush patching

Would be interesting to read the post mortem if there is any regardless

snazz7y ago

If rush patching were going on, we’d likely see some hints in commit messages of open source projects, like the Linux kernel commits that were tipoffs to Meltdown and Spectre.

Edit: Has anyone seen anything of this sort in any of the projects they follow?

ajflores16047y ago

Did we ever get a post mortem for the global Youtube outage from late last year?

augbog7y ago

No but everyone is 99% sure it was related to killing Google+ which was announced not too long before and everyone who has used YouTube way back when knows they had to make a Google+ identity at one point to link em. HMMMMM....

subcosmos7y ago· 3 in thread

Something fun happening in Germany? https://www.akamai.com/us/en/resources/visualizing-akamai/re...

And Level3 traffic going to Argentina? https://twitter.com/bgpstream/status/1105819050968580096

And GreatBritain going to cambodia? https://bgpstream.com/event/197968

subcosmos7y ago

Hrmmm .... fun in Venezuela too ...

https://twitter.com/bgpmon/status/1104919654441467904

Must be because of that Dam blowing up

https://www.newsweek.com/sen-marco-rubio-blames-power-outage...

ceejayoz7y ago

> Something fun happening in Germany?

Major sporting events in the EU.

BGP fuckups appear to happen regularly, based on the tweet history of that account.

hokumguru7y ago

As a layperson, can you explain what exactly we're looking at here?

1 more reply

rdtsc7y ago· 3 in thread

Google then Facebook and Instagram?

My hunch is that it's the end of Q1 and people are trying to release code changes so they can pad their Q1 performance reviews "designed and delivered feature X on time in Q1".

cheeze7y ago

Yeah I'm gonna go with about a 0% chance of this.

unfunco7y ago

"and brought down a service used by billions and losing potentially millions in revenue."

3 more replies

toephu27y ago

Year end performance reviews are the most important..those were already done at most companies.

1 more reply

FabHK7y ago· 2 in thread

Let's see whether we have a spike in the birth rate in 9 months.

(Oh, turns out the Great Blackout Baby Boom was a myth:

https://www.snopes.com/fact-check/from-here-to-maternity/ )

Arbalest7y ago

What if there is a fall in birthrate because people can't organise risky hookups without their preferred communication platform?

anitil7y ago

Or a fall in STDs for the same reason.

kartan7y ago· 2 in thread

"This usually means we're making an improvement to the database your account is stored on. While this process won't affect your account, you temporarily won't be able to access the site." https://www.facebook.com/help/134401680031995

I guess that this is all that I will get. Facebook is never down, it is just making improvements (like restarting the services to make them work again).

jfaat7y ago

We've always been at way with Eastasia

subcosmos7y ago

and autocorrect - it's doubleplus bad

1 more reply

agosnell7y ago· 2 in thread

If you use their API and haven't seen it yet, their issue is listed here on their status page:

https://developers.facebook.com/status/issues/55989644784543...

_56597y ago

Looks like the status page is down for me as well.

yorwba7y ago

Increased Error Rates Created by Gary Fitzpatrick · · Facebook Team — Today at 10:32 AM

Current State: Investigating

Description: We are currently experiencing issues that may cause some API requests to take longer or fail unexpectedly. We are investigating the issue and working on a resolution.

Start Time: 2 hours ago

Last Update: about an hour ago

Updates: There are currently no updates for this issue.

1 more reply

wybiral7y ago· 2 in thread

Look at how many service providers have increased incidents reported here: https://downdetector.com/

My bet is that people are having problems with FB/Insta and immediately assuming that the whole internet is messed up.

_bxg17y ago

Especially if they can't sign in via OAuth. To an average user who signs into Spotify with their Facebook account, "I can't sign into Spotify" means Spotify is down, not Facebook.

ashtonbaker7y ago

Looks like that site is having a problem logging reports! They all go to zero at around 17:00.

zomg7y ago· 2 in thread

All my funny cat videos and memes are loading fine. Did you try rebooting your router 62 times?

All joking aside, is this news? :/

blhack7y ago

One of the largest internet companies in the world having a massive, global outage?

Yes, that seems like appropriate news for "hacker news", a website where people discuss technology news.

1 more reply

pbhjpbhj7y ago

Do you want to explain why you believe it's not [HN worthy] news?

casper3457y ago· 1 in thread

The real storm is realizing through Facebook OAuth you cannot access your affiliate accounts. Caution to move your accounts away from Facebook

Edit: Or have other methods than just relying on Facebook authentication

rmujica7y ago

realized that when trying to play Pokémon Go today, guess I'll break my weekly challenge...

btown7y ago· 1 in thread

I've also seen issues uploading images to Whatsapp in the past half hour. I wonder if there's anything to do with the Google Cloud Storage outage that took down Gmail yesterday?

_t2kx7y ago

Facebook doesn't use GCS (at least they didn't in 2015), they have their own infrastructure/data centers.

JakeWesorick7y ago· 1 in thread

https://www.facebook.com/platform/api-status/ still returns "Facebook Platform is Healthy", but you can't even load https://developers.facebook.com/status/dashboard/. Why have status pages if they are so susceptible to going down themselves?

ceejayoz7y ago

I remember an S3 outage a number of years ago where AWS discovered that their status page was hosted on S3. Whoops!

I believe this is why Github's status page is now on its own domain; so a github.com DNS outage won't take it down.

1 more reply

tinyhouse7y ago· 1 in thread

BTW, many apps are affected by this. I cannot log in to any app that uses Facebook for authentication.

cheeze7y ago

This is one of the many many reasons I don't use fb for auth for any other websites.

goblin897y ago· 1 in thread

Coincidentally, just watched The Social Network, the plot of which includes that quote by Mark:

> Let me tell you difference between Facebook and everybody else. We don't crash ever! If the serves are down for even a day, our entire reputation is irreversibly destroyed. <…>

> Even a few people leaving would reverberate through the entire use base. The users are interconnected. That is the whole point. College kids are online because their friends are online, and if one domino goes, the other dominos go.

ninth_ant7y ago

You’re quoting a movie, which played fast and loose with facts to tell a story (as movies do).

In real life, Facebook had significant issues with uptime in the early years.

1 more reply

minimaxir7y ago· 1 in thread

Minor update: https://twitter.com/facebook/status/1105907126424109056?s=21

> We're focused on working to resolve the issue as soon as possible, but can confirm that the issue is not related to a DDoS attack.

KennyFromIT7y ago

There's something ironic about a social media company being forced to rely on a competitor to facilitate communication between them and their users.

1 more reply

benatkin7y ago· 1 in thread

I got my github two factor auth SMS two hours late. Fortunately it was just my old laptop. I wonder if it was related. Good reminder to set up an authenticator app on my new phone so I don’t have to rely on SMS!

0xffff27y ago

Which you should do anyway because relying on SMS is only slightly more secure than not having 2FA enabled at all...

1 more reply

nodesocket7y ago· 1 in thread

And yet Facebook's stock is still up on the day (+.74%)?

You'd think being down for hours would be negative news and revenue impacting.

derwiki7y ago

2 hours of downtime in a quarter is 0.09% of downtime -- probably very little effect on their monetization products.

1 more reply

bluedino7y ago· 1 in thread

Instagram works for me on my iPhone, but the comments are all missing. I kind of like it that way.

jjulius7y ago

You're already logged in so it's just going to show you old content. I'd be surprised if your able to post anything, or if any "like" you give during the outage is saved. Same thing is happening with both Instagram and FB apps on Android for me.

1 more reply

tinyhouse7y ago· 1 in thread

From Google I got an error clicking on instagram and Quora links today.

JohnJamesRambo7y ago

At least you are free of Quora now!

arisAlexis7y ago· 1 in thread

yesterday Google and today Facebook. My conspirator says it's the Chinese government showcasing.

mdkdog7y ago

Meanwhile in Russia they are talking about disconnecting their network from the rest of the world. Some test gone south?... Maybe... Someone has a traceroute handy?

mizeandmen7y ago

Bunch fb employees near pacific catch was talking about how fb was hacked

jedberg7y ago

So yesterday Google had a major (and out of character) outage, and today Facebook has a major (and also out of character) outage.

I can't wait to see the RCA for both of these and if they're related.

Dangeranger7y ago

Ironically as I've been checking this post HN has been experiencing errors loading new comments.

llamataboot7y ago

Instagram seems to load the feed here fine (EU), but doesn't allow you to log in from any device or post anything new. FB is totally fine if you are logged in for reading, but also can't log in if logged out.

VPN to US, insta can login, but still not post.

Distributed services are weird man!

markstos7y ago

I see they've made some progress putting Instagram on the same infrastructure as Facebook.

hnruss7y ago

Did they move too fast and break too many things?

evolvedcleaning7y ago

Doesn’t their world class team make such a long outage to be quite unlikely? How hard would it be to devote ample resources to a cover story for the “incident report”? Is the timing relative to the plethora of indictments relevant at all? Reasonable that this may be related to shredding of data and/or code, or even a cooperation to turn over data to government in secret deal?

johnchristopher7y ago

> The team at Jefferies remains reasonably positive, and in the firm's top growth stock calls for the week we found four tech stocks that are offering more aggressive accounts good entry points. Carl Court / Getty Images

What's that weird tagline about ?

akulbe7y ago

In other news, productivity everywhere skyrocketed!

alien20037y ago

It's hard to believe that people simply can not live without fresh instagraphies

revskill7y ago

What's the cause of outage then ? Disk, memory, CPU, network bandwidth,... ?

armortech7y ago

Whatever happens right now at Facebook is less important than the fact they will never say what affected them. Of course nobody would tell 'hey, outage right now due to 0day / mistake' but...

aviral7y ago

https://downdetector.com/status/facebook

entwife7y ago

It's an experiment to see whether productivity improves when people aren't able to access FB and Instagram to slack off.

boshomi7y ago

the issue is also in EU: Is Facebook down? Messenger, site, app and Instagram hit by issues[1]

[1]https://www.manchestereveningnews.co.uk/news/uk-news/faceboo...

indigochill7y ago

First thought when I heard the news was BGP hijacking (ignoring whether accidental or deliberate). Doesn't the symptom fit other known cases like the Telegram incident in Iran last year, just at a larger scale?

Admittedly networking is not my strength, so perfectly happy for someone to shoot down this hypothesis.

JDiculous7y ago

Is Facebook actually working for anyone?

js27y ago

I haven’t been able to post anything on Facebook, neither a new post to my wall nor add a comment on a friend’s post, since mid-morning US/Eastern and this is still the case. In addition I can’t login to the site - I am able to access the site only where I’m already logged-in.

letientai2997y ago

This is the first time I experience this. Also note that current session on messenger.com still work, we can still send/receive message, but can't upload any image or send sticker. Looking forward for a post mortem analysis on this.

aboutruby7y ago

Hacker News might just have a "Major Internet Services" status board.

bradynapier7y ago

Perhaps relevant that npm has been having issues although they only recently caught and fixed them. Scoped Private npm packages were getting cloudflare 503 errors

Endy7y ago

Since it went down for PC and not mobile, I was concerned if it was just an idea of audience testing, in the process of moving to an app-only platform.

winrid7y ago

Ironic: https://imgur.com/a/M2SzqIc

boshomi7y ago

see also: Facebook, Instagram down: Social media sites not working for many, FB doing 'required maintenance'

[1] https://www.abc15.com/news/national/facebook-down-social-med...

llamataboot7y ago

Whatsapp now down across much of Europe it looks like. Cannot send/receive messages.

evesprini7y ago

Argentina: Whatsapp works for text, but any type of media takes very long to send.

red_admiral7y ago

FB has been down for me for around an hour, but has just come back up again.

peepX7y ago

I'm surprised there wasn't a national crisis alert sent out

yeutterg7y ago

Can confirm. Also, API integrations, such as Buffer, are not working.

nathan9297y ago

...terrifying API developers everywhere...

allannienhuis7y ago

seems to be more than just messenger login; All of facebook is super flakey this morning.

deca6cda37d07y ago

Unfortunately the AI software used to censor fake news from facebook has decided, again, to censor facebook :D

abbiya7y ago

somebody please wipe off everything from all these big corps

pragmaticalien87y ago

must be a super 0 day.

nilskidoo7y ago

Affecting more lives than any terrorist act.

jak927y ago

  ... and nothing of value was lost.

shrthnd7y ago

I didn't notice.

njn7y ago

Good! Keep them down.

miloignis7y ago

My fiance's uncle sent something today that because of a school shooting in Brazil, they were blocking all images and video shared to social networks like "WhatsApp, Instagram, Facebook and other social networks". I haven't been able to verify this myself or from any other sources, but I wonder if either people are misinterpreting the FB outage or if Brazil is blocking content it's having weird ripple effects.

j / k navigate · click thread line to collapse

331 comments

217 comments · 69 top-level

linsomniac7y ago· 30 in thread

Could this be related to the storm?

jyriand7y ago

So, a storm in Denver stops me from using Messengner in Estonia? I wonder where the butterfly flapped its wings.

ljm7y ago

Pretty sure it doesn't apply to Facebook, but Amazon's cheapest AWS tiers are around there. Same with Virginia.

2 more replies

iamgopal7y ago

https://en.m.wikipedia.org/wiki/Butterfly_effect

erobbins7y ago

No, FB datacenters are geographically diverse.

They do run quarterly 'storms' where a datacenter is shut down to test failover and resiliency. I have no idea if today is one of those days, since I left last year.

johannes12343217y ago

For instance GitHub's relatively recent shutdown was due to a fail-over heartbeat not going as expected.

linsomniac7y ago

chipperyman5737y ago

Interesting. Out of curiosity, how hard is it to turn the datacenter "back on" in case they discover there's a problem with the failover?

1 more reply

bertil7y ago

It is an internal software problem.

unethical_ban7y ago

> "We are trying to gather information, we are making a bunch of client phone calls, we will know after we make those calls."

I think that is a yes, and he getting ahead by saying "Yes and we have no idea why or ETA so let us do our job".

Granted, they should have a status page.

opportune7y ago

Last time I dealt with a cloud provider outage the status page was unresponsive during the outage because the status page had some kind of dependency on the resources that were down...

1 more reply

linsomniac7y ago

Sure, I understand the "so let us do our job". I've been on the other side of that.

In short, I need answers to: Do I need to gracefully take down my site to prevent lost transactions and database corruption? Do I need to switch to our backup site?

So, yes, I respect that you need to do your job. But I also need to do my job.

rhizome7y ago

I wouldn't be surprised if they think a status page would open up liability for not putting it up soon enough, or for too long, or for some text that turned out to be wrong or unnecessary.

rconti7y ago

"The storm"? It's sunny in the Bay Area for the first time in I don't know how long. I imagine it's nice in other parts of the world as well, other than where this localized "storm" is.

kornish7y ago

Denver is getting slammed right now — power surges everywhere.

mises7y ago

You missed the bomb cyclone that's happening right now in half the country?

1 more reply

ninju7y ago

Colorado weather: Bomb cyclone brings wild winds, big impact as blizzard whips state

https://www.denverpost.com/2019/03/13/colorado-weather-bomb-...

jonstokes7y ago

My first reaction to "could this be related to the storm," was "oh no, now this QAnon stuff has spread to HN."

komali27y ago

Do they do this to get around and 99.9% uptime agreements?

Implicated7y ago

Probably a combination of that and to curtail the "I just spoke to Brad in Customer Service who confirmed _the whole datacenter if offline_" type posts.

But that's my presumption, I don't actually know anything and don't want to imply I do.

ljm7y ago

It's easy to be cynical but it's optimistic expectation management.

It's too easy to draw a conclusion from the client-side armchair and the provider is absolutely not going to make false promises, for the worse or for the better.

You want to hope that Facebook, in this case, acts on more complete information.

6nf7y ago

0.1% is still almost an hour per month

SilverSurfer9727y ago

2 more replies

rubicon337y ago

Classic response for any kind of service provider:

Deny, deny, deny, obfuscate, deny, then blame someone else (usually, YOU).

jorblumesea7y ago

A company as large and sophisticated as FB has data centers and cloud services in multiple countries, and in the US, probably colocated data centers. Certainly nothing localized to where you are.

TallGuyShort7y ago

ct5207y ago

Awesome we use iron mountain for escrow

taurath7y ago

“Outage, what outage?” Is a sort of laughable response but all too common with tons of providers.

rad_gruchalski7y ago

bradhe7y ago

I'm looking at you, AWS...

2 more replies

ct5207y ago

Man that reminds me of that century link cluster f

wybiral7y ago· 12 in thread

Can you imagine if Twitter and Google went down at the same time?

People would be reactivating their Facebook accounts and having to sift through conspiracy theory posts about Hillary Clinton still just to figure out what was going on.

bouncycastle7y ago

tyingq7y ago

Gmail's exponential back off, with it's visible countdown to next retry is a nice idea. Probably reduces that compounding wave of customer reloads.

spectre2567y ago

Whenever we had any sort of issue we could generally get a good idea of what was happening by looking at changes in traffic in those two web tiers.

If people couldn't play for most reasons, game action traffic would drop to near zero, but the static asset tier traffic would usually at least triple.

So yeah, there are a lot of F5 buttons being hit out there when pages don't load.

hiei7y ago

Gmail and other Google products went down last night. Close though. Thankfully not on Twitter or FB.

kyrra7y ago

I don't believe Gmail was ever fully down. For me, I was just having problems with attachments. I also noticed app icons in the play store failing to load.

3 more replies

hiccuphippo7y ago

My company's tech support sent us an email to tell us our email was down. Fun times we live in.

1 more reply

sundvor7y ago

People would be calling emergency hotlines..?

Wait, people are doing this already: https://twitter.com/SA_SES/status/1105969450698694656

pennaMan7y ago

I'm more worried of all of it going down at the same time.

celticninja7y ago

When it all goes down at the same time you should be worried. Not because of the lack of Twitter or FB or Gmail bit for what it means if it's all down.

josteink7y ago

> Can you imagine if Twitter and Google went down at the same time?

Google sure, but what people in the real world cares about twitter?

Twitter could be down for days and only the technocracy would notice.

corobo7y ago

Twitter is where we complain that Google is down

chriswarbo7y ago

The US president?

It also seems popular with journalists and media companies (e.g. TV shows asking viewers to "tweet us your questions")

snazz7y ago· 12 in thread

Have there been major security vulnerabilities patched lately in the Linux kernel that could have had unintended consequences?

str33t_punk7y ago

snazz7y ago

hideo7y ago

>>Google, Microsoft, etc are paying researchers top dollar to figure out how to prevent massive outages through novel techniques

Curious what makes you think this. Are there specific job postings in either company that are focused on this?

1 more reply

shafte7y ago

The best explanation is coincidence, I think. I have direct knowledge of two of the incidents in the past few weeks, and they have completely unrelated causes.

Sometimes you just get unlucky!

smacktoward7y ago

"Once is happenstance. Twice is coincidence. The third time it’s enemy action."

-- Ian Fleming (in Goldfinger)

z3t47y ago

If it can happen, it will happen.

snazz7y ago

That’s certainly possible. We’re probably still too early to tell, but the innate conspiracy theorist slash pattern-matching part of my brain wants to find a probable connection.

taneq7y ago

Maybe we're in the first week of a rogue AI's hard takeoff. ;)

nubslayer27y ago

>all of these organizations have fairly diverse tech stacks to my knowledge

>This leaves me wondering what software all these places have in common.

dunno what systems you're talking about, but seems likely they are mostly x86 systems and maybe even mostly using Intel hardware and microcode

those systems can-be/are rooted and more, to my knowledge

felix2467y ago

It could also just be coincidence.

madrox7y ago

> This leaves me wondering what software all these places have in common.

Cisco or Arista

happythought7y ago

1 more reply

nabla97y ago· 12 in thread

Serious question: Was any value lost? (this may appear sarcastic)

nimir7y ago

Actually the government did block all social media for over a month but that was fixable with vpn. (Follow hashtag #SudanUprising on twitter to learn more)

nabla97y ago

Interesting point in general, but...

The value of 99.5 availability fore __users__ is not clear to me. Instant messaging is exception for this.

brudgers7y ago

sampleinajar7y ago

OccamsMirror7y ago

I hate Facebook, but to deny its value is pretty naive.

TallGuyShort7y ago

hrrsn7y ago

I can live without FB and Instagram, but Messenger being down is a whole different story.

alanbernstein7y ago

Do you think of Messenger as separate from Facebook?

1 more reply

vokep7y ago

Potentially much value was gained (or stopped hemorrhaging)

anigbrowl7y ago

For hours, not especially - it's annoying but no worse than a power cut. There could even be benefits.

This is a hypothetical, not speculation on the cause of this outage.

ZeroCool2u7y ago

gukov7y ago

After a few hours of not being able to use an app people might start realizing how addicted they are to it. "I was bored initially but then realized talking to people in real life still works."

40acres7y ago· 10 in thread

I'm interviewing for a Production Engineer role at Facebook on Monday, thanks for providing relevant "do you have any questions for us" content.

mtw7y ago

Good question is why oh why switch WhatsApp to Facebook tech when it was running perfectly ok on its own. Never crashed.

duado7y ago

So that engineers can be moved between product groups while carrying relevant knowledge and experience with them.

2 more replies

macintux7y ago

Speaking as an Erlang developer I approve of this question.

mietek7y ago

The answer should be clear at this point. WhatsApp provided too much privacy and not enough monetisation opportunities.

packetslave7y ago

Good luck! If you’re interviewing in MPK make sure your recruiter takes you to the barbecue shack for lunch

erobbins7y ago

But be prepared to stand in line for longer than it takes you to actually eat.

1 more reply

B-Con7y ago

When I interviewed for SRE at Google, they'd had a non-trivial cross product outage days before. Good conversation starter, but I couldn't get many details out of them.

t3rabytes7y ago

You mean this week when they took out most of our their cloud and public products for a few hours? /s

taneq7y ago

You think they might have a couple more vacancies going now? :P

HelloFellowDevs7y ago

Funnily enough, same for me too! New Grad this Friday. Good luck!

jedberg7y ago· 10 in thread

So yesterday Google had a major (and out of character) outage across its apps, and today Facebook has a major (and also out of character) outage across its apps.

I can't wait to see the RCA for both of these and if they're related.

godzillabrennus7y ago

Private post Morten: The NSA middleware we are required to run (that took time to deploy to each of our social partners) is breaking something so let’s revert.

Public post Mortem:

Entirely believable technical cause.

samstave7y ago

Alternate Post Mortem: Cyber WWIII's first public skirmishes become visible...

(Ignore Stuxnet, Ignore DUQU)

2 more replies

popz417y ago

I imagine the NSA uses an optical tap device. These devices create identical copies and require no power or management.

4 more replies

drbenway7y ago

https://stackoverflow.com/questions/4188605/what-is-cavalryl...

anticensor7y ago

Alternative post mortem (blind): Massive power outage

Yet another alternative: Third World War has just started, and this was the first battle.

betolink7y ago

I don't think it's the NSA this time, for once they don't have to do deep package analysis or install any MITM device since they get the whole info in bulk, maybe it's just a 400-pound hacker.

2 more replies

eqdw7y ago

Remember when youtube was down for like six hours a few months ago and we still haven't heard why?

mikemotherwell7y ago

Yeh why was that? I kept wondering when we'd hear. I've got bets to collect on!

crb0027y ago

Curious of they did it due to a kernel exploit being used by a nation state bad enough that it was worth YOLO patching.

arisAlexis7y ago

there are actively people downvoting such comments. I guess that's suspicious too.

cronix7y ago· 8 in thread

ceejayoz7y ago

That site relies on user reports.

I strongly suspect users are reporting "my Internet is having troubles" because their FB, Messenger, etc. isn't working right.

cronix7y ago

That's true, but it is a good indicator. Here's a better map from Akamai: https://www.akamai.com/us/en/resources/visualizing-akamai/re...

Also, check out the "Attacks" tab. That one really lights up. Like seriously lights up. Something is going on... all over. US, China, Russia, EU...

1 more reply

jnothing7y ago

Yeah in Turkey people thought government is slowing down/blocking the whole internet

1 more reply

mclightning7y ago

Yes, I had that exact problem yesterday. I couldnt upload pictures/videos/gifs on whatsapp and instagram.

I thought maybe my ISP blocked a port which these services maybe transferring their multimedia on.

/Sweden

onychomys7y ago

That's a fair point, but those folks probably aren't reporting AWS problems, and it's showing a spike too.

1 more reply

wybiral7y ago

I believe that too but some of the services seem unrelated (like Flickr and Capital One).

Now, another interpretation is that the reports are simply false...

wybiral7y ago

Ironically https://outage.report/ is down too.

Edit: it's back now (8:37 PM UTC)

blang7y ago

HN was down for a minute.

earless17y ago· 7 in thread

What manner of failure would cause such globally deployed and distributed systems to go down like this? I'm very interested to read up on this when they release details of the failure.

rwultsch7y ago

Short duration: network, bad software deploy Long duration: db. If you break data, it takes a while to unbreak.

Source: Me. My career has been spent managing db's for internet scale sites.

dsfyu404ed7y ago

I work for a smaller but comparably large platform. "If everything is down check the DB" is at the top of one of our internal monitoring websites in red.

1 more reply

cheeze7y ago

Nothing worse than that sinking feeling of "oh fuck, we have to backfill a lot of data.

1 more reply

fixermark7y ago

str33t_punk7y ago

Network failures are usually really bad when your system is globally deployed and distributed -- often times you can't even communicate with your machines to deliver fixes :p

phoe-krk7y ago

An expired certificate, for instance.

https://www.thesslstore.com/blog/expired-certificate-ericsso...

jankassens7y ago

Here's one example https://rachelbythebay.com/w/2019/01/20/quiet/

BucketSort7y ago· 7 in thread

Google and FB having successive outages? Is this just a coincidence or is there some shared infrastructure that would explain this?

js27y ago

Verizon also had SMS issues yesterday:

https://www.businessinsider.com/verizon-outage-on-east-coast...

_t2kx7y ago

Both companies have their own private data centers/infrastructure.

frostyj7y ago

^^^ I noticed this too. GCP is under all 50 shades of outages since past few days. Feel I might need to rush back to my house and start digging a bunker

FakeComments7y ago

Seems like when GMail and O365 experienced major outages on the same day.

aviv7y ago

Likely a firmware issue with the NSA intercept devices.

9999px7y ago

Are you basing that on anything or just BS'ing?

farisjarrah7y ago

Implicated7y ago· 6 in thread

Facebook's own status dashboard (https://developers.facebook.com/status/dashboard/) showed no issues or outtage just 30 min ago.

I run a messenger bot platform - the webhooks stopped being delivered _hours_ ago... nothing on their status page until it had been down for hours.

Their current issue...

"We are currently experiencing issues that may cause some API requests to take longer or fail unexpectedly. We are investigating the issue and working on a resolution."

What? lmao

pbhjpbhj7y ago

I'm pretty sure businesses use status pages to divert attention from support resources, they never seem to give useful information about outages and half the time don't even mention the outage.

yalok7y ago

that page is down now as well

buboard7y ago

you get what you pay for

knicholes7y ago

If you're inferring that Facebook is "free," I'd strongly disagree. Data is the currency of today, and they're swimming in it.

1 more reply

henrikschroder7y ago

I can't even reach that status page right now...

augbog7y ago

when I click I see a blank screen. why

CodeWriter237y ago· 4 in thread

This is bigger than Facebook.

https://imgur.com/a/gePwi0i

https://www.akamai.com/us/en/resources/visualizing-akamai/re...

ce47y ago

Dont conflate that with fb/insta problems.

why-el7y ago

Indeed. The games are streamed like crazy! Lots of streams are in HD.

1 more reply

ceejayoz7y ago

That chart doesn’t support your assertion. Akamai’s traffic and attack charts usually look like that, and the attack chart even says it’s currently low.

dr1ggins7y ago

That traffic map seems to line up with time zones

dstola7y ago· 3 in thread

The only things that I can think of that would cause this scale of being down is either a T1 center outage or (conspiracy hat on) a major hack and everyone is rush patching

Would be interesting to read the post mortem if there is any regardless

snazz7y ago

If rush patching were going on, we’d likely see some hints in commit messages of open source projects, like the Linux kernel commits that were tipoffs to Meltdown and Spectre.

Edit: Has anyone seen anything of this sort in any of the projects they follow?

ajflores16047y ago

Did we ever get a post mortem for the global Youtube outage from late last year?

augbog7y ago

subcosmos7y ago· 3 in thread

Something fun happening in Germany? https://www.akamai.com/us/en/resources/visualizing-akamai/re...

And Level3 traffic going to Argentina? https://twitter.com/bgpstream/status/1105819050968580096

And GreatBritain going to cambodia? https://bgpstream.com/event/197968

subcosmos7y ago

Hrmmm .... fun in Venezuela too ...

https://twitter.com/bgpmon/status/1104919654441467904

Must be because of that Dam blowing up

https://www.newsweek.com/sen-marco-rubio-blames-power-outage...

ceejayoz7y ago

> Something fun happening in Germany?

Major sporting events in the EU.

BGP fuckups appear to happen regularly, based on the tweet history of that account.

hokumguru7y ago

As a layperson, can you explain what exactly we're looking at here?

1 more reply

rdtsc7y ago· 3 in thread

Google then Facebook and Instagram?

My hunch is that it's the end of Q1 and people are trying to release code changes so they can pad their Q1 performance reviews "designed and delivered feature X on time in Q1".

cheeze7y ago

Yeah I'm gonna go with about a 0% chance of this.

unfunco7y ago

"and brought down a service used by billions and losing potentially millions in revenue."

3 more replies

toephu27y ago

Year end performance reviews are the most important..those were already done at most companies.

1 more reply

FabHK7y ago· 2 in thread

Let's see whether we have a spike in the birth rate in 9 months.

(Oh, turns out the Great Blackout Baby Boom was a myth:

https://www.snopes.com/fact-check/from-here-to-maternity/ )

Arbalest7y ago

What if there is a fall in birthrate because people can't organise risky hookups without their preferred communication platform?

anitil7y ago

Or a fall in STDs for the same reason.

kartan7y ago· 2 in thread

I guess that this is all that I will get. Facebook is never down, it is just making improvements (like restarting the services to make them work again).

jfaat7y ago

We've always been at way with Eastasia

subcosmos7y ago

and autocorrect - it's doubleplus bad

1 more reply

agosnell7y ago· 2 in thread

If you use their API and haven't seen it yet, their issue is listed here on their status page:

https://developers.facebook.com/status/issues/55989644784543...

_56597y ago

Looks like the status page is down for me as well.

yorwba7y ago

Increased Error Rates Created by Gary Fitzpatrick · · Facebook Team — Today at 10:32 AM

Current State: Investigating

Description: We are currently experiencing issues that may cause some API requests to take longer or fail unexpectedly. We are investigating the issue and working on a resolution.

Start Time: 2 hours ago

Last Update: about an hour ago

Updates: There are currently no updates for this issue.

1 more reply

wybiral7y ago· 2 in thread

Look at how many service providers have increased incidents reported here: https://downdetector.com/

My bet is that people are having problems with FB/Insta and immediately assuming that the whole internet is messed up.

_bxg17y ago

Especially if they can't sign in via OAuth. To an average user who signs into Spotify with their Facebook account, "I can't sign into Spotify" means Spotify is down, not Facebook.

ashtonbaker7y ago

Looks like that site is having a problem logging reports! They all go to zero at around 17:00.

zomg7y ago· 2 in thread

All my funny cat videos and memes are loading fine. Did you try rebooting your router 62 times?

All joking aside, is this news? :/

blhack7y ago

One of the largest internet companies in the world having a massive, global outage?

Yes, that seems like appropriate news for "hacker news", a website where people discuss technology news.

1 more reply

pbhjpbhj7y ago

Do you want to explain why you believe it's not [HN worthy] news?

casper3457y ago· 1 in thread

The real storm is realizing through Facebook OAuth you cannot access your affiliate accounts. Caution to move your accounts away from Facebook

Edit: Or have other methods than just relying on Facebook authentication

rmujica7y ago

realized that when trying to play Pokémon Go today, guess I'll break my weekly challenge...

btown7y ago· 1 in thread

I've also seen issues uploading images to Whatsapp in the past half hour. I wonder if there's anything to do with the Google Cloud Storage outage that took down Gmail yesterday?

_t2kx7y ago

Facebook doesn't use GCS (at least they didn't in 2015), they have their own infrastructure/data centers.

JakeWesorick7y ago· 1 in thread

ceejayoz7y ago

I remember an S3 outage a number of years ago where AWS discovered that their status page was hosted on S3. Whoops!

I believe this is why Github's status page is now on its own domain; so a github.com DNS outage won't take it down.

1 more reply

tinyhouse7y ago· 1 in thread

BTW, many apps are affected by this. I cannot log in to any app that uses Facebook for authentication.

cheeze7y ago

This is one of the many many reasons I don't use fb for auth for any other websites.

goblin897y ago· 1 in thread

Coincidentally, just watched The Social Network, the plot of which includes that quote by Mark:

> Let me tell you difference between Facebook and everybody else. We don't crash ever! If the serves are down for even a day, our entire reputation is irreversibly destroyed. <…>

ninth_ant7y ago

You’re quoting a movie, which played fast and loose with facts to tell a story (as movies do).

In real life, Facebook had significant issues with uptime in the early years.

1 more reply

minimaxir7y ago· 1 in thread

Minor update: https://twitter.com/facebook/status/1105907126424109056?s=21

> We're focused on working to resolve the issue as soon as possible, but can confirm that the issue is not related to a DDoS attack.

KennyFromIT7y ago

There's something ironic about a social media company being forced to rely on a competitor to facilitate communication between them and their users.

1 more reply

benatkin7y ago· 1 in thread

0xffff27y ago

Which you should do anyway because relying on SMS is only slightly more secure than not having 2FA enabled at all...

1 more reply

nodesocket7y ago· 1 in thread

And yet Facebook's stock is still up on the day (+.74%)?

You'd think being down for hours would be negative news and revenue impacting.

derwiki7y ago

2 hours of downtime in a quarter is 0.09% of downtime -- probably very little effect on their monetization products.

1 more reply

bluedino7y ago· 1 in thread

Instagram works for me on my iPhone, but the comments are all missing. I kind of like it that way.

jjulius7y ago

1 more reply

tinyhouse7y ago· 1 in thread

From Google I got an error clicking on instagram and Quora links today.

JohnJamesRambo7y ago

At least you are free of Quora now!

arisAlexis7y ago· 1 in thread

yesterday Google and today Facebook. My conspirator says it's the Chinese government showcasing.

mdkdog7y ago

Meanwhile in Russia they are talking about disconnecting their network from the rest of the world. Some test gone south?... Maybe... Someone has a traceroute handy?

mizeandmen7y ago

Bunch fb employees near pacific catch was talking about how fb was hacked

jedberg7y ago

So yesterday Google had a major (and out of character) outage, and today Facebook has a major (and also out of character) outage.

I can't wait to see the RCA for both of these and if they're related.

Dangeranger7y ago

Ironically as I've been checking this post HN has been experiencing errors loading new comments.

llamataboot7y ago

VPN to US, insta can login, but still not post.

Distributed services are weird man!

markstos7y ago

I see they've made some progress putting Instagram on the same infrastructure as Facebook.

hnruss7y ago

Did they move too fast and break too many things?

evolvedcleaning7y ago

johnchristopher7y ago

What's that weird tagline about ?

akulbe7y ago

In other news, productivity everywhere skyrocketed!

alien20037y ago

It's hard to believe that people simply can not live without fresh instagraphies

revskill7y ago

What's the cause of outage then ? Disk, memory, CPU, network bandwidth,... ?

armortech7y ago

Whatever happens right now at Facebook is less important than the fact they will never say what affected them. Of course nobody would tell 'hey, outage right now due to 0day / mistake' but...

aviral7y ago

https://downdetector.com/status/facebook

entwife7y ago

It's an experiment to see whether productivity improves when people aren't able to access FB and Instagram to slack off.

boshomi7y ago

the issue is also in EU: Is Facebook down? Messenger, site, app and Instagram hit by issues[1]

[1]https://www.manchestereveningnews.co.uk/news/uk-news/faceboo...

indigochill7y ago

Admittedly networking is not my strength, so perfectly happy for someone to shoot down this hypothesis.

JDiculous7y ago

Is Facebook actually working for anyone?

js27y ago

letientai2997y ago

aboutruby7y ago

Hacker News might just have a "Major Internet Services" status board.

bradynapier7y ago

Perhaps relevant that npm has been having issues although they only recently caught and fixed them. Scoped Private npm packages were getting cloudflare 503 errors

Endy7y ago

Since it went down for PC and not mobile, I was concerned if it was just an idea of audience testing, in the process of moving to an app-only platform.

winrid7y ago

Ironic: https://imgur.com/a/M2SzqIc

boshomi7y ago

see also: Facebook, Instagram down: Social media sites not working for many, FB doing 'required maintenance'

[1] https://www.abc15.com/news/national/facebook-down-social-med...

llamataboot7y ago

Whatsapp now down across much of Europe it looks like. Cannot send/receive messages.

evesprini7y ago

Argentina: Whatsapp works for text, but any type of media takes very long to send.

red_admiral7y ago

FB has been down for me for around an hour, but has just come back up again.

peepX7y ago