If a service has a big enough base an outage could almost seem like a pass for everyone. “Oo couldn’t do that this morning GITHUB was down.” Whereas if my small GitLab server was down bosses could be “well why aren’t we on GitHub like everyone else, it’s up now?”
When google calendar was down a bit ago I tried looking at random people responses on twitter and many were “whew I get out of some meetings!”
Perhaps there is some accountability for large scale outages like these but it really feels like consensus is often “shit happens” which is totally reasonable just seems like it could hurt big companies a lot less.
My counter to that would be, even though Firefox was and is a big company it's easy to change. A lot of services like Outlook, Chrome, or Gmail or just downright colossal at this point. Changing from Gmail isn't easy because of all the other services and logins associated with it.
Not to mention, Gsuite, GitHub, Outlook, and such are all very corporate. Higher ups make the decision as to when to use these often enough, so you can't just say "this has kind of failed us a couple times lets try something else".
So all in all, I still stand by my point, entrenched big companies probably get more of a pass.
[1] https://www.zdnet.com/article/former-mozilla-exec-google-has...
You can keep the Gmail account and start using a new one somewhere else. You start registering to new services with your new account and gradually migrate old services to it if they let you. You end up with gmail for social logins (which hopefully you can migrate too) and little else.
I do have a Gmail account. I use it to login in Google when I work for a customer which use the Google Cloud, when I have to upload videos to YouTube (I logout after that) and for Google Play.
Thinking specifically of cloud infrastructure, if you run your systems on infrastructure that is also used by your largest partners and customers, then your outages will tend to coincide, so (a) nobody cares that you're down because they're down too, and (b) even if they did care, they're busy enough worrying about their own systems that they have less time to come at you.
An example would be a company specializing in Shopify "apps" running their systems on GCP because that's where Shopify runs... if GCP goes down then Shopify itself is also likely down as well, and you'll get more mileage out of the "sorry, GCP is down" justification (if it's even needed).
When I was younger I really wanted one thing to do everything, when chat was integrated into gmail is was like “oh boy, one less app!” And then similar feelings when it connected to SMS... now it’s just “great everything is getting more connected and homogenous and I seem to have less and less options”
I think it's more "momentum". With a smaller company that affects a small number of users, the possibility of cutting your losses is quite feasible. On the other hand, with Outlook you've got your whole org using it, your servers running on it, your customers integrated with it, custom rules setup - a cut and run is just not simple.
Honestly though, I really don't get why rollback isn't more common for these larger pieces of software. Just some logic to detect multiple crashes, check if a rollback would break anything (i.e. dependencies) and then rollback. If you remove so much power from the average user in your OS, you ought to really have some amazing automated procedure for these scenarios.
I thought they also do limited roll-out as well? I think I remember the Windows anniversary bricked a bunch of machines in the limited roll-out and they just pushed it out anyway?
The complete lack of control as to when an update is performed is astounding - I know there are some settings to stop this, but apparently there are cases where that setting reverts back to auto or you need to switch it back on in order to manually update. I remember the last straw for MS and myself was accidentally clicking update just before a meeting whilst on very low battery, with no way to back out and no charger on hand. It didn't end well.
These days if something is "Windows only", it goes in a VM along with other software I don't trust.
I don't really know of any small services I gave up on because of outages more or less than I would for any other service regardless of company size....
The complexity of a product probably makes folks less likely to 'walk away' but I'm not convinced they do more or less for any other given company based on size alone.
At least in the days of yore MS distinguished itself with support for that kind of client, famously sending service technicians in helicopters out to remote installations.
It would be interesting to hear, in the new era that they're in, if that kind of thing continues.
Subhead: Are we better off?
Also, unless it's happening every week there's probably an element of, "well it doesn't make me regret the overall choice to use this service, so there's no sense being upset about it".
It will take you 3 years to unwind your Microsoft business. Few people with that level of rage over poor Windows/Office software can maintain it that long. Normally, companies yell at the IT Staff, unless it's a really bad problem, then they yell at the TAM or account manager, who takes some abuse, wears an appropriate sad/chastened face, and talks about the Microsoft release ring framework, and that most problems are your fault because you don't follow a similar model.
In my experience recently with O365, these issues usually are an update issue related to some regression in Windows 10 combined with an Office issue, or changes to Microsoft's authentication infrastructure. Microsoft has a unique understanding and methodology for making things good enough to persist -- they don't need to sell the product anymore, just make it not suck too much.
The first issue is because no human can keep a product working that has like 50 different active releases while the company prioritizes pushing out slop. The second issue is usually tied to a product release or revision, combined with new infrastructure. (Especially if it's in new IP space)
To figure out what the actual message is, you first have to figure out if you are seeing a HRESULT or a NTSTATUS[1]. In this case the leading 0xC is an invalid HRESULT, so it's for sure an NTSTATUS. Then you just look up the code in the bottom two bytes [2].
In this case it's just error code 5: "STATUS_ACCESS_VIOLATION", which is a memory access error, which typically means it's a bug where the program accidentally is trying to read outside of its memory space.
So, for sure an application error and not something that the user did or some sort of system issue.
There are also "facility" codes which can give you more information what where the issue is happening, but this code is just using the default.
I suppose this is better than when they would print the decimal value instead of the hex. "WTF is error code 3221225477?"
[1] https://docs.microsoft.com/en-us/openspecs/windows_protocols...
[2] https://docs.microsoft.com/en-us/openspecs/windows_protocols...
SYS!012345
SYS!876543
...and they were supposed to go look that up in some reference book to see what it meant.
The question was why they weren't outputting something human readable like "Unable to boot from non-system floppy". And the answer to that question was that, as an international company, they couldn't expect everyone to understand English.
So why not output multiple different languages? Well, the code to output the message has to fit in the floppy's boot sector, so the space is very tight.
And so there was discussion about whether it makes sense to give up having only 99% of users understand it, and instead have 0% understanding.
I think the killer argument was the person who suggested outputting a message something like
______
| |__| |
| () | X
|______|
which would just barely fit, and I think is almost universally understandable. Of course, the inscrutable message eventually stuck anyway.I suppose the reasoning was the same—not all users can understand English. But in this day and age a pictogram is in fact harder to search for because people describe it differently.
On the other hand, this is the OS/2 project, which Microsoft did as a political hedge instead of a true desire to invest in that particular future. I don't fault them for not overthinking their decisions.
I tracked this down with process explorer the other day. What I determined is that outlook is trying to load it's DLLs but the permissions on the FS prevent the DLLs from being read by the user.
I reset the permissions for all subfolders in C:\Program Data\Microsoft Common and c:\Program Files\Microsoft\Office and it started working again for me.
ACCESS_VIOLATION would be more like dereferencing a null pointer, but it's been a number of years since I had to troubleshoot this far in on Windows.
I suppose it's possible they didn't check for the error in the open file syscall then tried to use the uninitialized file handle for the DLL which caused the segfault.
If it had just told me that the name could not be resolved I would have figured out that I needed the full domain suffix because ipconfig wasn't showing it.
This issue goes way back in Microsofts history, always getting on my nerves as a power user. But at the same time probably the best OS for any other type of user.
> In this case it's just error code 5: "STATUS_ACCESS_VIOLATION", which is a memory access error, which typically means it's a bug where the program accidentally is trying to read outside of its memory space.
What would be a (more) user-friendly error message for this though, given the quite technical nature of the error?
"Error: 0xC0000005 - STATUS_ACCESS_VIOLATION - (Error, Microsoft defined, unreserved) Error: This is an internal error that prevented normal operation of the application. Please contact support at X."
Which conveys all of the information in the code and tells the user what they should do and that it wasn't their fault. Then it's a different story to say which code would be the most useful for the user.
NPM does something like this[1].
[1] https://stackoverflow.com/questions/22445371/npm-install-err...
I believe the actual error would be "access violation", status is only the prefix. But I don't think that text would help normal users more.
As in: "Who is that General Protection, what has he done to my computer and why is that his fault"?
"General Protection Fault" were the words actually presented to the "normal" users before. It surely didn't help them to read that.
They'll point you in a direction that is irrelevant to the error. I just wish the references were better.
This is what happens when you rip out well-tested IT infrastructure that has a reasonable update distribution process and entrust the cloud to update your software for you.
I've never been clear on what CtR describes. The first CtR Office was Home & Student 2010, which was a purchase.
A huge difference for business is that Windows Update allows you to approve/deny updates and cache them on a local server with WSUS, whereas CtR leaves each machine to go talk with Microsoft amongst itself on what to install and mostly cuts your IT staff out of the loop.
I've seen about ten major Office 365 outages since my last major Exchange upgrade? I don't know how beliefs about cloud service uptime being good persist unless your IT staff were really, really bad at their jobs. My house is more stable than a MS datacenter. And by and large, I actually like Microsoft.
Or, you can enjoy the quiet while email is down :)
The Web Client in its current form is functional, in that I can write and receive emails, but the experience is greatly diminished. Don't mistake me for liking the Outlook client (I am a very modest email user, no more than 400 some per day + a Tetris-style calendar), but this load crashes the client on a daily basis, on start up it takes a good 30 seconds before the client becomes responsive (with multiple local profile rebuilds), and sometimes just crashing randomly when reading certain HTML formatted emails.
Outlook is an exercise in frustration in general that unfortunately my entire company has built its foundation upon. The core ideas are great, but the execution on all fronts is awful. I'm actually almost disappointed when I open the mobile app and it __hasn't completely changed some UI/UX element__ as daily changes are the norm for me.
(all this being said, I'm shocked that Gmail somehow took Outlooks unusualness as a challenge and designed a worse UI/UX experience...)
At this point, if the OS wasn't an issue, I wouldn't use the full client, ever (for all of the reasons others have provided). The only gremlin I've encountered, routinely, is a situation where the rich editor doesn't see a "space" until I've typed a character after it. It's mildly distracting at worst, and far less of a problem than typing and having the thing just sit there like it'll get to my input as soon as the spirit moves it.
On that note, really, all of Microsoft's web products have been pleasant to use. They've become good enough that I rarely end up in the LibreOffice equivalents. The "Teams" client for Linux (in Preview; I run the Insiders version) integrates well -- clicking links from my calendar opens meetings. Some things are a little off, but are a few settings away from being ideal for me. It's lacking some of the features of its more developed Windows client, but it isn't as bad as the mac Lync client was back when I last had an opportunity to experience Microsoft's conferencing platform on non-Windows.
> You can't sign in here with a personal account. Use your work or school account instead.
Of course it works if you instead go to https://www.outlook.com/
However, Outlook is very much actively developed and myself being a Premium user am receiving new features now and then, like UI refreshes etc.
[0] http://journal.stuffwithstuff.com/2015/02/01/what-color-is-y...
Something that scales with mailbox size, network latency or similar is happening on the UI thread in Microsoft Outlook. This boggles my mind every time it happens, which is every day.
Something tells me Outlook is a huge pile of legacy C++ that no one dares rewrite any major part of even in the face of issues like this.
The Mac, web, and mobile versions are getting a whole lot of feature releases in comparison to the Windows client, but virtually all of them are intended to bring those other clients to feature parity with the Windows client. I've interpreted this as more of Microsoft treating these alternate platforms as first-class citizens finally, rather than as neglect of the Windows clients.
This also isn't limited to just the Outlook client - it's a trend across the entire Office suite. For example, Power Query[2] is a really neat and powerful ETL system hiding within Excel (as well as used in PowerBI, and originating in SQL Server Analysis Services). It's been available for years in Excel for Windows, but only recently became usable on Excel for Mac thanks to a major refactor[3] to port it to .NET Core and strip out Windows dependencies. Eventually this will reach feature parity with and replace the legacy Power Query code in the Windows Excel client, then all net new development will be cross-platform and available on both clients simultaneously.
All things considered, I view the resource allocation across the platforms as a positive. Presuming the feature development happening for the non-Windows clients are as similarly cross-platform as the Power Query effort, it'll eventually lead to cleaned up Windows clients with decades of legacy cruft refactored out, feature parity and consistency across platforms, and substantially greater velocity as those resources currently split between platforms eventually consolidate and focus on a unified, cross-platform, and cleaned up codebase. And if the refactors continue to prioritize .NET Core, it also bodes well for an official Linux client eventually.
[1] https://docs.microsoft.com/en-us/officeupdates/semi-annual-e...
[2] https://docs.microsoft.com/en-us/power-query/power-query-wha...
[3] https://devblogs.microsoft.com/dotnet/using-net-core-to-prov...
Faulting application name: OUTLOOK.EXE, version:
16.0.13001.20266, time stamp: 0x5ef2a169
Faulting module name: mso98win32client.dll, version: 0.0.0.0, time stamp: 0x5ef2771f
Exception code: 0xc0000005
Fault offset: 0x00000000000beef2
Faulting process id: 0x3be0
Faulting application start time: 0x01d65ad25f628bf7
Faulting application path: C:\Program Files\Microsoft
Office\root\Office16\OUTLOOK.EXE
Faulting module path: C:\Program Files\Common Files\Microsoft Shared\Office16\mso98win32client.dll
It's also very nice how easy it is to roll back one program on Windows! officec2rclient.exe /update user updatetoversion=16.0.12827.20470After 5 hours of failed attempts and many failed system restore points the updatetoversion link above worked on multiple pcs
I only shutdown "my" work laptop on weekends.
"%Programfiles%\Common Files\microsoft shared\ClickToRun\officec2rclient.exe" /update user updatetoversion=16.0.12827.20470
[1] https://www.reddit.com/r/sysadmin/comments/hrq0mn/outlook_im...
Some have a victim pool, er, test pool, to help protect from such as this
Disliking something can be quite relative I guess...
i) local search isn't great and
ii) in general, the application UI is very 'delicate' -- e.g., I have mail rules that move mailing lists to a folder. I browse through them from time to time but housekeeping those is a nightmare-- Outlook consistently freezes while deleting larger numbers of email.
I don’t think it’s a coincidence that quality plummeted afterwards.
[0] https://arstechnica.com/information-technology/2014/08/how-m...
In the calls that I've gotten my tiny little Customer base has had something like 25 people "down" today. At least ActiveSync on phones and OWA were still working. I know some people were miffed that full-blown Outlook was unusable because it's a major part of their workflow.
How did this get past QA? It's simply mind-blowing...
I had to do that this morning on my work machine, not sure if it was related to this issue.
Yeah, that's disconcerting from a support standpoint. Fortunately, I was able to remotely load their event logs and get the status code. DDG pointed right to the above article.
(Safe mode did not work)
https://redmondmag.com/articles/2019/05/10/repairing-office-...
"%Programfiles%\Common Files\microsoft shared\ClickToRun\officec2rclient.exe" /update user updatetoversion=16.0.12827.20470
Boss: "Why didn't you get any work done?"
Middle manager: "The dog (Microsoft Outlook) ate my homework (email)"