According to `issquareup.com` there was a 13 hour outage yesterday - many API endpoints returned 500 errors, locking customers out of their POS and ecommerce sites.
It seems unlikely Square is going to post a full postmortem, but does anyone have an inside scoop on what engineering at Square is like now and how they could have a half-day outage with almost no communication?
The square outage wasn't just credit card processing, it was their entire product. You couldn't log in, update your ecommerce site, accept online orders, book appointments, anything. Even things that had nothing to do with accepting payments were broken.
Oddly, some parts of their API continued to work. When we take online purchases we have a checkbox to allow customers to keep their card on file (stored with Square - we keep a card ID generated by Square). Well, our software accesses the Square Customers API first to store the card, before we try to create a payment. So during the outage, all these customers were able to put their cards on file, but then the payment failed within the same 1-second time span. The result was that we ended up with tons of duplicate cards on file as customers kept re-submitting.
Certainly ransomware that corrupted data and required restoring everything from backups. Remember, your backups aren't backups until you've validated they can be restored!