A deeper dive into our May 2019 security incident (opens in new tab)

(stackoverflow.blog)

226 pointsalex-warren5y ago58 comments

58 comments

50 comments · 15 top-level

TacticalCoder5y ago· 8 in thread

> However, there is a route on dev that can show email content to CMs and they use this to obtain the magic link used to reset credentials

So many sites do this: allowing major changes to be effective immediately (like resetting credentials/password) by simply opening a "magic link" sent by email.

I think that this "immediately" is a major security antipattern.

I prefer it when such changes have a "cooldown" period of, say, 72 hours, during which the change is "ongoing" but not effective yet and during which the user can veto the change (say by either login on the site, where they'd then get a warning that a major configuration change is ongoing, and denying the change on the site or by opening another "magic link", sent by email, which allows to deny the change).

It's not a perfect solution but it stops so many of these oh-so-common attacks dead in their tracks.

Because there's a big difference between being able to read an email meant to someone (as happened here, on the server side) and being able to prevent a legit user from receiving emails while also being able to prevent that legit user from login onto a website with its correct credentials.

NieDzejkob5y ago

I don't think it's a good idea to make a password reset take 3 entire days. In this case, I'd say the costs outweigh the benefits.

curiousllama5y ago

Yea, 10 minutes & a text message would suffice, IMO...

1 more reply

spelunker5y ago

I think the problem is showing the "magic link" anywhere other than in the email for the intended recipient, since it's effectively a password.

As others have mentioned though, if you as a user know someone has access to said password and you're resetting it as an emergency, that's 3 more days of some hacker being able to log in with your password!

Sebb7675y ago

While I partially agree, letting an user deny the password change with the old password is pretty horrible in case the password was leaked. And if your dev forgot the password after the holidays, you're looking at 3 additional free days for him.

I see your point, but a timeout is a suboptimal solution.

TacticalCoder5y ago

> ... letting an user deny the password change with the old password is pretty horrible in case the password was leaked.

That's a different threat: an attacker knowing your password. Well... How do websites that allow instant credentials resets by email typically deal with a password reset asked by someone who knows the current password? Instant change too ("enter your current password / enter your new password twice"). And the good guy is locked out of his own account. I don't see how it's worse than that.

pc865y ago

The next time you forget a password and need to reset it, how likely are you to be willing to wait three days?

TacticalCoder5y ago

> The next time you forget a password and need to reset it, how likely are you to be willing to wait three days?

10 minutes like others suggested is way too short that said: this wouldn't catch attacks happening at night.

But to answer your question: it really depends what it is that you are protecting. For most sites I use by very far I don't see how 72 hours without access would be that problematic. Not logged in to StackOverflow for 3 days? Not a problem. Not logged in to HN for 3 days? Not a problem. Not logged in to Twitter for 3 days? I can live with that. Etc.

The question is: how much convenience are you willing to trade for security?

saagarjha5y ago

Ten days is better than forever, FWIW, which is something that many websites do.

akersten5y ago· 6 in thread

Interesting that most of the mitigations are "move resource behind firewall." Kind of an indictment of the whole BeyondCorp idea - unless we really trust our 2FA to never have any access bypass issues like the initial access to the dev environment here. Speaking of that, I didn't see "fix bug allowing unauthenticated access to dev environment" listed as one of the mitigations, but maybe I glossed over it.

YesThatTom25y ago

You got it backwards. BeyondCorp-like systems would have prevented that.

(FYI: I'm very familiar with BeyondCorp, as I was on an adjacent team when it was invented. Now I am an SRE at Stack Overflow when the incident happened.)

deanward815y ago

It's in the remediations section, but maybe the wording isn't clear:

*> Hardening code paths that allow access into our dev tier. We cannot take our dev tier off of the internet because we have to be able to test integrations with third-party systems that send inbound webhooks, etc. Instead, we made sure that access can only be gained with access keys obtained by employees and that features such as impersonation only allow de-escalation—i.e. it only allows lower or equal privilege users to the currently authenticated user. We also removed functionality that allowed viewing emails, in particular account recovery emails.*

There was no "unauthenticated" access into dev - the access key here is what allows login at all to our dev environment, but the attacker was able to bypass that protection.

akersten5y ago

Thanks, yeah I missed that on account of misunderstanding the nature of the access (bug vs token shenanigans)

vntok5y ago

> Hardening code paths that allow access into our dev tier. We cannot take our dev tier off of the internet because we have to be able to test integrations with third-party systems that send inbound webhooks, etc. Instead, we made sure that access can only be gained with access keys obtained by employees and that features such as impersonation only allow de-escalation—i.e. it only allows lower or equal privilege users to the currently authenticated user. We also removed functionality that allowed viewing emails, in particular account recovery emails.

Spooky235y ago

I wouldn't say that, but I would say that 2FA is insufficient. MFA, where the "M" includes some non-zero level of trust in the devices used for user and application runtime becomes essential.

"2FA", as commonly implemented in many scenarios is weak and only helps address certain scenarios -- TOTP tokens, for example, are pretty trivial to compromise. Critical infrastructure needs hardened tokens and clients with more controls.

invokestatic5y ago

I agree. I’ve employed the BeyondCorp philosophy behind a VPN as an extra measure of security, which is to say that all services are authenticated and encrypted inside the VPN perimeter. As shown in this article, service accounts are a major concern for attacker lateral movement which can’t be effectively protected with just 2FA.

CountHackulus5y ago· 5 in thread

I found it interesting that the attacker looked for help on the attackee's own site. I guess it truly proves how good of a repository of information StackOverflow is.

ballenf5y ago

There's a new service SO could offer: help a company under attack or recently attacked correlate the methods with suspicious users on SO, based on IP addresses and the presumption that attackers would use the same system to get help as used in the attack.

bentcorner5y ago

I feel like there's privacy implications around that. You'd need to be careful that it couldn't be abused.

fredley5y ago

Although the article was written in an extremely straightforward and dry technical manner, this was comedy gold.

anticristi5y ago

Should probably be classified as a meta-breach. :)

brazzy5y ago

Yeah, I'm pretty sure the reason why the article keeps repeating that is not a desire to provide the most detailed information about the breach...

blakesterz5y ago· 3 in thread

That was an interesting read. I'm left wondering "why" though. Anyone care to take a wild guess what they were after? That seems like quite a bit of work to be just doing it for no particular reason.

ballenf5y ago

Given the focus on enterprise systems and teams, really looks like it was a Solarwinds type (but lower sophistication) attack where SO wasn't really the target. The targets were users of SO Enterprise or teams products.

lima5y ago

If that's the case, why would they elevate privileges on the main SO site and draw attention to their successful intrusion?

2 more replies

plibither85y ago

Exactly what I thought. Probably more beneficial for attacker would be to report the security vulnerabilities and receive a bounty in turn.

dsr_5y ago· 3 in thread

"However, there is a route on dev that can show email content to CMs and they use this to obtain the magic link used to reset credentials."

Zawinski's Law: "Every program attempts to expand until it can read mail. Those programs which cannot so expand are replaced by ones which can."

Shog95y ago

Gonna clarify here, because that description is a bit misleading: this wasn't a route that allowed viewing sent emails, it was a route that allowed viewing what would be sent if a password reset was requested.

The story behind that route might be interesting... See, originally Stack Overflow didn't have passwords - all logins were done via OpenID, so any credential management you'd need to do was done through your provider (Google, LiveJournal, myOpenID, etc) This made account recovery assistance pretty simple: given a verified email address, the system would just send that address an email that reminded the owner of any and all OpenID providers that they'd associated with their account. From there, it was up to the account owner to work with a provider to do things like reset passwords.

Skip forward a few years, and Stack Overflow had its own OpenID provider - now you could sign up with an email and password just like a normal site, except really you were creating an account on https://openid.stackexchange.com/ - so the recovery process remained pretty much the same, just with a new provider that happened to be run by the same company.

So far so good... Except, this was awkward to explain to folks. Really, that was what ended up killing OpenID: folks wanted a "Google" or "Facebook" button, not a whitepaper on fancy new authentication systems.

At this point Stack Overflow decided to try to streamline the login process, making signing up and logging in with their own provider seamless: no need to know anything about OpenID. Now recovery emails started including password reset links, and also reduced or removed information on other OpenID providers that were associated with the account in an effort to reduce confusion. The decision tree for generating those emails got complex.

And the decision tree for supporting users got complex as well. Support staff got frustrated; they'd been used to knowing what would and wouldn't be in a recovery email, and had a pile of templates ready to help folks navigate login issues based on that. But now they were getting replies back from folks who were confused and upset because their recovery email didn't contain information that the support person had asserted it would!

This was the genesis of the vulnerable route: a way for support staff to ensure that they were providing accurate information to users about how they could recover their accounts. By the time of this attack, it was already obsolete; the login system had been redesigned twice since the confusing and complex system that first required it. It was vestigial and forgotten... The ideal breeding ground for vulnerabilities.

(source: I worked at Stack Overflow through the time period described in this post, and was involved in support during the period when the relevant route was useful)

jlericson5y ago

Yeah. I'm almost certain I was the one who got sick of having no idea what a user would see when they opened their email that I asked for _some way_ of see it. (Otherwise it was this strange dance of "Request a password recovery and tell me what it says.") I don't recall if I ever considered that it might be a _massive security hole_ if anyone got a hold of it. In retrospect . . .

(I overlapped with Shog at Stack Overflow.)

1 more reply

weinzierl5y ago

Excellent writeup and it shows that Stack Overflow's account management came a long way.

Still there is room for improvement. What confused me a lot recently was, that the reset link sent to a certain email is not necessarily for the login associated with that email.

I tried to to login at Stack Overflow after a long time. Entered my current mail and pw. Did not work, clicked pw recovery, received mail, reset pw, got logged in. So far so good.

Logged out, couldn't log in again. After a few password resets I realized that, while the mail was sent to my current address, the reset link actually was for the pw of a login associated with an old email.

At least for me that was not clear from the recovery email. Here is the full text with only email redacted:

> Account Recovery - Stack Overflow

> We received an account recovery request on Stack Overflow for new@example.com.

> If you initiated this request, reset your password here.

> You can use any of the following credentials to log in to your account:

> Email and Password (old@example.com)

> Email and Password (new@example.com)

> Once logged in, you can review existing credentials and add new ones. Simply visit your profile, click on Edit Profile & Settings and My Logins.

To be clear, "reset your password here." is a link and it changes the pw only for old@example.com.

1 more reply

Robelius5y ago· 2 in thread

As someone who doesn't work much with software teams, can someone fill in my gaps for understanding timeline.

I'm imagining after a security issue is identified, the steps taken are roughly in the below order and close-ish for the date. I guess my question is, why does it take 20 months from start to blog post?

-Contain the issue (1wk) -Remove the threat (1wk) -Build up remedies (a few months) -Check and recheck what happened to make sure you're accurate when submitting final reports (a few months) -Release a blog post (1month)

The timeline is a cool day by day instance, but I just don't understand the larger timeline.

tclancy5y ago

I assume it's related directly to "It’s been quite some time since our last update but, after consultation with law enforcement, we’re now in a position to give more detail".

kmontrose5y ago

It's this.

Discovery, immediate mitigation, deeper mitigation, general notice, notifying effected users - all these can happen pretty quickly once the ball is rolling. Once you're dealing with "the law" in any capacity you are constrained in what you details you can share broadly, and when.

I'm happy we were finally able to share this level of detail.

whimsicalism5y ago· 2 in thread

The chronology has some issues in dating/day starting on "Tuesday May 15th" (Tuesday was the 14th) and continuing on.

deanward815y ago

Ouch, good catch, fixing now

mark-r5y ago

Something to do with the difference between UTC and New York time zones, perhaps?

120bits5y ago· 2 in thread

Thank you SO for being open and listing the best practices. It seems like even few security best practices makes it harder for hackers to get in to your system.

I have database connecting strings and password as ENV variables. But I still don't know what is the best practice. Lets say someone gets access to the server, they can still read the ENV vars, right? It definitely prevents from accidently checking in your code git repo. But still . Does anyone has good recommendation for storing credentials like database passwords in a way secured way.

dividuum5y ago

> Lets say someone gets access to the server, they can still read the ENV vars, right?

Correct. Easiest way is to look at `/proc/$pid/environ`. It contains the \0 separated values for that process.

cbg05y ago

I don't think there's a magic way to do this, if your app can connect to the database and someone has access to your app server - they have access to your database as well.

sradman5y ago· 2 in thread

The report describes a security breech in 2019; the report was held back until now for legal reasons:

> Sunday May 5th

> ...a login request is crafted to our dev tier that is able to bypass the access controls limiting login to those users with an access key. The attacker is able to successfully log in to the development tier.

> Our dev tier was configured to allow impersonation of all users for testing purposes, and the attacker eventually finds a URL that allows them to elevate their privilege level to that of a Community Manager (CM). This level of access is a superset of the access available to site moderators.

EDIT: clarified that the report was held back

kmontrose5y ago

The breach itself was announced shortly after it was discovered: https://stackoverflow.blog/2019/05/16/security-update/

And affected users were notified once identified, which was shortly after the announcement: https://stackoverflow.blog/2019/05/17/update-to-security-inc...

This is an update with more details, which was held back for legal reasons.

sradman5y ago

Yes, thank you. My wording was ambiguous; my bad.

Eduard5y ago· 2 in thread

> Fortunately, we have a database containing a log of all traffic to our publicly accessible properties

https://stackoverflow.com/legal/privacy-policy

GDPR anyone?

TrickyRick5y ago

> When you visit the Network or use our Apps, Stack Overflow automatically receives and records information from your browser or mobile device, such as your Internet Protocol (IP) address or unique device identifier. Cookies and data about which pages you visit on our Network allow us to operate and optimize the Products and Services we provide to you. This information is stored in secure logs and is collected automatically.

Clear as day that they're doing exactly that. You agree to this when you use the site.

eitland5y ago

Collecting data might well be acceptable with the GDPR.

However what makes it legal isn't if it is written in the TOS or in a cookie banner.

AFAIK what matters is either:

- if you have a specific, valid (according to the GDPR) reason,

- or if you have the users free and informed consent.

... and yes, I think a number of the things I still see on the web is not OK:

- dark pattern where if you click manage settings everything is opted out, but there's a big green "Accept everything" and a small bland "Confirm my choices"? Doesn't fly because the rule that it should be equally easy to opt out.

- Cookie banners with no real opt out? No way.

- Cookie banners where you have to deselect 927 "partners"? Also no way.

The only ones that seems legal are those who either uses a pure minimum of cookies for preserving state and those allow one to opt out directly but inform you that ads might become less relevant.

itsdrewmiller5y ago

Kudos to the team over there for being as transparent about what happened and where they were not following best practices - I am pretty sure most companies would not publicly admit this:

we had secrets sprinkled in source control, in plain text in build systems and available through settings screens in the application.

lima5y ago

> A significant period of time is spent investigating TeamCity—the attacker is clearly not overly familiar with the product so they spend time looking up Q&A on Stack Overflow on how to use and configure it. This act of looking up things (visiting questions) across the Stack Exchange Network becomes a frequent occurrence and allows us to anticipate and understand the attacker’s methodology over the coming days.

Awesome writeup - this gave me a good laugh :-)

cbg05y ago

> After attempting to access some URLs, to which this level of access does not allow, they use account recovery to attempt to recover access to a developer’s account (a higher privilege level again) but are unable to intercept the email that was sent. However, there is a route on dev that can show email content to CMs and they use this to obtain the magic link used to reset credentials.

Many of these debugging tools are great for devs to test things quickly but I've always felt very weary of having these exist in an app without some strict access control with 2FA. Ideally you'd not have them in the app at all, maybe just on local dev.

riston5y ago

Did they figure out who were behind these attacks? These seems to be quite sophisticated and quite long taking attacks to dig so deeply into SO system.

CapriciousCptl5y ago

tldr; 1. Attacker found a stackoverflow dev environment requiring a login/password and access key to get in.

2. Attacker was able to login to the dev environment with their credentials from prod (stackoverflow.com) by a replay attack based on logging in to prod.

3. The dev environments allows viewing outgoing emails, including password reset magic links. The attacker triggered a reset password on a dev account, and changed the credentials. This gives them access to "site settings."

4. Settings listed TeamCity credentials. The attacker logged into TeamCity.

5. Attacker spends a day or so getting up to speed with TeamCity, in part by reading StackOverflow questions.

6. Attacker browses the build server file system, which includes a plaintext SSH key for GitHub.

7. Attacker clones all the repos 8. Attacker alters build system to execute an SQL migration that escalates him to a super-moderator on production (Saturday May 11th).

9. Community members make security report on Sunday May 12th, stackoverflow response found the TeamCity account was compromised and moved it offline.

10. Stackoverflow determines the full extent of the attack over the next few days.

j / k navigate · click thread line to collapse