Post Mortem on Salt Incident (opens in new tab)

(blog.algolia.com)

118 pointssfg756y ago65 comments

65 comments

41 comments · 10 top-level

cetra36y ago· 12 in thread

This whole salt-stack incident could've been handled a lot better by salt themselves:

- the notification was a week ago to a small mailing list, which is tucked away on their site

- no notification to the registry to when you go to download salt (at least I never received an email, but still get plenty of marketing spam)

- no posts on social media as far as I can tell, I couldn't find a tweet, anything on reddit, or anything on hn.

- they only blogged about it on their official site yesterday, way after damage had been done

- one week's notice between the initial announcement and the patch coming out. The patch being released is basically a disclosure of the vulnerability

- the patch was released late Thursday early Friday depending on your timezone, giving attackers the weekend head start

- the official salt docker images were only patched yesterday

- You can't get a patch for older versions without filling out a form and supplying details

- Ubuntu and other repositories are still vulnerable

mtam6y ago

+1, however, from what I read, the vulnerability can only be exploited if the attacker has network access to the salt masters port, which should never occur. The people that got compromised had Salt exposed to the Internet, which is obviously ridiculous.

Not trying to downplay the critical nature of the vulnerability but the ones that were compromised by this issue have deeper security issues to deal with.

mike_d6y ago

> has network access to the salt masters port, which should never occur

You seem to prescribe to the "hard shell soft gooey center" network security philosophy. Should people expose an Oracle server to the internet? Absolutely not. Does moving it behind a firewall change the fact that every mildly skilled exploit developer is sitting on an Oracle 0day? Absolutely not.

People have legitimate reasons for exposing Salt to the internet. I do. It's how I bootstrap random VMs and bare metal from the internet. But in my case the attack was mitigated by the fact that Salt cascades changes in a bunch of other systems and re-masters minions to a host only reachable over a tunnel. I blew away the internet master, restored from a backup, and patched.

> the ones that were compromised by this issue have deeper security issues to deal with

Or it was just another Monday. When you become sufficiently large you deal with incidents on a daily basis. Kudos to the people who publicly postmortem and talk about what went well and what didn't.

(For the record, I've already been working for a few months on a move to Ansible for non-security reasons)

3 more replies

StreamBright6y ago

Coconut security is not great either. Hard shell, soft internals. Not exposing the ports to the internet is just one layer.

tasssko6y ago

+1 agree but exposing salt to the internet is not the problem. A simple ip whitelist ingress firewall rule on the salt master port would have helped, blocking access is also possible on this port. With cloud services it has become trivial to group server resources so that when they belong to the same group they can communicate with each other. I don’t use salt however i am not a proponent of network isolation as a form of security.

1 more reply

Shish2k6y ago

I was in this situation; I went with “salt master exposed to the internet” because it’s the only service on that box - if I’d wrapped it in a VPN, then I’m replacing one exposed service with a different exposed service, and VPNs aren’t immune to exploits either (plus an extra layer of configuration means an extra layer of things that can go wrong)

CJefferson6y ago

If they wrote software which should never be visible to the internet, they should have made that clearer.

It's far too easy to make something internet-visible. They could have set up a simple check to see if the service is internet, and refused to work if it was.

cetra36y ago

If you look at their current `hardening` document it still has pretty unclear language about what is acceptable and what isn't.

> Use a hardened bastion server or a VPN to restrict direct access to the Salt master from the internet

Is this SSH access or is this access to the salt master from minions? Or just access in general?

2 more replies

bawolff6y ago

> one week's notice between the initial announcement and the patch coming out. The patch being released is basically a disclosure of the vulnerability

While your other points may be valid, one week should be plenty of time between announcement and patch. Any longer and i would call the time table problematic.

aneutron6y ago

You have clearly never worked at a large enough OLD corporation.

One week is nothing compared to what it would take to upgrade your configuration management system.

1 more reply

isodude6y ago

Weird that something with this title got buried here on HN: https://news.ycombinator.com/item?id=23041528

section_me6y ago

It was posted, just no traction on it (eg. https://news.ycombinator.com/item?id=22972100 posted 11 days ago). But yeah supprising the lack of posts about it.

cat1996y ago

> - Ubuntu and other repositories are still vulnerable

isn't really salt's problem though.. same could be said for relying on any distro-provided package

VWWHFSfQ6y ago· 6 in thread

The intruders had root access to every server in a salt deployment for who knows how long and yet everyone is claiming there's no evidence that any data or secrets (customer's or otherwise) were exfiltrated from the network. This is a very dangerous assumption. Nobody has any idea what was run on the servers since it seems that once the initial attack script was deployed it downloaded and executed new scripts every 60s and then removed themselves. Pretty standard C&C ops. It may have started as a mining operation, but that doesn't mean it was the only thing it was doing.

Jedd6y ago

> ... and yet everyone is claiming there's no evidence that any data or secrets (customer's or otherwise) were exfiltrated from the network.

A number of people have carefully reviewed the payload that was deployed to servers, especially during what we're calling v1-v4 of the attack. (v5 onwards got more complex, but that wasn't until Monday (with variability for timezone).

> Nobody has any idea what was run on the servers ...

Well that's not true - there's a number of victims that have useful IDS tools, including auditd, plus the review of binaries and shell scripts deployed, etc.

Some of us also have netflow collection at the edge, and can review connections initiated from within our networks.

> ... once the initial attack script was deployed it downloaded and executed new scripts every 60s and then removed themselves.

I don't think any of us have found scripts that removed themselves. While that may sound naive, there's a few researchers that have been analysing these tools, including via large honeypot networks, and this just hasn't (at least for the first 2-3 days) been a profile of the attack.

Thankfully - and I appreciate it's very weird to say this - the initial attacks were very much vanilla crypto currency mining opportunities. It could have been a lot worse, and algolia's assessment matches a lot of other independent assessments on this front.

VWWHFSfQ6y ago

I hope for everyone's sake that it was just a naive crypto mining operation. But given the length of time this vulnerability was available, and the extent of access it allowed, I just find it very hard to say with any certainty that we know everything that it was doing. Exploits like this get passed around in nefarious circles pretty regularly. One of the scripts I saw went to great lengths to eliminate competing crypto miners from the systems so they could run their own. That tells me there were multiple people (or groups) exploiting this in competition with each other.

You said the v5 of the attack got more sophisticated. How do we know there wasn't a "v0" that was even more sophisticated and innocuous? You can't trust the server logs. Firewall tables were flushed, SELinux was disabled. It's just really hard to say the full extent of damages.

1 more reply

johann-algolia6y ago

Hello,

I'll try to give you some insight as I'm a security engineer at Algolia.

Your concern is valid, and it's true, we cannot know for sure. That's the reason why, as explained in the blog post, we are reinstalling all impacted servers and rotating our secrets. If our assumption is false, this should contain the issue.

That being said, we have good reasons to make that assumption.

- Our analysis of the incident and how the malware behaved on our systems didn't find any evidence towards access and transfer of data.

- There are other public analysis of the malware. Other companies hit have the same analysis than us, and you can have a look at https://saltexploit.com/ which is maintaining an interesting list of what is known on the attack, how it behaved, and how it's evolving fast to adapt.

I hope this answers your concern.

lasdfas6y ago

I agree. I would like to seem more details of how they determined it was only crypto mining. Finding only mining scripts in your logs doesn't mean they were not running other code once they had root.

sterlind6y ago

It seems bizarre to me that a crypto miner got in. It wouldn't make much money on regular CPUs, and the high processor usage would immediately draw attention. So it looks like a low-effort botnet, which is embarrassing to get pwned by.

(The coin mining could be a cover like you mention, but it seems unlikely since it naturally draws attention.)

3 more replies

nemo1366y ago

running the virus code in a container / vm and checking what gets modified

mtam6y ago· 5 in thread

“We’ve secured the impacted SaltStack service by updating it and adding additional IP filtering, allowing only our servers to connect to it.”

So this means they had Salt master ports publicly accessible? Why would anyone have salt ports open/exposed to public/internet?

dijit6y ago

> Why would anyone have salt ports open/exposed to public/internet?

If you're bootstrapping random servers, this is a fine approach.

The whole Salt connection methodology is 'trust on first connect' (a bit like the default SSH) with a manual stage in accepting an incoming request and the connection stream is encrypted.

If you're using salt to bootstrap your VPN servers or network appliances then it's understandable that you'd have it exposed to a more public network, and the documentation was clear that this was fine.

Not everything is a virtual machine on a cloud provider.

alexandercrohde6y ago

Kind of a tough situation. I personally wouldn't be ready to accept this is the last such vulnerability that will be found.

In light of this attack, maybe going forward have a setup script that creates an SSH tunnel back to a machine that can talk to the salt-master for you. You could then have VPN, but if it's flakey at all, it could cost the ability to update machines.

Or perhaps (and I say this as a saltstack user) ansible really is the more secure model for those scenarios.

darkwater6y ago

> If you're bootstrapping random servers, this is a fine approach.

Define "random". I think there is an alternative method not involving exposing you CM server on the Internet for almost any definition of random. In the Algolia case it's pretty sure because they now filter the access by IP (so they KNOW the IPs)

1 more reply

mirimir6y ago

Yeah, that jumped out for me too. I'm guessing that they didn't want to deploy some sort of private network layer.

lykr0n6y ago

That's easier said then done. There are no simple cross cloud provider solutions for a private networking other then ZeroTier, which has it's own issues.

2 more replies

kureikain6y ago· 4 in thread

It's weird that these salt master are reach-able from internet and they can sleep well with it.

Even with zero-trust network or beyondcorp idea, I still found one extra layer of protection a VPC give are so great. Few years ago, it has an issue with K8S API Server, and updating k8s isn't a walk in the park. I felt relax back then because we have everything inside VPC.

You can use SSH or VPN to access service inside VPC. But any of tools that had permission to manage your infrastructure should never expose to the internet.

Same thing with Jenkins, if you are using Jenkins to manage Terraform or trigger Ansible/Salt/Chef run, make sure Jenkins is not reachable from internet. Using different method to route webhook into it.

trabant006y ago

I never understood the current trent to say VPN is a thing of the past. Redundancy in security layers is how you dont't get affected by every CVE out there.

Imo this is THE lesson to learn from this story.

Seondary: salt and ansible are not very mature yet.

dijit6y ago

Salt is definitely immature (been using it for 5 years and the situation has actually gotten worse in that time) but Ansible is a weird thing to group.

What issues do you have with Ansible?

darkwater6y ago

Yeah, I completely agree and really don't see the point of having a Configuration Management server facing Internet and basically having all your servers connect to it through the Internet! One thing is BeyondCorp idea to eliminate the roadwarrior concept and another is having your infra management exposed to CVEs in the wild!

For Jenkins it's a bit more complicated because GitHub webhooks although they do publish their IPs in a programmatic form so you can whitelist them.

kureikain6y ago

For Jenkins, what I do is:

1. Configured webhook override in Jenkins. So Jenkins will register sth like https://ci-webhook.domain.com to github webhook.

2. This ci-webhook is a simple webapp that validate webhook and if it's valid(sign by correct key), write the payload to SQS queue

3. A small daemon, run on same Jenkins master, that pulls SQS queue, and replay it to local jenkins

I used to rely on Github IP whitelist but one day i realized anyone can hit my Jenkins use Github.

2 more replies

lrpublic6y ago· 2 in thread

Trusting a central control server is the fundamental mistake here.

It creates a very high value target that is difficult to secure.

I prefer a model where the management commands are signed at a management workstation and those commands are pushed by the server and authenticated at the managed node against a security policy.

brianjlogan6y ago

What configuration management tools use this methodology?

lrpublic6y ago

A couple that I’ve built - they are not commercially available.

I’d consider open sourcing something based on them if there’s sufficient interest.

Perhaps as an integration for one of the major players.

alexbrower6y ago· 2 in thread

Can anyone describe the business benefits of an algolia implementation (vs Elasticsearch?) for a company that doesn't heavily rely on content searches? It seems expensive and something that I'd build on my own.

(Disclaimer: long-time operator and fledgling programmer)

aseure6y ago

Disclaimer: I'm a developer at Algolia.

IMHO the two main advantages in favor of Algolia, are the sane defaults for relevancy and speed and the fact that the service is hosted and can grow with your business without having dedicated engineers to manage both the configuration and the infrastructure.

Also, on top of the Algolia services per se (search, analytics, recommendation, etc.), we're providing a lot of backend and frontend libraries which one would otherwise need to reimplement when using an elastic- or Solr-based implementations.

vegannet6y ago

Search is hard to get right and the cost of Algolia is negligible vs. doing it yourself. As a programmer, every line of code you write is a line of code you own: the less code you own in production, the better off you are. Algolia has saved us hundreds of hours which translates to tens of thousands of dollars.

hawaiian6y ago

I haven't been a fan of Salt since learning they decided to roll their own encryption.

You don't have to look that far to find problems with that:

https://github.com/saltstack/salt/commit/5dd304276ba5745ec21...

0x06y ago

Both this and the ghost cms updates seem to hint that the only reason this was discovered was the fact that loud crypto miners were exhausting resources. What are the chances a more quiet attacker hasn't thoroughly ploughed through the entire infrastructure days ahead?

Also think about how many years this vuln has been present and exposed. Who's to know blackhats haven't sat on this 0day for years, quietly compromosing private keys and other data? Spooky.

ciprian_craciun6y ago

I've seen mentioned in the comments various "deployment" tools (or call them "configuration management" if you will) being called "insecure" or "immature", or one being claimed better than another; however I think this is a good opportunity to talk about a deeper problem, namely the architectural choices each tool has taken.

These choices all impact the reliability and security of the resulting system, especially the following:

* do they rely on SSH, or they have implemented their own authentication / authorization techniques? (personally I would be very reluctant to trust anything that just listens on a network port for deployment commands, and it's not SSH;)

* do the agents run with full `root` privileges, or is there a builtin mechanism that allows the agent to act only in a limited capacity, within the confines of a set of whitelisted actions? (perhaps even requiring a secondary authentication mechanism for certain "sensitive" actions, for example something integrated with `sudo`, that provides a sort of 2-factor-authentication with a human in the loop;)

* do the operators have enough "visibility" into what is happening during the deployments? (more specifically, are the deployment scripts easily auditable or are they a spaghetti of dependencies? are the concrete actions to be taken clearly described, or are they hidden in the source code of the tool?)

* are there builtin mechanisms to "verify" the results of the deployments?

* and building upon the previous item, are there mechanisms to continuously "verify" if the deployment hasn't changed behind the scenes?

I understand that some of these features wouldn't have helped directly to prevent this particular case, however it would have helped in alerting and diagnosis.

vbernat6y ago

As a point of comparison, you can also expose Puppet masters to the public Internet but Puppet is using HTTP/HTTPS as a transport, so it is trivial to put a reverse proxy in front of it, requiring a valid certificate (managed and signed by Puppet) to contact the service. This way, no need to maintain a whitelist of legitimate clients.

j / k navigate · click thread line to collapse