This alone is probably manageable, it might even be simple but painful to handle for 2-15 of twitters employees (pre-firing) with specialized knowledge. If 3 people knew the disaster recovery plan and they all got fired because they were so busy maintaining things and fighting fires that they failed to get good reviews by building things, well I wouldn't be surprised. Likewise the employees trusted with extreme disaster recovery mechanisms are not the poor souls on H1Bs who don't have the option of leaving easily, so the people trusted with access might have already jumped ship since they aren't being coerced into staying on board with a mad man.
The real existential threat is another problem compounding on top of this or a disastrous recovery effort. Auto-remediation systems could do something awful. A master database could fall over and a replica be promoted, but if that happens twice, 4 times? Without puppet to configure replacement machines appropriately, there could be a very real problem very quickly. Similarly, extremely powerful tools, like a root ssh key, might be taken out, but those keys do not have seat-belts and one command typed wrong could be catastrophic. Sometimes bigger disasters are made trying to fix smaller ones.
Puppet can be in the critical path of both recovery (via config change) and capacity.
Same goes when someone lists all the reasons why a proposal isn't viable. "Great, so we'll address those and be golden then?" Often they list them as fact without considering (or the ability to imagine) that they could be made viable with additional effort.
It would only be really problematic if they also lost SSH access to those machines using Puppet. If you have root access the fix is not exactly hard.
But then they fired people that did had access so that might also be a problem
We made sure all of our machines can be accesses both by Puppet and by SSH kinda for that reason; we had both accidents of someone fucking up Puppet, and someone fucking up SSH config rendering machines un-loggable (the lessons were learned and etched in stone).
So really, depending on who has access to what, it can be anything from "just pipe list of hosts to few ssh commands fixing it" to "get access manually to the server and change stuff, or redeploy machine from scratch". Again, assuming muski boy didn't fire wrong people
> But then they fired people that did had access so that might also be a problem
Oh my, wouldn't that be delicious...
Gotta wonder how you'd go about fixing that, though. Assuming that those people's access was also tied to their employment and irrevocably voided when they were fired: I guess it would depend on how well those machines are secured against attackers with access to the hardware.
The way forward is to generate a new CA root certificate.
> and they can no longer run puppet because the puppet master's CA cert expired
They can reconfigure internal tools to use the new CA root certificate, or rather one of the signed intermediate certificates.
> and they can't get a new one because no one has access.
They can simply generate new CA root certificates, and sign or create new intermediate certificates.
> They no longer can mint certs.
Yes, they, can...
> My limited understanding in this area is that this is...very bad
No, it, is, not...
There are two immediate issues that come to mind.
* Twitter was so awful before, that it relied on people to safeguard the keys to the kingdom. This is very bad practice, and one of the many things Musk will no doubt be fixing. For any mission critical assets, and especially certificates, but also passwords... current modern day corporate practice is to have a secure ledger of these that can be accessed by the board of directors, the executive managers, and designated maintainers. At no point ever should the password be entrusted to anybody, but rather a "role" that functions as the one who has access. Say for example, the CIO/CTO and their subordinates.
* The Second issue is the one everyone is fixating upon, and that's firing important people who put the company at risk. This is a big issue, and certainly Musk could have done a better job of scoping out who represents a single-point-of- failure at twitter, eliminate that risk, and then proceed with the culling. In a modern enterprise no single person should be capable of putting the entire operation at risk. It's just that simple. So in a way, Musk accelerated what was probably inevitable at Twitter already. They were probably precariously close to destruction already, and now they can learn the hard way of not repeating these mistakes.
LOL, you realize all the PEOPLE you list as the PEOPLE who should be able to manage the keys to the kingdom are PEOPLE? Board of directors - fired on day one of Musk takeover, executive managers - many fired one day one by Musk as well, designated maintainers - for all we know they could have been fired in the purge or quit when Musk offered the 3 month severance.
All system require people to run.
Serious question... How do I build a system that grants access to a company role not a person? In other words, the CIO is fired, how does this system ensure that the new CIO can access it, and the old one no longer can?
If we tie it to the HR system, whoever admins that effectively has the keys to the kingdom. Same for Active Directory or any other technical solution.
Maybe in hacker movies. In real life, you try your best to avoid anyone having access to keys or passwords, and rely on HSMs, cloud KMS, secret services, etc. Access to those things is controlled by your security team, with multi-factor authentication, often stored in safes, with alerts being fired when they are used (because they should never be used). The audit logs that trigger these alerts should be written in WORM storage, so you can track access back down to individuals, and so that you know when you need to rotate secrets accessed by humans. Ideally your CA infrastructure automatically rotates and distributes.
There's absolutely no way in hell you should allow your board to have access to these things.
Most companies slowly work their way towards full automation, and until that happens, your security team usually owns manual rotations of critical systems like this. Only a fucking moron would fire all of these people.
maybe another one is assume you will lose access to the hsm. sure spinning up a new trust chain is annoying but it wouldn’t take that long to do. totally agree this post is overblown
This is because the whole idea is that you have inaccessible, locked down Production servers that only Puppet (which is driven from a central, governed configuration management source) has authority to configure i.e. no SSH and no root access.
Thus leaving the only option being to physically visit each server at the datacenter and issue the commands.
[1] https://www.puppet.com/docs/puppet/5.5/ssl_regenerate_certif...
Circular dependencies can absolutely wreck you. For example, puppet could configure sudoers, and without puppet config being applied people who would normally expect access might not have it. So now you have to find a privileged ssh key for un-configured machines.
I would be surprised if twitter did not have a physical vault with a USB drive with a root SSH key on it. With that you can do just about everything.
I would be most terrified of machine churn. Auto-remediation systems or elastic capacity systems can result in lost capacity that can't come back until the configuration problem is resolved.
Very simple operation... if you have working SSH access with root. If they don't, well...
Or the VMware esxi emulated graphical console, etc.
Or if it's a bunch of bare metal machines, hopefully someone old-school in the organization thought to deploy 48/96-port rs232 console serial concentrators and wire them up to the db9 serial port on each physical server. And you didn't disable all local serial tty in your operating config.
I've received calls from past employers, usually when they migrate a site I worked on to a new CMS or platform. There is some critical service (AWS, CDN credentials, domain related) etc. that no one knows who has access... Happily those appear to get resolved... but this... yikes (if true)
It doesn't have to be a departure on bad terms, if they needed my TOTP codes I can't help them. That secret is already gone.
But, well, if you fuck up your CM...
Beyond that, though: Internal build systems? Data encryption? User client auth to critical services? Internal app mTLS for data exchanges? The list of possibilities goes on and on…
I thought SOX mandated this sort of internal controls - after all, Twitter basically seems to be full of infrastructure risks that would (and have) negatively impacted them financially in a material way.
No key access? Why didn't they print it out and stick it in a safe deposit box, which is what a couple of startups I've been with have done...along with a couple of other key pieces of paper. Physical backup.
Say you're planning, well, _anything_, and someone says "but in five years, a weird billionaire might buy the company and mismanage it to such an extent that your contingency plans don't work". There's a good argument that the proper response is that (a) that is largely the weird billionaire's problem and (b) that it is impossible to defend against an arbitrarily incompetent speculative future weird billionaire.
If someone takes a hammer to an electricity distribution board and electrocutes themselves, the normal response is not "well, that's the electrician's fault; they should have thought of that".
If true, this would "reflect rather badly" on exactly one person. But, y'know, it'll need to join a rather long queue of poorly reflecting things.
Just that nobody plans for "bus hit our entire ops team"