What's particularly impressive about Tramp is that other Emacs packages tend to work well with it. For instance, you can Magit over Tramp --- or, better put: Magit just works in Tramp buffers. Same with language server stuff. It's kind of wild when you think about what's happening under the hood.
And it's still transferring files. It's not remotely editing.
That's interesting. I know some places go to great lengths to keep developers from accessing production without some sort of break-glass procedure through a jump host. I'm curious if they all know about this sort of loophole.
1. You don't have to expose a jump host at all, which is one less exposed asset to manage and worry about.
2. Your security team should already be collecting Cloudtrail logs, so they get auditing of SSM/SSH "for free".
3. You can control SSM access via your SSO provider, which means you can trivially enforce a bunch of policies all in one place vs having to configure SSHD.
4. You can control SSM access via IAM.
5. You can limit session duration easily.
6. No more SSH agent hijacking, at least I don't think.
I also wouldn't call this a loophole, you have to explicitly have permissions to use SSM.
Perhaps not the best wording on my part. I was aware of SSM, but not aware of the SSH tunneling features. I'm wondering if that's common. Is the SSH tunneling controlled separately, or on by default if SSM is on?
If there's some tricky bug in production, then one can create some sort of debugging service that runs on another port and deploy it to investigate the bug, or use management and monitoring tools. Copying files up to production is something that should be only done by an automated deployment script.
If you are under time pressure to fix an escalation from a high profile customer, and you don't have such a service yet, do you make the customer wait for you to write one, or do you just use command line access? Or else, if you already have such a service, but it doesn't contain the necessary diagnostics to investigate this particular problem, do you make the customer wait for you to enhance it, or do you just use command line access? Or you make your debug service totally generic – allow it to run arbitrary code supplied by the user – in which case it can do anything the command line can, but how is that actually any more secure than more standard means of command line access? Plus, it is going to be adding friction which may slow down resolution.
> or use management and monitoring tools.
Often these work fine for some problems, and then you get a problem which they don't cover adequately, and you need to go beyond them.
Seems to be at odds with
>then one can create some sort of debugging service that runs on another port and deploy it to investigate the bug
In many cases, that's just SSH. In most cases, I'm not copying files around, I want to connect to the real environment where firewall rules, API keys, permission systems, overlay networks, etc are in place. If there's a stuck process (let's say, lock contention) it's much easier to just SSH on and run gdb and check the stack to see what it's doing. Some languages like Java have pretty rich tooling out of the box for remotely connecting to processes. Others, like Python and Ruby, you just use gdb
Either way, there's no copying data necessary--you just need access to the running process. For a large system with hundreds of identical servers, I don't want to deploy a debug service everywhere; I just want to connect to the one with an issue and check that.
Snapshotting works sometimes, but I used stuck processes as an example since that's usually where all this remote/log/etc stuff falls apart. And, as-it-so-happens, things like lock contention tend to be really hard to recreate in synthetic or simulated environments that don't have real, authentic load.
Keep in mind that doesn't mean "go crazy with `root` in production". You can combine that strategy with scripting and tooling to drain/isolate/quarantine servers where the stuck process is still running but they don't have live traffic being routed to them.
I see this "ZOMG NO ONE TOUCH PROD" mentality a lot in highly regulated environments but it's usually more sustainable to try to isolate in-scope system's functionality as narrowly as possible to avoid bringing unnecessarily large amounts of things in scope (e.g. put the billing functionality in a microservice to limit PCI scope)
But what about when things don't work like they should and ought to?
Want to debug network connectivity issues? See which process is hogging CPU? Investigate installation/delpoy problems? Reinvent the wheel, or use what's already there.
Tramp does not need scp to transfer files, it can just as easily multiplex them over the shell connection by using base64 or uu encoding.
SSM is definitely not the most secure way[0]. SSM is super complex and super-integrated into the rest of AWS, and also isn't cross-cloud to GCP, Azure, DO, etc, so now everyone needs an account just to log into a Linux server.
Worse, IAM roles are powerful but easy to misconfigure, and that's before getting into how hard they are to apply with any granularity because of the policy length limitations[1], so you're likely giving everyone access to log into every instance without even knowing it.
0. https://cloudonaut.io/aws-ssm-is-a-trojan-horse-fix-it-now/
1. https://aws.amazon.com/premiumsupport/knowledge-center/iam-i...
I was mentioning that particular misfeature because it was a personal annoyance of mine. Oh well, I suppose everything is about customer lock-in these days.