I hope this is useful, to learn more about the project please see https://github.com/fduran/sadservers
Yup, that's what I was afraid of.
We ask the engineer who is proctoring the interview to think about the following question: Would you want to pair with that engineer again?
If that answer is no, then we probably won't go further because pairing with engineers to troubleshoot is what we do every day.
Some great resumes have died with not knowing how to see what's running on port 80.
One example, is we had them ssh, download & extract a tarball (the Linux source, but the content doesn't matter). Sometimes, they'd gunzip to stdout. The reaction tells you a lot "lol whoopsie" followed by a quick fix: person knows what they're doing. "uh… what is going on? did I break it?" followed with general cluelessness… maybe not.
That did occasionally break tmux, though.
Part of it was "what are the specs of this thing you're SSH'd into?" and we had one candidate who was adamant the numbers must be wrong: 2 GiB is too little RAM, no machine is that small! Yeah we didn't spin up 128 GiB VM for your interview…
Rejecting someone because they can't recall the correct netstat syntax doesn't seem like good hiring practice, but I assume in good faith that's not what you meant :)
> My org has run a hands-on technical exam with a stack of linux admin basics ... they are based on real problems we've had and the feedback is overwhelmingly "this was one of the best technical interviews I've ever had."
You essentially answered your own question.
Putting thought into the interview process and working with candidates through real problems is valuable. I cannot say the same for outsourcing or "automating" this portion of an interview using 3rd party SaaS.
Even so, test taking can be stressful but it's arguably less stressful than actual production support with people waiting on the result. Whether people really want to put candidates in a stressful situation is up to them. Sadserver seems like it's somewhere in the middle vs some of the things I've seen. One job interview put me in a room with a boot cd, and an ancient computer with a cdrom so slow you got exactly one chance to boot the media and recover the system in the time limit. But the job was for a trading company, so if you couldn't handle that they didn't want you. It was a fun exercise but would I do that to someone else? Probably not.
[...] Please don't pick the most provocative thing in an article or post to complain about in the thread.
Heck, I'm not even asking for an email (and I had to do extra session management coding for that).
its certainly better than some crappy whiteboarding session, or worse a take home test.
You might instead want to have a smaller pool of (larger) servers that you run co-resident VMs on with https://firecracker-microvm.github.io/. That will avoid account limits and also keep your AWS costs more predictable.
> You might instead want to have a smaller pool of (larger) servers that you run co-resident VMs on with https://firecracker-microvm.github.io/. That will avoid account limits and also keep your AWS costs more predictable.
I'd imagine (still waiting for it to load lmao) most of it could be containers too.
I like making jokes with coworkers about implementing this or that bit of infra with WASM-based tools mostly to get a rise out of them but each time I make the joke I look into some of the tools or projects and the balance of joke to "I'm actually serious" shifts a little bit to the right.
trollface.jpg
VMs are designed from the ground up to isolate guests, rather than focusing on application deployment.
Firecracker is the modern container alternative in untrusted compute scenarios, with Fly.io even converting container images into Firecracker VMs.
Are SREs and DevOps tasked with administration of operating systems?
yes, eventually.
you can dress it up in all the fancy terms that you like. but devops and SREs are sysadmins with better PR.
its critical that SREs understand _how_ to debug a system, so that they can work out how to put in fixes, and or design better systems.
Why would you need to understand how something works? Just use containers. /s
[Edited my compensation numbers to avoid down votes - yikes]
My guess is trying to sell high end services as a "principal software engineer" isn't going to be enough to justify that cash comp to a lot of people hiring.
If it does
while true; do echo hello >> bad.log; done
Then renaming bad.log will not solve the challenge.(Yes, these are bad solutions, since the instructions explicitly said to stop the process which is writing.)
An error occurred (VcpuLimitExceeded) when calling the RunInstances operation: You have requested more vCPU capacity than your current vCPU limit of 64 allows for the instance bucket that the specified instance type belongs to. Please visit http://aws.amazon.com/contact-us/ec2-request to request an adjustment to this limit.
Maybe something like https://leaningtech.com/webvm-server-less-x86-virtual-machin... would be cheaper and more reliable for this kind of thing?Mitigation: reducing servers life time temporarily so more people can try.
I'd start by moving the test VMs to bare-metal servers running libvirt. You can get a 128GB RAM server for ~110 EUR and that should be able to run around 120 concurrent VMs assuming 1GB of RAM to each (CPU isn't a major issue in this case).
You'd get SSH access to the VM, then submit a diagnostic report of what was broken (and how you fixed it).
Reminded me of how Red Hat used to run their certification test (RHCE). I probably still have the live CDs for my RHCE laying around somewhere.
Usually a simple combination of immutable files, SELinux policies, and types in configuration files were enough for most of the challenges. Though now and again you'd find they'd given you a server with packages removed, or not yet installed.
I've long wanted for some sort of mock, "things are broken - I want to see how you think" approach for sysad
188 points by fduran 3 hours ago | unvote | flag | hide | past | favorite | 68 comments
If you click 'favorite' it will save it to your favorites list. This is a publicly visible list - yours is https://news.ycombinator.com/favorites?id=bravetraveler and mine is https://news.ycombinator.com/favorites?id=shagie which makes it easy to get a bookmark type style functionality within HN.As I tend to favorite less often than I comment, it makes it easier to find those things I want to find again.
The HN interface too tends to just have my eyes filter out those links... but that's no defense.
Especially good to know that it's publicly viewable!
Not that I'm particularly worried of being outed by anything I favorite here, it's just good to be mindful of the data we make and where it goes.
It seems like this is a similar SaaS.
(FWIW I think this is a very cool and fun educational project regardless of what usefulness it might or might not have in IT hiring decisions, and I'm looking forward to playing with it)
After choosing a problem, the endpoint you poll at https://sadservers.com/celery-progress/xxxx repeatedly returns {pending: true, current: 0, total: 100, percent: 0} for me.
I've been waiting a while for the "sad server" to come up for me and read the scenario (saint john) whilst waiting.
lsof was the first thing that came to mind after reading the scenario.
I guess that once I actually get a "sad server" I'll make it "happy" quickly :)
I was the founder of a school training software engineers, we had an infrastructure track that got a lot of our students to land SRE positions. When asking employers for feedback about our grads, one feedback kept coming: they lack experience when it comes to troubleshooting.
So I went on a quest to simulate that infra debugging while in an academic context.
I came up with the idea of giving students broken servers. I used Docker container and would setup a simple workload and mess it up with classic issues.
Needless to say students generally did not like it :) debugging isn’t fun. But it did help a lot.