Show HN: SadServers – Test your Linux troubleshooting skills (opens in new tab)

(sadservers.com)

597 pointsfduran3y ago128 comments

Hello, I'm building SadServers.com, a SaaS where users can test their Linux troubleshooting skills on real Linux servers in a "Capture the Flag" fashion.

I hope this is useful, to learn more about the project please see https://github.com/fduran/sadservers

Show HN: SadServers – Test your Linux troubleshooting skills

(sadservers.com)

597 pointsfduran3y ago128 comments

Hello, I'm building SadServers.com, a SaaS where users can test their Linux troubleshooting skills on real Linux servers in a "Capture the Flag" fashion.

I hope this is useful, to learn more about the project please see https://github.com/fduran/sadservers

128 comments

92 comments · 29 top-level

deeblering43y ago· 13 in thread

> It's also my not-so-secret hope that a sophisticated enough version of SadServers could be used by tech companies (or for companies that carry on job interviews on their behalf) to automate or facilitate the Linux troubleshooting interview section.

Yup, that's what I was afraid of.

lbotos3y ago

Why are you afraid of this? My org has run a hands-on technical exam with a stack of linux admin basics (I won't enumerate them here because people do their research) but they are based on real problems we've had and the feedback is overwhelmingly "this was one of the best technical interviews I've ever had."

We ask the engineer who is proctoring the interview to think about the following question: Would you want to pair with that engineer again?

If that answer is no, then we probably won't go further because pairing with engineers to troubleshoot is what we do every day.

Some great resumes have died with not knowing how to see what's running on port 80.

deathanatos3y ago

Yeah, we did this at a previous employer.

One example, is we had them ssh, download & extract a tarball (the Linux source, but the content doesn't matter). Sometimes, they'd gunzip to stdout. The reaction tells you a lot "lol whoopsie" followed by a quick fix: person knows what they're doing. "uh… what is going on? did I break it?" followed with general cluelessness… maybe not.

That did occasionally break tmux, though.

Part of it was "what are the specs of this thing you're SSH'd into?" and we had one candidate who was adamant the numbers must be wrong: 2 GiB is too little RAM, no machine is that small! Yeah we didn't spin up 128 GiB VM for your interview…

1 more reply

joenot4433y ago

If you give the person you're interviewing access to the same tools they'd have in a regular day on the job (Google, manpages, etc.), I'd say that's a fair and probably relatively enjoyable interview.

Rejecting someone because they can't recall the correct netstat syntax doesn't seem like good hiring practice, but I assume in good faith that's not what you meant :)

2 more replies

mathverse3y ago

People in higher up positions like yourself will rarely be subjected to testing with tools like this. You are basically trying to remove the human from equation and industrialize the whole process.

3 more replies

deeblering43y ago

> Why are you afraid of this?

> My org has run a hands-on technical exam with a stack of linux admin basics ... they are based on real problems we've had and the feedback is overwhelmingly "this was one of the best technical interviews I've ever had."

You essentially answered your own question.

Putting thought into the interview process and working with candidates through real problems is valuable. I cannot say the same for outsourcing or "automating" this portion of an interview using 3rd party SaaS.

rednerrus3y ago

We do this in our org as well. 30 minutes of troubleshooting linux issues is a good way to evaluate a candidates experience. We run it as a team exercise with the candidate so that we also get the added bonus of how do they work in a team setting, how do they communicate, etc.

Nextgrid3y ago

Is it bad though? The problem with Leetcode is that it's an extremely unrealistic test. This on the other hand seems like it actually tests real-world scenarios, and you can get there without grinding. I'm pretty sure I can pass all the tests they've currently got despite having no formal sysadmin experience, just using common developer knowledge, common sense and strategic Google-fu.

technofiend3y ago

The Redhat Certified System Admin, Redhat Certified System Engineer and similar tests require practical, general hands-on skills to solve broken systems. The performance tuning and troubleshooting exams go into more detail and more complex scenarios. No internet access, but resources are available if you understand how to use them. Would never suggest people should solely hire on those certs, but if someone takes the time to complete 7 hands on tests for the certified architect certification, it's a strong indicator they have skills.

Even so, test taking can be stressful but it's arguably less stressful than actual production support with people waiting on the result. Whether people really want to put candidates in a stressful situation is up to them. Sadserver seems like it's somewhere in the middle vs some of the things I've seen. One job interview put me in a room with a boot cd, and an ancient computer with a cdrom so slow you got exactly one chance to boot the media and recover the system in the time limit. But the job was for a trading company, so if you couldn't handle that they didn't want you. It was a fun exercise but would I do that to someone else? Probably not.

pvg3y ago

Please don't post shallow dismissals, especially of other people's work.
[...] Please don't pick the most provocative thing in an article or post to complain about in the thread.

https://news.ycombinator.com/newsguidelines.html

Sebguer3y ago

Already exists. I can't remember the name, but the infra company that I used to work for used one of these as part of their interview loop.

fduranOP3y ago

That doesn't mean that I'd charge individual users :-)

Heck, I'm not even asking for an email (and I had to do extra session management coding for that).

KaiserPro3y ago

but why? a real test that is repeatable, realistic and not _overly_ hard. Sure for a junior software its a bad fit. but for a devop/sre/sysadmin, its a great fit.

its certainly better than some crappy whiteboarding session, or worse a take home test.

aliqot3y ago

I knew this is where it was headed :/

apawloski3y ago· 12 in thread

Based on your architecture diagram it looks like you're spinning up an instance per-user? As you're probably finding now, you will hit AWS limits quickly.

You might instead want to have a smaller pool of (larger) servers that you run co-resident VMs on with https://firecracker-microvm.github.io/. That will avoid account limits and also keep your AWS costs more predictable.

ilyt3y ago

That's kinda nice use case for the WASM machine/linux emulators, then you just need to provide image and user can run it in the browser

> You might instead want to have a smaller pool of (larger) servers that you run co-resident VMs on with https://firecracker-microvm.github.io/. That will avoid account limits and also keep your AWS costs more predictable.

I'd imagine (still waiting for it to load lmao) most of it could be containers too.

twalla3y ago

Someone else linked https://github.com/copy/v86 which seems really neat.

I like making jokes with coworkers about implementing this or that bit of infra with WASM-based tools mostly to get a rise out of them but each time I make the joke I look into some of the tools or projects and the balance of joke to "I'm actually serious" shifts a little bit to the right.

encryptluks23y ago

So then users experience will be poor due to the slowness and non-standard implementation. A better solution IMO would be to provide a container with SSH access.

yamtaddle3y ago

Just run them in Linux VMs with WASM, on the users' browsers. Make them all pay for it with higher utility bills and greater wear & tear on their hardware.

trollface.jpg

freeone30003y ago

This is actually a good idea for this -- the user wants the education, they can pay for it with their own hardware. Keep your costs low!

1 more reply

BossingAround3y ago

Why not spin up containers instead of VMs? Seems to me containers would fit much better than VMs.

paulfurtado3y ago

If the goal of the test is to debug a sad linux server, containers are going to severely limit what ways the server can be sad in, isn't it?

1 more reply

spiffytech3y ago

Containers have a history of escape vulnerabilities, for reasons like sharing a kernel with the host and other containers.

VMs are designed from the ground up to isolate guests, rather than focusing on application deployment.

Firecracker is the modern container alternative in untrusted compute scenarios, with Fly.io even converting container images into Firecracker VMs.

2 more replies

cogman103y ago

Bypassing container security is easier than bypassing VM security.

1 more reply

temp08263y ago

I haven't fully grokked this yet, but one trick I've used in the past to get around limits is AWS Organizations, creating a sub-account per property. A bit more setup but can keep things cleaner administratively.

icedchai3y ago

AWS will raise limits if you ask. Increasing EC2 instance limits is usually a quick turn around.

2 more replies

fduranOP3y ago

Yes thanks!

DeathArrow3y ago· 6 in thread

>Practice for your next SRE/DevOps interview.

Are SREs and DevOps tasked with administration of operating systems?

KaiserPro3y ago

> Are SREs and DevOps tasked with administration of operating systems?

yes, eventually.

you can dress it up in all the fancy terms that you like. but devops and SREs are sysadmins with better PR.

its critical that SREs understand _how_ to debug a system, so that they can work out how to put in fixes, and or design better systems.

asmr3y ago

Both SRE and DevOps are essentially evolved sysadmin roles. The DevOps philosophy is cross-functional and many sysadmins have adopted a DevOps approach. The latest edition of the classic sysadmin book "The Practice of System and Network Administration" is now centered around DevOps.

dsr_3y ago

If you have ops somewhere in your responsibilities, then yes.

jabroni_salad3y ago

depends on what layer the issue is happening at. I know everyone thinks the OS has been abstracted away but my ticket queue says otherwise. "yaml engineering" is just a control surface, I still need to pop the hood often.

jen_h3y ago

Yeah. Random data point: One of my most favorite SRE interviews ever (serious fun!) involved hands-on troubleshooting that eventually required gdb.

BossingAround3y ago

How do you automate something you can't do manually?

andrewmcwatters3y ago· 6 in thread

My only feedback is that this is unrealistic because today developers wouldn’t try to debug something, they’d just destroy the instance, push a commit and hope it fixed something infra related then recreate it.

Why would you need to understand how something works? Just use containers. /s

vsareto3y ago

Developers just need to understand everything because we need developers to do everything and meet all deadlines. We wouldn't dare consider a support role that could troubleshoot it because then there would be no point to having developers that can do everything! /s

cube003y ago

Support doesn't deliver features, we need new features! /s

grepLeigh3y ago

If most developers can't debug a VM, then anyone who can will be able to charge a premium. If you have a proficiency in ops, remember that the next time you negotiate a compensation package.

[Edited my compensation numbers to avoid down votes - yikes]

andrewmcwatters3y ago

I feel like you definitely have to target particular companies and more specifically specific titles and skills to offer to do so.

My guess is trying to sell high end services as a "principal software engineer" isn't going to be enough to justify that cash comp to a lot of people hiring.

1 more reply

sshd3y ago

This is so sad but so true!

edmcnulty1013y ago

If its dumb and it works it's not dumb.

yubiox3y ago· 4 in thread

Can't get to the first problem because of HN hug but anyway there are fake ways to "solve" it like renaming the logfile (what they test for solved is provided).

BossingAround3y ago

This is a self-test, not a certification. The goal is not to defeat the verification goal, but to learn something. So yeah, it's perfectly acceptable that the tests are not bullet-proof.

Timja3y ago

Depends on how the broken program writes to the log.

If it does

    while true; do echo hello >> bad.log; done

Then renaming bad.log will not solve the challenge.

teddyh3y ago

Replace it with a symlink to /dev/null! Or /dev/full if we feel like it.

(Yes, these are bad solutions, since the instructions explicitly said to stop the process which is writing.)

1 more reply

fduranOP3y ago

There are ways to cheat but not so simple; there's a script that checks for the solution and a hash of the script is checked for modifications.

dugmartin3y ago· 3 in thread

I'd suggest integrating https://bellard.org/jslinux/ and running the VM in the browser if you can - then you can scale without running out of resources.

fduranOP3y ago

Thanks, I've been looking at WASM, for ex https://github.com/snaplet/postgres-wasm/tree/main/packages/... , it would certainly simplify everything to "download a fat file".

jodrellblank3y ago

Have you seen https://copy.sh/v86/ ? It doesn't run as fast as jslinux but is BSD Licensed, on Github, and supports resuming the VM from a snapshot.

https://github.com/copy/v86

1 more reply

m00dy3y ago

or linux kernel port on webassembly.

vermon3y ago· 3 in thread

Seems like it's out of capacity:

    An error occurred (VcpuLimitExceeded) when calling the RunInstances operation: You have requested more vCPU capacity than your current vCPU limit of 64 allows for the instance bucket that the specified instance type belongs to. Please visit http://aws.amazon.com/contact-us/ec2-request to request an adjustment to this limit.

Maybe something like https://leaningtech.com/webvm-server-less-x86-virtual-machin... would be cheaper and more reliable for this kind of thing?

fduranOP3y ago

Yes, HN effect lol-sob.

Mitigation: reducing servers life time temporarily so more people can try.

warent3y ago

Usually I roll my eyes when someone posts their own website to HN and it crashes under load. But given the nature and complexity of yours I think there's room for understanding and patience :)

1 more reply

Nextgrid3y ago

Scaling this service without breaking the bank could become its own "sad server" scenario.

I'd start by moving the test VMs to bare-metal servers running libvirt. You can get a 128GB RAM server for ~110 EUR and that should be able to run around 120 concurrent VMs assuming 1GB of RAM to each (CPU isn't a major issue in this case).

bm-rf3y ago· 3 in thread

I'm assuming you're spinning up an EC2 instance for each lab. What do you think about using pre-built docker images for each challenge instead? that way they can spin up in just a couple of seconds. Might also be cheaper?

fduranOP3y ago

I wanted to do full VMs rather than Docker images but yes I could do Docker images or dedicated big instances with VMs on top like somebody else is suggesting.

bravetraveler3y ago

Not a bad idea but something to consider; this limits the options for kernel level things quite considerably

clvx3y ago

probably lxd would be better.

grepLeigh3y ago· 2 in thread

Very cool! This reminds me of the ops challenge @ Slack. I'm not sure if they still do this, but the SRE/platform infra interview used to involve a VM running a malfunctioning LAMP stack.

You'd get SSH access to the VM, then submit a diagnostic report of what was broken (and how you fixed it).

Reminded me of how Red Hat used to run their certification test (RHCE). I probably still have the live CDs for my RHCE laying around somewhere.

stevekemp3y ago

I've had interviews like that in the past, and really enjoyed them. Much better than "Draw an architecture diagram for how you'd handle a serverless IoT application" - where you lose points, silenly, because you didn't pick something the interviewer expected you to do.

Usually a simple combination of immutable files, SELinux policies, and types in configuration files were enough for most of the challenges. Though now and again you'd find they'd given you a server with packages removed, or not yet installed.

fduranOP3y ago

Oh that reminds me, I loved the original Stripe CTF, it's been 10 years already! https://twitter.com/fduran/status/240321390698442753

bravetraveler3y ago· 2 in thread

Commenting to give this a try later, I've routinely been the person to get these kinds of gremlins escalated

I've long wanted for some sort of mock, "things are broken - I want to see how you think" approach for sysad

shagie3y ago

In the "tricks of hacker news" -

     188 points by fduran 3 hours ago | unvote | flag | hide | past | favorite | 68 comments

If you click 'favorite' it will save it to your favorites list. This is a publicly visible list - yours is https://news.ycombinator.com/favorites?id=bravetraveler and mine is https://news.ycombinator.com/favorites?id=shagie which makes it easy to get a bookmark type style functionality within HN.

As I tend to favorite less often than I comment, it makes it easier to find those things I want to find again.

bravetraveler3y ago

Much appreciated! I'm woeful about using not using features like this, it's a character fault at this point.

The HN interface too tends to just have my eyes filter out those links... but that's no defense.

Especially good to know that it's publicly viewable!

Not that I'm particularly worried of being outed by anything I favorite here, it's just good to be mindful of the data we make and where it goes.

hotpotamus3y ago· 2 in thread

Are you familiar with Trueability? https://www.trueability.com/

It seems like this is a similar SaaS.

fduranOP3y ago

Didn't know about this one. There's quite a few labs/sandbox SaaS but what I've seen so far is that they are more for training with a "follow the recipe" model (do this do that to configure something, rather than "this (real) server is broken, fix it (with possibly different solutions)" which imho is more real-life and useful.

hotpotamus3y ago

I believe the company was founded by some coworkers of mine way back when at Rackspace who often interviewed Linux admins with a lab VM and I assume they just automated the setup and spun it off as their own business. At least that's what happened as far as I can tell; I didn't know the parties involved.

b200003y ago· 2 in thread

did you read up on the problems with leetcode?

fduranOP3y ago

Hi, not sure what the question means, I came up with the scenarios not copying from leetcode if that's what you mean.

pxc3y ago

I think they mean 'are you aware of the limitations of Leetcode-like tests and the downsides of their (over)use in hiring processes?'

(FWIW I think this is a very cool and fun educational project regardless of what usefulness it might or might not have in IT hiring decisions, and I'm looking forward to playing with it)

jer0me3y ago· 1 in thread

New challenge: Fix SadServers’ sad servers

vetkat3y ago

And while we’re at it, we might as well write a wrapper around low-upvote Server Fault questions in the hope that they attract more attention when the problem is gamified.

BossingAround3y ago· 1 in thread

I'd love to get the actual VM content offline, packaged as Vagrantfiles or Containerfiles. Love the idea though! Go to Pluralsight and pitch it to them :)

fduranOP3y ago

A few people have suggested offering content offline as a Docker image etc, good idea, thanks.

Timja3y ago· 1 in thread

The idea is really cool, but all I see is "Waiting for server..." and nothing happens.

kiyundai3y ago

That's the trick you failed the first challenge : "Did you try to turn it off and on again?"

lagrange773y ago· 1 in thread

Really cool idea.

After choosing a problem, the endpoint you poll at https://sadservers.com/celery-progress/xxxx repeatedly returns {pending: true, current: 0, total: 100, percent: 0} for me.

fduranOP3y ago

yes good catch (I should forbid internet access to this end point), poor queue is waiting on VM up but there's no quota left until other VMs are garbaged-collected.

mewse-hn3y ago· 1 in thread

Completed the first challenge and it was a lot of fun - spoiler I've never had to use the 'lsof' command before.

nobody99993y ago

>Completed the first challenge and it was a lot of fun - spoiler I've never had to use the 'lsof' command before.

I've been waiting a while for the "sad server" to come up for me and read the scenario (saint john) whilst waiting.

lsof was the first thing that came to mind after reading the scenario.

I guess that once I actually get a "sad server" I'll make it "happy" quickly :)

sylvainkalache3y ago

Very cool project!

I was the founder of a school training software engineers, we had an infrastructure track that got a lot of our students to land SRE positions. When asking employers for feedback about our grads, one feedback kept coming: they lack experience when it comes to troubleshooting.

So I went on a quest to simulate that infra debugging while in an academic context.

I came up with the idea of giving students broken servers. I used Docker container and would setup a simple workload and mess it up with classic issues.

Needless to say students generally did not like it :) debugging isn’t fun. But it did help a lot.

computershit3y ago

I love this idea, I'll definitely try it out when provisioning for scenario machines is up again. Nice work.

PanosJee3y ago

Hack The Box -> Fix The Box

yapril3y ago

Can I download the images so I could run it on my own machine ? I'd really appreciate, I've got an interview very soon :)

fzyzcjy3y ago

This looks interesting! But it keeps loading forever saying "Your server is being created" (hit VM limit again?)

10g1k3y ago

"Have you turned it off and on again?"

N3Xxus_63y ago

Well this sucks I wanted to try it lol. It's timing out for me or throws an error.

arwt3y ago

Interesting idea! Looking forward to trying this once some VMs are available. :-)

ASalazarMX3y ago

I only want to say that I love the name SadServers. Strongly memorable.

diffcheck3y ago

The tasks loading infinitely, is it a zero challenge?

imwillofficial3y ago

This is badass, just what I need!

Pr0ject2173y ago

Cool!

j / k navigate · click thread line to collapse

128 comments

92 comments · 29 top-level

deeblering43y ago· 13 in thread

Yup, that's what I was afraid of.

lbotos3y ago

We ask the engineer who is proctoring the interview to think about the following question: Would you want to pair with that engineer again?

If that answer is no, then we probably won't go further because pairing with engineers to troubleshoot is what we do every day.

Some great resumes have died with not knowing how to see what's running on port 80.

deathanatos3y ago

Yeah, we did this at a previous employer.

That did occasionally break tmux, though.

1 more reply

joenot4433y ago

Rejecting someone because they can't recall the correct netstat syntax doesn't seem like good hiring practice, but I assume in good faith that's not what you meant :)

2 more replies

mathverse3y ago

People in higher up positions like yourself will rarely be subjected to testing with tools like this. You are basically trying to remove the human from equation and industrialize the whole process.

3 more replies

deeblering43y ago

> Why are you afraid of this?

You essentially answered your own question.

rednerrus3y ago

Nextgrid3y ago

technofiend3y ago

pvg3y ago

Please don't post shallow dismissals, especially of other people's work.
[...] Please don't pick the most provocative thing in an article or post to complain about in the thread.

https://news.ycombinator.com/newsguidelines.html

Sebguer3y ago

Already exists. I can't remember the name, but the infra company that I used to work for used one of these as part of their interview loop.

fduranOP3y ago

That doesn't mean that I'd charge individual users :-)

Heck, I'm not even asking for an email (and I had to do extra session management coding for that).

KaiserPro3y ago

but why? a real test that is repeatable, realistic and not _overly_ hard. Sure for a junior software its a bad fit. but for a devop/sre/sysadmin, its a great fit.

its certainly better than some crappy whiteboarding session, or worse a take home test.

aliqot3y ago

I knew this is where it was headed :/

apawloski3y ago· 12 in thread

Based on your architecture diagram it looks like you're spinning up an instance per-user? As you're probably finding now, you will hit AWS limits quickly.

ilyt3y ago

That's kinda nice use case for the WASM machine/linux emulators, then you just need to provide image and user can run it in the browser

I'd imagine (still waiting for it to load lmao) most of it could be containers too.

twalla3y ago

Someone else linked https://github.com/copy/v86 which seems really neat.

encryptluks23y ago

So then users experience will be poor due to the slowness and non-standard implementation. A better solution IMO would be to provide a container with SSH access.

yamtaddle3y ago

Just run them in Linux VMs with WASM, on the users' browsers. Make them all pay for it with higher utility bills and greater wear & tear on their hardware.

trollface.jpg

freeone30003y ago

This is actually a good idea for this -- the user wants the education, they can pay for it with their own hardware. Keep your costs low!

1 more reply

BossingAround3y ago

Why not spin up containers instead of VMs? Seems to me containers would fit much better than VMs.

paulfurtado3y ago

If the goal of the test is to debug a sad linux server, containers are going to severely limit what ways the server can be sad in, isn't it?

1 more reply

spiffytech3y ago

Containers have a history of escape vulnerabilities, for reasons like sharing a kernel with the host and other containers.

VMs are designed from the ground up to isolate guests, rather than focusing on application deployment.

Firecracker is the modern container alternative in untrusted compute scenarios, with Fly.io even converting container images into Firecracker VMs.

2 more replies

cogman103y ago

Bypassing container security is easier than bypassing VM security.

1 more reply

temp08263y ago

icedchai3y ago

AWS will raise limits if you ask. Increasing EC2 instance limits is usually a quick turn around.

2 more replies

fduranOP3y ago

Yes thanks!

DeathArrow3y ago· 6 in thread

>Practice for your next SRE/DevOps interview.

Are SREs and DevOps tasked with administration of operating systems?

KaiserPro3y ago

> Are SREs and DevOps tasked with administration of operating systems?

yes, eventually.

you can dress it up in all the fancy terms that you like. but devops and SREs are sysadmins with better PR.

its critical that SREs understand _how_ to debug a system, so that they can work out how to put in fixes, and or design better systems.

asmr3y ago

dsr_3y ago

If you have ops somewhere in your responsibilities, then yes.

jabroni_salad3y ago

jen_h3y ago

Yeah. Random data point: One of my most favorite SRE interviews ever (serious fun!) involved hands-on troubleshooting that eventually required gdb.

BossingAround3y ago

How do you automate something you can't do manually?

andrewmcwatters3y ago· 6 in thread

Why would you need to understand how something works? Just use containers. /s

vsareto3y ago

cube003y ago

Support doesn't deliver features, we need new features! /s

grepLeigh3y ago

If most developers can't debug a VM, then anyone who can will be able to charge a premium. If you have a proficiency in ops, remember that the next time you negotiate a compensation package.

[Edited my compensation numbers to avoid down votes - yikes]

andrewmcwatters3y ago

I feel like you definitely have to target particular companies and more specifically specific titles and skills to offer to do so.

My guess is trying to sell high end services as a "principal software engineer" isn't going to be enough to justify that cash comp to a lot of people hiring.

1 more reply

sshd3y ago

This is so sad but so true!

edmcnulty1013y ago

If its dumb and it works it's not dumb.

yubiox3y ago· 4 in thread

Can't get to the first problem because of HN hug but anyway there are fake ways to "solve" it like renaming the logfile (what they test for solved is provided).

BossingAround3y ago

This is a self-test, not a certification. The goal is not to defeat the verification goal, but to learn something. So yeah, it's perfectly acceptable that the tests are not bullet-proof.

Timja3y ago

Depends on how the broken program writes to the log.

If it does

    while true; do echo hello >> bad.log; done

Then renaming bad.log will not solve the challenge.

teddyh3y ago

Replace it with a symlink to /dev/null! Or /dev/full if we feel like it.

(Yes, these are bad solutions, since the instructions explicitly said to stop the process which is writing.)

1 more reply

fduranOP3y ago

There are ways to cheat but not so simple; there's a script that checks for the solution and a hash of the script is checked for modifications.

dugmartin3y ago· 3 in thread

I'd suggest integrating https://bellard.org/jslinux/ and running the VM in the browser if you can - then you can scale without running out of resources.

fduranOP3y ago

Thanks, I've been looking at WASM, for ex https://github.com/snaplet/postgres-wasm/tree/main/packages/... , it would certainly simplify everything to "download a fat file".

jodrellblank3y ago

Have you seen https://copy.sh/v86/ ? It doesn't run as fast as jslinux but is BSD Licensed, on Github, and supports resuming the VM from a snapshot.

https://github.com/copy/v86

1 more reply

m00dy3y ago

or linux kernel port on webassembly.

vermon3y ago· 3 in thread

Seems like it's out of capacity:

    An error occurred (VcpuLimitExceeded) when calling the RunInstances operation: You have requested more vCPU capacity than your current vCPU limit of 64 allows for the instance bucket that the specified instance type belongs to. Please visit http://aws.amazon.com/contact-us/ec2-request to request an adjustment to this limit.

Maybe something like https://leaningtech.com/webvm-server-less-x86-virtual-machin... would be cheaper and more reliable for this kind of thing?

fduranOP3y ago

Yes, HN effect lol-sob.

Mitigation: reducing servers life time temporarily so more people can try.

warent3y ago

Usually I roll my eyes when someone posts their own website to HN and it crashes under load. But given the nature and complexity of yours I think there's room for understanding and patience :)

1 more reply

Nextgrid3y ago

Scaling this service without breaking the bank could become its own "sad server" scenario.

bm-rf3y ago· 3 in thread

fduranOP3y ago

I wanted to do full VMs rather than Docker images but yes I could do Docker images or dedicated big instances with VMs on top like somebody else is suggesting.

bravetraveler3y ago

Not a bad idea but something to consider; this limits the options for kernel level things quite considerably

clvx3y ago

probably lxd would be better.

grepLeigh3y ago· 2 in thread

Very cool! This reminds me of the ops challenge @ Slack. I'm not sure if they still do this, but the SRE/platform infra interview used to involve a VM running a malfunctioning LAMP stack.

You'd get SSH access to the VM, then submit a diagnostic report of what was broken (and how you fixed it).

Reminded me of how Red Hat used to run their certification test (RHCE). I probably still have the live CDs for my RHCE laying around somewhere.

stevekemp3y ago

fduranOP3y ago

Oh that reminds me, I loved the original Stripe CTF, it's been 10 years already! https://twitter.com/fduran/status/240321390698442753

bravetraveler3y ago· 2 in thread

Commenting to give this a try later, I've routinely been the person to get these kinds of gremlins escalated

I've long wanted for some sort of mock, "things are broken - I want to see how you think" approach for sysad

shagie3y ago

In the "tricks of hacker news" -

     188 points by fduran 3 hours ago | unvote | flag | hide | past | favorite | 68 comments

As I tend to favorite less often than I comment, it makes it easier to find those things I want to find again.

bravetraveler3y ago

Much appreciated! I'm woeful about using not using features like this, it's a character fault at this point.

The HN interface too tends to just have my eyes filter out those links... but that's no defense.

Especially good to know that it's publicly viewable!

Not that I'm particularly worried of being outed by anything I favorite here, it's just good to be mindful of the data we make and where it goes.

hotpotamus3y ago· 2 in thread

Are you familiar with Trueability? https://www.trueability.com/

It seems like this is a similar SaaS.

fduranOP3y ago

hotpotamus3y ago

b200003y ago· 2 in thread

did you read up on the problems with leetcode?

fduranOP3y ago

Hi, not sure what the question means, I came up with the scenarios not copying from leetcode if that's what you mean.

pxc3y ago

I think they mean 'are you aware of the limitations of Leetcode-like tests and the downsides of their (over)use in hiring processes?'

(FWIW I think this is a very cool and fun educational project regardless of what usefulness it might or might not have in IT hiring decisions, and I'm looking forward to playing with it)

jer0me3y ago· 1 in thread

New challenge: Fix SadServers’ sad servers

vetkat3y ago

And while we’re at it, we might as well write a wrapper around low-upvote Server Fault questions in the hope that they attract more attention when the problem is gamified.

BossingAround3y ago· 1 in thread

I'd love to get the actual VM content offline, packaged as Vagrantfiles or Containerfiles. Love the idea though! Go to Pluralsight and pitch it to them :)

fduranOP3y ago

A few people have suggested offering content offline as a Docker image etc, good idea, thanks.

Timja3y ago· 1 in thread

The idea is really cool, but all I see is "Waiting for server..." and nothing happens.

kiyundai3y ago

That's the trick you failed the first challenge : "Did you try to turn it off and on again?"

lagrange773y ago· 1 in thread

Really cool idea.

After choosing a problem, the endpoint you poll at https://sadservers.com/celery-progress/xxxx repeatedly returns {pending: true, current: 0, total: 100, percent: 0} for me.

fduranOP3y ago

yes good catch (I should forbid internet access to this end point), poor queue is waiting on VM up but there's no quota left until other VMs are garbaged-collected.

mewse-hn3y ago· 1 in thread

Completed the first challenge and it was a lot of fun - spoiler I've never had to use the 'lsof' command before.

nobody99993y ago

>Completed the first challenge and it was a lot of fun - spoiler I've never had to use the 'lsof' command before.

I've been waiting a while for the "sad server" to come up for me and read the scenario (saint john) whilst waiting.

lsof was the first thing that came to mind after reading the scenario.

I guess that once I actually get a "sad server" I'll make it "happy" quickly :)

sylvainkalache3y ago

Very cool project!

So I went on a quest to simulate that infra debugging while in an academic context.

I came up with the idea of giving students broken servers. I used Docker container and would setup a simple workload and mess it up with classic issues.

Needless to say students generally did not like it :) debugging isn’t fun. But it did help a lot.

computershit3y ago

I love this idea, I'll definitely try it out when provisioning for scenario machines is up again. Nice work.

PanosJee3y ago

Hack The Box -> Fix The Box

yapril3y ago

Can I download the images so I could run it on my own machine ? I'd really appreciate, I've got an interview very soon :)

fzyzcjy3y ago

This looks interesting! But it keeps loading forever saying "Your server is being created" (hit VM limit again?)

10g1k3y ago

"Have you turned it off and on again?"

N3Xxus_63y ago

Well this sucks I wanted to try it lol. It's timing out for me or throws an error.

arwt3y ago

Interesting idea! Looking forward to trying this once some VMs are available. :-)

ASalazarMX3y ago

I only want to say that I love the name SadServers. Strongly memorable.

diffcheck3y ago

The tasks loading infinitely, is it a zero challenge?

imwillofficial3y ago

This is badass, just what I need!

Pr0ject2173y ago

Cool!

j / k navigate · click thread line to collapse