But on new hardware, you're dealing with gremlins. Physical factors can affect hardware, and sometimes you just can't be sure if it's your code, Cthulhu or position of the moon. For example higher temperature can both decrease or increase crystal frequency. Firmware/software can affect hardware through power consumption and other routes. It can be really hard to know if it's hardware or software, except in hindsight.
Catching something that happens frequently is one thing, you have oscilloscopes and stuff. But good luck hunting down the issue that happens once per month with 20 test devices. Sometimes these things can take more than half a year to catch and fix.
When you mess up in kernel driver, result is often something pretty bad. Like crashing or freezing the system, or worse, silent corruption. Freezes and crashes are great, if you can use kernel debugger or get the dumps. Not so great when it happens on the other side of the world without any low level data to work with. You also have to really understand kernel interfaces, OS power management, driver life-cycle issues and performance ramifications to mention some.
I don't mean low level is necessarily hard, but I do mean it really, really depends!
It literally took months to find the cause, while in the meantime the people that didn’t want us to succeed (think big-company politics) were scoring points against us. We did everything we could think of, fixed a lot of bugs, tried stress-testing the system, things like that. Nothing worked. We were met with skepticism when we started saying it was the hardware. They said it can’t be the hardware....
Except this time it was — and we found out almost purely by good luck. One of the hardware guys happened to have an oscilloscope hooked up to a system when the reboot happened. The reboot was preceded by ‘something strange’ on the scope (I’m likely missing stuff here since this is memory from a few years back).
It turned out the motherboard had a bug which would manifest when the system went into a certain lower power state. That’s why stress-testing never caused it. In fact, the testing we did was actively preventing us from catching it!!
Lesson well and truly learned.
On a related note - is it just me or did Intel have a lot of trouble with their power-management implementation a few years back? I’ve since worked on another product that would randomly hard-freeze and sure enough after checking the Intel errata doc for the CPU/SoC in question (an Atom something-or-other), sure enough there were issues with S-states that meant we needed to limit them (to S1 I believe) in the BIOS, which did indeed fix the freeze issue.
To anyone that wants to avoid the long road here, realise that modern Windows, Linux and macOS systems never freeze or kernel panic without a special reason. Part of what I had to unlearn many years ago was the notion that on Windows, freezes and blue-screens were things that ‘just happened’ in the course of otherwise regular operation.
Low-level can get you, even if you aren’t doing low-level work.
> On a related note - is it just me or did Intel have a lot of trouble with their power-management implementation a few years back?
Everyone seems to have troubles with their power management stuff! Power state transitions are tricky in hardware, firmware and software. There are a ton of corner cases and assumptions. For example, you have to be careful (in software!) that you don't cause voltage dips by switching on chips and their peripherals too quickly. See [0].
CPUs, chipsets and peripherals have pages and pages of errata that BIOS, microcode/firmware updates and operating systems work around.
"But... but I AM the hardware vendor! sob"
And yes, Stack Overflow is really not going to help you with pretty much any of this. When doing low level programming, better get comfortable with that there's often no one in the world who can answer your question. Find it out yourself.
So all I can say it might be, or it might not.
On my first serious project at the first job I estimated back-end part of an app as 75% and front-end as 25%. It ended up exact reverse of that. And in the next years the pattern repeated over and over.
Everybody seems to think that UI is just about placing buttons but backend is the heavy lifting. In reality, requirements change more often and in a more fundamental way, code is harder to test, more third party dependencies involved, even the dumb KLOC metric ends up bigger.
By any mean, low-level backend will be harder than high-level backend, and low-level frontend will be harder than high-level frontend. Try implementing HTTP server in plain C vs. plain Java (no third party frameworks) and you will see. Being easier is practically the definition of high-level, he is just comparing things that are too different.
No, I don't think so. I think he's doing exactly what he said - comparing low-level with high-level. The low level stuff he was working on doesn't sound like it was a backend; it sounds like it was an embedded device running bare-metal (no OS). You wouldn't do a backend (database server, say) that way (or so I suspect).
Is there anything remotely that good going currently? Usually stuff is on medium so I just avoid clicking it.
Edit: reminds me a tiny bit of Catcher in the Rye, stylistically
ARM64 architecture reference manual is 5242 pages and doesn't even include things like GPU, CPU (554 pages for A57), interrupt controller (240 pages for the main GIC version 2 doc), dma controller (100 pages) and tons more you need to do low-level stuff.