http://ebpf.io for some more insights.
At the very least, it'll provide some useful tooling for you to debug problems in kernel-space.
https://lore.kernel.org/lkml/20180730163256.GC27761@infradea...
By the way, C Playground is really helpful for teaching an OS course!
I hope C Playground is helpful, and I'm building it with teaching in mind. If you teach anywhere and could find it useful, let me know!
Mostly you just have to steep your brain in it for long enough
So what books can you recommend to understand the above subjects? I know of only UNIX Systems for Modern Architectures: Symmetric Multiprocessing and Caching for Kernel Programmers by Curt Schimmel.
>everything is concurrency, at the lowest levels .... after a while I came back to doing kernel stuff and found that with this new background all that hard stuff was trivial and obvious
I see a lot of HDL programmers say this. But how exactly do you map the concepts since the very language semantics between HDLs and "Standard" computer languages are different?
I'm glad to see you used a VM. That's the first step in the right direction. Others have mentioned that you should've used printk(), which is true.
I'll mention that you can also run the kernel in a debugger: https://www.kernel.org/doc/html/latest/dev-tools/gdb-kernel-...
I did use printk for debugging, but I (incorrectly) assumed it could block. Another commenter pointed out that this is not the case. TIL!
The gdb link looks very helpful and I'll try that next time. Thanks for linking that.
"But when BPF got extended, it allowed users to add code that is executed by the kernel in a safe manner in various points of its execution, not only in the network code."
Read more here:
https://thenewstack.io/how-io_uring-and-ebpf-will-revolution...
(The project I have in mind at the moment is making a bindfs-like filesystem without FUSE, but I’ve had a few different ideas where eBPF seemed like it might have been a good fit if I could figure it out.)
No, printk() is magic. It can be called even in NMI context, which is a worse place. Quoting https://lwn.net/Articles/800946/, "[...] kernel code must be able to call printk() from any context. Calls from atomic context prevent it from blocking; calls from non-maskable interrupts (NMIs) can even rule out the use of spinlocks. [...]"
I wonder if I am the only one that loves debugging difficult/weird problems. It’s something like trying to solve a puzzle. And knowing that the system will never deceive me(it will not be the system’s fault if I get deceived), and that a perfectly reasonable good explanation exists for what I observe helps me do not give up.
Same here. At times, I'd prefer to just work on debugging things for colleagues versus writing rather boring code. It can give some insights when it comes to design, as well as enabling customer support to fix certain issues.
Or if you like networking challenges, having widely distributed users on diverse platforms and networks, and either running your own load balancers or using DNS or application level balancing, so that you can see the actual network flow, and not only the parts that make it through a load balancer.
Of course, it's a lot of frustration when you find the issue, and it's in some random router in some far off locale with no way to contact. Things like the linux large receive offloading bug that would receive larger than MTU packets because of offloading, then drop the packet (and send ICMP needs frag) because it's larger than the MTU of the destination address. I fixed the FreeBSD bad behavior when getting such an ICMP, but it would be nice if systems operating as routers would update their kernels a couple of times a decade. I could (and have, elsewhere) rant about more MTU problems, but let's just say, they're out there, they're stupid, and it's hard to get them fixed. Ugh.
Even if you write a kernel driver, iterating over all tasks in the system is a terrible design (there may be millions), not to mention "determining if a task belongs to a C playground program" in the kernel (obviously the kernel should have no knowledge about such specifics).
Of course, if a developer cannot even produce a reasonable overall design, it's not surprising that they aren't capable of writing correct code.
I did spend a lot of time trying to avoid writing a kernel module, and this was the only way I could find to do it :)
Linux has a project called CRIU that can save and restore processes to disk without needing additional kernel modules, so pretty much all state is already gettable and settable from user space.
It looks like the author of the piece did something similar, and noted other people doing similar things to themselves.
I wrote some modules to experiment with the Security Module API, because working with the APIs seemed like a good way to learn how they worked, and what was possible beyond just SELinux,Apparmor, etc.:
I've recently been trying to build something similar, visualizing forks/exeve/read/write, but using the strace output of a binary, which is much less challenging.
Here's my humble strace visualizer: https://lhoursquentin.github.io/visual-strace/
One thing that looks fishy is this branch:
if (container_tasks_len == max_container_tasks) {
printk("cplayground: ERROR: container_tasks list hit capacity! We "
"may be missing processes from the procfile output.\n");
break;
}
Since you said printk can block, why isn't calling it in the rcu critical section a bug? Is it because you immediately break afterwards and don't try to reference the next task?[edit] and yes, since we break and don't follow the `next` pointer in the linked list, that also shouldn't cause any problems.
[edit 2] a sibling comment by cesarb pointed out that printk actually does not block, since it's important for it to be usable in critical sections to debug when the kernel gets into trouble
Regardless of whether you end up using eBPF or a .ko like you already have, you may have a yet simpler option. By leveraging the loader you can do an interposition trick with LD_PRELOAD to hook C library accesses. Maybe this is all you need in order to "help students understand system calls such as open, close, dup2, fork, pipe, and others. "
Just a suggestion. Carry on, good show.