My First Kernel Module: A Debugging Nightmare (opens in new tab)

(reberhardt.com)

133 pointsksml5y ago67 comments

67 comments

53 comments · 16 top-level

ksmlOP5y ago· 8 in thread

Hi HN, this was my first attempt at writing any sort of kernel code. I would love to hear your thoughts on this experience and on the fixes I applied, especially from anyone with more Linux experience than me :)

warybeary5y ago

Have you looked into using eBPF instead of writing a kernel module?

http://ebpf.io for some more insights.

At the very least, it'll provide some useful tooling for you to debug problems in kernel-space.

ksmlOP5y ago

I hadn't considered this! Can eBPF be used to access arbitrary kernel data structures, though?

1 more reply

ylyn5y ago

Seems like someone did try to get those functions exported, but the maintainer rejected it, saying that no driver should be poking so deep into fd internals. Makes sense. Your use case is kind of niche.

https://lore.kernel.org/lkml/20180730163256.GC27761@infradea...

By the way, C Playground is really helpful for teaching an OS course!

ksmlOP5y ago

That is really interesting and good to know -- thanks for that!

I hope C Playground is helpful, and I'm building it with teaching in mind. If you teach anywhere and could find it useful, let me know!

waiseristy5y ago

That entire email chain was unpleasant. Are Linux maintainers typically that combative?

1 more reply

ylyn5y ago

Here's a hack you could use to get around the functions not being exported: https://github.com/anbox/anbox-modules/blob/master/binder/de...

Soft5y ago

This will stop working since kallsyms_lookup_name is no longer exported by recent kernels. See [1].

[1]: https://lwn.net/Articles/813350/

ksmlOP5y ago

Oh, that's clever! I might try that. I really don't feel comfortable building my own kernel

1 more reply

Taniwha5y ago· 6 in thread

So a story: I've been a kernel hack since Unix V6, made a living doing it one way or another for over half my life ... learning to think about concurrency, time, interrupts, race conditions etc is hard, very hard - I got pretty good at it ... but then my career took a diversion, I designed chips for a decade or so, everything is concurrency, at the lowest levels .... after a while I came back to doing kernel stuff and found that with this new background all that hard stuff was trivial and obvious.

Mostly you just have to steep your brain in it for long enough

suifbwish5y ago

That’s exactly it. It’s the only way to master something. The more varied exposure over time we have to the core ideas of a discipline, the more we come to master the thought process of comprehending it’s limits and possibilities to the extent where we can make it do whatever we like.

febed5y ago

How did you pivot from kernel programming to designing chips? Did you already have a background in embedded electronics?

Taniwha5y ago

I'm self trained in electronics, I'd started building nubus cards for Mac's and was hired as an architect for new graphics cards ... Started using C as an architectural reference language, from there it was a small step to using verilog instead ... Pretty soon I was building CPUs .... I've always been the hardware guy who understands software, and/or the software guy who understands hardware

rramadass5y ago

>learning to think about concurrency, time, interrupts, race conditions

So what books can you recommend to understand the above subjects? I know of only UNIX Systems for Modern Architectures: Symmetric Multiprocessing and Caching for Kernel Programmers by Curt Schimmel.

>everything is concurrency, at the lowest levels .... after a while I came back to doing kernel stuff and found that with this new background all that hard stuff was trivial and obvious

I see a lot of HDL programmers say this. But how exactly do you map the concepts since the very language semantics between HDLs and "Standard" computer languages are different?

arkj5y ago

Consider the simplest RISC execution path, from the software view there is an instruction executing in one cycle but from the hardware view in the same cycle along with the execute there is a different decode and fetch happening.

1 more reply

ksmlOP5y ago

Concurrency is still hard for me, but I do find it getting much easier over the years :) thanks for the story!

lallysingh5y ago· 5 in thread

EBPF is honestly the first thing to try before writing a module.

I'm glad to see you used a VM. That's the first step in the right direction. Others have mentioned that you should've used printk(), which is true.

I'll mention that you can also run the kernel in a debugger: https://www.kernel.org/doc/html/latest/dev-tools/gdb-kernel-...

ksmlOP5y ago

I hadn't considered eBPF because I needed some pretty obscure information from the kernel internals (i.e. the addresses of the `struct file`s) and I didn't realize eBPF was as capable as it is. Another commenter suggested trying it, though, so I'm checking it out now!

I did use printk for debugging, but I (incorrectly) assumed it could block. Another commenter pointed out that this is not the case. TIL!

The gdb link looks very helpful and I'll try that next time. Thanks for linking that.

PeterCorless5y ago

Yeah, my mind immediately went to eBPF too.

"But when BPF got extended, it allowed users to add code that is executed by the kernel in a safe manner in various points of its execution, not only in the network code."

wolfgang425y ago

Where would you look for a list of what you can do with eBPF and how? (I think maybe I’m searching for a list of hook points?) I keep seeing tantalizing hints about all of the things it lets you do, but the tutorials I’ve seen only seem to cover networking and tracing.

(The project I have in mind at the moment is making a bindfs-like filesystem without FUSE, but I’ve had a few different ideas where eBPF seemed like it might have been a good fit if I could figure it out.)

dmitris5y ago

https://github.com/iovisor/bcc/blob/master/docs/reference_gu... lists the hook points where BPF code can be attached. Also take a look at https://blogs.oracle.com/linux/notes-on-bpf-1 (there are follow-ups - https://blogs.oracle.com/linux/notes-on-bpf-7 has the links at the bottom) and from the Linux source, https://github.com/torvalds/linux/blob/master/include/uapi/l.... https://docs.cilium.io/en/latest/bpf/ is an extensive reference but with an emphasis on the network-related areas (xdp, tc).

lallysingh5y ago

There are 1-2 good books on it. I've skimmed them via O'reilly Safari when I needed something in the past.

cesarb5y ago· 4 in thread

> However, printk can block (while allocating memory)

No, printk() is magic. It can be called even in NMI context, which is a worse place. Quoting https://lwn.net/Articles/800946/, "[...] kernel code must be able to call printk() from any context. Calls from atomic context prevent it from blocking; calls from non-maskable interrupts (NMIs) can even rule out the use of spinlocks. [...]"

ksmlOP5y ago

This is really good to know. I had assumed it could block when allocating memory for the formatted string buffer, but the rationale explained in that article makes a lot of sense. Being able to use printk simplifes things a lot.

kanox5y ago

Also: allocating memory with GFP_ATOMIC doesn't sleep.

m4635y ago

now that is technical leadership.

msla5y ago

If you have to rely on printf debugging, you go to epic lengths to ensure printf always works.

noncoml5y ago· 4 in thread

I see the world “nightmare” used a lot in this attic ale.

I wonder if I am the only one that loves debugging difficult/weird problems. It’s something like trying to solve a puzzle. And knowing that the system will never deceive me(it will not be the system’s fault if I get deceived), and that a perfectly reasonable good explanation exists for what I observe helps me do not give up.

zaptheimpaler5y ago

Same. I would love a job comprising solely of jumping into big hairy systems and debugging weird issues. Its much more interesting to understand how exactly things work at every level of the stack (the bottom of the stack being OS/kernel or even hardware stuff, not a backend endpoint or database) than writing code.

zerkten5y ago

> I wonder if I am the only one that loves debugging difficult/weird problems.

Same here. At times, I'd prefer to just work on debugging things for colleagues versus writing rather boring code. It can give some insights when it comes to design, as well as enabling customer support to fix certain issues.

toast05y ago

It helps to have colleagues that break things in interesting ways. ;) Also important is a supportive manager, and a 'real job' that is usually time flexible; you might need to drop what you're working on to debug an issue when it's happening, so that needs to be mostly OK.

Or if you like networking challenges, having widely distributed users on diverse platforms and networks, and either running your own load balancers or using DNS or application level balancing, so that you can see the actual network flow, and not only the parts that make it through a load balancer.

Of course, it's a lot of frustration when you find the issue, and it's in some random router in some far off locale with no way to contact. Things like the linux large receive offloading bug that would receive larger than MTU packets because of offloading, then drop the packet (and send ICMP needs frag) because it's larger than the MTU of the destination address. I fixed the FreeBSD bad behavior when getting such an ICMP, but it would be nice if systems operating as routers would update their kernels a couple of times a decade. I could (and have, elsewhere) rant about more MTU problems, but let's just say, they're out there, they're stupid, and it's hard to get them fixed. Ugh.

Glyptodon5y ago

I enjoy it, but hate that it's almost always for something that needed to be figured out yesterday.

devit5y ago· 3 in thread

You can do most or all of that by reading /proc/<pid>/fdinfo/<fd> and /proc/<pid>/fd/<fd> or by making system calls on the affected fds (which you can do e.g. by injecting code with LD_PRELOAD or ptrace or with nsenter with fd namespace or equivalent C code).

Even if you write a kernel driver, iterating over all tasks in the system is a terrible design (there may be millions), not to mention "determining if a task belongs to a C playground program" in the kernel (obviously the kernel should have no knowledge about such specifics).

Of course, if a developer cannot even produce a reasonable overall design, it's not surprising that they aren't capable of writing correct code.

nosefrog5y ago

"Be kind. Don't be snarky. Have curious conversation; don't cross-examine. Please don't fulminate. Please don't sneer, including at the rest of the community."

https://news.ycombinator.com/newsguidelines.html

ksmlOP5y ago

I actually cannot get enough information from doing that. Crucially, I need to be able to recognize whether two file descriptors point to the same open `file_struct`. (To be clear, this isn't the same as whether they're pointing to the same file path. I need to know when the two file descriptors are sharing the same cursor.) There is no way to do this using existing APIs, because there is nothing identifying a `struct file` besides the memory address of the struct. (The "open file IDs" I mention are hashes of the `file_struct` address.)

I did spend a lot of time trying to avoid writing a kernel module, and this was the only way I could find to do it :)

devit5y ago

You can use the kcmp system call with KCMP_FILE argument to find out if two fds point to the same files structure (of course you must use this as the custom comparison function of a sort algorithm so you don't end up with quadratic run time).

Linux has a project called CRIU that can save and restore processes to disk without needing additional kernel modules, so pretty much all state is already gettable and settable from user space.

1 more reply

megous5y ago· 2 in thread

Linux has some debug options that could have probably helped here. It's a good idea to enable them when developing new code.

https://megous.com/dl/tmp/b6e8f550de4539a8.png

ksmlOP5y ago

Ah! This would have been really helpful!

PeterCorless5y ago

Hackernews at its very best.

sweettea5y ago· 2 in thread

You probably already did this, but for the audience: one of the best ways to make sure you're using a function reasonably is to use elixir.bootlin.com to look at other uses and make sure you're using the function similarly. For instance, check out https://elixir.bootlin.com/linux/latest/A/ident/for_each_pro... .

ksmlOP5y ago

Elixir was extremely helpful to me! It didn't always help me understand _why_ code was written the way it was (hence my incorrect use of rcu_read_lock), but it was very helpful to see some examples.

stevekemp5y ago

I've not done too much kernel programming, but for sure I know that looking for existing uses of code is very helpful.

It looks like the author of the piece did something similar, and noted other people doing similar things to themselves.

I wrote some modules to experiment with the Security Module API, because working with the APIs seemed like a good way to learn how they worked, and what was possible beyond just SELinux,Apparmor, etc.:

https://github.com/skx/linux-security-modules

lhoursquentin5y ago· 2 in thread

Great post, also love what you are trying to do with C playground, this is awesome!

I've recently been trying to build something similar, visualizing forks/exeve/read/write, but using the strace output of a binary, which is much less challenging.

ksmlOP5y ago

Thank you! It's open source, and I'd love to hear if you have any suggestions for it. Would also love to see what you're building!

lhoursquentin5y ago

Cool I'll definitely try to set it up in the coming days!

Here's my humble strace visualizer: https://lhoursquentin.github.io/visual-strace/

nosefrog5y ago· 1 in thread

Great story! I've had a lot of debugging nightmares, but thankfully never anything as bad as that.

One thing that looks fishy is this branch:

  if (container_tasks_len == max_container_tasks) {
    printk("cplayground: ERROR: container_tasks list hit capacity! We "
    "may be missing processes from the procfile output.\n");
    break;
  }

Since you said printk can block, why isn't calling it in the rcu critical section a bug? Is it because you immediately break afterwards and don't try to reference the next task?

ksmlOP5y ago

That's a good point. I'm hoping that this never gets hit, and if that line ever appears in the logs, then things are already broken. However, it's probably better to improve the failure mode where possible :)

[edit] and yes, since we break and don't follow the `next` pointer in the linked list, that also shouldn't cause any problems.

[edit 2] a sibling comment by cesarb pointed out that printk actually does not block, since it's important for it to be usable in critical sections to debug when the kernel gets into trouble

wyldfire5y ago

My knee jerk reading this article and seeing a kernel module near 'nodejs' was to grumble and say "wtf they clearly didn't need a kernel module for this". But upon reading deeper I see that accessing the kernel is kinda appropriate.

Regardless of whether you end up using eBPF or a .ko like you already have, you may have a yet simpler option. By leveraging the loader you can do an interposition trick with LD_PRELOAD to hook C library accesses. Maybe this is all you need in order to "help students understand system calls such as open, close, dup2, fork, pipe, and others. "

Just a suggestion. Carry on, good show.

egberts15y ago

Takes me back to the days of ATM device driver debugging. I’ve written 9 kernel drivers. All in all, a dedicated standalone terminal attached to the serial port of the target is still your best friend.

secondcoming5y ago

Great article! Reminds me of when I was working on a bug in a phone kernel and adding its equivalent of printk() made the bug disappear! Lauterbach time!

pjmlp5y ago

Back in the Windows NT/2000 days, IIS executed as part of the kernel, debugging ISAPI extensions was an exercise in patience every time a programming error crashed the kernel and a reboot was in order.

known5y ago

Free Book https://www.tldp.org/LDP/lkmpg/2.6/html/lkmpg.html

foxhlchen5y ago

nice article but I think op should use debugfs instead of /proc. debugfs is designed for this purpose.

j / k navigate · click thread line to collapse

67 comments

53 comments · 16 top-level

ksmlOP5y ago· 8 in thread

warybeary5y ago

Have you looked into using eBPF instead of writing a kernel module?

http://ebpf.io for some more insights.

At the very least, it'll provide some useful tooling for you to debug problems in kernel-space.

ksmlOP5y ago

I hadn't considered this! Can eBPF be used to access arbitrary kernel data structures, though?

1 more reply

ylyn5y ago

https://lore.kernel.org/lkml/20180730163256.GC27761@infradea...

By the way, C Playground is really helpful for teaching an OS course!

ksmlOP5y ago

That is really interesting and good to know -- thanks for that!

I hope C Playground is helpful, and I'm building it with teaching in mind. If you teach anywhere and could find it useful, let me know!

waiseristy5y ago

That entire email chain was unpleasant. Are Linux maintainers typically that combative?

1 more reply

ylyn5y ago

Here's a hack you could use to get around the functions not being exported: https://github.com/anbox/anbox-modules/blob/master/binder/de...

Soft5y ago

This will stop working since kallsyms_lookup_name is no longer exported by recent kernels. See [1].

[1]: https://lwn.net/Articles/813350/

ksmlOP5y ago

Oh, that's clever! I might try that. I really don't feel comfortable building my own kernel

1 more reply

Taniwha5y ago· 6 in thread

Mostly you just have to steep your brain in it for long enough

suifbwish5y ago

febed5y ago

How did you pivot from kernel programming to designing chips? Did you already have a background in embedded electronics?

Taniwha5y ago

rramadass5y ago

>learning to think about concurrency, time, interrupts, race conditions

>everything is concurrency, at the lowest levels .... after a while I came back to doing kernel stuff and found that with this new background all that hard stuff was trivial and obvious

I see a lot of HDL programmers say this. But how exactly do you map the concepts since the very language semantics between HDLs and "Standard" computer languages are different?

arkj5y ago

1 more reply

ksmlOP5y ago

Concurrency is still hard for me, but I do find it getting much easier over the years :) thanks for the story!

lallysingh5y ago· 5 in thread

EBPF is honestly the first thing to try before writing a module.

I'm glad to see you used a VM. That's the first step in the right direction. Others have mentioned that you should've used printk(), which is true.

I'll mention that you can also run the kernel in a debugger: https://www.kernel.org/doc/html/latest/dev-tools/gdb-kernel-...

ksmlOP5y ago

I did use printk for debugging, but I (incorrectly) assumed it could block. Another commenter pointed out that this is not the case. TIL!

The gdb link looks very helpful and I'll try that next time. Thanks for linking that.

PeterCorless5y ago

Yeah, my mind immediately went to eBPF too.

"But when BPF got extended, it allowed users to add code that is executed by the kernel in a safe manner in various points of its execution, not only in the network code."