The author needed to use unsafe in order to pass his pointer to libmodem, but libmodem is going to require a pointer with static lifetime itself. Which would have prevented the issue in the first place had the author done this.
I can see why you wouldn't want to use static, it hinders testability, but that means you need to ensure that the pointer you supply libmodem outlives libmodem. I would use RAII to do that in C++ and I am sure in rust you could/would do the same.
I guess I am asking, is there anything here that a libmodem written in rust would have magically solved? It feels like wishful thinking, but I am open to learn where I am mistaken.
In any case, kudos for finding this bug. Having worked with Zephyr/NRF connect SDK and this exact chip myself I can definitely relate to the pain they (can) bring.
But the custom Rust wrapper was composed as a game of telephone (ugh), with the author blindly mimicking "Jonathan" who seemed to have been blindly mimicking a sloppy (and later repaired) example from Nordic.
The argument is that if the library and its internals were originally written in Rust, which has richer semantics for object lifetimes, Rust would have been able to formally convey that the input data needed to outlive the individual function call, throwing an error at compile time.
The wrapper could have enforced this constraint itself, as it probably does now, but the handoff between Rust and C needs somebody to account for and understand the by convention stuff in C so that it can be expressed formally in Rust, and that human process failed to happen here.
I'm not following your comment, but I think the point is simply "the lifetime of the config is in the function signature, rather than hopefully (sometimes) being in the documentation, and hopefully (sometimes) correct".
The assumption nobody ever makes mistakes is mistake one.
Reading the article (nice troubleshooting story!), my summary, as a C programmer, is that the "C Interface" here "takes ownership". Given C cannot express this properly, a pointer is passed - and the called function "simply" makes the assumption that from hence-on, what was given to it will remain.
As "semantics" this (the need to pass an "owned" piece of data to a function) isn't unusual irrespective of the programming language. Just in case of Rust, this is explicit in the interface (if the func takes a non-ref arg, or a shared smart ref of sorts), while in C ... this can lead to errors of the observed kind. I haven't looked whether any of the sources or docs of libmodem say "this pointer must be either global/static or malloc'ed (and the caller shall not free it)".
A rust wrapper for this could / should possibly "leak a reference" here; Something that prevents the initialisation object from being dropped. yea, accepted, needs "nasty hacks" whether static lifetime, Pin, manual drops, explicit Arc leaks, ... possible though.
It'd be nice if libmodem were stricter about such ownership, agreed, and then a rust wrapper could take advantage. Takes time to evolve; is there a bug report / enhancement request out there for this in libmodem ?
The end of the post says
> This would have been so simple to put in the docs. I've opened a ticket on their DevZone forum. As of writing they've still not updated the docs of the init function.
And they've replied
> Thank you for reporting this, it will be fixed in the next `libmodem` release by the end of the month.
I'm still not sure I understand why he couldn't just diff between versions. And the black box thing seems like a fool's errand. If changing the order of random things makes the issue go away, you can't change anything. The only thing you can do is use the binary you already have. Especially because even if you have 2 not working versions, fixing one doesn't necessarily fix the other. This debug effort felt very sloppy.
It's also weird looking at a lot of this code. The first assembly function pushes 4 values on the stack and only needed to push 2. I've had my fair share of bugs that make me go to dissembly but that also felt very time wastey here. The author evidently did not have enough of a grasp of what to expect for it to help at all.
While true that nRF should've put something in a log, the author admits they don't support this development flow. It's like the old addage about APIs. Any change no matter how small will break someone's usage.
The tooling around this is much easier to deal with it, of course, since it's all just Windows and there's a bunch of sane debug layers that you can use.
Massive respect to be able to debug the issue on an embedded system!
I guess not, as the seems to run on the microcontroller, but I remember getting at least some warning from valgrind in similar situations
For example, I've encountered hardware that would occasionally write unexpected-error details to a memory location that was completely undocumented. And if you expect more than a shrug from a vendor after pointing out such things, well...