So I made a tool for the job. It is tailored to this specific workflow, but I think it's generic enough to be useful in other cases too.
I thought it'd be complicated, but it turned out to be dead simple - get the device context of the desktop, bitblt it on timer to a bitmap, draw the mouse cursor, see if it changed from the last time, save to the disk. Once the capture is done, go through the images and pack them into a GIF. The rest is just some UI work, to make things convenient, but out of the way at the same time.
For the reference, GifCam [1] was close, but it has palette encoding issues that create visible artifacts in the output. ScreenToGif [2] is a .Net app, asked to install 4.8, that asked to reboot the computer, etc. It also comes with a lot of extra stuff (it has a ribbon-style menu bar!) and requires quite a bit of clicking to get things done. Others were even worse off in comparison. So, as per usual, you want things to be done your way, you do it yourself.
I completely forgot about LiceCap! Notable option of a very good provenance. Still a bit too much UI for my taste :)