Also, I'd rather use a tasklet than a kthread to offload 'work' from interrupt contexts.
Amen. On another platform, I ended up mmapping and writing registers from userspace (which is obviously only useful when no interrupt handlers must be attached and the register word in question does not have any other functions that would require synchronized access). Implementing the proper kernel GPIO api (and device tree etc.) would have taken more than a week (this was a rather obscure PowerPC platform).
If you implement a GPIO controller driver using the gpiolib means that all other drivers and userland tools will be able to interface with you at no cost. If you write a device driver that uses a GPIO using the gpiolib means that you'll be able to use your driver even if the GPIO gets wired on an external I2C "expander" or something like that.
Pushing your logic to the extreme one might as well get rid of the kernel and write on bare metal to make sure not to waste any CPU cycle or memory byte on a abstraction layer. What is a kernel in the end if not a set of standardized APIs?
It's true that there's some bloat in a few of the kernel's APIs (the video comes to mind, although I won't pretend I would be able to do it better) but that shouldn't be a reason to "always try to do without having to use all of these APIs". Unless you only write code that's only intended to be used on a particular software configuration on a single board without any possible evolution or reuse it's really asking for trouble in the long term.
If I have to integrate a third party driver and I see that they use some custom made GPIO interface instead of the standard kernel API I'm going to be very annoyed.
If you use GPIOs for a couple LED, pin configuration and stuff like that it's perfectly adequate, and won't require a driver anyway. If your GPIO expander is on i2c then you clearly are not worrying about latency anyway, so gpiolib is just fine.
If you make a driver to use the gpios for more critical time sensitive things (overcurrent/thermal protections and that sort of things, or bitbanging), and then realize that actually, it's pretty crap and you'd be better off mmaping /dev/mem and poke at the registers directly from userland to get the performance/latency that is needed, your driver was a waste of time if you went that way first...
Unlike Linux user space, the Linux kernel space has support for interrupts. The first example in this article demonstrates how you can write an LKM that uses GPIOs and interrupts to achieve a faster response time than is possible in user space.
I've written a JSON-RPC API which can be used to interact with GPIOs. Currently I set/get a GPIO using sysfs. During handling of a request the operations on sysfs take most of the response time, far more time than parsing the request and writing the result to the client. How could I speed up the GPIO interaction with a LKM?