So grain of salt.
You've almost certainly never had a system that supported any hardware accelerated crypto that also required a kernel module.
It's much easier to expose as cpu extensions.
You could push that all down to the accelerator, but if there are even a few such use cases you might want a dedicated DMA-capable implementation instead.
I've liked it nevertheless for context, as augmentation to parent's post.