Well, when you're doing a minimum of 256000 calculations per page element, you should expect some slowness (assuming a radius 256 filter requires a 512x512 kernel to process). A less naive implementation of blurring blurs horizontally then vertically, for a rough estimate of 2
512w*h calculations, which could still be a lot for a lot of page elements.
I guess what I'm trying to say is this: don't expect to render a ton of large-radius shadows in realtime unless you're rendering with a modern game engine on a fast GPU.