In which case bit packing can be problematic as it can stretch across a cache boundary?
However, straddling word boundaries is problematic.
When you span the word size (e.g. add a single bit), you not only add two (dependent) instructions to access the value (consume a register), but without a linear access pattern (i.e. id index access), you can add anything up to ~200 cycles per value access. So, depending on the number of instructions per value, this could be anywhere up to 200x slower. This also doesn't consider the other possible issues like polluting the cache.
The memory bus word size is irrelevant. A cache line fill requires multiple words but DDR memory is designed to efficiently do exactly that.
So, where values fall on (across) cache lines does matter. Straddling word sizes has additional problems, and generally implies straddling the cache line size as well (unless it's a multiple).
Intel will soon be marketing CPUs with 512-bit registers. At that time the register width and the cache line width will be the same. That will be novel.