Lookup tables can be slower. Do not assume that memory is fast. Even if the table is small and fits in cache, it will still be displacing other useful things from the cache. If you can have a small sequence of fast instructions using only registers and no branches, that's very likely to be faster -if not much faster- than lookup tables. Just one L1 cache miss could take much longer to resolve than computing this particular sequence of instructions.
The example is literally one cache line, which probably won't affect the rest of the program too much. But given the average L1 throughput, I'd bet the bitwise version is faster