You need at least nlog(n) bits for the same reason as regular shuffling. There are n! permutations and it therefore takes log(n!) bits to uniquely describe a shuffle. nlog(n) is the same as log(n^n) which is obviously greater than log(n!) (this is Stirling’s approximation).
OK, but I was kind of hoping for a treatment that involved the workings of one or both algorithms. They are subject to the same theoretical lower bound, yes. Do they have any advantages or disadvantages relative to each other? Does one of them do a much better job reaching the theoretical bound?
It kind of makes me mad that the very simple RS algorithm (divide and conquer) isn't more widely known given that it's so easy to implement and that it's actually parallelizable unlike Fisher-Yates. I think it's also better pedagogically since you can easily do it with a deck of cards and a coin or some dice while Fisher-Yates is pretty unwieldy for anything beyond very small lists.