One advantage of bus mastering, at least in theory, is that you can do peer-to-peer transfers without going through main system memory.
In practice, I don't think this works out too well for really heavy workloads... both NVIDIA SLI and AMD Crossfire require their own interconnects, presumably because of limited bus bandwidth.
Remember the days when you had a separate MPEG-2 decoder card that externally proxied VGA from your graphics card?