RL needs a supercomputer and its code is usually too fragile - making a trivial mistake anywhere (missing a constant multiplication, swapping the order of two consecutive lines of code etc.) would likely lead to your model never converging even if you got everything else right.
The hard part of RL for the problems I've encountered in my work is that you need a simulator. Building a reliable and accurate simulator is often an immense undertaking.