Wait What? That is an odd way of defining it. That's like saying turing machines are inefficient way to solve TSP. You would , at the least, want to define this in terms of complexity or put this into context of domains and observability.
RL's by definition is a field that is about finding efficient problems in the domain of choice[1]. There are likely regimes in LLM/LRM learning where RL can be quite efficient, polynomial time even in the state space, we just need to explore and find them. For example you can use Dynamic Programming as a "more" efficient way to solve MDPs[1] because it is polynomial in the state space X Action space.
[1]https://web.stanford.edu/class/psych209/Readings/SuttonBarto...