no serious, safety critical system uses RL (except tesla "autopilot" and we see how that went). Control theory algorithms can be validated to work within the desired envelope and produce a valid solution.
The big advantage of convexifying the problem, is that when it is convex you have a guarantee it can be solved in fixed time, a major requirement for real time systems