The problem with all the modern graphics APIs isn't sleep()--the graphics card generally can be filled to 100%, and you kick off another render when the swapchain releases a frame.
The problem is knowing how long the actual render is going to take. Scenes can vary vastly in complexity and predicting how long they will take to render a priori is extremely difficult.
Not an expert in graphics rendering, but could you estimate the render time by calculating (a priori) the number of draw calls needed to render the frame?
Unfortunately no - the cost will depend on the number of visible pixels, and worse, there are secondary costs depending on the arrangement of those visible pixels on screen, what part of the texture happens to be there, whether any sparse textures need to get paged in from system memory, whether GPU occlusion culling is going to cull certain things it wouldn't normally cull, etc. And other things are competing for GPU time like your OS compositor