That really is the $1m question huh, wish I knew tbh. Even ignoring the unpaid work aspect the other Catch-22 is sure, you might filter a few "only good on paper" candidates but if you're hiring "top talent" what is a 5min Flask API going to prove?
For a coding eval like that to be useful beyond jr level it'd need to be decently complex which usually takes a while to develop. Maybe an open ended (upfront no expectation of completion) kinda "see how far you can get in 1hr on this complex thing" could be a fair middle ground as laying some good groundwork is a pretty solid insight into their coding process.