When I interviewed at Stripe, they had a "debug this!" question ready to go in a number of languages. (I think the list was something like: ruby, java, python, javascript, go, maybe one or two more.) I thought this was pretty great, but also seemed pretty labor intensive.
That is the way to do it, good on Stripe. Its definitely labor intensive, but I think that you either need to outsource it or be big enough that you can source the talent to build a half-dozen plus idiomatic and kind of big applications to make it a good test.