Actually you're almost 100% describing how Wimsey works! It's using native df code rather than a UDF of some kind. Under the hood it uses Narwhal's which converts polars style expressions into native pandas/polars/spark/dask code with super minimal overheads.
If you're using a lazy dataframe (via polars, spark etc) Wimsey will force collection, so that can have speed implications. Reason being that I can't find a cross-language way yet of embedding assertions for fail later down the line.