If I were going after the same problem I would try to do a couple of things.
1. Reframe the QA problem to make it smaller. Reducing the number and size of pipelines is a good start. That has a bunch of knock-on benefits beyond correctness.
2. Look at data cleaning technologies. QA on datasets is a variation on this problem. For example if you can develop predicates that check for common safety conditions on data like detecting bad addresses or SSANs you give users immediately usable quality information. There's a lot more you can do here.
Assuming you are working on this project, I wish you good luck. You can contact me at rhodges at altinity dot com if you want to discuss further. I've been dealing with QA problems on data for a long time.