> You'll likely do not want to do joins on them directly.
I can't see you have any alternative. Looking for any discrepancy in either table, you have to compare them both completely. Is there an alternative?
(edit: actually there is, the checksumming I mentioned earlier. Never needed to do that).
> so let's add an order by and limit. Now you'll need to spot differences in 100 column table, since hiding matching values is even more hand-written / generated sql.
Yes, I added a limit for this too, and spotting two different columns out of a hundred, well it's not a big deal, but having done that I agree some GUI/other assistance would make it a lot nicer.
> It's pretty good to know about all problems upfront and not find and fix them one-by-one...
Well, you compare the schemas first and being small, you can do this very quickly so you can pick up errors there one after the other very easily. After that you compare the data, and I suppose you can run multiple queries one after the other in a batch file and check them when done. I guess I never had enough data to need to do that.
> ...but creating representative sample of differences is difficult to do with hand-written sql.
I'm not sure what you mean.
> Then what about cross-db comparison,
Straightforward. In fact if you weren't comparing data across DBs then you aren't doing ETL. (edit: ok I think ISWYM. Solutions are roundtripping if the amount of data is not overwhelming, checksumming otherwise).
> sampling, integrating with CI / github PRs?
I don't understand why any of these are needed, but I'm pretty certain that reflects on my lack of experience, so okay.
> Some companies are using complex pipelines that can and sometimes do lose events
Then you got a software quality problem. It's also not difficult (with joins against primary keys), to efficiently pick up the difference and to make things idempotent - just re-run the script. I've done that too.
> Often they don't need their data to be 100% perfect. So they need to know if they lost anything, what exactly and if it's within acceptable range.
Very good point.
> As always, roll-your-own is a valid approach, as is using already available tools to save time
this leads to very interesting question about upskilling of employees versus the cost of tools - good training (say, in SQL) is somewhat expensive but lack of knowledge is far more expensive. I don't think this is the right place to have such a discussion, and I don't want to detract from your product (I do seem to have dissed it which is not my intention).
> Many large companies like Uber and Spotify
Okay, if we're talking on the scale of companies like this then we talking something well out of my experience.
I'll leave it here, I think a distracted enough from your launch, and I wish you well with it!