If I’m working with parquet I’ll have duckdb on hand for fiddling parquet files. I’m much better at SQL at 2 am than I am at piping Unix tools together over N files.
I have no idea how I’d drop bad rows from this thing with a bash pipeline anyways, I need to select from one file to find the bad line numbers (grep I guess, I’ll need to look up how to cut just the line number), and then delete those lines from all the files in a zip (??). Sounds a lot harder than a single SELECT WHERE NOT or DELETE WHERE.