1. the order of single line output results does not need preserved
2. long running parallel tasks are non-blocking for efficiency reasons
3. optionally need to include remote computers in a cluster
Toy example IP blacklist preparation:
cat ./banlist_ipv4.raw | parallel --ungroup --eta --jobs 24 "ipcalc {} | sed '2!d' " | grep -Ev '^(0.|255.|127.)' > ./banlist_ipv4.formatted
In this toy case, the child processes may be loaded hundreds of thousands of times. Thus, the random exiting parallel child processes avoiding blocking/waits reduces runtime cost.
Its FOSS, people shouldn't feel entitled to complain about how authors share their works. =3
for i in $(seq 10000); do echo "$((RANDOM % 255)).$((RANDOM % 255)).$((RANDOM % 255)).$((RANDOM % 255))" >> banlist_ipv4.raw; done
Running the example, the sed '2!d' is really not working for me. It keeps throwing "bash: !d': event not found" and I can't seem to find the right escapes for it to work.
I modified it to this
time $(parallel --ungroup --eta --jobs 24 ipcalc --nobinary --nocolor {} :::: banlist_ipv4.raw | awk '/Network/{ print $2 }' > banlist_ipv4.formatted)
Which ran in 1m3.876s
I then wrote it for xargs
time $(xargs --arg-file=banlist_ipv4.raw --max-procs=24 -I{} ipcalc --nobinary --nocolor {} | awk '/Network/{ print $2 }' > banlist_ipv4.formatted)
which runs in 0m42.346s
I'd love to understand a little bit better what you were trying to show in your example and why parallel in my example seems to be at least a third slower for the same input and task.
Removing the eta calculations from the parallel example doesn't change the runtime nor does adding pv to get progress with the xargs example. It's just not a meaningful amount of cost.