~$ parallel --citation
~$ ./test.sh
#!/bin/bash
echo "note: bench-marking a long running task"
echo "BASH_VERSION=${BASH_VERSION}"
date
#jobs=0 will spawn as many as possible at one job per core
cat "./blacklistp_p2p" | parallel --ungroup --eta --jobs 0 "ipcalc {} | sed '2!d' " | grep -Ev '^(0.|255.|127.)' >> ./blacklist_p2p_converted
date
exit 0
#note the cluster network version after test run
For your example, the only difference I saw was the ">>", which shouldn't prevent the example from running.
In practice, we saw around a 30% task completion reduction in this task due to the random queue blocking time xargs generates to preserve ordered output.
Thus, unless one pins all cpu cores with hundreds of processes for several minutes... the overall completion time may differ from what was expected.
Note, the performance likely also depends on the in-ram Linux kernel Page Cache of your filesystem, and the child process execution time variability. i.e. if you are running it in an emulated/VM/WSL environment the batching may behave differently.
On average, this toy checks/converts several hundred thousand IP "3.0.0.0 - 3.127.255.255" ranges into a CIDR subnet notation "3.0.0.0/9". My example may now contain typos as the example is quite dated.
YMMV, and I hope you are able to replicate the fun =)