From the man page:
"--citation Print the BibTeX entry for GNU parallel and silence citation notice. If it is impossible for you to run --bibtex you can use --will-cite. If you use --will-cite in scripts to be run by others you are making it harder for others to see the citation notice. The development of GNU parallel is indirectly financed through citations, so if your users do not know they should cite then you are making it harder to finance development. However, if you pay 10000 EUR, you should feel free to use --will-cite in scripts."
Asking for donations/citations is one thing, but putting this junk in the man page about 10000 EUR and nagging users is quite an annoyance. How GNU allows such junk in their man pages puzzles me. Obviously the GPL allows one to remove the nagware and redistribute, but I don't know if anyone has forked it.
It's a great tool I'm sure, but I've been able to get by using just xargs, flock, etc., for most usecases.
This isn't nearly the case, so until then blaming FOSS authors for some experimentations is just unwarranted.
Citing it or not is an issue of academic practice/considerations (whether its use was a significant part of the research etc.). Mandating it through nag messages is too much.
What's next? make will print ads while the compilation runs? GIMP will watermark my images if I don't pay 10K or promise to cite it if I make figures with for my paper?
So again, my main confusion is about how this can be an official GNU tool.
They spent a lot of time and effort, and made a cool thing and gave it away for free. If it bothers you so much, just add the flag. Or patch it out.
"GNU Parallel is indirectly funded through citations. It is therefore important for the long term survival of GNU Parallel that it is cited. The citation notice makes users aware of this."
It's a bit like saying:
"Webkit is indirectly funded by iPhones. It is therefore important for the long term survival of Webkit that people purchase iPhones. The iPhone notice make users aware of this."
Imagine if every utility, library, or driver in a typical Linux distribution took this approach. :(
I encourage Debian et al. to adopt a "no nagware" policy.
> Programs whose authors encourage the user to make donations are fine for the main distribution, provided that the authors do not claim that not donating is immoral, unethical, illegal or something similar; in such a case they must go in non-free.
BTW, the nagware code has been removed in Debian unstable:
To me the dialog box is actually worse, because the program often blocks until you close the dialog box (not 100% sure if that is the case with Firefox).
With GNU Parallel you run 'parallel --citation' once, and you are done. We are talking an effort of 15 seconds or less.
When I install a library I often have to run the install command and it often takes longer than 15 seconds.
Finally, I would like to understand why you do not just use another utility? Would that not solve your issue?
- --joblog writes out a detailed logfile of the jobs, which can be used to resume from interrupted runs with --resume{,-failed}
- `--slf filename` can be used to provide a list of ssh logins to remote worker nodes to run jobs. Importantly, parallel will automatically reread this list when it changes. This lets you very easily distribute batch jobs across preemptible gcloud vms (or ec2 spot instances) and gracefully handle worker nodes appearing/disappearing with just a few lines of bash https://gist.github.com/gpittarelli/5e14fb772ce0230a3c40ffad...
- When used with bash, parallel can run bash functions if you export them with `export -f functionName` .
Additionally, you can run a series of unrelated commands that aren't from a list/piped in with parallel using the `--` syntax:
`parallel -j 3 -- ls df "echo hi"`
You can limit system load using parallel, which as far as I know isn't possible with xargs: `parallel -l L` where L is the average system load you want to remain beneath.
- Splitting input lines into multiple fields and building more complex commands from them
- Running jobs on remote nodes
- Pausing/resuming batch jobs (--joblog)
- ETA and progress bars
- Passing data to programs on stdin and generally many, many other ways of distributing and collecting data that xargs can't do
You can see a bunch of examples at: https://www.gnu.org/software/parallel/man.html
$ PAGER=cat man xargs | wc -l
259
$ PAGER=cat man parallel | wc -l
3985PAGER="wc -l" man xargs
(although my man page for xargs is just 211 lines)
I normally use xargs for simple things and if it’s a regular business operation I’d setup a task queue but there’s a fair amount of work in the middle where it’s nice to have a solid tool with most of the features you could want built in and tested.
You might want to look at:
https://unix.stackexchange.com/questions/104778/gnu-parallel...
I'll say that field separation / null termination is a bit annoying for xargs/find etc-but more so perhaps for novice users of shell. I do like shell pipelines, but quoting can be nearly.
For ad-hoc system modifications I've found myself using tmux's synchronize-panes feature, or xargs. For anything bigger or more involved then I break out Ansible/Chef/Puppet depending on which client project I'm working on.
I remember one place I worked at had a huge elaborate configuration/deployment system hand written by the head IT guy which used Parallel+bash+perl extensively. Thing is, while it was a great system, I could make the same changes in Ansible or Puppet with a couple of lines and push them within minutes, while making changes using the hand written system might take hours. Plus no logging and poor error handling led to all sorts of problems with that system, despite it being a real labour of love by that wacky Finnish dude.
However this sheet is really nice because it is just one side of a letter/A4 piece of paper and lays out the information clearly. I definitely want to mess around with Parallel now because of this cheat sheet. I wonder how it was typeset or laid out on the page? I try to write my own cheat sheets but they always seem way too sparse with too much white space. Maybe it is written in LaTeX or similar.
I also use it as a rudimentary queue system for stacking up the next jobs (while scripts stack up the next jobs, but..).
It had a bit of a learning curve because the docs are really technical and not geared towards new users enough, but reading and re-reading and trying some examples helped cement.
Here are a few ways I use it:
echo "Number of RAR archives: "$(ls .rar | wc -l)
ls .rar | parallel -j0 1_1_rarFilesExtraction
ls -d stocks_all/Intraday/*.txt | parallel -j${ccj}% 1_2_stockFileProcessing {}
I'd like to scale this to work with multiple machines (as Parallel can do) but I get really tempted to just write my own parallel processor just to rely on my own code.
I wrote a very different style of command parallelizer that I named lateral. It doesn't require constructing elaborate commandlines that define all of your work at once. You start a server, and separate invocations of 'lateral run' add your commands to a queue to run on the server, including their filedescriptors. It makes for easier parallelization of complex arguments.
Take a look if this sort of thing interests you, as I haven't seen anyone write one like this before. Its primary difference is the ease with which each separate command can output to its own log, and the lack of need to play games with shell quoting and positional arguments.
Check it out: https://github.com/akramer/lateral
Can you make a comparison between lateral and sem?
Can a single lateral server queue be used across multiple host machines? And in the other direction, can lateral launch and monitor processes that reside across multiple machines?
https://github.com/Miserlou/Loop
The author of GNU Parallel wrote a pretty detailed comparison, which you can find in the linked README.