GNU Parallel Cheat Sheet [pdf] (opens in new tab)

(gnu.org)

119 pointsole_tange7y ago63 comments

63 comments

46 comments · 11 top-level

bonoboTP7y ago· 13 in thread

Ah, the rare case of nagware in GNU.

From the man page:

"--citation Print the BibTeX entry for GNU parallel and silence citation notice. If it is impossible for you to run --bibtex you can use --will-cite. If you use --will-cite in scripts to be run by others you are making it harder for others to see the citation notice. The development of GNU parallel is indirectly financed through citations, so if your users do not know they should cite then you are making it harder to finance development. However, if you pay 10000 EUR, you should feel free to use --will-cite in scripts."

Asking for donations/citations is one thing, but putting this junk in the man page about 10000 EUR and nagging users is quite an annoyance. How GNU allows such junk in their man pages puzzles me. Obviously the GPL allows one to remove the nagware and redistribute, but I don't know if anyone has forked it.

jmiserez7y ago

Yes. Another issue: The author has been promoting GNU Parallel pretty heavily on many StackOverflow questions dealing with xargs or parallel execution, even when an additional tool is neither needed nor wanted (because it's a) not already installed, unlike xargs, and b) because of the aforementioned citation issue which I disagree with).

It's a great tool I'm sure, but I've been able to get by using just xargs, flock, etc., for most usecases.

leppr7y ago

Your criticism would be fair if FOSS authors were rightly conpensated for their work.

This isn't nearly the case, so until then blaming FOSS authors for some experimentations is just unwarranted.

bonoboTP7y ago

They can do experimentations and distribute their work through their channels then. But GNU tools (and FOSS by extension) are so popular because of their no-nonsense philosophy of here it is do with it whatever you want. Run it anywhere and any way you please.

Citing it or not is an issue of academic practice/considerations (whether its use was a significant part of the research etc.). Mandating it through nag messages is too much.

What's next? make will print ads while the compilation runs? GIMP will watermark my images if I don't pay 10K or promise to cite it if I make figures with for my paper?

So again, my main confusion is about how this can be an official GNU tool.

2 more replies

oids98d98s7y ago

Gooood who caaaaares.

They spent a lot of time and effort, and made a cool thing and gave it away for free. If it bothers you so much, just add the flag. Or patch it out.

oids98d98s7y ago

parent comment was deleted. the tldr was "how rude of them to do the licensing flag" thing

lelf7y ago

Don't use it if you don't like it. It's that simple.

mechanical_jane7y ago

Do you feel this is covered in the FAQ? http://git.savannah.gnu.org/cgit/parallel.git/plain/doc/cita...

musicale7y ago

The FAQ makes it worse:

"GNU Parallel is indirectly funded through citations. It is therefore important for the long term survival of GNU Parallel that it is cited. The citation notice makes users aware of this."

It's a bit like saying:

"Webkit is indirectly funded by iPhones. It is therefore important for the long term survival of Webkit that people purchase iPhones. The iPhone notice make users aware of this."

1 more reply

musicale7y ago

Beyond being supremely irritating, nagware is simply not scalable.

Imagine if every utility, library, or driver in a typical Linux distribution took this approach. :(

I encourage Debian et al. to adopt a "no nagware" policy.

jwilk7y ago

There's something similar in Debian Policy §2.3 (https://www.debian.org/doc/debian-policy/ch-archive.html#cop...):

> Programs whose authors encourage the user to make donations are fine for the main distribution, provided that the authors do not claim that not donating is immoral, unethical, illegal or something similar; in such a case they must go in non-free.

BTW, the nagware code has been removed in Debian unstable:

https://bugs.debian.org/905674

1 more reply

mechanical_jane7y ago

I am trying to understand your take. Do you also call Firefox nagware, when it pops up with a dialog box where you can click "Don't show this again"?

To me the dialog box is actually worse, because the program often blocks until you close the dialog box (not 100% sure if that is the case with Firefox).

With GNU Parallel you run 'parallel --citation' once, and you are done. We are talking an effort of 15 seconds or less.

When I install a library I often have to run the install command and it often takes longer than 15 seconds.

Finally, I would like to understand why you do not just use another utility? Would that not solve your issue?

1 more reply

LeoPanthera7y ago

It would be trivial to fork parallel. If people cared enough, a fork would appear and be adopted. That's the beauty of free software. If you don't like it you can change it.

rurban7y ago

It was already forked in its early perl days, and thus it's very hard to use it properly as build tool, as the non-GNU version has an entirely different argument syntax. eg on macOS or BSD. You really have to probe for the GNU version (the more popular and newer one, with this awkward citation and begging), but for longer tasks it speeds up processing immensely. There's no need for Hadoop when you can use parallel. I'm processing hundreds of log files in one of my build-steps (similar to pgo, profile guided optimization), and with parallel it needs 30s, without 3min. This makes a difference.

1 more reply

mruts7y ago· 9 in thread

I've never used GNU Parallel. But could someone explain to me the value add vs GNU xargs -P/--max-procs? From the examples at the top, it seems like those could be achieved with xargs.

eindiran7y ago

The value add is that you don't need to do the `xargs --max-procs N`, yourself. By default N is 1 for xargs. For parallel, the default is N = number of CPUs.

Additionally, you can run a series of unrelated commands that aren't from a list/piped in with parallel using the `--` syntax:

`parallel -j 3 -- ls df "echo hi"`

You can limit system load using parallel, which as far as I know isn't possible with xargs: `parallel -l L` where L is the average system load you want to remain beneath.

gcommer7y ago

parallel is like xargs++; for simple cases it does the same thing as xargs, but it also has many more advanced features such as:

- Splitting input lines into multiple fields and building more complex commands from them

- Running jobs on remote nodes

- Pausing/resuming batch jobs (--joblog)

- ETA and progress bars

- Passing data to programs on stdin and generally many, many other ways of distributing and collecting data that xargs can't do

You can see a bunch of examples at: https://www.gnu.org/software/parallel/man.html

  $ PAGER=cat man xargs | wc -l
  259
  $ PAGER=cat man parallel | wc -l
  3985

e12e7y ago

OT: I got curious, and this also works:

PAGER="wc -l" man xargs

(although my man page for xargs is just 211 lines)

1 more reply

acdha7y ago

A couple months ago, I parallelized execution of thousands of slow batch jobs on a fleet of remote servers. With parallel that was one command, including estimated time to completion and retries for failed jobs. It was awfully nice not to need to install or setup anything or spend time coding built-in features. Once it was done, I will almost certainly never run that exact operation again.

I normally use xargs for simple things and if it’s a regular business operation I’d setup a task queue but there’s a fair amount of work in the middle where it’s nice to have a solid tool with most of the features you could want built in and tested.

creatornator7y ago

It has some more granular control over "pasting" in values. For example, you can use {} for the arg value itself, or you can use {.} for just what's before the extension, or you can use {/} for the basename, or {/.} for the basename without extension, etc. You can also get progress bar, ETA, etc.

phonebanshee7y ago

I use both. xargs does a simple job reasonably well; if I'm just typing on the command line it often is the tool I use. parallel has many, many more options and ways to turn output from script A into parallel invocations of script B in multiple machines. Parallel is also handy just for parsing filenames; it's become my default tool for manipulating a stream of filenames and then running commands on them. The parallel part is just a nice plus.

fwip7y ago

Parallel ensures that the output of each process is kept together, which can aid readability.

e12e7y ago

Mainly parallel remote execution. Possibly resumption (depending on the task, see gcommer's comment).

You might want to look at:

https://unix.stackexchange.com/questions/104778/gnu-parallel...

I'll say that field separation / null termination is a bit annoying for xargs/find etc-but more so perhaps for novice users of shell. I do like shell pipelines, but quoting can be nearly.

srean7y ago

To get an exhaustive answer you really need to go through the command line options as described in its man pages. xargs is good for simple stuff as long as you avoid some gotchas,but this really does a whole lot more. A lot more than anyone can do justice in a comment. AWK with gnu parallel is surprisingly potent combination.

hprotagonist7y ago· 5 in thread

Still often the simplest way to get parallel computation in python, sadly.

bonoboTP7y ago

The multiprocessing module is pretty good in Python.

srean7y ago

Not in my experience. Has edge cases, especially on windows. Its understandable though, if you look under the covers there is a huge amount of complexity there.

zaphirplane7y ago

Logging was lost last time I tried it

1 more reply

hprotagonist7y ago

pickle is gross.

black-tea7y ago

It's often they only way you really need.

gcommer7y ago· 2 in thread

A few slightly more advanced GNU Parallel features that I've used:

- --joblog writes out a detailed logfile of the jobs, which can be used to resume from interrupted runs with --resume{,-failed}

- `--slf filename` can be used to provide a list of ssh logins to remote worker nodes to run jobs. Importantly, parallel will automatically reread this list when it changes. This lets you very easily distribute batch jobs across preemptible gcloud vms (or ec2 spot instances) and gracefully handle worker nodes appearing/disappearing with just a few lines of bash https://gist.github.com/gpittarelli/5e14fb772ce0230a3c40ffad...

- When used with bash, parallel can run bash functions if you export them with `export -f functionName` .

ziotom787y ago

Yeah, --joblog is a very handy feature. I once hacked a small Python script to produce an ASCII time plot from its output:

https://github.com/ziotom78/plot_joblog

bloopernova7y ago

Those are all really good tips, thank you for sharing them.

akramer7y ago· 2 in thread

Each time I've seen something about GNU parallel pop up I've been tempted to post, but I've never made an account until now.

I wrote a very different style of command parallelizer that I named lateral. It doesn't require constructing elaborate commandlines that define all of your work at once. You start a server, and separate invocations of 'lateral run' add your commands to a queue to run on the server, including their filedescriptors. It makes for easier parallelization of complex arguments.

Take a look if this sort of thing interests you, as I haven't seen anyone write one like this before. Its primary difference is the ease with which each separate command can output to its own log, and the lack of need to play games with shell quoting and positional arguments.

Check it out: https://github.com/akramer/lateral

the_it_girl7y ago

I think it is good you finally made an account: How are people going to find your software if you do not tell them about it :)

Can you make a comparison between lateral and sem?

https://www.gnu.org/software/parallel/sem.html

arbie7y ago

This looks neat! Much <3 for using Golang and YAML.

Can a single lateral server queue be used across multiple host machines? And in the other direction, can lateral launch and monitor processes that reside across multiple machines?

Mizza7y ago· 2 in thread

If you're using GNU Parallel for simple, non-parallel command line tasks and scripting, I've written a tool which I find to be much more intuitive:

https://github.com/Miserlou/Loop

The author of GNU Parallel wrote a pretty detailed comparison, which you can find in the linked README.

isaachier7y ago

Your tool looks nice, but it doesn't seem to parallelize the work in any way.

isaachier7y ago

Never mind, missed your point about not being parallel.

bloopernova7y ago· 1 in thread

Parallel is Good Stuff (tm) and works very well but I haven't had much cause to use it.

For ad-hoc system modifications I've found myself using tmux's synchronize-panes feature, or xargs. For anything bigger or more involved then I break out Ansible/Chef/Puppet depending on which client project I'm working on.

I remember one place I worked at had a huge elaborate configuration/deployment system hand written by the head IT guy which used Parallel+bash+perl extensively. Thing is, while it was a great system, I could make the same changes in Ansible or Puppet with a couple of lines and push them within minutes, while making changes using the hand written system might take hours. Plus no logging and poor error handling led to all sorts of problems with that system, despite it being a real labour of love by that wacky Finnish dude.

However this sheet is really nice because it is just one side of a letter/A4 piece of paper and lays out the information clearly. I definitely want to mess around with Parallel now because of this cheat sheet. I wonder how it was typeset or laid out on the page? I try to write my own cheat sheets but they always seem way too sparse with too much white space. Maybe it is written in LaTeX or similar.

mechanical_jane7y ago

Not LaTeX, but LibreOffice: http://git.savannah.gnu.org/cgit/parallel.git/tree/src/paral...

devy7y ago· 1 in thread

Is there a Rust port for GNU parallel? It's written in Perl and having to install dependencies for Perl is not as simple as download a binary :)

wyoh7y ago

https://github.com/mmstick/parallel but it's unmaintained and the author wanted to do a rewrite.

jason_slack7y ago

I use GNU Parallel for pulling stock data from various sources, massaging it, creating flatfiles of the data, creating models of the data, etc.

I also use it as a rudimentary queue system for stacking up the next jobs (while scripts stack up the next jobs, but..).

It had a bit of a learning curve because the docs are really technical and not geared towards new users enough, but reading and re-reading and trying some examples helped cement.

Here are a few ways I use it:

echo "Number of RAR archives: "$(ls .rar | wc -l)

.rar | parallel -j0 1_1_rarFilesExtraction

ls -d stocks_all/Intraday/*.txt | parallel -j${ccj}% 1_2_stockFileProcessing {}

I'd like to scale this to work with multiple machines (as Parallel can do) but I get really tempted to just write my own parallel processor just to rely on my own code.

scrummyin7y ago

My favorite parallels command `$ find ~/Source/folder -name .git | parallel "cd {}/.. ; git pull ; git checkout -b new_branch" `

res0nat0r7y ago

Lots of good examples also here: https://www.gnu.org/software/parallel/man.html

j / k navigate · click thread line to collapse

63 comments

46 comments · 11 top-level

bonoboTP7y ago· 13 in thread

Ah, the rare case of nagware in GNU.

From the man page:

jmiserez7y ago

It's a great tool I'm sure, but I've been able to get by using just xargs, flock, etc., for most usecases.

leppr7y ago

Your criticism would be fair if FOSS authors were rightly conpensated for their work.

This isn't nearly the case, so until then blaming FOSS authors for some experimentations is just unwarranted.

bonoboTP7y ago

Citing it or not is an issue of academic practice/considerations (whether its use was a significant part of the research etc.). Mandating it through nag messages is too much.

What's next? make will print ads while the compilation runs? GIMP will watermark my images if I don't pay 10K or promise to cite it if I make figures with for my paper?

So again, my main confusion is about how this can be an official GNU tool.

2 more replies

oids98d98s7y ago

Gooood who caaaaares.

They spent a lot of time and effort, and made a cool thing and gave it away for free. If it bothers you so much, just add the flag. Or patch it out.

oids98d98s7y ago

parent comment was deleted. the tldr was "how rude of them to do the licensing flag" thing

lelf7y ago

Don't use it if you don't like it. It's that simple.

mechanical_jane7y ago

Do you feel this is covered in the FAQ? http://git.savannah.gnu.org/cgit/parallel.git/plain/doc/cita...

musicale7y ago

The FAQ makes it worse:

"GNU Parallel is indirectly funded through citations. It is therefore important for the long term survival of GNU Parallel that it is cited. The citation notice makes users aware of this."

It's a bit like saying:

"Webkit is indirectly funded by iPhones. It is therefore important for the long term survival of Webkit that people purchase iPhones. The iPhone notice make users aware of this."

1 more reply

musicale7y ago

Beyond being supremely irritating, nagware is simply not scalable.

Imagine if every utility, library, or driver in a typical Linux distribution took this approach. :(

I encourage Debian et al. to adopt a "no nagware" policy.

jwilk7y ago

There's something similar in Debian Policy §2.3 (https://www.debian.org/doc/debian-policy/ch-archive.html#cop...):

BTW, the nagware code has been removed in Debian unstable:

https://bugs.debian.org/905674

1 more reply

mechanical_jane7y ago

I am trying to understand your take. Do you also call Firefox nagware, when it pops up with a dialog box where you can click "Don't show this again"?

To me the dialog box is actually worse, because the program often blocks until you close the dialog box (not 100% sure if that is the case with Firefox).

With GNU Parallel you run 'parallel --citation' once, and you are done. We are talking an effort of 15 seconds or less.

When I install a library I often have to run the install command and it often takes longer than 15 seconds.

Finally, I would like to understand why you do not just use another utility? Would that not solve your issue?

1 more reply

LeoPanthera7y ago

It would be trivial to fork parallel. If people cared enough, a fork would appear and be adopted. That's the beauty of free software. If you don't like it you can change it.

rurban7y ago

1 more reply

mruts7y ago· 9 in thread

I've never used GNU Parallel. But could someone explain to me the value add vs GNU xargs -P/--max-procs? From the examples at the top, it seems like those could be achieved with xargs.

eindiran7y ago

The value add is that you don't need to do the `xargs --max-procs N`, yourself. By default N is 1 for xargs. For parallel, the default is N = number of CPUs.

Additionally, you can run a series of unrelated commands that aren't from a list/piped in with parallel using the `--` syntax:

`parallel -j 3 -- ls df "echo hi"`

You can limit system load using parallel, which as far as I know isn't possible with xargs: `parallel -l L` where L is the average system load you want to remain beneath.

gcommer7y ago

parallel is like xargs++; for simple cases it does the same thing as xargs, but it also has many more advanced features such as:

- Splitting input lines into multiple fields and building more complex commands from them

- Running jobs on remote nodes

- Pausing/resuming batch jobs (--joblog)

- ETA and progress bars

- Passing data to programs on stdin and generally many, many other ways of distributing and collecting data that xargs can't do

You can see a bunch of examples at: https://www.gnu.org/software/parallel/man.html

  $ PAGER=cat man xargs | wc -l
  259
  $ PAGER=cat man parallel | wc -l
  3985

e12e7y ago

OT: I got curious, and this also works:

PAGER="wc -l" man xargs

(although my man page for xargs is just 211 lines)

1 more reply

acdha7y ago

creatornator7y ago

phonebanshee7y ago

fwip7y ago

Parallel ensures that the output of each process is kept together, which can aid readability.

e12e7y ago

Mainly parallel remote execution. Possibly resumption (depending on the task, see gcommer's comment).

You might want to look at:

https://unix.stackexchange.com/questions/104778/gnu-parallel...

I'll say that field separation / null termination is a bit annoying for xargs/find etc-but more so perhaps for novice users of shell. I do like shell pipelines, but quoting can be nearly.

srean7y ago

hprotagonist7y ago· 5 in thread

Still often the simplest way to get parallel computation in python, sadly.

bonoboTP7y ago

The multiprocessing module is pretty good in Python.

srean7y ago

Not in my experience. Has edge cases, especially on windows. Its understandable though, if you look under the covers there is a huge amount of complexity there.

zaphirplane7y ago

Logging was lost last time I tried it

1 more reply

hprotagonist7y ago

pickle is gross.

black-tea7y ago

It's often they only way you really need.

gcommer7y ago· 2 in thread

A few slightly more advanced GNU Parallel features that I've used:

- --joblog writes out a detailed logfile of the jobs, which can be used to resume from interrupted runs with --resume{,-failed}

- When used with bash, parallel can run bash functions if you export them with `export -f functionName` .

ziotom787y ago

Yeah, --joblog is a very handy feature. I once hacked a small Python script to produce an ASCII time plot from its output:

https://github.com/ziotom78/plot_joblog

bloopernova7y ago

Those are all really good tips, thank you for sharing them.

akramer7y ago· 2 in thread

Each time I've seen something about GNU parallel pop up I've been tempted to post, but I've never made an account until now.

Check it out: https://github.com/akramer/lateral

the_it_girl7y ago

I think it is good you finally made an account: How are people going to find your software if you do not tell them about it :)

Can you make a comparison between lateral and sem?

https://www.gnu.org/software/parallel/sem.html

arbie7y ago

This looks neat! Much <3 for using Golang and YAML.

Can a single lateral server queue be used across multiple host machines? And in the other direction, can lateral launch and monitor processes that reside across multiple machines?

Mizza7y ago· 2 in thread

If you're using GNU Parallel for simple, non-parallel command line tasks and scripting, I've written a tool which I find to be much more intuitive:

https://github.com/Miserlou/Loop

The author of GNU Parallel wrote a pretty detailed comparison, which you can find in the linked README.

isaachier7y ago

Your tool looks nice, but it doesn't seem to parallelize the work in any way.

isaachier7y ago

Never mind, missed your point about not being parallel.

bloopernova7y ago· 1 in thread

Parallel is Good Stuff (tm) and works very well but I haven't had much cause to use it.

mechanical_jane7y ago

Not LaTeX, but LibreOffice: http://git.savannah.gnu.org/cgit/parallel.git/tree/src/paral...

devy7y ago· 1 in thread

Is there a Rust port for GNU parallel? It's written in Perl and having to install dependencies for Perl is not as simple as download a binary :)

wyoh7y ago

https://github.com/mmstick/parallel but it's unmaintained and the author wanted to do a rewrite.

jason_slack7y ago

I use GNU Parallel for pulling stock data from various sources, massaging it, creating flatfiles of the data, creating models of the data, etc.

I also use it as a rudimentary queue system for stacking up the next jobs (while scripts stack up the next jobs, but..).

It had a bit of a learning curve because the docs are really technical and not geared towards new users enough, but reading and re-reading and trying some examples helped cement.

Here are a few ways I use it:

echo "Number of RAR archives: "$(ls .rar | wc -l)

.rar | parallel -j0 1_1_rarFilesExtraction

ls -d stocks_all/Intraday/*.txt | parallel -j${ccj}% 1_2_stockFileProcessing {}

I'd like to scale this to work with multiple machines (as Parallel can do) but I get really tempted to just write my own parallel processor just to rely on my own code.

scrummyin7y ago

My favorite parallels command `$ find ~/Source/folder -name .git | parallel "cd {}/.. ; git pull ; git checkout -b new_branch" `

res0nat0r7y ago

Lots of good examples also here: https://www.gnu.org/software/parallel/man.html

j / k navigate · click thread line to collapse