GNU Parallel, where have you been all my life? (opens in new tab)

(alexplescan.com)

448 pointsalexpls2y ago262 comments

262 comments

148 comments · 41 top-level

Is the author still adding the "cite me or pay 10000€" notice to the output? And calling that GPL?

And still answering every xargs Stackoverflow question with "you should use GNU Parallel" instead of answering the question? That really gets old quickly when googling for xarg answers.

These are just some of the reasons I'll never use parallel. xargs is perfectly fine for most usecases, and it can do everything I need it to.

dahart2y ago

> Is the author still adding the "cite me or pay 10000€" notice to the output? And calling that GPL?

IIRC the citation notice was cleared by Stallman as GPL compatible. I’d be surprised if anyone’s paid, I assumed that’s rhetoric to imply the value of a citation, or lack of citation, for anyone publishing scientific works.

> These are just some of the reasons I’ll never use parallel.

Hey I’ve actually ranted on HN before about the citation notice (e.g. https://news.ycombinator.com/item?id=15319715) - in part because I find the language of the notice a little misleading; it’s not tradition to write citations for tools used to conduct research, and it’s a requirement (not just tradition) to cite academic sources. If I used parallel to speed up some calculations, that doesn’t justify an academic citation. I don’t cite bash or python or C++ when I write papers either. On the other hand, if I’m writing a computer science paper about how to parallelize code, and especially if I compare it to GNU Parallel, then a citation isn’t optional, and I don’t need a guilt trip to add one, it’ll get requested in review, and rejected without. Is there even a journal publication to cite? (Edit: found it - the request is to cite an article in USENIX magazine.) So I find the notice a little irritating and I’m not sure who it’s aimed at exactly, or what the history of Ole feeling snubbed by scientists really is. Maybe some people were trying to compete with GNU Parallel and failing to cite it? Maybe Ole is paid by an organization that appreciates citations and will continue to fund development on Parallel if there’s evidence of it’s use in academia?

That said, GNU Parallel really is totally awesome, the documentation is amazing, and the citation notice is a one-time thing you can silence permanently. I don’t think the notice is a good reason to never use Parallel, and I do think Parallel is worth using, FWIW.

thomasahle2y ago

> it’s not tradition to write citations for tools used to conduct research.

Thia is true, but it also makes it very hard for academics and PhD students who mainly write software over papers. They get no citations and eventually have to leave academia.

If we had a better practice of citing central software we use - at least the academic software that wants to be cited - we could have a more flourishing ecosystem of such software funded by the universities.

3 more replies

foldr2y ago

> it’s not tradition to write citations for tools used to conduct research

Academics seem to have a very blinkered attitude to this. I wrote some software that was popular for a while in a niche field, and people were forever asking me to waste my time by 'publishing' the manual in some pointless journal so that they could cite something and give me credit. Writing useful software counts for less in that world than publishing another pointless paper that no-one will read.

2 more replies

rlpb2y ago

> and the citation notice is a one-time thing you can silence permanently

This doesn't scale. Imagine if all the software you used nagged you and had their own individual methods to silence them. I don't think this would be reasonable.

What makes this particular software so special?

3 more replies

Eiim2y ago

Citing tools is maybe not tradition, but it's gaining popularity. R, for example, has 320,000 cites. (https://scholar.google.com/scholar?cites=1605575084878280829...)

2 more replies

5424582y ago

> IIRC the citation notice was cleared by Stallman as GPL compatible

Do you have a source for this? Im confused by this, as the GPL section 7 is pretty clear that additional restrictions are effectively void. I suppose it’s technically not contrary to the GPL to idly state those restrictions, but it is contrary to the GPL to expect them to do anything. If the author is deliberately including an impotent clause in the hope that people will follow it anyways, I feel that trying to confuse or scare people into doing something the GPL gives them explicit permission to do is contrary to the spirit of the GPL.

Furthermore, trying to retaliate against people who (as permitted by the GPL) remove the citation notice, as the author here has done, seems very contrary to the spirit of the GPL.

1 more reply

flanked-evergl2y ago

> IIRC the citation notice was cleared by Stallman as GPL.

I really hope that whomever adjudicate these disputes regarding licence agreements doesn't care what a random person says about it.

2 more replies

5e92cb50239222b2y ago

I wanted to say 'not anymore', but it turns out that some distributions remove that message.

https://gitlab.archlinux.org/archlinux/packaging/packages/pa...

Debian too (thanks to iib for pointing this out)

https://salsa.debian.org/med-team/parallel/-/tree/master/deb...

And looks like the author is aware of both:

https://gitlab.archlinux.org/archlinux/packaging/packages/pa...

ilyt2y ago

I kinda want to submit patch to GNU Parallel that changes "Hall of Shame" to "Hall of Heroes".

But yeah, if guy wants to have the name of the app mentioned there is a BSD license for that I thin...

iib2y ago

I think Debian also does this, I didn't see it when using the latest version

raverbashing2y ago

    -    # *YOU* will be harming free software by removing the notice.  You  
    -    # accept to be added to a public hall of shame by removing the
    -    # line. That includes you, George and Andreas.

The Open source way of buzzing a contestant

dsissitka2y ago

It's in current Fedora's:

  [david@pc ~]$ echo foo | parallel echo
  Academic tradition requires you to cite works you base your article on.
  If you use programs that use GNU Parallel to process data for an article in a
  scientific publication, please cite:
  
    Tange, O. (2023, July 22). GNU Parallel 20230722 ('Приго́жин').
    Zenodo. https://doi.org/10.5281/zenodo.8175685
  
  This helps funding further development; AND IT WON'T COST YOU A CENT.
  If you pay 10000 EUR you should feel free to use GNU Parallel without citing.
  
  More about funding GNU Parallel and the citation notice:
  https://www.gnu.org/software/parallel/parallel_design.html#citation-notice
  
  To silence this citation notice: run 'parallel --citation' once.
  
  foo
  [david@pc ~]$

5e92cb50239222b2y ago

  > Tange, O. (2023, July 22). GNU Parallel 20230722 ('Приго́жин').

Looks like the latest release is named after Prigozhin. Yeah, probably that one, although I couldn't find anything in the mailing list to confirm it.

https://en.wikipedia.org/wiki/Yevgeny_Prigozhin

edit: all releases are named after current political events:

https://git.savannah.gnu.org/cgit/parallel.git/refs/

1 more reply

cnr2y ago

Does 'Приго́жин' relate somehow to infamous 'Евгений Пригожин'. Or am I just biased?

3 more replies

Matl2y ago

> And calling that GPL?

You can add any message you want into your GPL program. Also, a GPL program does not have to be free.

This has nothing to do with the GPL. You can say in your program that 'by using this software you agree that you're a a cat' and license it under the GPL.

That does not mean the GPL relates to cats in any way.

5424582y ago

You can add all the extra restrictions you want, but they effectively won’t do anything. Expecting both the GPL and the additional restrictions to apply is a violation of section 7 of the GPL.

> All other non-permissive additional terms are considered “further restrictions” within the meaning of section 10. If the Program as you received it, or any part of it, contains a notice stating that it is governed by this License along with a term that is a further restriction, you may remove that term.

Debian explains this further in their patch file.

akerl_2y ago

Can you back that up?

If I put the GPL in my software and add a file next to it that says "Also you can't use this software if you make more than $100k/year", I've pretty clearly added an additional clause that's incompatible with the GPL.

3 more replies

Bjartr2y ago

The message isn't part of the license, and it's phrased in a way that wouldn't be binding if it were.

It says "please cite" and "feel free to not cite if you pay".

It doesn't say "must cite" or "you may only not cite if you pay".

IANAL, but it doesn't seem like it would interact with the GPL at all. So the worst that could be said is that the implementation is annoying or in poor taste.

1 more reply

paulmd2y ago

You cannot include a message that requires the formation of a binding contract. This is the old “you can fire someone for no reason but not any reason”, and if the message your product shows is a prompt forcing the user to agree to a binding contract, its not GPL compatible.

I agree that in this case it’s likely not enforceable/binding especially since the GPL specifically allows you to ignore those terms. Hopefully that’s legally binding in your jurisdiction vs the other party.

But it’s a straightforward clickwrap agreement, even if the terms are non-monetary the GPL simply doesn’t allow these at all. Can’t place any stipulations on how the user uses the software.

Matl2y ago

If you're a startup and relicensing previously open source code under a restrictive license or doing other shady things you'll get plenty of defenders to line up to say 'hey, they have to make a living somehow', but if a single guy tries to make a living via a simple message in a widely used program all hell breaks loose.

What gives?

electroly2y ago

Notably, GNU Parallel did not relicense; it's still GPL. The author wants to have his cake (gain the popularity benefits of being a GPL-licensed GNU tool, be able to carpetbomb Stack Overflow with "use GNU Parallel" answers, etc.) and eat it too (get people to cite or pay him as a condition of using the product). Since this isn't possible (GPL doesn't allow additional restrictions), but the author still really wants it, he went the route of making the extra condition non-legally-binding but then getting publicly upset at people for using the product under its actual license. That's the part that GNU Parallel is doing that people don't like, and that other projects are not doing.

The startups you mention actually changed their license. That's what GNU Parallel would have to do to make this extra condition ok, but he won't do it because being a GPL-licensed GNU tool is critical to its popularity in the first place.

2 more replies

dahart2y ago

I think there’s a reasonable question in there, but I don’t agree with this framing. Shady relicensing isn’t legal, and it doesn’t matter if there are armchair defenders. But, Ole does have defenders, so it’s not one-sided.

Part of the issue is that Ole’s citation notice doesn’t appear at first glance to some people to be compatible with the GPL. You have to read the language carefully, and read the history of GNU Parallel’s citation notice, to understand that the notice is not a licensing term.

Another part of the issue is that the notice doesn’t sound like someone just trying to make a living. It sounds like a demand or even a veiled threat, and one that is inflicted on everyone, not just academics. It’s not exactly clear about what the legal requirements even are.

I’m in favor of Ole getting citations, and I’m in favor of his right to ask. But the way it’s being asked for rubs me the wrong way a little bit, and it’s rubbed other people the wrong way a little bit ever since it was introduced. BTW, the whole reason it seems like all hell breaks loose, and the only reason this matters is precisely because the software is widely used. If it wasn’t widely used and it didn’t sit under the GNU umbrella, you’d never hear about this.

lolinder2y ago

I had no opinion until I read through the patch that Arch uses to remove the notice [0]. The creator comes across as whiny, entitled, and aggressive. They have comments in the source like "You accept to be put in a public hall-of-shame by removing the lines", "YOU will be harming free software by removing the notice", and "That includes you, George and Andreas". The whole thing is pretty unprofessional, and based on the false premise that every tool used during research traditionally gets a citation.

[0] https://gitlab.archlinux.org/archlinux/packaging/packages/pa...

akerl_2y ago

You'll note there are also plenty of "defenders" on this page. It turns out the community is made up of people with a wide range of opinions.

1 more reply

smcleod2y ago

I could be wrong here and this is the first I’ve heard of this but I suspect it’s the language / way he goes about communicating. On the surface at least it comes across as a little annoying / demanding, things like him having a website where he shames people that don’t cite him by name, I suspect the ‘legal’ claims being made aren’t that solid either. Don’t get me wrong it’s a neat tool - but it’s just one in a huge ecosystem of many people’s efforts.

1 more reply

philshem2y ago

Here it’s discussed on the gnu mailing list in 2013:

https://lists.gnu.org/archive/html/parallel/2013-11/msg00006...

daveguy2y ago

Rush is an alternative to gnu parallel that is MIT licensed:

https://GitHub.com/shenwei356/rush

As you mention xargs has parallel capabilities and gargs is Apache licensed software that fixes some of xargs shortcomings:

https://GitHub.com/brentp/gargs

No reason to use gnu parallel.

dec0dedab0de2y ago

Spamming stack overflow is bullshit, but I'm fine with citations and selling exceptions.

tempaccount4202y ago

Why not just cite it?

capableweb2y ago

> Is the author still adding the "cite me or pay 10000€" notice to the output? And calling that GPL?

Where you get the "or pay 10000€" part from? As far as I remember, the software, unless told otherwise, asks authors of scientific papers to cite GNU parallels if they used it when writing their papers. And it doesn't force it, it's not part of the license, but asks you to do so as it's academic tradition to use citations.

You could just ignore the citation and not break the license, no one would think less of you for doing so.

badsectoracula2y ago

> Where you get the "or pay 10000€" part from?

Most likely from the manpage:

    If you use --will-cite in scripts to be run by others you are
    making it harder for others to see the citation notice.  The
    development of GNU parallel is indirectly financed through
    citations, so if your users do not know they should cite then you
    are making it harder to finance development. However, if you pay
    10000 EUR, you have done your part to finance future development
    and should feel free to use --will-cite in scripts.
    
    If you do not want to help financing future development by letting
    other users see the citation notice or by paying, then please
    consider using another tool instead of GNU parallel. You can find
    some of the alternatives in man parallel_alternatives.

FWIW some distros remove the nagging message (e.g. mine - openSUSE - has it removed and the patch seems to come from Debian so i'd guess Debian and its derivatives also remove it).

1 more reply

akerl_2y ago

https://gitlab.archlinux.org/archlinux/packaging/packages/pa...

"If you pay 10000 EUR you should feel free to use GNU Parallel without citing."

1 more reply

rcxdude2y ago

The author apparently will think less of you, as made abundantly clear by the tone of the message and especially the comments around it in the code

1 more reply

zackmorris2y ago· 12 in thread

Since nobody asked, I'm reiterating my position that computers to effectively utilize parallel functionality simply aren't available today. I've always wanted a computer with at least 256 cores and local content-addressable memories beside each core to send data where it's needed. By Moore's Law, we could have had MIPS machines with 1000 cores around 2010, and 100,000 to 1 million cores today, for under $1000.

Contrast that with GPU shaders where one C-style loop operates on buffers separate from system memory, and can't access system services like network sockets or files. GPUs have around 32 or 64 physical cores, so theoretically that many shaders could run simultaneously, although we rarely see that in practice. And we'd need bare-metal drivers to access the GPU cores directly, does anyone know of any?

The closest thing now is Apple's M1 line, but it has specialized NN and GPU cores, so missed out on the potential of true symmetric multiprocessing.

The reason I care about this so much is that with this amount of computing power, kids could run genetic algorithms and other "embarrassingly parallel" code that solves problems about as well as NNs in many cases. Instead we're going to end up with yet another billion dollar bubble that locks us into whatever AI status quo that the tech industry manages to come up with. And everyone seems to love it. It reminds me of the scene in Star Wars III when Padme notes how liberty dies with thunderous applause.

pradn2y ago

1) Amdahl's law means it's not useful to have hundreds of cores for general purpose computing. There's not that much parallel work to do in typical applications. Increasing the proportion of work that's parallelizable for a given application pays dividends when you have more cores - that's why Servo is so exciting. In some cases, picking an O(n2) algorithm that's easy to parallelize will be faster than a less parallizable O(nlog(n)) algorithm - this is true for problems like Single-Source Shortest Paths (SSSP).

2) Shared resources (in-memory mutable data, hardware devices) mean the ratio of contention to CPU work goes up when you have more cores.

3) Cores on a single die need to share the same constraints - thermal limits and transistor count. So you're best off having enough powerful cores to get you to a sweet spot of single-core performance vs multi-core parallelism.

4) It's hard to provide a performant and useful many-core machine model. Cache coherence makes it easier to program a many-core machine, but limits performance. Without it, you're stuck with distributed systems-style problems.

JonChesterfield2y ago

This exists now. Some AI accelerators are a grid of independent compute units with their own memory, message passing between them. Graphcore's IPU is an instance.

An AMD GPU is a grid of independent compute units on a memory hierarchy. At the fine grain, it's a scalar integer unit (branches, arithmetic) and a predicated vector unit, with an instruction pointer. Ballpark of 80 of those can be on a given compute unit at the same time, executed in some order and partially simultaneously by the scheduler. GPU has order of 100 compute units, so that's ~8k completely independent programs running at the same time.

You've got a variety of programming languages available to work with that. There's a shared address space with other GPUs and the system processors, direct access to system and GPU local memory. Also some other memory you can use for fast coordination between small numbers of programs.

There's a bit of a disconnect between graphics shaders, the ROCm compute stack and what you can build on the hardware if so inclined. The future you want is here today, it just has a different name to what you expected.

zackmorris2y ago

K if I can transpile C/C++, Rust or TypeScript to that and have full access to memory, threads, system APIs, network sockets, etc, then that would work for the use cases I have in mind. Running MIMD processes on SIMD hardware is something I'm definitely interested in.

If there's no straightforward way to do that, then I'm afraid that hardware represents a huge investment in the wrong direction.

Because a GPU can be built from the general-purpose multicore CPU I'm talking about. But a CPU can't be built from a GPU.

What I'm getting at is that if I have to "drop down" to an orthodox way of solving problems, rather than being able to solve them in the freeform way that my instincts leads me, then I will always be stifled.

MawKKe2y ago

1000 cores?? I don't have 100 cores! What do you even need 10 cores for? Well, here's 4 cores. Give 2 to your brother. Don't go wasting all those hyper threads all at once!

Intel ca. 2010, probably

giantrobot2y ago

Also Intel: ECC memory support? In this economy?

imtringued2y ago

Sorry but we do have computers with 256 cores. I used to have this excuse back when processors only had 4 cores. When you consider that processors lower their turbo boost frequency as you use more cores and there is overhead from synchronization, your 4 core processor may only give you a 2x performance benefit at the expense of your code becoming difficult to reason about (depending on the problem at hand). Nowadays 8 core processors are quite cheap, below 200€. At 4x performance boost and easily 12x more if you are willing to spend the money, it is definitively worth it. The caveat of course is that there aren't actually that many programs that need the full power of your processor. The most common exception is a video game that was developed for a limited number of players or even single player but then the multiplayer version of the game becomes extremely popular and you get servers with 60 or even a hundred players, way beyond what the developers planned to support. Supporting multiple cores was not a priority and then very suddenly it becomes the biggest bottleneck.

The real problem we are facing is that our programming models aren't parallel by default.

>By Moore's Law, we could have had MIPS machines with 1000 cores around 2010, and 100,000 to 1 million cores today, for under $1000.

https://corescore.store/

You can have 10000 RISC-V cores on an FPGA but nobody cares. Why? Because even a bit serial processor (that means it processes one bit per clock cycle, or 32 clock cycles for a 32 bit addition) runs into memory bandwidth limitations very quickly if you have enough of them. Main memory is very slow compared to registers and caches. The only way to utilize this many cores is by having a workload that is entirely latency bound. Your memory access pattern is perfectly unpredictable. The moment you add caching, the number of cores you can have shrinks dramatically and companies like AMD are not slimming down their CPUs, they are adding more and more cache. Their highest end processors have almost a gigabyte of cache.

zackmorris2y ago

That's really awesome, thank you!

I agree about the programming models not being parallel by default, and that's one of the things that I specifically rail against in most of my comments. MATLAB/Octave is a good introduction to what parallel programming could be. Also the endless doubling down on large caches, because the multicore design I have in mind would mostly eliminate cache and use that die area for cores and local memories.

I think we're slightly talking past each other here though. The CPU I want to build would have around 10-256 cores on 90s tech. So the same transistors holding 1 Pentium Pro would allow for 1-2 orders of magnitude more MIPS or RISC-V cores and local memories. The design is so simple that I think that's why it was missed by the big fabs.

Today there's little demand for 1000+ cores, but that's partly because nobody can see what they could do. But we can't design the thing, because the status quo has us all working pedal to the metal in first gear to make rent. It's a chicken and egg problem that has a lower likelihood of being solved as time goes on. Which is why I think we're on the wrong timeline, because if the system worked then actual innovation would become more accessible over time.

1 more reply

FuckButtons2y ago

Arguing about where we should be based on a projection of an empirical exponential curve seems pretty irrational. Nothing in reality is exponential forever.

dragontamer2y ago

Typical GPUs are easily 6000+ shaders (aka kinda-sorta like cores) on the more expensive end.

At least, 6000+ 32-bit multiplies per clock tick on ~2GHz+ clocks. Even cheap GPUs easily are 2000+ shaders.

> GPUs have around 32 or 64 physical cores

NVidia SMs and AMD WGPs are not "cores", they are... weird things. They have many shaders inside of them and have huge amounts of parallelism.

As far as grunt-work goes, a "multiplier unit" (literally A x B) is perhaps the most accurate count to compare CPU cores vs GPU "cores", because the concept of CPU-core vs GPU WGP / SM is too weird and different to directly compare.

Split up that WGP / SM into individual multipliers... and also split up the ~3 64-bit multipliers or ~48 CPU SIMD multipliers per core (3x 512-bit on Intel AVX512 cores), and its perhaps a more fair comparison point.

---------

Back 20 years ago, you'd only have 1x multiplier on a CPU core like a Pentium 4, maybe as many as 4x with the 128-bit SSE instructions.

But today, even 1x core from Intel (3x 512-bit SIMD) or 1x core from AMD (4x 256-bit SIMD) has many, many, many more parallel elements compared to a 2004-era CPU core.

imtringued2y ago

>NVidia SMs and AMD WGPs are not "cores", they are... weird things. They have many shaders inside of them and have huge amounts of parallelism.

They aren't weird things. They are the equivalent of CPU cores. By your logic CPU cores aren't CPU cores, "they are... weird things" because of SMT.

1 more reply

_a_a_a_2y ago

Read a lot of this kind of post. Years ago I recall someone bleating for 8 cores when 1 or 2 was the norm. Now you want 256. Next generation will ask for thousands. All for nothing because you have no idea what to do with it except give the handwaviest justifications. A computer's a tool to do an actual job. You can and probably do have more computing power on your desktop than all the world's supercomputers put together from the 1970's.

https://en.wikipedia.org/wiki/Cray_X-MP

   Price US$7.9 million in 1977 (equivalent to $38.2 million in 2022)
   Weight 5.5 tons (Cray-1A)
   Power 115 kW @ 208 V 400 Hz[1]
   CPU 64-bit processor @ 80 MHz[1]
   Memory 8.39 Megabytes (up to 1 048 576 words)[1]
   Storage 303 Megabytes (DD19 Unit)[1]
   FLOPS 160 MFLOPS

In 2070 it still won't be enough for you. It never will be enough.

gumby2y ago

Have you considered finding a Connection Machine?

BoppreH2y ago· 9 in thread

It's a nice tool, but it also shows the shortcomings of shell commands.

In a proper programming language, we'd have something like

    parallel [1..5], i => { sleep random()*10+5; possibly_flaky i }
    // [{"Seq": 4, "Host": ":", "Starttime": 1692491267...

And `parallel` would only have to worry about parallelization.

Instead, the shell environment forces programs to invent their own parameter separator (:::), a templating format ({1}), and a way to output a list of structures (CSV-like). You can see the same issues in `find`, where the exec separator is `\;`, the template is `{}`, and the output is delimited by \n or \0. And `xargs` does it in yet another different way.

It's very hard to acquire and retain mastery over a toolbox where every tool reinvents the basics. If you ever found yourself searching "find exec syntax" multiple times in a week, it's not your fault.

As for alternatives, I'm a fan of YSH[1] (Javascript-like), Nushell[2] (reinvented from first-principles for simplicity and safety) and Fish[3] (bash-like but without the footguns). Nushell is probably my favorite from the bunch, here's a parallel example:

    ls | where type == dir | par-each { |it|
        { name: $it.name, len: (ls $it.name | length) }
    }

[1] https://www.oilshell.org/release/latest/doc/ysh-tour.html

[2] https://github.com/nushell/nushell

[3] https://fishshell.com/

JNRowe2y ago

[I'm not recommending this, but maybe… No, no. I'm not sure…]

It isn't even just the newer shells that have solved this, zsh also has a solution out of the box¹. The extensive globbing support in zsh can largely replace `find`, and things like zargs allow you to reuse your common knowledge throughout the shell.

For example, performing your first example with zargs would use regular option separators(`--`), regular expansion(`{1..5}`), and standard shell constructs for the commands to execute.

I'll contrive up an example based around your file counter, but slightly different to show some other functionality.

    f() { fs=($1/*(.)); jo $1=$#fs }
    zargs -P 32 -n1 -- **/*(/) -- f

That should recursively list directories, counting only the files within each, and output² jsonl that can be further mangled within the shell². You could just as easily populate an associative array for further work, or $whatever. Unlike bash, zsh has reasonable behaviour around quoting and whitespace too.

Edit to add: I'm not suggesting zargs is a replacement for parallel, but if you're only using a small subset of its functionality then it may be able to replace that.

¹ https://zsh.sourceforge.io/Doc/Release/User-Contributions.ht...

² https://github.com/jpmens/jo

³ https://github.com/stedolan/jq

coliveira2y ago

What you mention is the main reason why shell script is not a decent language to write long programs. It is full of inconsistencies, and since it depends on other commands, you have to learn the quirks of each command you use. Moreover, good luck if you need to debug this. Shell should only be used for small scripts that are easy to debug.

runeks2y ago

If doing even simple things requires looking up documentation, why does it matter whether the shell script is long or short?

Spending extra time doing simple things — because you need to Google e.g. "how to pass multiple space-separated arguments from a string to a command" — is also a waste of time.

1 more reply

lysium2y ago

Do you recommend any good alternative when your shell program gets too large?

Honest question, as I’m struggling to leave the shell environment once the program gets too large. I could use Perl, but $? and the likes get quickly out of hand. Python’s support for pipes was difficult last time I used it, but that may have changed. What would you recommend?

11 more replies

chasil2y ago

GNU Parallel is also based on perl, so the footprint is quite large.

GNU xargs implements limited parallelization, and is compiled C. This functionality is present within busybox, including the Windows version.

https://www.linuxjournal.com/content/parallel-shells-xargs-u...

GNU Parallel will have much greater functionality, but it will not reach as far as xargs.

reddit_clone2y ago

> GNU Parallel is also based on perl

Time to rewrite it in Rust /s

1 more reply

mistrial92y ago

meanwhile, python DASK is very well funded to be cloud-native, and also local.. however it relies on a python runtime, so you know .. also not sure about the DASK license terms

salawat2y ago

Your find exec problem can be trivially solved with either - exec /bin/bash -c "script" or you can spend a little extra time figuring out how to properly structure your scripts in such a way where the incocations just flow with little more than an invocation +getopts

If you feel like the answer is rewriting the shell, the answer is practically never rewriting the shell. It's learning to use it.

caro11ne2y ago

Do you mean like:

    parallel 'sleep {= $_=rand()*10+5; =} ; possibly_flaky {}' ::: {1..5}

The {= =} escapes to perl, so you have a full programming language available.

SPBS2y ago· 7 in thread

xargs is more useful because it's posix so you can always guarantee it to be there (whereas with GNU Parallel you probably have to reach for a package manager to install it first). The ergonomics are worse though, as usual.

ketanmaheshwari2y ago

The entirety of GNU Parallel is just one Perl program. It could be copied over and used in a pinch. The installation itself is very simple and no special dependencies or privileges are needed.

em5002y ago

Except Perl isn't always present by default either (e.g. in Arch Linux or FreeBSD).

2 more replies

bloopernova2y ago

See my comment above, there's a shell version you can store in your project repository and use wherever you want with zero installation!

https://news.ycombinator.com/item?id=37208250

Joel_Mckay2y ago

Indeed, xargs can be a better option, but it has trouble doing some tasks efficiently.

For example, translating a large list of IPv4 ranges into a standard format for a firewall rule-set parser:

cat ~/blacklist.p2p | parallel --ungroup --eta --jobs 20 "ipcalc {} | sed '2!d' " | grep -Ev '^(0.|255.|127.)' >> ~/blacklist_p2p_converted

Makes an annoyingly slow task tolerable, as parallel doesn't block while fetching to preserve order. We probably should rewrite this to be more efficient, but this task is run infrequently.

Happy computing =)

CJefferson2y ago

Last time I checked (which was a few years ago, admittedly), some popular ystem's xargs were too old to support parallelism -- Mac in particular.

krackers2y ago

This is not the case I think, xargs on mac supports parallel, and does so back to 10.9 or older

adrian_b2y ago

GNU Parallel has been created precisely for solving some deficiencies of xargs.

While there are cases when it makes sense to stick to what is specified by POSIX, there are also cases when the POSIX specification is so obsolete that using POSIX instead of some free ubiquitous programs is a big mistake.

Among these latter cases are writing scripts for a POSIX shell instead of writing them for bash and using xargs instead of parallel.

green-orca2y ago· 6 in thread

I'm using task spooler a lot for parallel background processing. What I like the most it the ability to add further tasks to the queue after processing has already started.

https://manpages.ubuntu.com/manpages/xenial/man1/tsp.1.html

pdimitar2y ago

Never knew about this, thanks! I'll definitely try it because `parallel` has bitten me before in a few more advanced cases. It has rough edges here and there.

NelsonMinar2y ago

Wow this tool is fantastic, thank you! The UI is very nice and simple. How has this not existed in Unix for 30+ years?

https://github.com/justanhduc/task-spooler

gjvc2y ago

from that man page, there is a name clash with "ts" from moreutils

codetrotter2y ago

I installed task-spooler just now, because I’ve been wanting something like this for a long time.

It looks like the actual name of the task-spooler command on Debian after install is “tsp”, not “ts”. So no collision :)

Now it just remains to be seen if the package by default allows the tasks to continue to run after I log out, or if systemd will annoyingly kill the tasks after I disconnect from ssh the same way systemd annoyingly kills my “screen” sessions when I disconnect ssh, and there is some cumbersome thing you have to do on each of your systems to have systemd not kill “screen” :(

1 more reply

chriswarbo2y ago

Some distros rename the binary to 'tsp' (I think Debian does that)

1 more reply

aftbit2y ago

moreutils also clashes with parallel, does it not? i remember installing some package for chronic and thus breaking GNU parallel, at least back in the late 2010s.

1 more reply

ssddanbrown2y ago· 6 in thread

Love finding a good use-case of parallel as an easy way to gain massive time savings, especially on the modern high-threaded CPUs of today. Most recently found it useful when batch-compressing large jpeg images to smaller webp files, via use with find and ImageMagick:

   find ./ -type f -iname '*.jpg' -size +1M -print0 | parallel -0 mogrify -format webp -quality 80 {}

wiredfool2y ago

Xargs is a nearly drop in replacement and probably already installed by default in most distros. You may need the -n 1 (one file per) and -P to parallelize.

  xargs -n 1 -P 8

c-hendricks2y ago

find + xargs has become my go-to "process files in parallel". Tho now I'm wondering if I should be using `-n` instead of `-L`

    #!/usr/bin/env bash
    set -e

    main() {
      if [ "$1" = "handle-file" ]; then
        shift
        handle-file "$@"
      else
        find . \
          -type f \
          -not -path '*/optimized/*' \
          -print0 \
          | xargs \
            -0 \
            -L 1 \
            -P 8 \
            -I {} \
            bash -c "cd \"$PWD\" && \"$0\" handle-file \"{}\""
      fi
    }

    handle-file() {
      echo "handle-file $1 ..."
    }

    main "$@"

1 more reply

indymike2y ago

Actually, parallel is a drop in for xargs as xargs has been around longer. Parallel has a few big improvements:

* Grouped output (prevents one process from writing output in the middle of another's output) * In-order output (task a output first, task b output second even though they ran in parallel) * Better handling of special characters * Remote execution

More here: https://www.gnu.org/software/parallel/parallel_alternatives....

toastal2y ago

You should batch compress to JPEG XL too with cjxl --lossless_jpeg=1 --quality=80 --effort=9 {} {/.}.jxl (or magick)

asicsp2y ago

Any particular reason to use -print0 and pipe instead of -exec?

Gabrys12y ago

-exec would not be parallel, pipe to parallel makes it parallel

1 more reply

cusspvz2y ago· 6 in thread

You guys know that in bash you can use `&` to pass a foreground terminal process to the background and then use `wait` to wait for all the session's background process to end, right?

bduffany2y ago

Yes, and those work well for smaller workloads, but if you just run 1,000,000 commands with `&` in a `for` loop, it will grind your computer to a halt (if the tasks are modestly resource intensive). GNU parallel will let you run those same 1,000,000 tasks but make sure that only (e.g.) 16 of them are running at once. It's not easy to do that in bash.

oniony2y ago

I've been using `&` to run stuff in the background for donkeys, but had no idea about `wait`.

seized2y ago

And that's not really at all comparable to what Parallel can do.... Bash can't do that across thousands of cores on separate machines for example.

da-x2y ago

It takes time to notice that if you do _several_ of these background jobs with `&`, you will only get the exit status of the last one when you do `wait`. Errors of the others will be swallowed.

Then you _have_ resort to 'wait <pid>' with the 20 lines of bash coded need to manage all those PIDs. I have a large editor bash snippet just for that.

remram2y ago

Strong "Dropbox is just rsync it'll never sell" vibes.

Alifatisk2y ago

Didn’t know about wait

b0afc375b52y ago· 5 in thread

What about & and wait? Could it have been an adequate alternative?

capableweb2y ago

Probably for very simple use cases, but the real power in parallel really comes from the myriad of switches that enables so much more than what "&" and "wait" could do.

Here are a bunch of examples: https://www.gnu.org/software/parallel/parallel_examples.html

A fun one I end up using ~monthly or so for various things (usually with more switches added as needed):

    GNU Parallel as queue system/batch manager

    # start queue
    true >jobqueue; tail -n+0 -f jobqueue | parallel

    # add job
    echo my_command my_arg >> jobqueue

    # to start queue for remote execution
    true >jobqueue; tail -n+0 -f jobqueue | parallel -S ..

klyrs2y ago

When I'm using parallel, it's usually because I have thousands of jobs. Worse, they have nontrivial memory requirements. When you background processes with &, the system starts timeslicing. Each process gets to allocate its memory before being paused to make time for the next process. Your system will almost immediately crumple under load. Hopefully, the oom killer will target your backgrounded jobs... but the script spawning them will go untouched because it isn't the thing hogging memory.

Before I learned of parallel, I tried a hack where I'd manually assemble jobs into batches, and wait on the batches before starting the next. It achieved very low system utilization, because inevitably, one job each the batch takes much longer than the rest. A slight improvement (still not good), is to use `split` to chop your jobs file into $num_cores chunks, and background each chunk. But still, this gets low utilization. Problem being that you aren't using a thread/worker pool.

Parallel (or, TIL, xargs) can maintain 100% system utilization, until the very last $num_cores jobs.

eisbaw2y ago

No, that is more messy and can easily leave lingering processes.

But it can be done in pure BASH: https://gist.github.com/mped-oticon/b11dafa937e694ce4fa6fbf2...

GNU parallel supports expansion, which bash_parallel doesn't. However bash_parallel works with bash functions, which GNU parallel doesn't.

untilted2y ago

GNU parallel supports bash functions, provided you "export -f" them beforehand

agumonkey2y ago

You just taught me something

amelius2y ago· 5 in thread

Another reminder that you shouldn't use Bash to write scripts.

E.g. in Python this would all be very easy to do. Just start a bunch of threads and e.g. invoke subprocess.run() from them.

bloopernova2y ago

You don't have to reinvent the wheel for your script, all the parallel options are ready for you to use and are well documented. It's also packed with features that might take a long time to write into your Python script.

I am trying to use Python by default when writing scripts nowadays, but sometimes the best tool for the job isn't Python or writing your own Python.

imajoredinecon2y ago

IMO, effective "scripting" just means the ability to solve ad hoc problems easily by writing task-specific glue that delegates the hard parts of the program to (1) an effective set of libraries you've written yourself and (2) external code or tools when it makes sense.

From this perspective, the languages of the glue, the libraries, and the external code all matter less than the ease of writing the glue; interfacing with the external code; and maintaining the libraries. The best language for this probably comes down to a combination of what you're comfortable writing (and reading, and maintaining) and what kinds of tasks you're trying to solve.

For me personally, using Python glue and libraries strikes a pretty good balance here. Writing a script "in Python" doesn't mean you need to reinvent the wheel. If you think `parallel` provides a better interface for map-reduce parallelism than `subprocess` (or than a library function you've written on top of `subprocess`), no problem: you can just call `parallel` from Python (and you'll probably find yourself writing a library function on top of it to abstract away the fact that it's a shell script).

But if you're much more effective working in Bash than Python, then writing your glue and developing your libraries in Bash could be the way to go.

amelius2y ago

Well, writing such a script takes me only a few minutes maybe and gives me a lot of flexibility.

dagw2y ago

start a bunch of threads and e.g. invoke subprocess.run() from them

Done that many, many times and honestly combining python with parallel is in many cases the best way to go. Write your python script to be as fast as possible on one core and then use parallel to run it on all your cores. This has the added advantage that you can go from running on all the cores on your machine to running on all the cores on a 100 machine cluster by just changing a couple of lines of code.

toyg2y ago

subprocess.run is likely to be significantly slower than a low-level dedicated utility like parallel, and adding a lot of flakyness and overhead. I'm a big pythonaro but one should always use the best tool for the job.

Aissen2y ago· 3 in thread

GNU parallel is great for the kind of tasks highlighted in the post. Note that being written in Perl, it's slower than its simpler C counterpart moreutils parallel. And that in many uses cases xargs --max-procs=$(nproc) can replace it.

astrodust2y ago

`xargs` has you covered in more cases than most realize.

cb3212y ago

This really is true and you may be understating with "most". Here are a couple:

    mkdir /tmp/g
    seq 1 10 | tr \\n \\0 |
      xargs -0n2 -P4 bash -c 't=$EPOCHREALTIME; sleep $((RANDOM%5)); echo "$@" >/tmp/g/$t' d0
    cat /tmp/g/*

Another one is

    xargs -P "$(nproc)" --process-slot-var=s sh -c 'grep X "$@" >>/tmp/g.$s' d0
    cat /tmp/g.*

You can also cobble together that second style with a custom config setup wherein a command is given $s and responds with some host names and there might be an `ssh` in front of the `grep`, for example. That `d0` argument (for $0) is a bit janky and there can be shell quoting issues, of course. But then again, you may not have hostile filenames/whatever. Remote loadavg adaptation might be nice, but then again, maybe you control all the remotes. Similarly, I could not get back-to-back executions of the EPOCHREALTIME thing closer than 250 microseconds. So, collision basically will not happen even though it probably could in theory.

cstrahan2y ago

I also recommend checking out `xe`: https://github.com/leahneukirchen/xe

It’s like xargs with sane defaults and a couple tricks of its own.

bloopernova2y ago· 3 in thread

There's a shell script version of GNU parallel that's great for CI/CD pipeline tasks. You just keep it in your repo and source it as needed. It's incredibly useful, we use it in one build to batch process a few thousand things in groups of 25.

Edited to add: finally got signed in to work, you create the script via:

    parallel --embed > scriptname.sh

It's about 14,000 lines of awesome and works on "ash, bash, dash, ksh, sh, and zsh"

notpachet2y ago

Maybe this is a silly question, but what advantage do you get from checking that huge file into VC instead of just installing parallel ahead of time on the CI images?

bloopernova2y ago

Not a silly question!

In this case, we don't have control over the docker images used to build our apps.

ilyt2y ago

Parallel was born way before docker and modern CI practices. Having one script that did it all was more of a benefit before those become commonplace

ketanmaheshwari2y ago· 2 in thread

GNU Parallel has been one of my go to tool to accomplish more on the terminal. Generate test data, transferring data from one node to another using rsync, run many-task, embarrassingly parallel jobs on HPC, pipelines with simple data dependencies but run over hundreds or files are some of the places where I use GNU Parallel.

Many thanks to Ole Tange for developing the wonderful tool and helping the users on Stack Overflow sites to this day.

Shameless plug, I am developing a tutorial on GNU Parallel to be presented at eScience conference in Cyprus this year: https://www.escience-conference.org/2023/tutorials/gnu_paral...

juujian2y ago

I'm surprised the CPU would in any way be the bottleneck for transferring data. Is it really faster to parallelize that?

Ultimatt2y ago

It's more GNU Parallel has host groups in a config so you can send files for a job to the right one where its going to execute and bring things back. Essentially it can turn a local xargs type job into any kind of remote task execution including dealing with files locally needing to be remote.

titzer2y ago· 2 in thread

I didn't know about this, and reading through the comments, I found out that xargs can also do batching and parallelism (nice!). However, it appears that if you pipe the output of an xargs-parallel command into another utility, it jumbles the output of the multiple subprocesses, whereas GNU parallel does not.

I was a little put off by the annoying/scary citation issue mentioned by another commenter, so I am not sure I will use parallel.

I want to pipe the output of parallel processes into a utility that I wrote for progress printing (https://github.com/titzer/progress), but I think that neither of these solutions work; my progress utility will have to do this on its own.

cb3212y ago

You can probably do something that creates as many FIFOs as you have parallelism and just be careful about emitting whole records like https://github.com/c-blake/bu/blob/main/doc/funnel.md . That one's Nim, but the meat is only like 50 lines and easily ported to C like your progress tool. ( EDIT: and it will also probably be drastically lower overhead than `parallel` which has over 70X worse time overhead and 10X the RAM overhead of tools written in fast, native-compiled languages: https://github.com/c-blake/bu/blob/main/tests/strench.sh )

titzer2y ago

Thanks for the suggestion!

1 more reply

jooz2y ago· 2 in thread

I try to use it last week to run 10 instances of curl against a webserver.

I was expecting something simple as 'parallel -j10 curl https://whatever' but couldnt find the right syntax in less time that took me to prepare a dirty shell script that did the same.

brabel2y ago

If you want a simple load testing tool for HTTP, use wrk2[1].

    wrk -t2 -c100 -d30s -R2000 http://127.0.0.1:8080/index.html

> This runs a benchmark for 30 seconds, using 2 threads, keeping 100 HTTP connections open, and a constant throughput of 2000 requests per second (total, across all connections combined).

Some distros include `ab`[2] which is also good, but wrk2 improves on it (and on wrk version 1) in multiple ways, so that's what I use myself.

[1] https://github.com/giltene/wrk2

[2] https://httpd.apache.org/docs/2.2/programs/ab.html

b5n2y ago

Quick solution:

    parallel -j 10 curl 2> /dev/null \
        ::: $(for i in {1..10};do echo 'https://whatever.com';done)

quickthrower22y ago· 2 in thread

It is sort if a shame that tools can’t figure out how to parallelize things without being herded like cattle to do so.

It might be a culture thing. In .NET code I see people running things in parallel a lot within code but maybe this is less so for linux tools.

Maybe functional programming style could lend to a parallel-first programming style, with heuristics to decide when it isn’t worth it.

pdimitar2y ago

You seem a bit behind or too invested in C# in particular. Elixir for example can run stuff in parallel with just 3-4 added lines of code added to an otherwise sequential code.

quickthrower22y ago

Yes of course other programming languages can do this. I was more referring to culture and idioms. The point is that tools don't support it or think about it. And that is because probably things work for most small use cases without it, and that it is a leaky abstraction - you need to change your code to support it.

Imagine a world where there were only GPUs for example - then everyone by default would be running parallel-first code, and in that imaginary world you would need to do nothing to run a series of bash commands piping into each other in parallel.

TZubiri2y ago· 1 in thread

First paragraph: I want to test my tests.

Second paragraph: I want to test my test-tester.

OP 100% fell down a rabbit-hole.

latchkey2y ago

Exactly! I was kind of shaking my head over this one...

"they execute extensive scenarios against a live service over HTTP"

Any time I've seen people think they've needed to test live services, over HTTP... it means that there are far deeper issues.

ranting-moth2y ago· 1 in thread

Learning Parallel pays high dividends for the rest of your life.

bloopernova2y ago

Similarly with the command line in general. Yet you'd think it was torture to some developers I know!

herrkanin2y ago· 1 in thread

I have wanted to parallelize my .zshrc file for a while – all those environment setup scripts for nvm, pyenv, starship, etc really makes the startup time noticably slow. Does anyone know how to do this?

dahart2y ago

Ooh nice thought. I’m not certain, but I kinda doubt it’s possible, because those startup scripts need to modify the current shell environment. I believe GNU parallel runs in a subshell and launching new tasks in separate processes, so fundamentally doesn’t operate the same way that e.g. sourcing the nvm script does, unfortunately. Even if there was some way to hack it, I’d be nervous about changing environment variables in parallel, to me that sounds like asking for really nasty race condition bugs.

toastal2y ago· 1 in thread

I use this with Nix all the time. Great utility.

tomberek2y ago

Especially with the remote SSH features one needs a way to ensure everything needed for your process is on the target machine; Nix makes this easy.

Nix + GDAL + GNUParallel + autoscaling groups === massive geospatial data processing pipeline

rhysrhaven2y ago

I much prefer rush over parallel. Namely that everything is executed as a bash shell.

https://github.com/shenwei356/rush

Decabytes2y ago

I’ve been writing a lot of PowerShell recently and discovered the ForEach-Object cmdlets with the -parallel parameter and it has been addicting to parallelize my scripts, so I totally understand why parallelizing using a command line tool is attractive

asicsp2y ago

Didn't know about the book: https://zenodo.org/record/1146014 (discussed 4 years back: https://news.ycombinator.com/item?id=20726631)

See also https://hn.algolia.com/?q=gnu+parallel for other related discussions.

AvImd2y ago

If none of the examples from the article work, make sure you are running GNU Parallel and not an identically named utility from moreutils.

pimpl2y ago

Having a layer of parallelisation on top of good old sequential code seems like a very neat idea. It resolves headaches of learning how to run code in parallel in languages that aren’t necessarily my primary language (e.g. short, one-off scripts). Thanks for sharing!!

ogou2y ago

Someone gifted an old blade server to me a few years ago. Very slow, but 16 cores and 24 gig of RAM. At the time I was making a lot of video art with ffmpeg, without a GPU. That version of ffmpeg wasn't optimized for multiple cores so rendering was really slow and sequential. I discovered Parallel and set the server to process large videos with most of the cores in parallel. Voila, it chewed through a massive amount of media fairly quickly. Faster than the hard drives actually.

bcjordan2y ago

Folks who are here and interested in parallelization for CI/CD may also be interested in Dagger.io — I had heard about it on HN over the years but not played w it. It's basically a more fine-grained Docker-like executor with better caching and utilities for spinning up services and running tests.

Curious if anyone else has experiences with it, honestly been surprised at how little I've heard about it

jamietanna2y ago

One thing I've used parallel before is to add the ability to add straightforward retry mechanisms, and it was great! https://www.jvt.me/posts/2022/04/28/shell-queue/

figomore2y ago

I use GNU Parallel to render Blender videos distributed by a bunch of nodes https://github.com/tfmoraes/blender_gnu_parallel_render

rubicks2y ago

I can appreciate that GNU parallel exists. I always use `xargs -P0` in my own work, though.

sneak2y ago

See also: ppss (parallel processing shell script) https://github.com/louwrentius/PPSS#

nateb20222y ago

There's also PaSh: https://github.com/binpash/pash

grepfru_it2y ago

The same can be implemented with just bash using jobs and wait. Useful if parallel is not available in your pipeline

heinrichhartman2y ago

As the answer to the question was not actually given in the post:

    /usr/bin/parallel

aquir2y ago

"Do one thing and do it well"

nullc2y ago

parallel is great but its default behaviors never quite seem to match my needs, so every time I use it I have to spend some time consulting the man page. Fortunately, the man page is more than up to the task.

But because of the mini learning curve on each use and because I find I need a little more boiler plate to use parallel, I use xargs -P more often, only using parallel when I need its special features (e.g. multiple hosts or collating the output streams).

Oh also, parallel itself can be a bit of a resource hog. (Obviously that depends a lot on how you're using it-- but I mean in cases where xargs' usage is unnoticeable I sometimes have to change the size of my jobs to get parallel out of the way).

jp572y ago

Seems like you could accomplish the same thing more cleanly (IMO) with make. You can create a target for each test, which can be done with patterns, and then use `make -j` to run them in parallel.

morbidious2y ago

Looks like a great tool!

Thanks for the link to the book: https://zenodo.org/record/1146014

michaelcampbell2y ago

parallel is one of those tools like jq, to me. It's great, but by the time I've grokked the syntax, AGAIN, I'd've been quicker to write a quick shell/ruby/python script to do it that's almost readable.

timtom392y ago

Love the tool. One of my favorite snippets adds parallel processing to jq

#!/bin/bash

cat - | parallel --line-buffer --pipe --roundrobin jq "$@"

pmarreck2y ago

HIPS (Hiding In Plain Sight)!

lfconsult2y ago

Wonderful... Thanks for sharing.

j / k navigate · click thread line to collapse

262 comments

148 comments · 41 top-level

throwaway2774322y ago· 33 in thread

Is the author still adding the "cite me or pay 10000€" notice to the output? And calling that GPL?

And still answering every xargs Stackoverflow question with "you should use GNU Parallel" instead of answering the question? That really gets old quickly when googling for xarg answers.

These are just some of the reasons I'll never use parallel. xargs is perfectly fine for most usecases, and it can do everything I need it to.

dahart2y ago

> Is the author still adding the "cite me or pay 10000€" notice to the output? And calling that GPL?

> These are just some of the reasons I’ll never use parallel.

thomasahle2y ago

> it’s not tradition to write citations for tools used to conduct research.

Thia is true, but it also makes it very hard for academics and PhD students who mainly write software over papers. They get no citations and eventually have to leave academia.

3 more replies

foldr2y ago

> it’s not tradition to write citations for tools used to conduct research

2 more replies

rlpb2y ago

> and the citation notice is a one-time thing you can silence permanently

This doesn't scale. Imagine if all the software you used nagged you and had their own individual methods to silence them. I don't think this would be reasonable.

What makes this particular software so special?

3 more replies

Eiim2y ago

Citing tools is maybe not tradition, but it's gaining popularity. R, for example, has 320,000 cites. (https://scholar.google.com/scholar?cites=1605575084878280829...)

2 more replies

5424582y ago

> IIRC the citation notice was cleared by Stallman as GPL compatible

Furthermore, trying to retaliate against people who (as permitted by the GPL) remove the citation notice, as the author here has done, seems very contrary to the spirit of the GPL.

1 more reply

flanked-evergl2y ago

> IIRC the citation notice was cleared by Stallman as GPL.

I really hope that whomever adjudicate these disputes regarding licence agreements doesn't care what a random person says about it.

2 more replies

5e92cb50239222b2y ago

I wanted to say 'not anymore', but it turns out that some distributions remove that message.

https://gitlab.archlinux.org/archlinux/packaging/packages/pa...

Debian too (thanks to iib for pointing this out)

https://salsa.debian.org/med-team/parallel/-/tree/master/deb...

And looks like the author is aware of both:

https://gitlab.archlinux.org/archlinux/packaging/packages/pa...

ilyt2y ago

I kinda want to submit patch to GNU Parallel that changes "Hall of Shame" to "Hall of Heroes".

But yeah, if guy wants to have the name of the app mentioned there is a BSD license for that I thin...

iib2y ago

I think Debian also does this, I didn't see it when using the latest version

raverbashing2y ago

    -    # *YOU* will be harming free software by removing the notice.  You  
    -    # accept to be added to a public hall of shame by removing the
    -    # line. That includes you, George and Andreas.

The Open source way of buzzing a contestant

dsissitka2y ago

It's in current Fedora's:

  [david@pc ~]$ echo foo | parallel echo
  Academic tradition requires you to cite works you base your article on.
  If you use programs that use GNU Parallel to process data for an article in a
  scientific publication, please cite:
  
    Tange, O. (2023, July 22). GNU Parallel 20230722 ('Приго́жин').
    Zenodo. https://doi.org/10.5281/zenodo.8175685
  
  This helps funding further development; AND IT WON'T COST YOU A CENT.
  If you pay 10000 EUR you should feel free to use GNU Parallel without citing.
  
  More about funding GNU Parallel and the citation notice:
  https://www.gnu.org/software/parallel/parallel_design.html#citation-notice
  
  To silence this citation notice: run 'parallel --citation' once.
  
  foo
  [david@pc ~]$

5e92cb50239222b2y ago

  > Tange, O. (2023, July 22). GNU Parallel 20230722 ('Приго́жин').

Looks like the latest release is named after Prigozhin. Yeah, probably that one, although I couldn't find anything in the mailing list to confirm it.

https://en.wikipedia.org/wiki/Yevgeny_Prigozhin

edit: all releases are named after current political events:

https://git.savannah.gnu.org/cgit/parallel.git/refs/

1 more reply

cnr2y ago

Does 'Приго́жин' relate somehow to infamous 'Евгений Пригожин'. Or am I just biased?

3 more replies

Matl2y ago

> And calling that GPL?

You can add any message you want into your GPL program. Also, a GPL program does not have to be free.

This has nothing to do with the GPL. You can say in your program that 'by using this software you agree that you're a a cat' and license it under the GPL.

That does not mean the GPL relates to cats in any way.

5424582y ago

You can add all the extra restrictions you want, but they effectively won’t do anything. Expecting both the GPL and the additional restrictions to apply is a violation of section 7 of the GPL.

Debian explains this further in their patch file.

akerl_2y ago

Can you back that up?

3 more replies

Bjartr2y ago

The message isn't part of the license, and it's phrased in a way that wouldn't be binding if it were.

It says "please cite" and "feel free to not cite if you pay".

It doesn't say "must cite" or "you may only not cite if you pay".

IANAL, but it doesn't seem like it would interact with the GPL at all. So the worst that could be said is that the implementation is annoying or in poor taste.

1 more reply

paulmd2y ago

But it’s a straightforward clickwrap agreement, even if the terms are non-monetary the GPL simply doesn’t allow these at all. Can’t place any stipulations on how the user uses the software.

Matl2y ago

What gives?

electroly2y ago

2 more replies

dahart2y ago

lolinder2y ago

[0] https://gitlab.archlinux.org/archlinux/packaging/packages/pa...

akerl_2y ago

You'll note there are also plenty of "defenders" on this page. It turns out the community is made up of people with a wide range of opinions.

1 more reply

smcleod2y ago

1 more reply

philshem2y ago

Here it’s discussed on the gnu mailing list in 2013:

https://lists.gnu.org/archive/html/parallel/2013-11/msg00006...

daveguy2y ago

Rush is an alternative to gnu parallel that is MIT licensed:

https://GitHub.com/shenwei356/rush

As you mention xargs has parallel capabilities and gargs is Apache licensed software that fixes some of xargs shortcomings:

https://GitHub.com/brentp/gargs

No reason to use gnu parallel.

dec0dedab0de2y ago

Spamming stack overflow is bullshit, but I'm fine with citations and selling exceptions.

tempaccount4202y ago

Why not just cite it?

capableweb2y ago

> Is the author still adding the "cite me or pay 10000€" notice to the output? And calling that GPL?

You could just ignore the citation and not break the license, no one would think less of you for doing so.

badsectoracula2y ago

> Where you get the "or pay 10000€" part from?

Most likely from the manpage:

    If you use --will-cite in scripts to be run by others you are
    making it harder for others to see the citation notice.  The
    development of GNU parallel is indirectly financed through
    citations, so if your users do not know they should cite then you
    are making it harder to finance development. However, if you pay
    10000 EUR, you have done your part to finance future development
    and should feel free to use --will-cite in scripts.
    
    If you do not want to help financing future development by letting
    other users see the citation notice or by paying, then please
    consider using another tool instead of GNU parallel. You can find
    some of the alternatives in man parallel_alternatives.

FWIW some distros remove the nagging message (e.g. mine - openSUSE - has it removed and the patch seems to come from Debian so i'd guess Debian and its derivatives also remove it).

1 more reply

akerl_2y ago

https://gitlab.archlinux.org/archlinux/packaging/packages/pa...

"If you pay 10000 EUR you should feel free to use GNU Parallel without citing."

1 more reply

rcxdude2y ago

The author apparently will think less of you, as made abundantly clear by the tone of the message and especially the comments around it in the code

1 more reply

zackmorris2y ago· 12 in thread

The closest thing now is Apple's M1 line, but it has specialized NN and GPU cores, so missed out on the potential of true symmetric multiprocessing.

pradn2y ago

2) Shared resources (in-memory mutable data, hardware devices) mean the ratio of contention to CPU work goes up when you have more cores.

JonChesterfield2y ago

This exists now. Some AI accelerators are a grid of independent compute units with their own memory, message passing between them. Graphcore's IPU is an instance.

zackmorris2y ago

If there's no straightforward way to do that, then I'm afraid that hardware represents a huge investment in the wrong direction.

Because a GPU can be built from the general-purpose multicore CPU I'm talking about. But a CPU can't be built from a GPU.

MawKKe2y ago

1000 cores?? I don't have 100 cores! What do you even need 10 cores for? Well, here's 4 cores. Give 2 to your brother. Don't go wasting all those hyper threads all at once!

Intel ca. 2010, probably

giantrobot2y ago

Also Intel: ECC memory support? In this economy?

imtringued2y ago

The real problem we are facing is that our programming models aren't parallel by default.

>By Moore's Law, we could have had MIPS machines with 1000 cores around 2010, and 100,000 to 1 million cores today, for under $1000.

https://corescore.store/

zackmorris2y ago

That's really awesome, thank you!

1 more reply

FuckButtons2y ago

Arguing about where we should be based on a projection of an empirical exponential curve seems pretty irrational. Nothing in reality is exponential forever.

dragontamer2y ago

Typical GPUs are easily 6000+ shaders (aka kinda-sorta like cores) on the more expensive end.

At least, 6000+ 32-bit multiplies per clock tick on ~2GHz+ clocks. Even cheap GPUs easily are 2000+ shaders.

> GPUs have around 32 or 64 physical cores

NVidia SMs and AMD WGPs are not "cores", they are... weird things. They have many shaders inside of them and have huge amounts of parallelism.

---------

Back 20 years ago, you'd only have 1x multiplier on a CPU core like a Pentium 4, maybe as many as 4x with the 128-bit SSE instructions.

But today, even 1x core from Intel (3x 512-bit SIMD) or 1x core from AMD (4x 256-bit SIMD) has many, many, many more parallel elements compared to a 2004-era CPU core.

imtringued2y ago

>NVidia SMs and AMD WGPs are not "cores", they are... weird things. They have many shaders inside of them and have huge amounts of parallelism.

They aren't weird things. They are the equivalent of CPU cores. By your logic CPU cores aren't CPU cores, "they are... weird things" because of SMT.

1 more reply

_a_a_a_2y ago

https://en.wikipedia.org/wiki/Cray_X-MP

   Price US$7.9 million in 1977 (equivalent to $38.2 million in 2022)
   Weight 5.5 tons (Cray-1A)
   Power 115 kW @ 208 V 400 Hz[1]
   CPU 64-bit processor @ 80 MHz[1]
   Memory 8.39 Megabytes (up to 1 048 576 words)[1]
   Storage 303 Megabytes (DD19 Unit)[1]
   FLOPS 160 MFLOPS

In 2070 it still won't be enough for you. It never will be enough.

gumby2y ago

Have you considered finding a Connection Machine?

BoppreH2y ago· 9 in thread

It's a nice tool, but it also shows the shortcomings of shell commands.

In a proper programming language, we'd have something like

    parallel [1..5], i => { sleep random()*10+5; possibly_flaky i }
    // [{"Seq": 4, "Host": ":", "Starttime": 1692491267...

And `parallel` would only have to worry about parallelization.

    ls | where type == dir | par-each { |it|
        { name: $it.name, len: (ls $it.name | length) }
    }

[1] https://www.oilshell.org/release/latest/doc/ysh-tour.html

[2] https://github.com/nushell/nushell

[3] https://fishshell.com/

JNRowe2y ago

[I'm not recommending this, but maybe… No, no. I'm not sure…]

For example, performing your first example with zargs would use regular option separators(`--`), regular expansion(`{1..5}`), and standard shell constructs for the commands to execute.

I'll contrive up an example based around your file counter, but slightly different to show some other functionality.

    f() { fs=($1/*(.)); jo $1=$#fs }
    zargs -P 32 -n1 -- **/*(/) -- f

Edit to add: I'm not suggesting zargs is a replacement for parallel, but if you're only using a small subset of its functionality then it may be able to replace that.

¹ https://zsh.sourceforge.io/Doc/Release/User-Contributions.ht...

² https://github.com/jpmens/jo

³ https://github.com/stedolan/jq

coliveira2y ago

runeks2y ago

If doing even simple things requires looking up documentation, why does it matter whether the shell script is long or short?

Spending extra time doing simple things — because you need to Google e.g. "how to pass multiple space-separated arguments from a string to a command" — is also a waste of time.

1 more reply

lysium2y ago

Do you recommend any good alternative when your shell program gets too large?

11 more replies

chasil2y ago

GNU Parallel is also based on perl, so the footprint is quite large.

GNU xargs implements limited parallelization, and is compiled C. This functionality is present within busybox, including the Windows version.

https://www.linuxjournal.com/content/parallel-shells-xargs-u...

GNU Parallel will have much greater functionality, but it will not reach as far as xargs.

reddit_clone2y ago

> GNU Parallel is also based on perl

Time to rewrite it in Rust /s

1 more reply

mistrial92y ago

meanwhile, python DASK is very well funded to be cloud-native, and also local.. however it relies on a python runtime, so you know .. also not sure about the DASK license terms

salawat2y ago

If you feel like the answer is rewriting the shell, the answer is practically never rewriting the shell. It's learning to use it.

caro11ne2y ago

Do you mean like:

    parallel 'sleep {= $_=rand()*10+5; =} ; possibly_flaky {}' ::: {1..5}

The {= =} escapes to perl, so you have a full programming language available.

SPBS2y ago· 7 in thread

ketanmaheshwari2y ago

The entirety of GNU Parallel is just one Perl program. It could be copied over and used in a pinch. The installation itself is very simple and no special dependencies or privileges are needed.

em5002y ago

Except Perl isn't always present by default either (e.g. in Arch Linux or FreeBSD).

2 more replies

bloopernova2y ago

See my comment above, there's a shell version you can store in your project repository and use wherever you want with zero installation!

https://news.ycombinator.com/item?id=37208250

Joel_Mckay2y ago

Indeed, xargs can be a better option, but it has trouble doing some tasks efficiently.

For example, translating a large list of IPv4 ranges into a standard format for a firewall rule-set parser:

cat ~/blacklist.p2p | parallel --ungroup --eta --jobs 20 "ipcalc {} | sed '2!d' " | grep -Ev '^(0.|255.|127.)' >> ~/blacklist_p2p_converted

Makes an annoyingly slow task tolerable, as parallel doesn't block while fetching to preserve order. We probably should rewrite this to be more efficient, but this task is run infrequently.

Happy computing =)

CJefferson2y ago

Last time I checked (which was a few years ago, admittedly), some popular ystem's xargs were too old to support parallelism -- Mac in particular.

krackers2y ago

This is not the case I think, xargs on mac supports parallel, and does so back to 10.9 or older

adrian_b2y ago

GNU Parallel has been created precisely for solving some deficiencies of xargs.

Among these latter cases are writing scripts for a POSIX shell instead of writing them for bash and using xargs instead of parallel.

green-orca2y ago· 6 in thread

I'm using task spooler a lot for parallel background processing. What I like the most it the ability to add further tasks to the queue after processing has already started.

https://manpages.ubuntu.com/manpages/xenial/man1/tsp.1.html

pdimitar2y ago

Never knew about this, thanks! I'll definitely try it because `parallel` has bitten me before in a few more advanced cases. It has rough edges here and there.

NelsonMinar2y ago

Wow this tool is fantastic, thank you! The UI is very nice and simple. How has this not existed in Unix for 30+ years?

https://github.com/justanhduc/task-spooler

gjvc2y ago

from that man page, there is a name clash with "ts" from moreutils

codetrotter2y ago

I installed task-spooler just now, because I’ve been wanting something like this for a long time.

It looks like the actual name of the task-spooler command on Debian after install is “tsp”, not “ts”. So no collision :)

1 more reply

chriswarbo2y ago

Some distros rename the binary to 'tsp' (I think Debian does that)

1 more reply

aftbit2y ago

moreutils also clashes with parallel, does it not? i remember installing some package for chronic and thus breaking GNU parallel, at least back in the late 2010s.

1 more reply

ssddanbrown2y ago· 6 in thread

   find ./ -type f -iname '*.jpg' -size +1M -print0 | parallel -0 mogrify -format webp -quality 80 {}

wiredfool2y ago

Xargs is a nearly drop in replacement and probably already installed by default in most distros. You may need the -n 1 (one file per) and -P to parallelize.

  xargs -n 1 -P 8

c-hendricks2y ago

find + xargs has become my go-to "process files in parallel". Tho now I'm wondering if I should be using `-n` instead of `-L`

    #!/usr/bin/env bash
    set -e

    main() {
      if [ "$1" = "handle-file" ]; then
        shift
        handle-file "$@"
      else
        find . \
          -type f \
          -not -path '*/optimized/*' \
          -print0 \
          | xargs \
            -0 \
            -L 1 \
            -P 8 \
            -I {} \
            bash -c "cd \"$PWD\" && \"$0\" handle-file \"{}\""
      fi
    }

    handle-file() {
      echo "handle-file $1 ..."
    }

    main "$@"

1 more reply

indymike2y ago

Actually, parallel is a drop in for xargs as xargs has been around longer. Parallel has a few big improvements:

More here: https://www.gnu.org/software/parallel/parallel_alternatives....

toastal2y ago

You should batch compress to JPEG XL too with cjxl --lossless_jpeg=1 --quality=80 --effort=9 {} {/.}.jxl (or magick)

asicsp2y ago

Any particular reason to use -print0 and pipe instead of -exec?

Gabrys12y ago

-exec would not be parallel, pipe to parallel makes it parallel

1 more reply

cusspvz2y ago· 6 in thread

You guys know that in bash you can use `&` to pass a foreground terminal process to the background and then use `wait` to wait for all the session's background process to end, right?

bduffany2y ago

oniony2y ago

I've been using `&` to run stuff in the background for donkeys, but had no idea about `wait`.

seized2y ago

And that's not really at all comparable to what Parallel can do.... Bash can't do that across thousands of cores on separate machines for example.

da-x2y ago

It takes time to notice that if you do _several_ of these background jobs with `&`, you will only get the exit status of the last one when you do `wait`. Errors of the others will be swallowed.

Then you _have_ resort to 'wait <pid>' with the 20 lines of bash coded need to manage all those PIDs. I have a large editor bash snippet just for that.

remram2y ago

Strong "Dropbox is just rsync it'll never sell" vibes.

Alifatisk2y ago

Didn’t know about wait

b0afc375b52y ago· 5 in thread

What about & and wait? Could it have been an adequate alternative?

capableweb2y ago

Probably for very simple use cases, but the real power in parallel really comes from the myriad of switches that enables so much more than what "&" and "wait" could do.

Here are a bunch of examples: https://www.gnu.org/software/parallel/parallel_examples.html

A fun one I end up using ~monthly or so for various things (usually with more switches added as needed):

    GNU Parallel as queue system/batch manager

    # start queue
    true >jobqueue; tail -n+0 -f jobqueue | parallel

    # add job
    echo my_command my_arg >> jobqueue

    # to start queue for remote execution
    true >jobqueue; tail -n+0 -f jobqueue | parallel -S ..

klyrs2y ago

Parallel (or, TIL, xargs) can maintain 100% system utilization, until the very last $num_cores jobs.

eisbaw2y ago

No, that is more messy and can easily leave lingering processes.

But it can be done in pure BASH: https://gist.github.com/mped-oticon/b11dafa937e694ce4fa6fbf2...

GNU parallel supports expansion, which bash_parallel doesn't. However bash_parallel works with bash functions, which GNU parallel doesn't.

untilted2y ago

GNU parallel supports bash functions, provided you "export -f" them beforehand

agumonkey2y ago

You just taught me something

amelius2y ago· 5 in thread

Another reminder that you shouldn't use Bash to write scripts.

E.g. in Python this would all be very easy to do. Just start a bunch of threads and e.g. invoke subprocess.run() from them.

bloopernova2y ago

I am trying to use Python by default when writing scripts nowadays, but sometimes the best tool for the job isn't Python or writing your own Python.

imajoredinecon2y ago

But if you're much more effective working in Bash than Python, then writing your glue and developing your libraries in Bash could be the way to go.

amelius2y ago

Well, writing such a script takes me only a few minutes maybe and gives me a lot of flexibility.

dagw2y ago

start a bunch of threads and e.g. invoke subprocess.run() from them

toyg2y ago

Aissen2y ago· 3 in thread

astrodust2y ago

`xargs` has you covered in more cases than most realize.

cb3212y ago

This really is true and you may be understating with "most". Here are a couple:

    mkdir /tmp/g
    seq 1 10 | tr \\n \\0 |
      xargs -0n2 -P4 bash -c 't=$EPOCHREALTIME; sleep $((RANDOM%5)); echo "$@" >/tmp/g/$t' d0
    cat /tmp/g/*

Another one is

    xargs -P "$(nproc)" --process-slot-var=s sh -c 'grep X "$@" >>/tmp/g.$s' d0
    cat /tmp/g.*

cstrahan2y ago

I also recommend checking out `xe`: https://github.com/leahneukirchen/xe

It’s like xargs with sane defaults and a couple tricks of its own.

bloopernova2y ago· 3 in thread

Edited to add: finally got signed in to work, you create the script via:

    parallel --embed > scriptname.sh

It's about 14,000 lines of awesome and works on "ash, bash, dash, ksh, sh, and zsh"

notpachet2y ago

Maybe this is a silly question, but what advantage do you get from checking that huge file into VC instead of just installing parallel ahead of time on the CI images?

bloopernova2y ago

Not a silly question!

In this case, we don't have control over the docker images used to build our apps.

ilyt2y ago

Parallel was born way before docker and modern CI practices. Having one script that did it all was more of a benefit before those become commonplace

ketanmaheshwari2y ago· 2 in thread

Many thanks to Ole Tange for developing the wonderful tool and helping the users on Stack Overflow sites to this day.

Shameless plug, I am developing a tutorial on GNU Parallel to be presented at eScience conference in Cyprus this year: https://www.escience-conference.org/2023/tutorials/gnu_paral...

juujian2y ago

I'm surprised the CPU would in any way be the bottleneck for transferring data. Is it really faster to parallelize that?

Ultimatt2y ago

titzer2y ago· 2 in thread

I was a little put off by the annoying/scary citation issue mentioned by another commenter, so I am not sure I will use parallel.

cb3212y ago

titzer2y ago

Thanks for the suggestion!

1 more reply

jooz2y ago· 2 in thread

I try to use it last week to run 10 instances of curl against a webserver.

I was expecting something simple as 'parallel -j10 curl https://whatever' but couldnt find the right syntax in less time that took me to prepare a dirty shell script that did the same.

brabel2y ago

If you want a simple load testing tool for HTTP, use wrk2[1].

    wrk -t2 -c100 -d30s -R2000 http://127.0.0.1:8080/index.html

> This runs a benchmark for 30 seconds, using 2 threads, keeping 100 HTTP connections open, and a constant throughput of 2000 requests per second (total, across all connections combined).

Some distros include `ab`[2] which is also good, but wrk2 improves on it (and on wrk version 1) in multiple ways, so that's what I use myself.

[1] https://github.com/giltene/wrk2

[2] https://httpd.apache.org/docs/2.2/programs/ab.html

b5n2y ago

Quick solution:

    parallel -j 10 curl 2> /dev/null \
        ::: $(for i in {1..10};do echo 'https://whatever.com';done)

quickthrower22y ago· 2 in thread

It is sort if a shame that tools can’t figure out how to parallelize things without being herded like cattle to do so.

It might be a culture thing. In .NET code I see people running things in parallel a lot within code but maybe this is less so for linux tools.

Maybe functional programming style could lend to a parallel-first programming style, with heuristics to decide when it isn’t worth it.

pdimitar2y ago

You seem a bit behind or too invested in C# in particular. Elixir for example can run stuff in parallel with just 3-4 added lines of code added to an otherwise sequential code.

quickthrower22y ago

TZubiri2y ago· 1 in thread

First paragraph: I want to test my tests.

Second paragraph: I want to test my test-tester.

OP 100% fell down a rabbit-hole.

latchkey2y ago

Exactly! I was kind of shaking my head over this one...

"they execute extensive scenarios against a live service over HTTP"

Any time I've seen people think they've needed to test live services, over HTTP... it means that there are far deeper issues.

ranting-moth2y ago· 1 in thread

Learning Parallel pays high dividends for the rest of your life.

bloopernova2y ago

Similarly with the command line in general. Yet you'd think it was torture to some developers I know!

herrkanin2y ago· 1 in thread

dahart2y ago

toastal2y ago· 1 in thread

I use this with Nix all the time. Great utility.

tomberek2y ago

Especially with the remote SSH features one needs a way to ensure everything needed for your process is on the target machine; Nix makes this easy.

Nix + GDAL + GNUParallel + autoscaling groups === massive geospatial data processing pipeline

rhysrhaven2y ago

I much prefer rush over parallel. Namely that everything is executed as a bash shell.

https://github.com/shenwei356/rush

Decabytes2y ago

asicsp2y ago

Didn't know about the book: https://zenodo.org/record/1146014 (discussed 4 years back: https://news.ycombinator.com/item?id=20726631)

See also https://hn.algolia.com/?q=gnu+parallel for other related discussions.

AvImd2y ago

If none of the examples from the article work, make sure you are running GNU Parallel and not an identically named utility from moreutils.

pimpl2y ago

ogou2y ago

bcjordan2y ago

Curious if anyone else has experiences with it, honestly been surprised at how little I've heard about it

jamietanna2y ago

One thing I've used parallel before is to add the ability to add straightforward retry mechanisms, and it was great! https://www.jvt.me/posts/2022/04/28/shell-queue/

figomore2y ago

I use GNU Parallel to render Blender videos distributed by a bunch of nodes https://github.com/tfmoraes/blender_gnu_parallel_render

rubicks2y ago

I can appreciate that GNU parallel exists. I always use `xargs -P0` in my own work, though.

sneak2y ago

See also: ppss (parallel processing shell script) https://github.com/louwrentius/PPSS#

nateb20222y ago

There's also PaSh: https://github.com/binpash/pash

grepfru_it2y ago

The same can be implemented with just bash using jobs and wait. Useful if parallel is not available in your pipeline

heinrichhartman2y ago

As the answer to the question was not actually given in the post:

    /usr/bin/parallel

aquir2y ago

"Do one thing and do it well"

nullc2y ago

jp572y ago

Seems like you could accomplish the same thing more cleanly (IMO) with make. You can create a target for each test, which can be done with patterns, and then use `make -j` to run them in parallel.

morbidious2y ago

Looks like a great tool!

Thanks for the link to the book: https://zenodo.org/record/1146014

michaelcampbell2y ago

timtom392y ago

Love the tool. One of my favorite snippets adds parallel processing to jq

#!/bin/bash

cat - | parallel --line-buffer --pipe --roundrobin jq "$@"

pmarreck2y ago

HIPS (Hiding In Plain Sight)!

lfconsult2y ago

Wonderful... Thanks for sharing.

j / k navigate · click thread line to collapse