The Mighty Named Pipe (opens in new tab)

(vincebuffalo.com)

385 pointsvsbuffalo11y ago96 comments

96 comments

64 comments · 16 top-level

aidos11y ago· 15 in thread

Nice article. Really easy to follow introduction.

I only discovered process substitution a few months ago but it's already become a frequently used tool in my kit.

One thing that I find a little annoying about unix commands sometimes is how hard it can be to google for them. '<()', nope, "command as file argument to other command unix," nope. The first couple of times I tried to use it, I knew it existed but struggled to find any documentation. "Damnit, I know it's something like that, how does it work again?..."

Unless you know to look for "Process Substitution" it can be hard to find information on these things. And that's once you even know these things exist....

Anyone know a good resource I should be using when I find myself in a situation like that?

david-given11y ago

Be aware that process substitution (and named pipes) can bite you in the arse in some situations --- for example, if the program expects to be able to seek in the file. Pipes don't support this and the program will see it as an I/O error. This'd be fine if programs just errored out cleanly but they frequently don't check that seeking succeeds. unzip treats a named pipe as a corrupt zipfile, for example:

  $ unzip <(cat z)
  Archive:  /dev/fd/63
    End-of-central-directory signature not found.  Either this file is not
    a zipfile, or it constitutes one disk of a multi-part archive.  In the
    latter case the central directory and zipfile comment will be found on
    the last disk(s) of this archive.
  unzip:  cannot find zipfile directory in one of /dev/fd/63 or
        /dev/fd/63.zip, and cannot find /dev/fd/63.ZIP, period.

aidos11y ago

Useful. I'm assuming those are cases where normal pipes would fail too? So you can't do:

    cat z | unzip   # I know, uuoc, demo only

It's just with the process substitution you have more flexibility to shoot yourself in the foot?

2 more replies

dandelany11y ago

I often use SymbolHound in these cases - it's a search engine with support for special characters; for example: http://symbolhound.com/?q=%3C%28%29

aidos11y ago

Thanks for that. I've already made use of it since you pointed it out.

Filligree11y ago

For this in particular, try the Advanced Bash Scripting doc, http://www.tldp.org/LDP/abs/html/

There's a bunch of interesting constructs there. Most of them also apply to improved shells such as zsh, though some are just pointless there.

AceJohnny211y ago

I've seen the ABS guide criticized for being obsolete and recommending wrong or obsolete best practices. A recommended replacement is The Bash Hacker's Wiki: http://wiki.bash-hackers.org/doku.php

Which itself recommends as best (current) alternative: Greg's Bash Guide: http://mywiki.wooledge.org/BashGuide

icebraining11y ago

man pages!

  $ man bash
  /<\(

Drops you right into the Process Substitution section.

pygy_11y ago

    /<\(

For those wondering, man, which uses less as a pager, has vi-like key bindings.

"/<\(" starts a regex-based search for "<(" (you must escape the open paren).

This is the origin of the regex literal syntax in most programming languages that have them. It was first introduced by Ken Thompson in the "ed" text editor.

aidos11y ago

Ahhhh..haaa...ha....DOH! I've never even thought of looking at the manpage for bash before. Thanks, you've just made my life better.

3 more replies

Kiro11y ago

A bit OT but I don't understand why Google doesn't supply a way to do strict searches where everything you input is interpreted literally.

reitanqild11y ago

They have, I have complained loudly about this[1], never hard anything back (this is SOP I understand), but I have seen improvements last year.

Double quotes around part of a query means make sure this part is actually matched in the index. (I think they still annoy me be including sites that are linked to using this phrase[2], but that is understandable.)

Then there is the "verbatim" setting that you can activate under search tools > "All results" dropdown.

[1]:And the reason they annoyed me was because they would still fuzz my queries despite me doublequoting and choosing verbatim.

[2]: To verify this you could open the cached version and on top of the page you'd see something along the lines of: "the following terms exist only in links pointing to this page."

1 more reply

perlgeek11y ago

Because if you want to ignore punctuation and case in normal situations, you leave them out of the search index. And then you can't query the same search index for punctuation and/or case-sensitive queries.

So they'd have to create a second index for probably less than 0.01% of their queries, and that second index would be larger and harder to compress.

As much as I'd love to see a strict search, from a business perspective I don't think it makes sense to a provide one.

el_benhameen11y ago

I wish they'd supply that too, but they do seem to have gotten better at interpreting literally when it makes sense in context. I've been learning C# and have found, for example, that searches with the term "C#" return the appropriate resources when in the past I'd have probably seen results for C.

1 more reply

staticshock11y ago

#bash on freenode, and http://mywiki.wooledge.org/BashFAQ

mkramlich11y ago

books on UNIX and shells

amelius11y ago· 6 in thread

If you like pipes, then you will love lazy evaluation. It is unfortunate, though, that Unix doesn't support that (operations can block when "writing" only, not when "nobody is reading").

falcolas11y ago

If nobody is reading, you will eventually fill the pipe buffer (which is about 4k), and the writing will stop. It's a bigger queue than most of us would expect when compared to generator expressions, but it can and does create back pressure while making reads efficient.

noselasd11y ago

*about 4k

64k on linux these days.

valarauca111y ago

Lazy evaluation with pipes would be problematic because alerts/echos would be ignored by default as they are necessarily part of the stdin/stdout chain.

I.E.: This section of code

     let a = "some_file_name".as_string();
     println!("Opening: {}", a);
     let path = std::path(a);
     let mut fd = std::io::open(path);

would get optimized to

     let mut fd = std::io::open(std::path("some_file_name".as_string()));

with strict lazy evaluation. The user feedback is removed, which is a big part of shell scripting.

amelius11y ago

I guess you found another issue with Unix. The user does not care in general how something is performed, just that it is performed correctly and with good performance.

I guess an OS should be functional at its interface to the user, and only imperative deep down to keep things running efficiently.

However, note that this hypothetical functional layer on top also would ensure efficiency, as it enables lazy evaluation. This type of efficiency could under certain circumstances be even more valuable than the bare-metal performance of system programming languages.

1 more reply

ajuc11y ago

I don't know, I love pipes and I'm on the fence regarding strict vs lazy.

BTW: when is nobody reading in pipes? There's always implicit

    &> stdout

added.

EDIT: oh, right, named pipes.

jquast11y ago

If you write to a named pipe, the call to write(2) will block until somebody opens it for reading and begins to read(2) it.

mhax11y ago· 6 in thread

I've used *nix for ~15 years and never used a named pipe or process substitution before. Great to know about!

masklinn11y ago

Beware though, process substitution is not POSIX and not supported in all shells. It isn't in pdksh or ash/dash for instance.

It's a ksh93 extension adopted by bash and zsh.

a3n11y ago

Named pipes have been rare for me, but simple process substitution is every day.

Very often I do something like this in quick succession. Command line editing makes this trivial.

  $ find . -name "*blarg*.cpp"
  # Some output that looks like what I'm looking for.
  
  # Run the same find again in a process, and grep for something.
  $ grep -i "blooey" $(find . -name "*blarg*.cpp")
  
  # Yep, those are the files I'm looking for, so dig in.
  # Note the additional -l in grep, and the nested processes.
  $ vim $(grep -il "blooey" $(find . -name "*blarg*.cpp"))

tne11y ago

Same here, except I typically use $(!!) to re-run the previous command. I find it faster than command-line editing.

    $ find . -name "*blarg*.cpp"
    $ grep -i "blooey" $(!!)
    $ vim $(!! -l)

Granted, you can only append new arguments and using the other ! commands will often be less practical than editing. Still, it's amazing how frequently this is sufficient.

I've always thought it'd be nice if there was a `set` option or something similar that would make bash record command lines and cache output automatically in implicit variables, so that it doesn't re-run the commands. The semantics are definitely different and you wouldn't want this enabled at all times, but for certain kinds of sessions it would be very handy.

EDIT: lazyjones beat me to it.

1 more reply

icebraining11y ago

That's actually command substitution, not process substitution :)

2 more replies

lazyjones11y ago

> Command line editing makes this trivial.

I'm lazy, so I typically do this (2nd step) to avoid the extra key strokes necessary for editing:

  $ grep -i "blooey" $(!!)

Also very useful for avoiding editing in order to do something else with the same argument as in the previous command: !$, i.e.:

  $ foo myfile
  $ bla !$

jbnicolai11y ago

You could just use a pipe here though, which would also make it more easy to read. e.g.:

    $ find . -name '*blarg*.cpp' | grep -li blooey | vi -

2 more replies

unhammer11y ago· 4 in thread

Once you discover <() it's hard not to (ab)use it everywhere :-)

    # avoid temporary files when some program needs two inputs:
    join -e0 -o0,1.1,2.1 -a1 -a2 -j2 -t$'\t' \
      <(sort -k2,2 -t$'\t' freq/forms.${lang}) \
      <(sort -k2,2 -t$'\t' freq/lms.${lang})
    
    # gawk doesn't care if it's given a regular file or the output fd of some process:
    gawk -v dict=<(munge_dict) -f compound_translate.awk <in.txt
    
    # prepend a header:
    cat <(echo -e "${word}\t% ${lang}\tsum" | tr [:lower:] [:upper:]) \
        <(coverage ${lang})

repsilat11y ago

> # gawk doesn't care if it's given a regular file or the output fd of some process:

Something wonderful I found out the other day: Bash executes scripts as it parses them, so you can do all kinds of awful things. For starters,

    bash <(yes echo hello)

will have bash execute an infinite script that looks like

    echo hello
    echo hello
    echo hello
    ...

without trying to load the whole thing first.

After that, you can move onto having a script append to itself and whatever other dreadful things you can think of.

unhammer11y ago

That's actually one of the things that I really dislike with bash, that it doesn't read the whole script before executing it. I've been bitten by it before, when I write some long-running script, then e.g. write a comment at the top of it as it's running, and then when bash looks for the next command, it's shifted a bit and I get (at best) a syntax error and have to re-run :-(

2 more replies

thristian11y ago

Fun fact: That's how goto worked in the earliest versions of Unix, before the Bourne shell was invented: goto was an external command that would seek() in the shell-script until it found the label it was looking for, and when control returned to the shell it would just continue executing from the new location.

To this day, when the shell launches your program, you can find the shell-script it's executing as file-descriptor 255, just in case you want to play any flow-control shenanigans.

jquast11y ago

one of my favorites is using diff across output of two programs, diff thinks they are files:

diff -u <(zipinfo archive.zip.bak) <(zipinfo archive.zip)

AndrewSB11y ago· 4 in thread

Does anyone have a working link to Gary Bernhardt's The Unix Chainsaw, as mentioned in the article?

sikhnerd11y ago

I found a high quality copy I had downloaded: http://sikhnerd.com/downloaded_vids/02-gary-bernhardt.mp4 (703M)

dtmooreiv11y ago

https://www.youtube.com/watch?v=sCZJblyT_XM

agumonkey11y ago

That's the kind of video I might have downloaded. At least I hope so. Gonna check my backups.

update 1 : found it, time to upload.

AndrewSB11y ago

Thank you kind sir. Waiting for a link

1 more reply

frankerz11y ago· 4 in thread

How does the > process substitution differ from simply piping the output with | ?

For example (from Wikipedia)

tee >(wc -l >&2) < bigfile | gzip > bigfile.gz

tee < bigfile | wc -l | gzip > bigfile.gz

unhammer11y ago

Say that you have a program that splits its output into two files, each given by command line arguments. A normal run would be

    <input.txt munge-data-and-split -o1 out1.txt -o2 out2.txt

but since the output is huge and your disk is old and dying, you want to run xz on it before saving it to disk, so use >():

    <input.txt munge-data-and-split -o1 >(xz - > out1.txt) -o2 >(xz - > out2.txt)

If you want to do several things in there, I recommend defining a function for clarity:

    pp () { sort -k2,3 -t$'\t' | xz - ; }
    <input.txt munge-data-and-split -o1 >(pp > out1.txt) -o2 >(pp > out2.txt)

aidos11y ago

In the tee case the substitution is actually going somewhere different than standard out (that's what tee does).

so:

    cmd1 | tee out.txt | cmd2

So tee is splitting the stream into two outputs, one that carries on out stdout (into cmd2) and the other one that is redirected into out.txt.

With process substitution you can do extra stuff on the way out, I guess (I've never seen it used for output before).

It looks like in the example given they're writing wc stuff to stderr while zipping the content (over stdout).

Nice to see that example, I hadn't even thought about the usefulness of process substitution for outputting like this!

cnvogel11y ago

When you connect to processes in a pipe such as ...

    a | b

you connect stdout (fd #1) of a to stdin (fd #0) of b. Technically, the shell process will create a pipe, which is two filedescriptors connected back to back. It then will fork two times (create two copies of itself) where it replaces standard output (filedescriptor 1) of the first copy by one end of the pipe and replaces standard input (filedescriptor 0) of the second copy by the other end of the pipe. Then the first copy will replace itself (exec) by a, the second copy will replace itself (exec) by b. Everything that a writes to stdout will appear on stdin of b.

But nothing prevents the shell from replacing any other filedescriptor by pipes. And when you create a subprocess by writing "<(c)" in your commandline, it's just one additional fork for the shell, and one additional filedescriptor pair to be created. One side, as in the simple case, will replace stdin (fd #0) of "c"... and because the input side of this pipe doesn't have a predefined output of "a" (stdout is already taken by "|b") the shell will somehow have to tell "a" what filedescriptor the pipe uses. Under Linux one can refer to opened filedescriptors as "/dev/fd/<FDNUM>" (symlink to /proc/self/fd/<FDNUM> which itself is a symlik to /proc/<PID>/fd/<FDNUM>), so that's what's replaced as a "name" to refer to the substituted process on "a"'s command line:

Try this:

    $ echo $$
    12345  # <--- PID of your shell
    $ tee >( sort ) >( sort ) >( sort ) otherfile | sort

and in a second terminal

    $ pstree 12345 # <--- PID of your shell

    zsh,301
      ├─sort,3600 # <-- this one reads from the other end of the shell's fd #14
      ├─sort,3601 # <-- this one reads from the other end of the shell's fd #15
      ├─sort,3602 # <-- this one reads from the other end of the shell's fd #15
      ├─sort,3604 # <-- this one reads from stdout of tee
      └─tee,3603 /proc/self/fd/14 /proc/self/fd/15 /proc/self/fd/16 otherfile

If your system doesn't support the convenient /proc/self/fd/<NUM> shortcut, the shell might decide not to create a pipe, but rather create temporary fifos in /tmp and use those to connect the filedescriptors.

http://man7.org/linux/man-pages/man2/pipe.2.html

http://linux.die.net/man/2/dup

You can watch the syscalls as they are made:

    $ strace -fe fork,pipe,close,dup,dup2,execve bash -c 'tee <(sort) <(sort)'

joosters11y ago

It allows multiple, parallel pipes to each individual command, where the | allows just one.

Dewie11y ago· 3 in thread

Pipes are very cool and useful, but it's hard for me to understand this common worship of something like that. Yes, it's useful and elegant, but is it really the best thing since Jesus Christ?

AndrewWright11y ago

Maybe it's not the best thing since Jesus, but it's worth celebrating its birthday http://blog.fugue.it/2013-10-07-pipeday.html

Dewie11y ago

Wow. I guess that's what I get for not being totally enamoured of Unix.

JustSomeNobody11y ago

No, that's not why you were down voted. You were down voted because you were condescending to the people who enjoy working with *nix.

1 more reply

chuckcode11y ago· 2 in thread

Anybody know of a way to increase the buffer size of pipes? I've experienced cases where piping a really fast program to a slow one caused them both to go slower as the OS pauses first program writing when pipe buffer is full. This seemed to ruin the caching for the first program and caused them both to be slower even though normally pipes are faster as you're not touching disk.

jquast11y ago

Both mbuffer and pv by default contain fairly large in-memory buffers for pipe data, and accept parameters for particularly large buffers.

http://www.maier-komor.de/mbuffer.html http://www.ivarch.com/programs/pv.shtml

chuckcode11y ago

Thanks - hoping that there was a built in solution but a buffer program makes sense

jamesrom11y ago· 2 in thread

Is this guy a bioinformatician? I think he's a bioinformatician.

Can't be sure if he is a bioinformatician because he never really mentions that he is a bioinformatician.

pmags11y ago

Seems entirely appropriate given his blog post, and others like it on his site as well as the book he wrote, are clearly aimed at people interested in learning bioinformatics.

bsenftner11y ago

The tone made me stop reading. It reads like a child's thoughts in a comic book.

larsf11y ago· 1 in thread

Pipes are probably the original instantiation of dataflow processing (dating back to the 1960s). I gave a tech talk on some of the frameworks: https://www.youtube.com/watch?v=3oaelUXh7sE

And my company creates a cool dataflow platform - https://composableanalytics.com

noselasd11y ago

http://doc.cat-v.org/unix/pipes/ . And there's a bit more about how pipes came to be in unix here: http://cm.bell-labs.com/who/dmr/hist.html

anateus11y ago· 1 in thread

In fish shell the canonical example is this:

   diff (sort a.txt|psub) (sort b.txt|psub)

The psub command performs the process substitution.

frankerz11y ago

It seems like fish shell's ">" process substitution equivalence is not working as well as bash's though

https://github.com/fish-shell/fish-shell/issues/1786

Malarkey7311y ago

Vince Buffalo is author of the best book on bioinformatics: Bioinformatics Data Skills (O'Reilly). It's worth a read for learning unix/bash style data science of any flavour.

Or even if you think you know unix/bash and data there are new and unexpected snippets every few pages that surprise you.

dbbolton11y ago

In zsh, =(cmd) will create a temporary file, <(cmd) will create a named pipe, and $(cmd) creates a subshell. There are also fancy options that use MULTIOS. For example:

    paste <(cut -f1 file1) <(cut -f3 file2) | tee >(process1) >(process2) >/dev/null

can be re-written as:

    paste <(cut -f1 file1) <(cut -f3 file2) > >(process1) > >(process2)

http://zsh.sourceforge.net/Doc/Release/Expansion.html#Proces...

http://zsh.sourceforge.net/Doc/Release/Redirection.html#Redi...

baschism11y ago

AFAIK process substitution is a bash-ism (not part of POSIX spec for /bin/sh). I recently had to go with the slightly less wieldy named pipes in a dash environment and put the pipe setup, command execution and teardown in a script.

leni53611y ago

moreutils [1] has some really cool programs for pipe handling.

pee: tee standard input to pipes sponge: soak up standard input and write to a file ts: timestamp standard input vipe: insert a text editor into a pipe

[1] https://joeyh.name/code/moreutils/

hitlin3711y ago

i heard somewhere that go follows unix pipe link interfaces.

j / k navigate · click thread line to collapse

96 comments

64 comments · 16 top-level

aidos11y ago· 15 in thread

Nice article. Really easy to follow introduction.

I only discovered process substitution a few months ago but it's already become a frequently used tool in my kit.

Unless you know to look for "Process Substitution" it can be hard to find information on these things. And that's once you even know these things exist....

Anyone know a good resource I should be using when I find myself in a situation like that?

david-given11y ago

  $ unzip <(cat z)
  Archive:  /dev/fd/63
    End-of-central-directory signature not found.  Either this file is not
    a zipfile, or it constitutes one disk of a multi-part archive.  In the
    latter case the central directory and zipfile comment will be found on
    the last disk(s) of this archive.
  unzip:  cannot find zipfile directory in one of /dev/fd/63 or
        /dev/fd/63.zip, and cannot find /dev/fd/63.ZIP, period.

aidos11y ago

Useful. I'm assuming those are cases where normal pipes would fail too? So you can't do:

    cat z | unzip   # I know, uuoc, demo only

It's just with the process substitution you have more flexibility to shoot yourself in the foot?

2 more replies

dandelany11y ago

I often use SymbolHound in these cases - it's a search engine with support for special characters; for example: http://symbolhound.com/?q=%3C%28%29

aidos11y ago

Thanks for that. I've already made use of it since you pointed it out.

Filligree11y ago

For this in particular, try the Advanced Bash Scripting doc, http://www.tldp.org/LDP/abs/html/

There's a bunch of interesting constructs there. Most of them also apply to improved shells such as zsh, though some are just pointless there.

AceJohnny211y ago

I've seen the ABS guide criticized for being obsolete and recommending wrong or obsolete best practices. A recommended replacement is The Bash Hacker's Wiki: http://wiki.bash-hackers.org/doku.php

Which itself recommends as best (current) alternative: Greg's Bash Guide: http://mywiki.wooledge.org/BashGuide

icebraining11y ago

man pages!

  $ man bash
  /<\(

Drops you right into the Process Substitution section.

pygy_11y ago

    /<\(

For those wondering, man, which uses less as a pager, has vi-like key bindings.

"/<\(" starts a regex-based search for "<(" (you must escape the open paren).

This is the origin of the regex literal syntax in most programming languages that have them. It was first introduced by Ken Thompson in the "ed" text editor.

aidos11y ago

Ahhhh..haaa...ha....DOH! I've never even thought of looking at the manpage for bash before. Thanks, you've just made my life better.

3 more replies

Kiro11y ago

A bit OT but I don't understand why Google doesn't supply a way to do strict searches where everything you input is interpreted literally.

reitanqild11y ago

They have, I have complained loudly about this[1], never hard anything back (this is SOP I understand), but I have seen improvements last year.

Then there is the "verbatim" setting that you can activate under search tools > "All results" dropdown.

[1]:And the reason they annoyed me was because they would still fuzz my queries despite me doublequoting and choosing verbatim.

[2]: To verify this you could open the cached version and on top of the page you'd see something along the lines of: "the following terms exist only in links pointing to this page."

1 more reply

perlgeek11y ago

So they'd have to create a second index for probably less than 0.01% of their queries, and that second index would be larger and harder to compress.

As much as I'd love to see a strict search, from a business perspective I don't think it makes sense to a provide one.

el_benhameen11y ago

1 more reply

staticshock11y ago

#bash on freenode, and http://mywiki.wooledge.org/BashFAQ

mkramlich11y ago

books on UNIX and shells

amelius11y ago· 6 in thread

If you like pipes, then you will love lazy evaluation. It is unfortunate, though, that Unix doesn't support that (operations can block when "writing" only, not when "nobody is reading").

falcolas11y ago

noselasd11y ago

*about 4k

64k on linux these days.

valarauca111y ago

Lazy evaluation with pipes would be problematic because alerts/echos would be ignored by default as they are necessarily part of the stdin/stdout chain.

I.E.: This section of code

     let a = "some_file_name".as_string();
     println!("Opening: {}", a);
     let path = std::path(a);
     let mut fd = std::io::open(path);

would get optimized to

     let mut fd = std::io::open(std::path("some_file_name".as_string()));

with strict lazy evaluation. The user feedback is removed, which is a big part of shell scripting.

amelius11y ago

I guess you found another issue with Unix. The user does not care in general how something is performed, just that it is performed correctly and with good performance.

I guess an OS should be functional at its interface to the user, and only imperative deep down to keep things running efficiently.

1 more reply

ajuc11y ago

I don't know, I love pipes and I'm on the fence regarding strict vs lazy.

BTW: when is nobody reading in pipes? There's always implicit

    &> stdout

added.

EDIT: oh, right, named pipes.

jquast11y ago

If you write to a named pipe, the call to write(2) will block until somebody opens it for reading and begins to read(2) it.

mhax11y ago· 6 in thread

I've used *nix for ~15 years and never used a named pipe or process substitution before. Great to know about!

masklinn11y ago

Beware though, process substitution is not POSIX and not supported in all shells. It isn't in pdksh or ash/dash for instance.

It's a ksh93 extension adopted by bash and zsh.

a3n11y ago

Named pipes have been rare for me, but simple process substitution is every day.

Very often I do something like this in quick succession. Command line editing makes this trivial.

  $ find . -name "*blarg*.cpp"
  # Some output that looks like what I'm looking for.
  
  # Run the same find again in a process, and grep for something.
  $ grep -i "blooey" $(find . -name "*blarg*.cpp")
  
  # Yep, those are the files I'm looking for, so dig in.
  # Note the additional -l in grep, and the nested processes.
  $ vim $(grep -il "blooey" $(find . -name "*blarg*.cpp"))

tne11y ago

Same here, except I typically use $(!!) to re-run the previous command. I find it faster than command-line editing.

    $ find . -name "*blarg*.cpp"
    $ grep -i "blooey" $(!!)
    $ vim $(!! -l)

Granted, you can only append new arguments and using the other ! commands will often be less practical than editing. Still, it's amazing how frequently this is sufficient.

EDIT: lazyjones beat me to it.

1 more reply

icebraining11y ago

That's actually command substitution, not process substitution :)

2 more replies

lazyjones11y ago

> Command line editing makes this trivial.

I'm lazy, so I typically do this (2nd step) to avoid the extra key strokes necessary for editing:

  $ grep -i "blooey" $(!!)

Also very useful for avoiding editing in order to do something else with the same argument as in the previous command: !$, i.e.:

  $ foo myfile
  $ bla !$

jbnicolai11y ago

You could just use a pipe here though, which would also make it more easy to read. e.g.:

    $ find . -name '*blarg*.cpp' | grep -li blooey | vi -

2 more replies

unhammer11y ago· 4 in thread

Once you discover <() it's hard not to (ab)use it everywhere :-)

    # avoid temporary files when some program needs two inputs:
    join -e0 -o0,1.1,2.1 -a1 -a2 -j2 -t$'\t' \
      <(sort -k2,2 -t$'\t' freq/forms.${lang}) \
      <(sort -k2,2 -t$'\t' freq/lms.${lang})
    
    # gawk doesn't care if it's given a regular file or the output fd of some process:
    gawk -v dict=<(munge_dict) -f compound_translate.awk <in.txt
    
    # prepend a header:
    cat <(echo -e "${word}\t% ${lang}\tsum" | tr [:lower:] [:upper:]) \
        <(coverage ${lang})

repsilat11y ago

> # gawk doesn't care if it's given a regular file or the output fd of some process:

Something wonderful I found out the other day: Bash executes scripts as it parses them, so you can do all kinds of awful things. For starters,

    bash <(yes echo hello)

will have bash execute an infinite script that looks like

    echo hello
    echo hello
    echo hello
    ...

without trying to load the whole thing first.

After that, you can move onto having a script append to itself and whatever other dreadful things you can think of.

unhammer11y ago

2 more replies

thristian11y ago

To this day, when the shell launches your program, you can find the shell-script it's executing as file-descriptor 255, just in case you want to play any flow-control shenanigans.

jquast11y ago

one of my favorites is using diff across output of two programs, diff thinks they are files:

diff -u <(zipinfo archive.zip.bak) <(zipinfo archive.zip)

AndrewSB11y ago· 4 in thread

Does anyone have a working link to Gary Bernhardt's The Unix Chainsaw, as mentioned in the article?

sikhnerd11y ago

I found a high quality copy I had downloaded: http://sikhnerd.com/downloaded_vids/02-gary-bernhardt.mp4 (703M)

dtmooreiv11y ago

https://www.youtube.com/watch?v=sCZJblyT_XM

agumonkey11y ago

That's the kind of video I might have downloaded. At least I hope so. Gonna check my backups.

update 1 : found it, time to upload.

AndrewSB11y ago

Thank you kind sir. Waiting for a link

1 more reply

frankerz11y ago· 4 in thread

How does the > process substitution differ from simply piping the output with | ?

For example (from Wikipedia)

tee >(wc -l >&2) < bigfile | gzip > bigfile.gz

tee < bigfile | wc -l | gzip > bigfile.gz

unhammer11y ago

Say that you have a program that splits its output into two files, each given by command line arguments. A normal run would be

    <input.txt munge-data-and-split -o1 out1.txt -o2 out2.txt

but since the output is huge and your disk is old and dying, you want to run xz on it before saving it to disk, so use >():

    <input.txt munge-data-and-split -o1 >(xz - > out1.txt) -o2 >(xz - > out2.txt)

If you want to do several things in there, I recommend defining a function for clarity:

    pp () { sort -k2,3 -t$'\t' | xz - ; }
    <input.txt munge-data-and-split -o1 >(pp > out1.txt) -o2 >(pp > out2.txt)

aidos11y ago

In the tee case the substitution is actually going somewhere different than standard out (that's what tee does).

so:

    cmd1 | tee out.txt | cmd2

So tee is splitting the stream into two outputs, one that carries on out stdout (into cmd2) and the other one that is redirected into out.txt.

With process substitution you can do extra stuff on the way out, I guess (I've never seen it used for output before).

It looks like in the example given they're writing wc stuff to stderr while zipping the content (over stdout).

Nice to see that example, I hadn't even thought about the usefulness of process substitution for outputting like this!

cnvogel11y ago

When you connect to processes in a pipe such as ...

    a | b

Try this:

    $ echo $$
    12345  # <--- PID of your shell
    $ tee >( sort ) >( sort ) >( sort ) otherfile | sort

and in a second terminal

    $ pstree 12345 # <--- PID of your shell

    zsh,301
      ├─sort,3600 # <-- this one reads from the other end of the shell's fd #14
      ├─sort,3601 # <-- this one reads from the other end of the shell's fd #15
      ├─sort,3602 # <-- this one reads from the other end of the shell's fd #15
      ├─sort,3604 # <-- this one reads from stdout of tee
      └─tee,3603 /proc/self/fd/14 /proc/self/fd/15 /proc/self/fd/16 otherfile

http://man7.org/linux/man-pages/man2/pipe.2.html

http://linux.die.net/man/2/dup

You can watch the syscalls as they are made:

    $ strace -fe fork,pipe,close,dup,dup2,execve bash -c 'tee <(sort) <(sort)'

joosters11y ago

It allows multiple, parallel pipes to each individual command, where the | allows just one.

Dewie11y ago· 3 in thread

Pipes are very cool and useful, but it's hard for me to understand this common worship of something like that. Yes, it's useful and elegant, but is it really the best thing since Jesus Christ?

AndrewWright11y ago

Maybe it's not the best thing since Jesus, but it's worth celebrating its birthday http://blog.fugue.it/2013-10-07-pipeday.html

Dewie11y ago

Wow. I guess that's what I get for not being totally enamoured of Unix.

JustSomeNobody11y ago

No, that's not why you were down voted. You were down voted because you were condescending to the people who enjoy working with *nix.

1 more reply

chuckcode11y ago· 2 in thread

jquast11y ago

Both mbuffer and pv by default contain fairly large in-memory buffers for pipe data, and accept parameters for particularly large buffers.

http://www.maier-komor.de/mbuffer.html http://www.ivarch.com/programs/pv.shtml

chuckcode11y ago

Thanks - hoping that there was a built in solution but a buffer program makes sense

jamesrom11y ago· 2 in thread

Is this guy a bioinformatician? I think he's a bioinformatician.

Can't be sure if he is a bioinformatician because he never really mentions that he is a bioinformatician.

pmags11y ago

Seems entirely appropriate given his blog post, and others like it on his site as well as the book he wrote, are clearly aimed at people interested in learning bioinformatics.

bsenftner11y ago

The tone made me stop reading. It reads like a child's thoughts in a comic book.

larsf11y ago· 1 in thread

Pipes are probably the original instantiation of dataflow processing (dating back to the 1960s). I gave a tech talk on some of the frameworks: https://www.youtube.com/watch?v=3oaelUXh7sE

And my company creates a cool dataflow platform - https://composableanalytics.com

noselasd11y ago

http://doc.cat-v.org/unix/pipes/ . And there's a bit more about how pipes came to be in unix here: http://cm.bell-labs.com/who/dmr/hist.html

anateus11y ago· 1 in thread

In fish shell the canonical example is this:

   diff (sort a.txt|psub) (sort b.txt|psub)

The psub command performs the process substitution.

frankerz11y ago

It seems like fish shell's ">" process substitution equivalence is not working as well as bash's though

https://github.com/fish-shell/fish-shell/issues/1786

Malarkey7311y ago

Vince Buffalo is author of the best book on bioinformatics: Bioinformatics Data Skills (O'Reilly). It's worth a read for learning unix/bash style data science of any flavour.

Or even if you think you know unix/bash and data there are new and unexpected snippets every few pages that surprise you.

dbbolton11y ago

In zsh, =(cmd) will create a temporary file, <(cmd) will create a named pipe, and $(cmd) creates a subshell. There are also fancy options that use MULTIOS. For example:

    paste <(cut -f1 file1) <(cut -f3 file2) | tee >(process1) >(process2) >/dev/null

can be re-written as:

    paste <(cut -f1 file1) <(cut -f3 file2) > >(process1) > >(process2)

http://zsh.sourceforge.net/Doc/Release/Expansion.html#Proces...

http://zsh.sourceforge.net/Doc/Release/Redirection.html#Redi...

baschism11y ago

leni53611y ago

moreutils [1] has some really cool programs for pipe handling.

pee: tee standard input to pipes sponge: soak up standard input and write to a file ts: timestamp standard input vipe: insert a text editor into a pipe

[1] https://joeyh.name/code/moreutils/

hitlin3711y ago

i heard somewhere that go follows unix pipe link interfaces.

j / k navigate · click thread line to collapse