I only discovered process substitution a few months ago but it's already become a frequently used tool in my kit.
One thing that I find a little annoying about unix commands sometimes is how hard it can be to google for them. '<()', nope, "command as file argument to other command unix," nope. The first couple of times I tried to use it, I knew it existed but struggled to find any documentation. "Damnit, I know it's something like that, how does it work again?..."
Unless you know to look for "Process Substitution" it can be hard to find information on these things. And that's once you even know these things exist....
Anyone know a good resource I should be using when I find myself in a situation like that?
$ unzip <(cat z)
Archive: /dev/fd/63
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
unzip: cannot find zipfile directory in one of /dev/fd/63 or
/dev/fd/63.zip, and cannot find /dev/fd/63.ZIP, period. cat z | unzip # I know, uuoc, demo only
It's just with the process substitution you have more flexibility to shoot yourself in the foot?There's a bunch of interesting constructs there. Most of them also apply to improved shells such as zsh, though some are just pointless there.
Which itself recommends as best (current) alternative: Greg's Bash Guide: http://mywiki.wooledge.org/BashGuide
$ man bash
/<\(
Drops you right into the Process Substitution section. /<\(
For those wondering, man, which uses less as a pager, has vi-like key bindings."/<\(" starts a regex-based search for "<(" (you must escape the open paren).
This is the origin of the regex literal syntax in most programming languages that have them. It was first introduced by Ken Thompson in the "ed" text editor.
Double quotes around part of a query means make sure this part is actually matched in the index. (I think they still annoy me be including sites that are linked to using this phrase[2], but that is understandable.)
Then there is the "verbatim" setting that you can activate under search tools > "All results" dropdown.
[1]:And the reason they annoyed me was because they would still fuzz my queries despite me doublequoting and choosing verbatim.
[2]: To verify this you could open the cached version and on top of the page you'd see something along the lines of: "the following terms exist only in links pointing to this page."
So they'd have to create a second index for probably less than 0.01% of their queries, and that second index would be larger and harder to compress.
As much as I'd love to see a strict search, from a business perspective I don't think it makes sense to a provide one.
# avoid temporary files when some program needs two inputs:
join -e0 -o0,1.1,2.1 -a1 -a2 -j2 -t$'\t' \
<(sort -k2,2 -t$'\t' freq/forms.${lang}) \
<(sort -k2,2 -t$'\t' freq/lms.${lang})
# gawk doesn't care if it's given a regular file or the output fd of some process:
gawk -v dict=<(munge_dict) -f compound_translate.awk <in.txt
# prepend a header:
cat <(echo -e "${word}\t% ${lang}\tsum" | tr [:lower:] [:upper:]) \
<(coverage ${lang})Something wonderful I found out the other day: Bash executes scripts as it parses them, so you can do all kinds of awful things. For starters,
bash <(yes echo hello)
will have bash execute an infinite script that looks like echo hello
echo hello
echo hello
...
without trying to load the whole thing first.After that, you can move onto having a script append to itself and whatever other dreadful things you can think of.
To this day, when the shell launches your program, you can find the shell-script it's executing as file-descriptor 255, just in case you want to play any flow-control shenanigans.
diff -u <(zipinfo archive.zip.bak) <(zipinfo archive.zip)
And my company creates a cool dataflow platform - https://composableanalytics.com
Or even if you think you know unix/bash and data there are new and unexpected snippets every few pages that surprise you.
paste <(cut -f1 file1) <(cut -f3 file2) | tee >(process1) >(process2) >/dev/null
can be re-written as: paste <(cut -f1 file1) <(cut -f3 file2) > >(process1) > >(process2)
http://zsh.sourceforge.net/Doc/Release/Expansion.html#Proces...http://zsh.sourceforge.net/Doc/Release/Redirection.html#Redi...
64k on linux these days.
I.E.: This section of code
let a = "some_file_name".as_string();
println!("Opening: {}", a);
let path = std::path(a);
let mut fd = std::io::open(path);
would get optimized to let mut fd = std::io::open(std::path("some_file_name".as_string()));
with strict lazy evaluation. The user feedback is removed, which is a big part of shell scripting.I guess an OS should be functional at its interface to the user, and only imperative deep down to keep things running efficiently.
However, note that this hypothetical functional layer on top also would ensure efficiency, as it enables lazy evaluation. This type of efficiency could under certain circumstances be even more valuable than the bare-metal performance of system programming languages.
BTW: when is nobody reading in pipes? There's always implicit
&> stdout
added.EDIT: oh, right, named pipes.
It's a ksh93 extension adopted by bash and zsh.
Very often I do something like this in quick succession. Command line editing makes this trivial.
$ find . -name "*blarg*.cpp"
# Some output that looks like what I'm looking for.
# Run the same find again in a process, and grep for something.
$ grep -i "blooey" $(find . -name "*blarg*.cpp")
# Yep, those are the files I'm looking for, so dig in.
# Note the additional -l in grep, and the nested processes.
$ vim $(grep -il "blooey" $(find . -name "*blarg*.cpp")) $ find . -name "*blarg*.cpp"
$ grep -i "blooey" $(!!)
$ vim $(!! -l)
Granted, you can only append new arguments and using the other ! commands will often be less practical than editing. Still, it's amazing how frequently this is sufficient.I've always thought it'd be nice if there was a `set` option or something similar that would make bash record command lines and cache output automatically in implicit variables, so that it doesn't re-run the commands. The semantics are definitely different and you wouldn't want this enabled at all times, but for certain kinds of sessions it would be very handy.
EDIT: lazyjones beat me to it.
I'm lazy, so I typically do this (2nd step) to avoid the extra key strokes necessary for editing:
$ grep -i "blooey" $(!!)
Also very useful for avoiding editing in order to do something else with the same argument as in the previous command: !$, i.e.: $ foo myfile
$ bla !$ $ find . -name '*blarg*.cpp' | grep -li blooey | vi - diff (sort a.txt|psub) (sort b.txt|psub)
The psub command performs the process substitution.update 1 : found it, time to upload.
For example (from Wikipedia)
tee >(wc -l >&2) < bigfile | gzip > bigfile.gz
vs
tee < bigfile | wc -l | gzip > bigfile.gz
<input.txt munge-data-and-split -o1 out1.txt -o2 out2.txt
but since the output is huge and your disk is old and dying, you want to run xz on it before saving it to disk, so use >(): <input.txt munge-data-and-split -o1 >(xz - > out1.txt) -o2 >(xz - > out2.txt)
If you want to do several things in there, I recommend defining a function for clarity: pp () { sort -k2,3 -t$'\t' | xz - ; }
<input.txt munge-data-and-split -o1 >(pp > out1.txt) -o2 >(pp > out2.txt)so:
cmd1 | tee out.txt | cmd2
So tee is splitting the stream into two outputs, one that carries on out stdout (into cmd2) and the other one that is redirected into out.txt.With process substitution you can do extra stuff on the way out, I guess (I've never seen it used for output before).
It looks like in the example given they're writing wc stuff to stderr while zipping the content (over stdout).
Nice to see that example, I hadn't even thought about the usefulness of process substitution for outputting like this!
a | b
you connect stdout (fd #1) of a to stdin (fd #0) of b. Technically, the shell process will create a pipe, which is two filedescriptors connected back to back. It then will fork two times (create two copies of itself) where it replaces standard output (filedescriptor 1) of the first copy by one end of the pipe and replaces standard input (filedescriptor 0) of the second copy by the other end of the pipe. Then the first copy will replace itself (exec) by a, the second copy will replace itself (exec) by b. Everything that a writes to stdout will appear on stdin of b.But nothing prevents the shell from replacing any other filedescriptor by pipes. And when you create a subprocess by writing "<(c)" in your commandline, it's just one additional fork for the shell, and one additional filedescriptor pair to be created. One side, as in the simple case, will replace stdin (fd #0) of "c"... and because the input side of this pipe doesn't have a predefined output of "a" (stdout is already taken by "|b") the shell will somehow have to tell "a" what filedescriptor the pipe uses. Under Linux one can refer to opened filedescriptors as "/dev/fd/<FDNUM>" (symlink to /proc/self/fd/<FDNUM> which itself is a symlik to /proc/<PID>/fd/<FDNUM>), so that's what's replaced as a "name" to refer to the substituted process on "a"'s command line:
Try this:
$ echo $$
12345 # <--- PID of your shell
$ tee >( sort ) >( sort ) >( sort ) otherfile | sort
and in a second terminal $ pstree 12345 # <--- PID of your shell
zsh,301
├─sort,3600 # <-- this one reads from the other end of the shell's fd #14
├─sort,3601 # <-- this one reads from the other end of the shell's fd #15
├─sort,3602 # <-- this one reads from the other end of the shell's fd #15
├─sort,3604 # <-- this one reads from stdout of tee
└─tee,3603 /proc/self/fd/14 /proc/self/fd/15 /proc/self/fd/16 otherfile
If your system doesn't support the convenient /proc/self/fd/<NUM> shortcut, the shell might decide not to create a pipe, but rather create temporary fifos in /tmp and use those to connect the filedescriptors.http://man7.org/linux/man-pages/man2/pipe.2.html
http://linux.die.net/man/2/dup
You can watch the syscalls as they are made:
$ strace -fe fork,pipe,close,dup,dup2,execve bash -c 'tee <(sort) <(sort)'http://www.maier-komor.de/mbuffer.html http://www.ivarch.com/programs/pv.shtml
Can't be sure if he is a bioinformatician because he never really mentions that he is a bioinformatician.
pee: tee standard input to pipes sponge: soak up standard input and write to a file ts: timestamp standard input vipe: insert a text editor into a pipe