How do Unix pipes work? (opens in new tab)

(vegardstikbakke.com)

269 pointsv3gas6y ago61 comments

61 comments

29 comments · 9 top-level

nneonneo6y ago· 11 in thread

This article promotes bad practices for dealing with SIGPIPE.

1. Closing stderr in Python is not a good idea because that’ll swallow any other errors that occur at exit. Redirecting stdout to devnull is really just a way to prevent the flushed output from going to the now-closed stdout and triggering another SIGPIPE. That’s more preferable than closing stderr and losing error output at exit.

2. Ignoring SIGPIPE is a terrible idea for a process that should do stream processing. Try making a yes clone and ignoring SIGPIPE - your process will likely run forever trying to shove “y” into a closed pipe. There’s a reason SIGPIPE was invented! Very few programs bother to check the return value from write/printf/etc.

seneca6y ago

Agreed. I like articles like this because they show the learning process, which I think is super valuable. However, they really need a big disclaimer stating the author is experimenting and doesn't know the correct answer. Otherwise people stumble upon it and take it as authoritative.

v3gasOP6y ago

Great point, I should probably add that disclaimer!

pixelbeat__6y ago

Agreed. Correct handling of SIGPIPE is quite subtle and often done incorrectly. I've some notes on SIGPIPE considerations at:

http://www.pixelbeat.org/programming/sigpipe_handling.html

loeg6y ago

> 1. Closing stderr in Python is not a good idea because that’ll swallow any other errors that occur at exit. Redirecting stdout to devnull is really just a way to prevent the flushed output from going to the now-closed stdout and triggering another SIGPIPE. That’s more preferable than closing stderr and losing error output at exit.

I don't follow. The standard for EPIPE/SIGPIPE handling is to silently exit with an error status. It's fine to close stderr to prevent spurious warning messages about flushing stdout.

> 2. Ignoring SIGPIPE is a terrible idea for a process that should do stream processing. Try making a yes clone and ignoring SIGPIPE - your process will likely run forever trying to shove “y” into a closed pipe. There’s a reason SIGPIPE was invented! Very few programs bother to check the return value from write/printf/etc.

Programs can correctly handle lost pipes masking SIGPIPE entirely, with error checking alone. Python's BrokenPipeError is raised on the basis of EPIPE, not SIGPIPE.

Re: programs not checking error returns of write() and close(): that is not really true in a language like Python with exceptions raised on IO errors. It always does the check, and the unwinder aborts the program if nothing handles the error. Sigpipe is completely unnecessary for Python programs. (It's also not necessary for C programs, but I guess AT&T didn't want to fix their programs to check for errors.)

nneonneo6y ago

The suppression of SIGPIPE was done in the Go code, not in the Python code.

Does the POSIX standard mandate that programs receiving EPIPE/SIGPIPE die silently? I don’t know of such a rule, and there’s plenty of programs that violate this. Python is a bit too verbose with the errors (with a full trace back and two copies of the error) so suppressing those errors somehow seems like a good idea for a general-purpose command line tool.

3 more replies

asveikau6y ago

The biggest reason I can think of not to close stderr (or any other of the standard handles) is that the next open(2) call is likely to get that same descriptor recycled. So now joe random open file is going to receive all the error messages from random libraries, or perhaps even from unrelated programs you may have forked with that file as fd 2.

developer26y ago

The biggest takeaway here IMO is that Python breaks the standard contract regarding signals–at least for SIGPIPE. Python should not be catching and throwing an exception for SIGPIPE; it should simply exit immediately, which is literally the default POSIX behaviour... unless a script/program/process specifically installs a signal handler to perform cleanup before exit. Python has some pretty awful behaviours built into it, and this is one of them.

Half of this article is not "How do Unix pipes work", but "how to fix broken SIGPIPE handling in Python".

loeg6y ago

Python doesn't catch SIGPIPE, it ignores it.

The exception is raised from a -1/EPIPE return from libc write().

I fully agree that Python is often a bad citizen in terms of signal handling — it wants to only process signals on 'the main thread', but also wants end-users to fully control signal-handling. The two ideas are sort of at-odds and in general I find handling signals in Python frustrating.

microtherion6y ago

Another bad practice, IMHO, is not to quit the program once the exception is thrown. Instead, the loop continues and the rest of the input gets fed to /dev/null

v3gasOP6y ago

Thanks! So the preferred way is to redirect to dev null?

hoytech6y ago

It's better to just do nothing and allow SIGPIPE to kill your program. That is the reason this signal exists. Python is not a good unix citizen in this case. Compare it to Perl for example where nothing special is required to do the right thing:

    $ perl -E 'say "y" while 1' | head -1
    y
    $

5 more replies

cperciva6y ago· 5 in thread

    If we cat this file, it will be printed to the terminal.
    > cat brothers_karamazov.txt
    ... many lines of text!
    ***FINIS***
    It takes a noticeable amount of time to finish.

The amount of time it takes for cat(1) to read and output the file is almost certainly insignificant. The time the author is noticing is probably related to how long it takes for his console to process the text.

kccqzy6y ago

Agreed.

This can be easily verified by putting `time` in front of the cat to measure the time taken. Even for huge text files, the wall clock time might be significant but the "user" time is likely still zero.

wolf550e6y ago

or redirect cat to /dev/null and see how fast that is

happytoexplain6y ago

This is the first thing I noticed too.

>how does cat know to stop when head is finished

I'm no expert on Unix, so correct me if I'm wrong, but surely this line of reasoning is misleading because pipes create a unidirectional data flow, so `cat` can not know anything about `head`. It does not "stop" - it passes the whole text along just as it did without the pipe. As you said, the delay comes in printing to the console, not in the `cat` command.

kyuudou6y ago

This is a great example of Useless Use of cat and why it is bad - the full text is indeed sent through the pipe simply for head to chop n initial lines.

I've actually had "developers" go "but, readability". Yea ok.

cperciva6y ago

Pipes aren't completely unidirectional. You get one bit of information flowing back: Whether the read end of the pipe is still open.

ur-whale6y ago· 4 in thread

This article only shows basic usage of pipes (this is what they mean by "how pipes works"), but doesn't explain at all "how pipe works" (as in: how are they implemented).

userbinator6y ago

It's implemented as a buffer and some associated state. A process that writes to the buffer can do so until it is full, at which point the thread is suspended (blocked on the write() call) until it is not full. The read() side is similar --- reads return successive data in the buffer unless it is empty, at which point the read() call will block.

emmelaich6y ago

ur-whale probably knows that

chaps6y ago

Same, I was hoping for some mention of /proc/PID/fd*, but nothin'.

emmelaich6y ago

/proc/ is not a fundamental to understanding Unix or Unix pipes and is not present on many Unixes.

1 more reply

mg6y ago

I had a nice surprise and learning experience, when I discovered that the output of

     (echo red; echo green 1>&2) | echo blue

is indeterministic:

http://www.gibney.de/the_output_of_linux_pipes_can_be_indete...

As it turns out, this short line and its behavior nicely demonstrate a bunch of aspects that happen under the hood when you use a pipe.

pierremenard6y ago

See Section 1.2 this & 1.3 of the MIT Unix teaching OS for a great intro to FDs and pipes: https://pdos.csail.mit.edu/6.828/2019/xv6/book-riscv-rev0.pd...

ryanmccullagh6y ago

Here's something that you should remember about using pipes and fork(2) in Python 3: By default, O_CLOEXEC is passed to the pipe(2) system from the CPython runtime.

This means, that reading the read end of the side in the parent process after you forked will not work. Thefore you should explicitly change fctl flags and remove os.O_CLOEXEC:

  fcntl.fcntl(readfd, fcntl.F_SETFL, fcntl.fcntl(readfd, fcntl.F_GETFL) & ~os.O_CLOEXEC)

kccqzy6y ago

My own rule of thumb of whether or not to ignore SIGPIPE is simple:

* If you only deal with file descriptors provided to you (stdin, stdout, stderr) as well as some files that you open (including special files like FIFOs), do not ignore SIGPIPE.

* If you deal with sophisticated file descriptors (socket(2) and pipe(2) count as sophisticated), you'd better ignore SIGPIPE, but also make sure to check for EPIPE in every single write.

In my view, SIGPIPE is a kludge so that programs that are too lazy to check for errors from write(2) (and fwrite(3) and related friends) will not waste resources. But if you are dealing with sophisticated file descriptors, there is a lot more happening than just open/read/write and a lot more error cases you must handle, and at that point the incremental cost of handling EPIPE isn't a significant addition.

RoutinePlayer6y ago

My favorite sentence from Brian Kernighan's latest book "UNIX A History and a Memoir": Pipes are the quintessential Unix invention, an elegant and efficient way to use temporary connections of programs .. so I'll read this article :-)

ilammy6y ago

Another point where you have to ignore SIGPIPE is concurrent code that handles multiple fds (say, like a web server). In this case you have to ignore the signal and process EPIPE correctly, because the signal is not associated with a particular fd so you cannot tell which one of them failed.

j / k navigate · click thread line to collapse

61 comments

29 comments · 9 top-level

nneonneo6y ago· 11 in thread

This article promotes bad practices for dealing with SIGPIPE.

seneca6y ago

v3gasOP6y ago

Great point, I should probably add that disclaimer!

pixelbeat__6y ago

Agreed. Correct handling of SIGPIPE is quite subtle and often done incorrectly. I've some notes on SIGPIPE considerations at:

http://www.pixelbeat.org/programming/sigpipe_handling.html

loeg6y ago

I don't follow. The standard for EPIPE/SIGPIPE handling is to silently exit with an error status. It's fine to close stderr to prevent spurious warning messages about flushing stdout.

Programs can correctly handle lost pipes masking SIGPIPE entirely, with error checking alone. Python's BrokenPipeError is raised on the basis of EPIPE, not SIGPIPE.

nneonneo6y ago

The suppression of SIGPIPE was done in the Go code, not in the Python code.

3 more replies

asveikau6y ago

developer26y ago

Half of this article is not "How do Unix pipes work", but "how to fix broken SIGPIPE handling in Python".

loeg6y ago

Python doesn't catch SIGPIPE, it ignores it.

The exception is raised from a -1/EPIPE return from libc write().

microtherion6y ago

Another bad practice, IMHO, is not to quit the program once the exception is thrown. Instead, the loop continues and the rest of the input gets fed to /dev/null

v3gasOP6y ago

Thanks! So the preferred way is to redirect to dev null?

hoytech6y ago

    $ perl -E 'say "y" while 1' | head -1
    y
    $

5 more replies

cperciva6y ago· 5 in thread

    If we cat this file, it will be printed to the terminal.
    > cat brothers_karamazov.txt
    ... many lines of text!
    ***FINIS***
    It takes a noticeable amount of time to finish.

kccqzy6y ago

Agreed.

wolf550e6y ago

or redirect cat to /dev/null and see how fast that is

happytoexplain6y ago

This is the first thing I noticed too.

>how does cat know to stop when head is finished

kyuudou6y ago

This is a great example of Useless Use of cat and why it is bad - the full text is indeed sent through the pipe simply for head to chop n initial lines.

I've actually had "developers" go "but, readability". Yea ok.

cperciva6y ago

Pipes aren't completely unidirectional. You get one bit of information flowing back: Whether the read end of the pipe is still open.

ur-whale6y ago· 4 in thread

This article only shows basic usage of pipes (this is what they mean by "how pipes works"), but doesn't explain at all "how pipe works" (as in: how are they implemented).

userbinator6y ago

emmelaich6y ago

ur-whale probably knows that

chaps6y ago

Same, I was hoping for some mention of /proc/PID/fd*, but nothin'.

emmelaich6y ago

/proc/ is not a fundamental to understanding Unix or Unix pipes and is not present on many Unixes.

1 more reply

mg6y ago

I had a nice surprise and learning experience, when I discovered that the output of

     (echo red; echo green 1>&2) | echo blue

is indeterministic:

http://www.gibney.de/the_output_of_linux_pipes_can_be_indete...

As it turns out, this short line and its behavior nicely demonstrate a bunch of aspects that happen under the hood when you use a pipe.

pierremenard6y ago

See Section 1.2 this & 1.3 of the MIT Unix teaching OS for a great intro to FDs and pipes: https://pdos.csail.mit.edu/6.828/2019/xv6/book-riscv-rev0.pd...

ryanmccullagh6y ago

Here's something that you should remember about using pipes and fork(2) in Python 3: By default, O_CLOEXEC is passed to the pipe(2) system from the CPython runtime.

This means, that reading the read end of the side in the parent process after you forked will not work. Thefore you should explicitly change fctl flags and remove os.O_CLOEXEC:

  fcntl.fcntl(readfd, fcntl.F_SETFL, fcntl.fcntl(readfd, fcntl.F_GETFL) & ~os.O_CLOEXEC)

kccqzy6y ago

My own rule of thumb of whether or not to ignore SIGPIPE is simple:

* If you only deal with file descriptors provided to you (stdin, stdout, stderr) as well as some files that you open (including special files like FIFOs), do not ignore SIGPIPE.

* If you deal with sophisticated file descriptors (socket(2) and pipe(2) count as sophisticated), you'd better ignore SIGPIPE, but also make sure to check for EPIPE in every single write.

RoutinePlayer6y ago

ilammy6y ago

j / k navigate · click thread line to collapse