1 is not the same as dup(1) (opens in new tab)

(blog.gnoack.org)

61 pointsgnoack2y ago68 comments

68 comments

Obscure fact #37: Glibc's implementation of freopen opens /proc/self/fd/<n>, when the pathname is null, e.g.:

  freopen(NULL, "w", stdout);

tsimionescu2y ago

Doesn't the Linux behavior make more sense? It basically guarantees that open() always returns a new file description. Having open() return an existing file description for some special files seems like a recipe for exploits. Dup() already exists for this purpose.

eadmund2y ago

I don’t think it makes more sense — to be honest, the Linux behaviour seems like a bug and the Plan 9 semantics seem to be what is or ought to be intended.

DSMan1952762y ago

I think it has some minor tradeoffs, like the article mentions not all of the underlying files can be opened a second time so that makes them basically unusable in this form. If the `open()` call did a dup instead then you could read/write to it in the same way as the process can regardless of the type of underlying file.

hawski2y ago

Ideally I would like to see both behaviors available, making the current way the preferred one. But AFAIK it would be hard, because /dev/fd/* are just symlinks and this would need a very special case.

tsimionescu2y ago

But what would be the use case of re-opening your own files in this way, that dup() doesn't cover?

I mean, it's cute that it's in principle possible, but what does it actually do that can't be achieved more cleanly in other ways?

kazinator2y ago

It supports the wacky semantics of the freopen function, which is probably why glibc implements freopen by opening /proc/self/fd/<n>.

Specifically, the case of freopen when the path argument is null, which was introduced in C99.

freopen does all the open-time actions such as that the "w" mode truncates the file. So that could be why. If we dup the descriptor, we have to call ftruncate to implement the "w" flag. I think if we just open the /proc item, O_TRUNC will do it for us.

The freopen function is wacky in the first place, and obviously the approach has the flaw that (as we learn from the submitted article) that sockets cannot be opened this way.

Obviously, /proc/.../fd was not designed for freopen.

/proc/<other-than-self>/fd is very useful for implementing, e.g. process substitution in Bash:

  $ diff -u <(this command) <(that command)

This calls diff with /proc/<pid>/fd/<n> /proc/<pid>/fd/<m> which refer to pipes set up by the shell. The diff program thinks it's just opening files.

/proc/self/fd/ exists because there is a /proc/self symlink to /proc/<self-pid>/ not necessarily because it's useful for a process to open its own descriptors this way. Other /proc/self things are useful, like /proc/self/exe to find out where your executable is located.

1 more reply

DSMan1952762y ago

> But what would be the use case of re-opening your own files in this way, that dup() doesn't cover?

Well you can't `dup()` an fd that you don't already have, this would let you do it. Assuming I understand the behavior right it doesn't have to be the original program calling `open()`.

If you're running a separate process and trying to debug another process that's running by peeking at its fds, then being able to just `open()` them is easier and less invasive compared to the alternatives for getting a dup'd fd (Ex. force the program to call `dup()` itself and then send the fd over a socket or something).

nrdvana2y ago

Usually to make a program do something the author didn't think of, by passing it a filename to one of its descriptors where the author only gave you a configuration of a path. For example, starting a service under a monitoring system which gives it a pipe to a logger on fd3, then telling it to write a log file of /dev/fd/3

Joker_vD2y ago

> Unix processes have file descriptors which point to file descriptions (`struct file` in Linux).

Also known as file handles referencing file objects (on Windows). Unix terminology is unnecessarily confusing in this place IMHO.

On the subject of the article itself: why was this change introduced? To give kernel support for reopen(3) since dup(3) already exists?

AshamedCaptain2y ago

I propose to reintroduce the term FCB (File Control Block ) to help clear things up :)

zare_st2y ago

Unix terminology is older. FD is a kernel-user interface based on integer map. HANDLE is a typedef'd pointer. Apples vs oranges.

larsnystrom2y ago

I can't be the only one who dreams of a radically different successor to Linux when I read stuff like this. Like, why do I need to care about file descriptors? Why is everything a file, except its not because it can be a socket, or a pipe, or a block device, or something else I haven't heard about? Why does the OS even care what my current working directory is? How are we ever going to get rid of C when the whole OS interface is defined in C? Even with the simplest program, running in a single thread (or is it pthread?), I can't control the flow of execution because suddently SIGNALS happen and who knows what'll happen then. Like, you could use select(2), but you should use pselect(2) to deal with signals, except select(2) should not be used because it can't deal with PIDs larger than 1024, so use poll(2), or maybe epoll(7), or ppoll(2) if you want to deal with signals, but what happens when you use epoll(7) and have to deal with signals? And did you notice all those numbers next to the syscall names? Wait, what do you mean syscall, is it not a function? And of course, to create a new process, you use a syscall called fork(), because processes and cutlery are intrinsically linked.

Honestly, the OS situation is a mess and I just want to not think about it, but not having an OS is not really an option.

tsimionescu2y ago

You're mixing up many entirely different topics in this rant, so it's hard to unpack.

That we use the term "file descriptors" for pointers from userspace to any kernel object, even those that are not files, is unfortunate, but ultimately just a naming quirk. Windows has a better name, "Handle", but the concept is exactly the same.

The OS includes the file system, and file systems include a notion of paths, and relative paths are really useful. So, the OS helps you by automatically resolving relative paths to your current directory, instead of forcing every application to manually keep track of this.

Linux is perhaps the only popular OS whose interface is not defined in C. All syscalls are clearly documented at the assembler level in Linux, and kept backwards compatible. All other popular OSs (Windows, MacOS, FreeBSD) have a C lib you have to dynamically link if you expect compatibility.

Even if signals weren't a thing, you'd still have to worry about processor interrupts. There is no such thing as a purely single-threaded program on any gpCPU released in the last 30+ years.

The variety of calls in Linux to handle various kinds of events is unfortunate. Windows has a slightly cleaner interface, though even there it's not ideal. Hopefully io_uring will subsume all of the current use cases.

The numbers after the syscalls are related to the man pages where they are documented. Not all that relevant.

Sycalls are not functions, they are specific APIs that the kernel provides to userspace, defined at the assembler level (you put this value in this register/stack and jump to this address/invoke this CPU interrupt). It is up to your language to wrap syscalls into functions, which may have an entirely different calling convention. A kernel can't provide APIs as language-specific functions, as Python's calling convention is vastly different from Haskell's.

Fork() has many meanings that are not related to cutlery, used in CS in other places. Fork() is also an extraordinarily terrible interface for process creation for reasons which have nothing to do with its name. I would be happy if one day Linux gets rid of this insanity and adds a CreateProcess syscall that doesn't have to pretend to copy the entire address space of the current process.

oasisaimlessly2y ago

> I would be happy if one day Linux gets rid of this insanity and adds a CreateProcess syscall that doesn't have to pretend to copy the entire address space of the current process.

fork() is going to exist forever, but posix_spawn() already exists:

https://linux.die.net/man/3/posix_spawn

tsimionescu2y ago

I think clone(), or better yet clone3(), is closer to what I had in mind, as posix_spawn() is not a syscall, it's just a utility function calling fork()/vfork()/clone() and then exec().

the84722y ago

fork+exec is great in so far as it lets you do arbitrarily complex process setup between those syscalls. APIs like posix_spawn are far more restrictive. The issue is the overhead and the restricted post-fork environment in a multi-threaded process. Rather than CreateProcess we need io_uring_spawn[0] + all relevant syscalls ported to io_uring.

https://lwn.net/Articles/908268/

tsimionescu2y ago

If custom process setup code is so common, a better abstraction would have been a CreateProcess() / StartProcess() pair, where CreateProcess() would return a struct that exposes all the necessary methods to control security, FD behavior, working directory etc, and StartProcess() would take that struct and actually run it.

1 more reply

vacuity2y ago

For the syscall raw ASM/libc debate, would it be possible to provide an interface that just does syscalls and separate that from the rest of libc? It would be more inconvenient for people using ASM, but they wouldn't have to conform to libc. I imagine it's a breaking change for everyone, so consider this in a hypothetical OS.

tsimionescu2y ago

It might be possible, but I don't think you'd gain much. Even today, you can dynamically link to libc but only use its syscall interfaces, not anything else from it, not even malloc(). I think most runtimes for GC languages work like this.

However, this still means that your process will be affected by any memory correctness issues in "libsyscall", in addition to the issues in the kernel itself. Plus, the maintainers of libsyscall would have to write it in a bizarre dialect of C that doesn't use any stdlib functions, which might price even more error prone than standard C.

It's perhaps important to note here that the parts of libc that implement syscalls in OpenBSD are not simple syscall-to-C wrappers, they can have quite a bit of code occasionally. And Windows' runtime library is even more complex than that. That's their whole point - they can keep a backwards compatible system interface in spite of significant changes at the syscall layer, probably by doing lots of small pieces of work in userspace to bridge the gap.

1 more reply

knightoffaith2y ago

Could you educate me on what's wrong with fork()?

tsimionescu2y ago

Well, there are two categories of problems.

One is that fork() is by definition a very costly operation (a copy of the entire address space of the current process), and the kernel has to do a lot of work to implement it efficiently (implementing copy-on-write clones of all of the pages of a process). And that all that work is done for nothing in the very very very common case of doing fork() + exec().

The other problem is that the semantics of fork() just fundamentally can't work properly for a multi-threaded process. In any multi-threaded process, if you do fork(), the only thing you can safely do in the child process is to call exec(). Any other call, even a printf() or some path logic, has a very good chance to lead to a deadlock, quite possibly inside malloc() itself.

So fork() as a standalone operatikn is actually an extremely niche utility (duplictaing single-threaded processes) that has been made the main way of spawning new processes. Similarly, exec() by itself is an even more niche utility, sometimes useful for "launcher" style processes.

So, instead of achieving an extremely common task (launch some binary file as a new process) using a dedicated system call, Unix has chosen to define two extremely niche syscalls that you should almost never use individually, but that together can implement this common process, but only with a lot of behind the scenes work to make it efficient.

4 more replies

jcranmer2y ago

There's basically three uses of fork.

The first is to implement POSIX shells, and that's less because this is a good design and more because shells are a wrapper around the original Unix system calls. Note that if you're designing a scripting language that isn't beholden to compatibility with /bin/sh (especially one that can be portable to OSes that don't have fork()!), then you're liable to not design it in such a way that requires you to use fork().

The second use case is an alternative to threads for parallel processing. And there are some reasons that processes can work better than threads for parallel processing. But fork() has such a bad interaction with multithreaded code [1] that you end up having to choose fork() xor threads. And as threading has become an increasingly important part of modern environments, well, given that xor choice, almost everybody is going to come down on the threads side of the equation.

The final use case, and by far the most common, is to be able to spawn a new process. This means you break up one logical system call (spawn) into two (fork + exec), the first of which semantically requires you to do a lot of work (clone memory state) that you're immediately going to throw away. Even in the case where you want to do more expansive process-twiddling magic before spawning the process, there are better designs (especially if you're willing to commit to a handle-based operating system).

Of the three use cases, one amounts to "backwards compatibility", and the other two amount to "fork() is actively fighting you". That is not the hallmark of a good API.

[1] Think things like "locks are held by threads that don't exist."

zokier2y ago

https://news.ycombinator.com/item?id=30502392 340 comments

https://news.ycombinator.com/item?id=19621799 180 comments

https://news.ycombinator.com/item?id=22462628 117 comments

https://news.ycombinator.com/item?id=8204007 314 comments

https://news.ycombinator.com/item?id=31739794 135 comments

https://news.ycombinator.com/item?id=16068305 89 comments

that's some 1000+ comments for light background reading about fork(). I think you can make some aggregate sentiment analysis from that and conclude that fork() is not great.

hnlmorg2y ago

Any sufficiently complex software ecosystem eventually ends up amassing a heap of ugliness due to assumptions made at the time that are no longer correct nor are easy to change. Web development is a great example of this: the modern browser is not that far removed from an operating system and how many quirks do they need to cater for these days plus how many footguns do web developers need to consider?

Also some of your specific concerns are impossible to resolve. Take SIGNALS for example. They're ostensibly just callback functions / events. It's very easy to do event-driven programming if all of your events are being raised by the same language runtime as the code you're writing your application in, but how do you raise an event that crosses application boundaries and where your application code would be written in a different language to the event bus (the kernel in this instance)? You ultimately end up with some kind of IPC ugliness and the best solution in the 70s was SIGNAL. Given its now core to the OS, stripping out SIGNALs from Linux would be as easy as stripping HTML from websites.

There are plenty of radically different successors to Linux though. But they all have their own rough edges too. Ultimately these things are complicated and you always end up making compromises somewhere.

Pesthuf2y ago

Careful now, implying the inventors of gets() may not be flawless beings of divine intellect whose creation has outshone whatever the second coming of Christ may end up becoming will get you a slow and painful death.

bregma2y ago

You could serialize everything through epoll(): use signalfd() to redirect signals into epoll(), use eventfd() for IPC through epoll(), etc. The kernel programming API is that everything is operated on through a file descriptor, not that everything is a file. You misunderstand.

You can program in another language other than C and avoid using GNU libc, or Musl libc, or any other libc, so avoid using the C API to talk to the kernel. Other languages like Rust and Go provide their own runtimes and avoid using the C runtime for syscalls. Syscalls are written in assembly language, or at least syscall(2) itself is because the kernel API is just marshalling and a context switch. You misunderstand.

Oh, and the fork(2) function on Linux is implemented by a libc using the clone syscall(2). The (2) is the chapter in the manual providing the documentation. You misunderstand.

vacuity2y ago

Go has tried to avoid libc where possible, but Linux is rather unusual in that raw syscalls can be done reliability with assembly code. Rust just uses libc to sidestep the hassle (and perhaps better interface with C code, not sure).

crabbone2y ago

While I think that most of your reply is pretending to not notice the real problem and making lame excuses, I specifically want to point out this:

> You can program in another language other than C and avoid using GNU libc

Linux is a set of various system interfaces. Many of which are exposed through libc functions. If you deliberately exclude libc and program something parallel to it instead, then you aren't really using Linux. You've created a hybrid system. This only reinforces OP's complaint about not being able to program in a different language, because, in other words, this means that to C language programmer Linux is available fully, and to other languages the functionality is not completely available.

Take, for example, async I/O, which is implemented in libc around other system primitives, s.a. threads. If you don't use libc, you don't have async I/O, and by extension, you don't have Linux, since Linux is supposed to have that.

thecodedmessage2y ago

This is a surprising take.

Linux as a project is the kernel. Unlike most OSes, the most commonly used libc is maintained separately. As a result, Linux the operating system is the one where it’s the most common and supported to avoid the “standard” libc because the kernel itself has a stable interface.

1 more reply

tsimionescu2y ago

This is simply false. The Linux maintainers have little idea of how different libc implementers expose the interface they provide, and there are non-C languages using those ABIs as well. There is also full documentation for how to access any piece of ABI that Linux exposes without using C.

For example, here is a site that documents all Linux syscalls on x86_64, with the exact registers that they read their arguments from, and the registers they write their results in:

https://filippo.io/linux-syscall-table/

1 more reply

bregma2y ago

Linux is not exclusively GNU/Linux. For example, Musl/Linux (eg. Alpine Linux) and Android/Linux are widespread. The Linux ABI is unchanged.

Legitimate Linux applications are written in Go all the time. Linux applications are often written in Rust and use the Linux ABI without going through any libc. They seem to be doing mighty fine in terms of "being Linux".

This sounds like some variant of the True Scotsman fallacy.

1 more reply

rrdharan2y ago

https://en.m.wikipedia.org/wiki/The_UNIX-HATERS_Handbook

jmprspret2y ago

Ever looked into 9front?

moody__2y ago

Out of curiosity I decided to test these assumptions on 9front and things work as the author would have expected:

cpu% ./dup -proc > out; cat out; echo

cpu% ./dup -dup > out; cat out; echo

The slightly modified code can be found here: http://okturing.com/src/18632/body

This seems to be another case of Linux attempting to emulate Plan 9 design and not quite hitting the mark. In general these operating system interfaces are much more consistent and sane within the Plan 9 environment at (seemingly) every turn. I think a lot of folks see the implementation quirks of Linux and decide its time to toss the baby out with the bath water, but studying Plan 9 really shows how nice things could have been.

thriftwy2y ago

> he whole OS interface is defined in C

What do you think about io_uring? I believe it moves away from OS interface as (C) function calls.

ykonstant2y ago

What is the state of io_uring? Is it being actively used or are there kinks to work out first?

Hendrikto2y ago

Well both. It is already in wide use, but still has some kinks.

1 more reply

stefan_2y ago

fork() is truly the epitome of terrible design. Yet all the OS books praise it as some genius - tells you how little research has progressed...

yau8edq12i2y ago

Have you read an "OS book" written recently?

stefan_2y ago

We shall open the Tanenbaum, now in its fifth edition circa 2023, no doubt still taught to students all over the world:

> Processes are created in Linux in an especially simple manner. The fork system call creates an exact copy of the original process.

> The Linux sequence of clone plus exec is yet more orthogonal, since even more fine-grained building blocks are available. As a general rule, having a small number of orthogonal elements that can be combined in many ways leads to a small, simple, and elegant system.

Yup, still the same bullshit. It is not elegant, it is not fine-grained, it is not simple, it is certainly not small.

1 more reply

zokier2y ago

Fuchsia exists.

j / k navigate · click thread line to collapse

68 comments

kazinator2y ago

Obscure fact #37: Glibc's implementation of freopen opens /proc/self/fd/<n>, when the pathname is null, e.g.:

  freopen(NULL, "w", stdout);

tsimionescu2y ago

eadmund2y ago

I don’t think it makes more sense — to be honest, the Linux behaviour seems like a bug and the Plan 9 semantics seem to be what is or ought to be intended.

DSMan1952762y ago

hawski2y ago

tsimionescu2y ago

But what would be the use case of re-opening your own files in this way, that dup() doesn't cover?

I mean, it's cute that it's in principle possible, but what does it actually do that can't be achieved more cleanly in other ways?

kazinator2y ago

It supports the wacky semantics of the freopen function, which is probably why glibc implements freopen by opening /proc/self/fd/<n>.

Specifically, the case of freopen when the path argument is null, which was introduced in C99.

The freopen function is wacky in the first place, and obviously the approach has the flaw that (as we learn from the submitted article) that sockets cannot be opened this way.

Obviously, /proc/.../fd was not designed for freopen.

/proc/<other-than-self>/fd is very useful for implementing, e.g. process substitution in Bash:

  $ diff -u <(this command) <(that command)

This calls diff with /proc/<pid>/fd/<n> /proc/<pid>/fd/<m> which refer to pipes set up by the shell. The diff program thinks it's just opening files.

1 more reply

DSMan1952762y ago

> But what would be the use case of re-opening your own files in this way, that dup() doesn't cover?

Well you can't `dup()` an fd that you don't already have, this would let you do it. Assuming I understand the behavior right it doesn't have to be the original program calling `open()`.

nrdvana2y ago

Joker_vD2y ago

> Unix processes have file descriptors which point to file descriptions (`struct file` in Linux).

Also known as file handles referencing file objects (on Windows). Unix terminology is unnecessarily confusing in this place IMHO.

On the subject of the article itself: why was this change introduced? To give kernel support for reopen(3) since dup(3) already exists?

AshamedCaptain2y ago

I propose to reintroduce the term FCB (File Control Block ) to help clear things up :)

zare_st2y ago

Unix terminology is older. FD is a kernel-user interface based on integer map. HANDLE is a typedef'd pointer. Apples vs oranges.

larsnystrom2y ago

Honestly, the OS situation is a mess and I just want to not think about it, but not having an OS is not really an option.

tsimionescu2y ago

You're mixing up many entirely different topics in this rant, so it's hard to unpack.

Even if signals weren't a thing, you'd still have to worry about processor interrupts. There is no such thing as a purely single-threaded program on any gpCPU released in the last 30+ years.

The numbers after the syscalls are related to the man pages where they are documented. Not all that relevant.

oasisaimlessly2y ago

> I would be happy if one day Linux gets rid of this insanity and adds a CreateProcess syscall that doesn't have to pretend to copy the entire address space of the current process.

fork() is going to exist forever, but posix_spawn() already exists:

https://linux.die.net/man/3/posix_spawn

tsimionescu2y ago

I think clone(), or better yet clone3(), is closer to what I had in mind, as posix_spawn() is not a syscall, it's just a utility function calling fork()/vfork()/clone() and then exec().

the84722y ago

https://lwn.net/Articles/908268/

tsimionescu2y ago

1 more reply

vacuity2y ago

tsimionescu2y ago

1 more reply

knightoffaith2y ago

Could you educate me on what's wrong with fork()?

tsimionescu2y ago

Well, there are two categories of problems.

4 more replies

jcranmer2y ago

There's basically three uses of fork.

Of the three use cases, one amounts to "backwards compatibility", and the other two amount to "fork() is actively fighting you". That is not the hallmark of a good API.

[1] Think things like "locks are held by threads that don't exist."

zokier2y ago

https://news.ycombinator.com/item?id=30502392 340 comments

https://news.ycombinator.com/item?id=19621799 180 comments

https://news.ycombinator.com/item?id=22462628 117 comments

https://news.ycombinator.com/item?id=8204007 314 comments

https://news.ycombinator.com/item?id=31739794 135 comments

https://news.ycombinator.com/item?id=16068305 89 comments

that's some 1000+ comments for light background reading about fork(). I think you can make some aggregate sentiment analysis from that and conclude that fork() is not great.

hnlmorg2y ago

Pesthuf2y ago

bregma2y ago

Oh, and the fork(2) function on Linux is implemented by a libc using the clone syscall(2). The (2) is the chapter in the manual providing the documentation. You misunderstand.

vacuity2y ago

crabbone2y ago

While I think that most of your reply is pretending to not notice the real problem and making lame excuses, I specifically want to point out this:

> You can program in another language other than C and avoid using GNU libc

thecodedmessage2y ago

This is a surprising take.

1 more reply

tsimionescu2y ago

For example, here is a site that documents all Linux syscalls on x86_64, with the exact registers that they read their arguments from, and the registers they write their results in:

https://filippo.io/linux-syscall-table/

1 more reply

bregma2y ago

Linux is not exclusively GNU/Linux. For example, Musl/Linux (eg. Alpine Linux) and Android/Linux are widespread. The Linux ABI is unchanged.

This sounds like some variant of the True Scotsman fallacy.

1 more reply

rrdharan2y ago

https://en.m.wikipedia.org/wiki/The_UNIX-HATERS_Handbook

jmprspret2y ago

Ever looked into 9front?

moody__2y ago

Out of curiosity I decided to test these assumptions on 9front and things work as the author would have expected:

cpu% ./dup -proc > out; cat out; echo

cpu% ./dup -dup > out; cat out; echo

The slightly modified code can be found here: http://okturing.com/src/18632/body

thriftwy2y ago

> he whole OS interface is defined in C

What do you think about io_uring? I believe it moves away from OS interface as (C) function calls.

ykonstant2y ago

What is the state of io_uring? Is it being actively used or are there kinks to work out first?

Hendrikto2y ago

Well both. It is already in wide use, but still has some kinks.

1 more reply

stefan_2y ago

fork() is truly the epitome of terrible design. Yet all the OS books praise it as some genius - tells you how little research has progressed...

yau8edq12i2y ago

Have you read an "OS book" written recently?

stefan_2y ago

We shall open the Tanenbaum, now in its fifth edition circa 2023, no doubt still taught to students all over the world:

> Processes are created in Linux in an especially simple manner. The fork system call creates an exact copy of the original process.

Yup, still the same bullshit. It is not elegant, it is not fine-grained, it is not simple, it is certainly not small.

1 more reply

zokier2y ago

Fuchsia exists.

j / k navigate · click thread line to collapse