The API design is patently insane, but why can't there be a simple
if(nfds > FD_SETSIZE) {
errno = EINVAL;
return -1;
}
… or something to prevent "the API is garbage" from escalating all the way into "and now your memory is corrupt and the hackers are in"…?The issue is that the definition of `fd_set` has a constant size [1]. If you allocate the memory yourself, the select() system call will work with as many file descriptors as you care to pass to it. You can see that both glibc [2] and the kernel [3] support arbitrarily large arrays (well, in the kernel case you'll run into other limitations... but no memory corruption).
[1] https://github.com/bminor/glibc/blob/master/misc/sys/select....
[2] https://github.com/bminor/glibc/blob/master/sysdeps/unix/sys...
[3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin...
Of course, that's never going to actually happen on implementations people care about between ABI breaking on the one hand and the existence of poll/epoll on the other.
(My biggest concern in practice is random shitty libraries using select behind the scenes and then silently corrupting memory in processes that have more than a few file descriptors.)
It's not quite that bad: UNIX has always guaranteed open() will return the lowest unused file descriptor. So in practice, it just limits you to 1024 total open files in the process, which in all fairness probably seemed like an absurdly large number at the time it was designed.
And in all fairness, nobody was thinking of multithreading when these APIs were designed. We're lucky enough that errno mostly works as a thread local rather than a global.
https://github.com/bminor/glibc/commit/a0f33f996f7986dbf3763...
The kernel side interface probably won't change because apparently legitimate apps have been allocating fd_sets on the heap to monitor fds > 1024 and they don't want to break those:
https://sourceware.org/bugzilla/show_bug.cgi?id=10352#c7
The underlying problem was also discussed in an article by Lennart Poettering (of systemd fame) and posted to HN back in 2021:
Rightly so. Before libev, AIO or whatever where a thing, I used to run network servers 10 or 15 years or so ago with a redefined __FD_SETSIZE set to 16384 without any problems on Linux (plus appropriate proc and ulimit settings). The whole stack properly supported it, even if not officially supported.
The real problem nowadays is, people can easily receive a fd >= 1024 as you do not control them, and then put them into fd sets only supporting values up to 1023 and then you have a security problem. Plus of course, the later APIs also simply scale better beyond 16k connections.
Even though most implementations do so, there is no requirement to implement fd_set as a bitmap. It could also be an array of integers. Though this still won’t allow you to select() against an infinite number of file descriptors, it at least allows file descriptor numbers to span the full range of int.
Furthermore, there’s also no requirement that FD_*() corrupt your memory. I get it that these macros can’t return errors back to the caller, but they can always set some kind of flag in the fd_set to indicate that insertion was unsuccessful. select() could check that flag and bail out if set.
But if the address is NOT within the current thread's stack, select() assumes you know what you're doing and will allow the call, trusting you have allocated sufficient memory for the high fds in the fd_set.
No opinion if this is a good decision or not, I just think it's interesting!