Given the correct input, at least a month ago, it could do all of those things.
(I am not sure that attempting to enforce this within the file(1) binary is optimal... after all, even though the attack surface is much reduced, file(1) could still have a bug somewhere prior to the sandboxing. If you could do a "chpriv -write_to_disk -socket -run_external_program /bin/file" that the OS would enforce, that would be cool. Someone should create that.)
If by "open sockets" you mean open existing sockets in read-only mode, it's so that it can identify them as sockets. If by "open sockets" you mean create new sockets, I don't think it does do that:
https://github.com/threatstack/libmagic/search?utf8=%E2%9C%9...
> write to arbitrary files
It appears it only does this if running on OS/2 and investigating what's inside a compressed file. Under these conditions, a temporary file is necessary for platform-specific reasons:
https://github.com/threatstack/libmagic/blob/3dea7072b8d7e92...
https://github.com/threatstack/libmagic/blob/3dea7072b8d7e92...
It also writes to a non-arbitrary mmapped file (the magic database), because that's how such databases work; you query them by writing to them in a particular way:
https://github.com/threatstack/libmagic/blob/3dea7072b8d7e92...
> run external programs
I can't find any examples where it does that. Do you know of any?
This is what I am saying...given the right input, file(1) could do anything and everything. Yes, it's only due to a bug in file(1), but still that's kind of ridiculous.
We have all sorts of things in place to protect against other bugs (for example, segmentation faults), and there's 27 years of evidence that we need some more help.
Well, there's not code in file(1) to do that, but there's code that reads data in and makes decisions based on that data. Which means, if your attacker is more careful than the programmer was, you have possibly given that attacker a Turing machine.
All file needs to do is scan for some magic bytestrings, and optionally print the numbers at a handful of offsets. It currently does much more than that, which is why it's insecure and hard to fix.
If you feel the command does more than it needs to, could you call out a few examples of bloated features that you would cut?
But I haven't seen anything to disagree with file being similarly problematic. A quote like
To sum up: If somebody uses 'file' in an unconstrained OS environment on untrusted inputs, and he gets pwnd in the result, then it's not a security problem, it's an incompetence problem - and IMO it should be discussed elsewhere.
does not suggest that the program is very well designed.
Scanning for byte strings with no possibility of security flaw is a solved problem.