Why bother with argv[0]? (opens in new tab)

(wietzebeukema.nl)

248 pointswietze1y ago259 comments

259 comments

So obviously claiming that there's no good reason for process to read argv[0] is either demonstrating the author's ignorance or needs a much stronger defense; I'd be fascinated to hear how they think busybox should work on an OpenWrt box with a 16MB root filesystem.

However, I am willing to consider the discussion about whether there could be merit to restricting the ability to write that value; I could imagine a system that populated it only from the actual file name and did not allow it to be written by the parent process or the child process at runtime. The obvious place this still falls apart is that an attacker could just

    ln /bin/curl ./some\ other\ name

but there are sometimes security measures that we use even though they're less than 100% effective so it at least conceivable that this might be a trade off worth making.

gwbas1c1y ago

I agree, I think the author really shot themselves in the foot when they, at length, criticized the merits of a program using argv[0].

The real point are the security flaws in a calling program setting argv[0], because it really, really should be set by the operating system. (As a programmer, I shouldn't have to defend against these kinds of attacks. The OS should block it.)

The criticisms of valid programming practices, IMO, hurt the author's credibility and distract from the real point of the article.

nrdvana1y ago

The real security flaw is extracting a value from a process's own memory to identify what the process is. If you want a secure way to identify what a process is and where it came from, that needs to be a new feature in the OS.

argv[0] was designed to be part of the arguments to the program, and it succeeds perfectly at that task. The problem is that it has been abused by external tools as a way to identify the program just because there was no other alternative.

It has to be writable because the entire argv string (in program memory) is writable and declared as

  int main(int argc, char **argv)

not

  int main(int argc, const char **argv)

and needs to preserve back-compat. Classic C code might be calling strtok on the arguments, so that block of memory needs to remain writable.

4 more replies

shadowgovt1y ago

I see a common anti-pattern in security researchers in that they can lose sight of the human beings who operate the software.

argv[0] should be used by any logging message that purports to report the program name, because argv[0] should be a string the human recognizes as something they invoked. Taking it away would break usability.

This does, of course, imply that the program name is non-constant untrusted data. Which means we shouldn't be making security software that depends on knowing that name.

rzwitserloot1y ago

That seems unnecessarily harsh.

I don't think that's the gist of the article, but the throwaway suggestion of 'just make lots of copies, who cares about diskspace' is insufficient and thus distracts. It's.. a single line about solutions in an article that isn't _about_ solving problems, it's about highlighting a problem exists and that it's worth solving.

I read the article more as: There is __often__ no good reason to use argv[0], and it should be avoided if at all possible, and if it cannot be avoided, it would behoove the industry to work on ways to make sure in the future it can be avoidable.

For example, why in the blazes does windows taskman.exe list argv[0] in the GUI table view? That's just asking for trouble. Show the actual file path, and always an absolute one - that way you avoid confusion about which executable you're actually running, and it's just as readable if not more readable for every app _except_ those who care about argv[0], e.g. if you ran `/bin/dd` and it's actually busybox, in taskman you'd see `/bin/busybox` instead which'd be worse than seeing 'dd'. That is simple enough to solve (add an API call to update _your own process name_ or at least update your own process 'title' which interfaces like ps/taskman can use accordingly), but, now we're talking about coordinating between OS, glibc, busybox, and so on - lots of parties. I don't mind that the article doesn't delve that deep, as that wasn't the point of it. The point is simply to show the problems the kludge of 'we will show argv[0] instead of the executable name' causes.

This article feels more about explaining that in the distant past, a mistake was made with some history as to why that mistake was made and the deleterious practical effects that this mistake is causing or is likely to cause (most of them security related). It's not really about solving the problem; that presumably comes later and should be sketched out by those who are knowledgable on _that_ subject. That doesn't imply the author is ignorant or that the article is insufficiently defended. Just that it hasn't covered all aspects of what it's writing about.

toast01y ago

> Show the actual file path, and always an absolute one - that way you avoid confusion about which executable you're actually running, and it's just as readable if not more readable for every app _except_ those who care about argv[0], e.g. if you ran `/bin/dd` and it's actually busybox, in taskman you'd see `/bin/busybox` instead which'd be worse than seeing 'dd'.

This was kind of in the middle of your complaint about windows, but then you've got unixy busybox discussion.

On a unix filesystem, a file that's hard linked with multiple names has no single 'actual name'. All of the names are equally valid. You could show the filesystem and inode number, which should uniquely identify the file, but is pretty user unfriendly.

2 more replies

mbrumlow1y ago

> highlighting a problem exists

Coding bugs into your programs is not a problem it’s a bug. None of the weird arg[0] examples can happen on the shell (without escaping), only when using system calls.

The more I read the article the more I feel this is a reaction to a behavior the author did not expect and fancy them as smart therefore the last 20 years of use age of this feature are obviously wrong.

1 more reply

ArchOversight1y ago

There's the `setproctitle` in FreeBSD that is designed exactly for a process to update the information that is presented to tools such as ps.

https://man.freebsd.org/cgi/man.cgi?query=setproctitle&aprop...

2 more replies

kelnos1y ago

> it's about highlighting a problem exists and that it's worth solving.

If so, then I disagree with the premise of the article, fundamentally. I don't see a problem. If someone is writing security software and doesn't already know about the mutability of argv[0], and doesn't know that (on Linux at least) /proc/$PID/exe is the only correct way to gt the binary backing a process... well, then they have no business writing security software.

There is no problem here. The author is making a big deal about nothing, either because they have a weird axe to grind, or because they're ignorant.

Maxatar1y ago

>Show the actual file path, and always an absolute one

There are numerous reasons why this is not desirable, for example knowing whether an application was called from one symbolic link or a relative path dictates what that application's working directory is.

croes1y ago

It's easy to call something a mistake in hindsight.

You could argue the mistake was done elsewhere so this feature could be abused.

ahoka1y ago

“That is simple enough to solve (add an API call to update _your own process name_ or at least update your own process 'title' which interfaces like ps/taskman can use accordingly)“

We could call it setproctitle, or something. \s

pzmarzly1y ago

Not an author, but there's a good alternative. If busybox was edited to ignore argv[2], then applets could be called via shebangs, instead of symlinks:

    $ echo '#!/path/to/busybox echo' > myecho
    $ chmod +x myecho
    $ ./myecho 123
    ./myecho 123

Right now this doesn't work properly, because "./myecho" (argv[0]) gets placed into argv[2] of the process. Otherwise, this technique IMHO is better than symlinks:

- Each applet uses the same amount of disk space (0 blocks, i.e. the content fits into inode).

- Doesn't read or write to argv[0].

- You could finally rename the applets. This is not that useful if busybox is your only posix userspace implementation, but very useful if you want many implementations to live side-by-side. E.g. on macOS, I'd like to have readlink point to BSD/macOS's readlink, greadlink to GNU coreutil's, bbreadlink to busybox's.

But as I said, this doesn't work for now. The best you can do now is to write shell two-liners https://news.ycombinator.com/item?id=41436012. Some of such two-liners may also fit into the inode inlining limit, so that's a plus. But you will have performance penalty on every call (since sh needs to start up).

cesarb1y ago

> Each applet uses the same amount of disk space (0 blocks, i.e. the content fits into inode).

Is that really the case? AFAIK, OpenWRT uses SquashFS by default, and a quick web search tells me that "[...] In addition, inode and directory data are highly compacted, and packed on byte boundaries. Each compressed inode is on average 8 bytes in length [...]" (https://www.kernel.org/doc/html/latest/filesystems/squashfs....). That is, even if the content fits into the inode, it will make the inode use more space (they're variable-size, unlike on traditional filesystems with fixed-size inodes).

And using hardlinks (traditionally, we use hardlinks with busybox, not symlinks) goes even further: all commands use a single inode, the only extra space needed is for the directory entry (which you need anyway).

alerighi1y ago

Well that would be inefficient. For each command you run the kernel has to read the file, detect that it has a shebang, parse the shebang line, and then finally load the actual executable in memory. That could be a performance problem, since busybox is used typically in embedded systems that doesn't have a lot of resources: imagine a shell script that runs a command in a loop, it has to do a lot of extra work.

Finally, symlinks can be relative, while the solution you proposed is not. This is particularly useful for distributing software, e.g. distributing a tar file with the busybox itself and their symlinks.

In fact, you don't even need symlinks at all: you can even have hard links, that could even save disk space on embedded filesystems, that are readonly images anyway.

3 more replies

soneil1y ago

I was going to say it'd be easier to have a single script, eg

    #!/bin/sh
    busybox $0 $@

and then every command required could just be a hardlink to the same script, instead of replicating it over and over again for hardcoded command names.

Then I realised the whole point is to posit a world where $0 doesn't exist, and we're not allowed to be clever about it.

1 more reply

account421y ago

Are shebangs recursive? Otherwise this means that busybox can no longer provide /bin/sh.

pie_flavor1y ago

Another example program that reads argv[0] is Rustup, the version manager for Rust. Rust versions can be set per directory either as a machine-specific override or via a file. Rustup is symlinked to all the Rust commands like rustc and cargo, and when invoked as one of those commands, it checks what version it is supposed to be using, and then forwards to that version. I don't see how you'd do this without argv[0] (or a dozen slightly different pointlessly recompiled binaries).

wietzeOP1y ago

For busybox/toybox the argv[0] thing is great, and seems to be the prime example of why argv[0] shouldn't go - yet it is a bit of an anomaly in how argv[0] is used.

If there really is a need for having one executable that comprises multiple commands, is `busybox whoami` instead of `whoami` so much more effort? To me, that would make more sense in terms of what is going on; aliases could be used if one-word commands are preferred. In most non busybox contexts, argv[0] is just an unnecessary addition that, as the linked article shows, can introduce weirdness.

It's clear from the comments there are still many who think argv[0] is a good thing, which is great - I'm glad the post sparked this debate.

blenderob1y ago

> is `busybox whoami` instead of `whoami` so much more effort?

It's not the "more effort" that is the deal breaker here. It is a matter of compliance with specs and user expectations. What you're suggesting would make Busybox very non-POSIXy, very non-Unixy. All scripts written over the last many decades would need to be updated to call `busybox ls` instead of `ls`? How is that a viable solution?

> I'm glad the post sparked this debate.

This is a very strange way to deflect concerns about quality of the article!

2 more replies

sltkr1y ago

`busybox whoami` is probably fine, but having to write `busybox ls`, `busybox grep`, `busybox cp` etc. would get tedious quickly.

Shell aliases don't solve all problems, even if you do:

    alias rm="busybox rm"
    alias xargs="busybox xargs"
    # etc.

you still have to write `xargs -exec busybox rm`, because xargs won't use the shell alias.

But the main problem with this approach is that POSIX and LSB require certain binaries to be available at certain paths. When they're not, most shell scripts will just break.

The minimal standard solution is probably to create shell scripts for all of these, e.g. in /bin/ls:

    #!/bin/sh
    exec /bin/busybox ls

But this both adds runtime overhead (on every invocation!) and is quite wasteful in terms of disk space. Busybox boasts over 400 tools. At 4 KB per file, that's 1.6 MiB of just shell scripts. Of course that can be less if the file system uses some type of compression which is common on embedded systems where storage space is small, but it still seems to defeat the purpose of using busybox to create a minimal system.

1 more reply

mbrumlow1y ago

Yes. Anybody who has shipped software would say.

I really don’t think it is a debate. The usage of arg[0] is massively understated by the article. Just go look at gcc or any modern day compiler. Its use so much that the conversion of should we has been hashes out by many different groups yet they still chose to implement it.

The security concerns are a non issue. As arg[0] was not the problem. It was the lack of technical knowledge of how systems work and a flaw in the security application.

hinkley1y ago

I think you’re both forgetting that bash has been using this trick for decades.

Bash has an sh compatibility mode that runs when you invoke it as sh.

alerighi1y ago

Well of course it's not only a matter of interactive usage (even because the busybox itself shell could do the conversion). The problem are script, or worse programs that invokes commands as subprocesses (programs that maybe you don't have have access to the source code!).

What you do? Replace every single occurrence of each command by prefixing it `busybox`? Not ideal at all...

epcoa1y ago

https://pubs.opengroup.org/onlinepubs/9699919799/

You appear not to realize that busybox is an essential component of a POSIX like system.

jimrandomh1y ago

That's fine for when users are interactively typing commands, but it doesn't work when the command is being run by a non-busybox program which expects commands to exist in the standard locations.

Arch-TK1y ago

Restricting setting it would break login.

Not that it couldn't be fixed by changing how we handle login shells but still. Worth remembering.

Similarly the busybox situation could be solved by having busybox ship posix shell wrapper scripts which use `#!/bin/busybox sh` as the shebang and simply consist of a line like `exec /bin/busybox ls "$@"`.

nrclark1y ago

It can already do that, afaik. When I last checked, BusyBox supported installations via 4 methods:

  - symlink
  - hardlink
  - shell script wrappers
  - executable binary wrappers around libbusybox

1 more reply

zekica1y ago

You are over-complicating it, you only need `#!/bin/busybox ls` as the entire contents of the file.

1 more reply

kazinator1y ago

If hard linking (no symbolic) is used to install the BusyBox commands, then instead of argv[0], BusyBox could use the platform-specific means of obtaining the executable name, and take the basename of that path. On Linux this means /proc/self/exe; _NSGetExecutablePath on Drawin; getexecname on Solaris; GetModuleFilename on Windows; ...

mzs1y ago

https://github.com/util-linux/util-linux/blob/master/login-u...

edit: basically login(1) execes your shell with - prepended, so an example where POSIX expects this

marcosdumay1y ago

Is there a good reason for allowing writes to argv at all?

I think any reason one will find are based on backwards compatibility.

red_admiral1y ago

Yeah, never mind shutdown/reboot, has the author heard of busybox?

zokier1y ago

Isn't the reason for busybox multi-call binary mostly just ELF being bloated? So the answer for resource constrained systems would be to have more efficient executable format. I don't see why multi-call binary + bunch of symlinks would be intrisically much more size-efficient than something purpose-built.

yuliyp1y ago

a lot of code is shared between different tools. Busybox has one copy of those. Before you mention shared libraries. There is still overhead, as well as complicating the usage (it needs to find a shared lib when starting instead of just having all it needs in the binary). This isn't really a property of the executable format. Any format would have the same problem.

surajrmal1y ago

You can write a 2 liner shell script that prepends busybox per command. I've done this on a 16MiB restricted system and while ate maybe 4k per command, it wasn't a big deal with only 20-30 commands.

blenderob1y ago

What about compiled binaries that for one reason or another is doing an execve() on "/usr/bin/cmp" or some such thing? Do you propose changing every script and every binary on earth that expects Busybox to be a POSIXy, Unixy environment?

2 more replies

Hizonner1y ago

As the original author says (but seems to forget within a paragraph or two), the program should already know what program it is. If you're looking at argv to find out what program you are, you are doing it deeply wrong. It's an argument.

One good use for it is to make a guess as to where your executable is installed. Yes, it would be nice if there were a more certain way to get that... but not for security purposes. You don't want to rely on filenames for security anyway, because anybody can make copies and symlinks and rename files at will, and it's really, really hard to catch all the cases of that. Much harder than, for instance, remembering that argv[0] is a hint from your caller, not gospel from the OS.

In the same way, I know that it's fashionable nowadays for incompetent idiots to write security tools, but a security tool that trusts an argv value for anything much was obviously written by an incompetent idiot, because that's not what they're for.

mariusor1y ago

Am I missing something, you didn't seem to address the case where you actually need to know which program you are? The way busybox provides the whole suite of linux-utils in one binary and require the command under which it was invoked to know what to do.

1 more reply

avidiax1y ago

It is sometimes used to allow one binary to be the symlink target of hundreds of commands.

Android does this for most common shell commands. Toybox and busybox are examples of such implementations.

https://github.com/landley/toybox

https://en.m.wikipedia.org/wiki/BusyBox

cubist_castle1y ago

I just learned that rustup/rustc/cargo etc. work like this too. I couldn't understand why the gentoo formula was symlinking the same binary to a bunch of aliases.

kbolino1y ago

On my system, these are hardlinks (regular files with a link count >1 and the same inode) rather than symlinks, though I'm not sure why.

1 more reply

alerighi1y ago

And that makes a lot of sense, especially for binaries that are statically linked (as usually are Rust binaries), since that could save a lot of disk space!

duped1y ago

clang does this too.

mistercow1y ago

Also if you want a program to call itself, which is sometimes useful, this way lets you actually call the same program, rather than assuming the name and path.

duped1y ago

Don't do this - if you (reliably) want the path to the current executable there is no portable way to do it, but on Linux you need to readlink /proc/self/exe and on MacOS you call _NSGetExecutablePath. I forget the API on Windows.

4 more replies

SoftTalker1y ago

There's no guarantee that the name and the path are still the same executable that is running, or that they even exist anymore.

3 more replies

akira25011y ago

Beware TOC TOU problems when doing this.

fallingsquirrel1y ago

You can do this without assuming the name by execing /proc/$PID/exe. Then you're not vulnerable to the argv[0] spoofing described in the article. (But of course since argv[0] does exist, you should set it properly and pass through your own argv[0] unchanged.)

2 more replies

hi-v-rocknroll1y ago

coreutils-static did this too. The advantage of shared libraries and multiple-use single static binaries is they're only loaded once.

layer81y ago

The article discusses this.

travisgriggs1y ago

> “Should a program be allowed to behave differently based on its name?”

I don’t see why not. It’s allowed to behave differently based on the arguments that follow it. I personally think the genericity of including the program name itself as one of its own calling arguments is really meta cool.

SoftTalker1y ago

One other historical reason for this (also the reason that older unix utilities tend to have such short names) is that people often interacted with unix machines over slow terminals or even paper teletypes. Typing "rm" instead of "remove" or "reboot" instead of "systemctl --reboot" was legitimately more convenient.

Arch-TK1y ago

I mean, it's still more convenient to type `rm` rather than `Remove-Item` when doing day-to-day computer tasks on your computer (yes I'm one of those people who lives in a terminal).

It's also certainly better from a readability standpoint to have `Remove-Item` rather than `rm` in a script.

Likewise, I would much rather type `ls -Al` rather than `ls --almost-all --long-listing` (N.B. --long-listing is not the long option for -l, -l has no long option, I just made up an appropriate name) when listing a directory but would probably appreciate the long form in a script.

I think just like we have long options and short options, it would be helpful to have long commands and short commands.

1 more reply

ForOldHack1y ago

As someone who started on ASR-33s. I have empathy for Mr Ritchie and Mr Kernigan.

After all, its 50% faster to type a two letter acronym, than a TLA.

https://media.wired.com/photos/59327efdf682204f73696446/mast...

Too1y ago

If i download a new version of foo and rename my old version to foo_old_backup_2, should foo_old_backup_2 start behaving differently, just because it has a different name? NO THANKS!

A program should be sandboxed from its environment, including how the user started it. How a user names and organizes his files is a matter between the user and the operating system, not something individual program should care about.

account421y ago

A lot of programs actually do need support files at specific locations (either full paths or relative to the executable) so you already don't get to abitrarily organize your program binaries any way you want (without adjusting the programs).

shermantanktop1y ago

It’s the equivalent of the HTTP Host header, with similar utility. But I agree with the author that an OS provided trustable structure is a much better way.

marcosdumay1y ago

> an OS provided trustable structure

Repeating the OP, your program takes every other parameter from the caller, why do you insist on the executable name to not be set by him too?

Windows defender is the one that is stupid by using it. Every OS has the real executable name in some place, security software should look there instead.

strawhatguy1y ago

Yes, this is useful for backwards compat too, like bash with an 'sh' mode.

dheera1y ago

There are also multiple reasons for a program to read its own executable.

- Decompressing and inflating a compressed binary block with a generic decompressor at the top (e.g. a bash or python script with a binary blob at the end)

- Checksumming its own executable (skipping the checksum string) to resist virus infection. Not bulletproof but viruses aren't usually smart enough to circumvent this

alkonaut1y ago

Really the weirdness isn't that main is invoked with the program name as argv[0]. The weirdness comes not in main() but in execv. Shouldn't execv have just taken the user provided arguments, prepended the program name (as provided by the OS) and then invoke the main function of the program with that array?

The busybox argument or shutdown/reboot explains why the name of a symlinked binary is helpful as argv[0]. But does the busybox/shutdown case explain why the execv lets the user set the argv[0] value to anything other than what the path says?

MPSimmons1y ago

If not, then busybox is going to need to change a TON

dataflow1y ago

> I don’t see why not. It’s allowed to behave differently based on the arguments that follow it.

That's missing the point, I think.

The real question here is, is the name of a program really an argument to the program, from the user's perspective? I certainly don't blame users that disagree. It's more difficult for them to change argv[0], and the fact that this is possible is not necessarily obvious to them, nor to their users.

If it helps, think of it like this: imagine the file timestamp was similarly passed as argv[-1]. And that the file inode number was passed as argv[-2]. Would it make sense to change behavior on those too?

sokoloff1y ago

> is the name of a program really an argument to the program, from the user's perspective?

When I use busybox [invisibly to me], I sure care that it knows whether I called it as "ls" or as "rm" and that it does the operation that I asked it to do.

theamk1y ago

That's a weird take against argv[0] - all arguments are: "goes against modern design principles" and "can confuse programs which use argv[0] when they wanted "exec" instead"

For the former, I don't see how this goes against modern principles - in presence of symlinks, it is pretty reasonable to want to know both "how was this program called", as well as "what's the actual executable we ended up with". And this does more than just giving multiple names to same program - for example python uses argv[0] to tell if it's inside virtualenv and adjust search paths accordingly. This makes it appear like there are multiple python installs on system, with no extra disk space taken.

For the latter, yes, programs can have bugs and OSes can have non-obvious semantics, and if you are security software, it's very important to be aware about them. I would not mark "argv[0]" as something especially bad from security perspective. All the author's examples would still be possible in hypothetical world where argv[0] is set by system - as nothing stops user from creating a symlink in temporary dir with deceiving name (spaces and quotes are OK in filenames!) and exec'ing it directly. Instead, fix your security software so it quotes argv values?

cedilla1y ago

> all arguments are: "goes against modern design principles"

And the key witness is systemd, which is too young to buy a beer - even in Germany.

wietzeOP1y ago

From a living-of-the-land perspective, having to symlink/hardlink/alias a command is much noisier - and thus easier to detect. So although you are right in saying it wouldn't completely solve the problem, making it a system responsibility would still significantly reduce the scope for abuse.

jrockway1y ago

I think argv[0] is fine. It sounds like there is a lot of bad security scanning software that doesn't understand how the `exec` syscall works. That sounds like their problem and not a fundamental problem with argv[0].

Most people use argv[0] so they can do something like:

   $ mycommand help
   Type `mycommand foo bar` to foo bars.

   $ mycommand1.2.3 help
   Type `mycommand1.2.3 foo bar` to foo bars.

This is admittedly less fun when mycommand is /home/jrockway/.cache/bazel/_bazel_jrockway/7f95bd5e6dcc2e75a861133ddc7aee82/execroot/_main/bazel-out/k8-fastbuild/mycommand/mycommand_/mycommand` however.

yencabulator1y ago

I routinely use basename(arg0) in my programs.

denysvitali1y ago

I don't think argv[0] includes the full path (or at least some programming language strip the whole path and keep only the last part)

denysvitali1y ago

I stand corrected - C, Go and Python are all consistent here and show the full path.

I seem to recall there was a language that only provided the stripped part - but I guess my memory is failing me here. Sorry for the wrong information above.

2 more replies

kelsey987654311y ago

This is how busybox works in 'shim' mode. I am not however concerned with the security argument here, if you have the ability to run code you have the ability to do n to the power of x insidious things, and arg[0] abuse is just one of dozens, (hundreds?) of vectors or useful building blocks in an attack. if we are suddenly giving a shit about security on nixens, we should be looking at deeper SELinux rollouts (ease of use for sysadmins and maintainers so we never see permissive mode instead of just applying the difficult to remember command that will patch your policy settings. We need root capabilities to continue to be separated in the kernel access control scheme and probably we need to start using namespaces much more liberally like projects like silverblue/bluefin which reimplement entire os stack as a series of containers. Stronger container foundations and ease of use for existing security mechanisms will take us much further than worrying about ANYTHING else in the ABI which by the way will never change as long as linus is alive, and he will live on forever as an LLM most likely with the amount of mailing list posts he has made over the years.

js21y ago

> Today however, disk space is no longer considered an issue; this is evidenced by macOS Sonoma, where shutdown and reboot are two separate executables.

Try running `ls -li /usr/bin` on macOS and you might be surprised to learn that all of these are a single executable: DeRez, GetFileInfo, Rez, SetFile, SplitForks, ar, as, asa, ... yacc. There's 77 different entries in `/usr/bin` (including `git` and `python3`) that are all links to the same binary (`com.apple.dt.xcode_select.tool-shim`). It's a wrapper that implements the `xcode-select` concept to locate and run the real executable provided by either the Command Line Tools package or a particular Xcode version you may have installed.

And that's not the only one. There's another 68 links starting with `binhex.pl` and ending with `zipdetails` that are a single 811 byte wrapper-script around perl.

Altogether, I see that there are 26 different names that are multiply linked:

  ls -li /usr/bin |
  awk '{print $1}' | 
  sort | uniq -c | 
  sort -n | grep -v "\s*1\s" | wc -l

Some of the other examples: less & more, bc & dc, atrm & batch, stat & readlink.

Having a program behave dynamically based on argv[0] is a useful tool in the Unix toolbox. The alternative would be compiling 77 different versions of `tool-shim,` creating 68 different versions of that perl wrapper, etc.

The `git` binary uses this concept too. You can create an executable named `git-foo`, put it anywhere in your PATH, and then call it as `git foo`.

In the end, argv[0] is just an argument that can be used to improve CLI ergonomics and reduce code duplication. It's not solely about disk space. I think that makes it a more common and useful concept than you give it credit for.

As to the rest of the post: I'm not really sure how argv[0] being in the caller's control is any different than the rest of the execution context being in the caller's control: the remaining arguments, the environment, limits on file descriptors, which file descriptors are open, the program's real and effective uid and gid, signals it might receive and so on. These all amount to untrusted input any executable has to be cognizant of, more or less so depending upon what privileges the executable has and what its goals are.

cryptonector1y ago

Besides, disk space is not an issue, but container image size still can be an issue because those have to be copied around the network, and it's easy to have thousands of 10GB images consume more disk space than you might have thought you'd need.

tsujamin1y ago

Surely the duplication would be (mostly) compressed away?

dcminter1y ago

This lost me at "goes against modern design principles" without citing what principle(s) the author had in mind that would proscribe it.

st_goliath1y ago

Given the tone and assumptions the article makes, and the things that are explicitly explained, this seems to be one of those articles where a novice learnt something new and then decided to write an article about it, despite not having fully grasped the concept yet.

As a result, the author has such strange, absolute positions, calling it a legacy that should be abolished (only tangentially knowing some actual use cases), or that strange quote about design principles.

Despite all the talk about security, the whole debacle that argc can be 0 (and argv[0] can be NULL), is completely left aside. This has caused actual security issues quite recently[1].

[1] https://lwn.net/Articles/882799/

gwbas1c1y ago

The security issues the author points out later in the article do have merit.

Unfortunately, the author shot their credibility in the foot by perseverating on use of argv[0]; instead of glossing over it and getting to the point.

rpcope11y ago

Almost any time someone uses the words "legacy" or "modern" in the context of computers, it's a giveaway to me almost always someone has an axe to grind with few or no real deep substantive reasons. I typically read these as:

"legacy" -> anything that has existed for more than a day that I don't understand and don't like that stops me from poorly reinventing the wheel

"modern" -> anything that I dreamt up or heard some other hipster talk about recently that I got hyped about

hiccuphippo1y ago

I would guess the modern principle of disregard for disk space or memory usage :(

astrobe_1y ago

the Modern principle prescribe that one should never use software that's older than yourself. Some cults even prescribe that that one should not use a framework longer than you would wear a pair of socks.

lanstin1y ago

This article seems to be an example of how some common security practices are kind of surface level. If you want to limit what a box can access on the network, do it in the network. Why is security looking for bad urls in the argv; if you know they are bad just block them? Or better yet if they aren't good, don't allow them. And if you want to know what a process is doing, ask the kernel to log its syscalls. If you take away argv 0 you will lose some valuable stuff (cute little busybox links, error logs that have argv[0] in them, and attackers will just name payload.exe ls.exe. And if your network is allow all, they will still reach CNC or collector end point.

dividuum1y ago

Seriously: Their reason is basically "argv[0] is bad because security snake oil software is garbage":

1) Oh no, the only protection is looking at argv[0]. What kind of clown software is that? Software that notably runs on an already compromised system..

2) No need for argv[0] to fool software that concats argv values with spaces: just run 'curl -o "test.txt |grep" 1.1.1.1'

3) A long argument messes up telemetry? Let's hope that bucket doesn't have more holes.

shermantanktop1y ago

These are all very realistic examples. Should they happen? No, but reality is messy and imperfect. The crappy software you describe would not exist if there were great solutions in this space.

2 more replies

skobes1y ago

"Windows’ own API calls for creating new processes (such as CreateProcess [6], ShellExecute [7]) do not allow you to set argv[0]: it sets it for you, based on how the path to the executable was provided."

Isn't this contradicted by the docs? CreateProcess receives lpApplicationName and lpCommandLine, and they can be different.

DSMan1952761y ago

Yeah they have this incorrect. if you provide `lpApplicationName` and `lpCommandLine` then the application name is not automatically added to the command line string, you have to add it yourself to the string provided as `lpCommandLine`. I checked and the docs for `CreateProcess` briefly mention this issue:

> If both lpApplicationName and lpCommandLine are non-NULL, the null-terminated string pointed to by lpApplicationName specifies the module to execute, and the null-terminated string pointed to by lpCommandLine specifies the command line. The new process can use GetCommandLine to retrieve the entire command line. Console processes written in C can use the argc and argv arguments to parse the command line. _Because argv[0] is the module name, C programmers generally repeat the module name as the first token in the command line._

magicalhippo1y ago

Not the way I understand it. In the execv documentation[1], you pass the program name twice:

int execv(const char *path, char *const argv[]);

The argument path points to a pathname that identifies the new process image file.

The argument argv is an array of character pointers to null-terminated strings. [..] The value in argv[0] should point to a filename string that is associated with the process being started by one of the exec functions.

Windows does not allow you to do that, AFAIK.

[1]: https://pubs.opengroup.org/onlinepubs/9699919799/functions/e...

skobes1y ago

> Windows does not allow you to do that, AFAIK.

It does though, using the lpCommandLine parameter to CreateProcess as I said.

CreateProcess("main.exe", "foobar", ...)

argv[0] is "foobar"

1 more reply

halayli1y ago

> This seems like a questionable design decision. Should a program be allowed to behave differently based on its name? From a 2020s standpoint, this seems highly undesirable, as it makes software less predictable and goes against modern design principles.

No it doesn't make software less predictable nor does it goes against modern design principles. argv has very handy use cases and can be used to provide better user experience.

Unless you have evidence to back up your claims, you're just turning a subjective opinion to an objective one without any merit.

Either way, it's software developer choice and irrelevant to the user as much as it is irrelevant to the user whether the developer prefers for(;;) over while(1).

JohnFen1y ago

> Today however, disk space is no longer considered an issue

On desktop machines, perhaps, but this is certainly not true on all platforms Linux runs on.

Suppafly1y ago

Plus the whole "space is not an issue" thing along with "you can just add more ram" is the reason everything is so bloated and slow even on well provisioned machines.

Sohcahtoa821y ago

These days, Windows Calculator takes up more memory than mIRC.

Tell me why a simple calculator app needs more memory than a complete multi-server implementation of the IRC protocol (including SSL/TLS), not to mention a full scripting engine.

1 more reply

citrin_ru1y ago

SSD relatively recently were not so big (compare to HDD with comparable price) and space is an issue on not so new desktops. I don't want to upgrade a notebook only because someone thinks that disk space is not an issue.

But of course this much more of an issue for embedded platforms like routers with OpenWRT.

Hizonner1y ago

No, not on desktop machines either. Executables these days can be enormous.

The author's just dumb.

blenderob1y ago

> argv[0] is a relic of the past

Busybox says hello.

Seriously though, how is this on the front page? Both the premise and conclusions contradict the reality of how argv[0] is used with symbolic links and hard links.

josefx1y ago

Microsoft defender using broken by design detection rules? One could almost think it is an anti virus program.

dotancohen1y ago

I also use argv[0] for the -h help text, to show examples how to use the command.

st_goliath1y ago

There is also a neat little BSD extension, also supported on a number of other Unix-like systems and GNU userspace (i.e. glibc, but also other libcs like Musl):

    extern char *__progname;

which holds the program name without the (optional) invocation path in front of it. Basically the last path component of argv[0].

dotancohen1y ago

Nice, thank you.

Brian_K_White1y ago

wattttt? nice thank you

anonymousiam1y ago

I've done this too, but you should remove the path elements from the argv[0] string before you include it in your error/help messages.

Brian_K_White1y ago

Sometimes you want it, sometimes you don't, so it needs to be in there, and sometimes you remove it yourself if your context of the moment doesn't want it.

And neither the want-it nor the don't-want-it case is such an outlier that you can disregard and not serve that case.

Sometimes you're talking to the user about general usage and the full path is a distracting detail and not the important part of the message.

Sometimes the full path and truthful invoked filename are an unnecessary security disclosure like telling a web viewer details about the server.

Sometimes the full path and truthful invoked filename is a necessary fact in debugging, or in errors, or even ordinary non-error logs that aren't public.

jmholla1y ago

You don't need to. Keeping them shows the user exactly how to call the program based on how they called it.

patrickmay1y ago

Same here, as well as for showing an example invocation when the user fails to include a required argument.

kelnos1y ago

> From a 2020s standpoint, this seems highly undesirable, as it makes software less predictable and goes against modern design principles.

Says who? I'm not aware of any modern design principles that say anything about this sort of thing.

> argv[0] is ignored (mostly)

Pretty much any program I've written that has a --help option uses argv[0] to print out the usage string, i.e.:

    printf("%s [--some-arg] FILENAME\n", argv[0]);

> First off, argv[0] can be used to fool security software

Then that security software is poorly written. On Linux, the correct way to find the binary of a running process is by calling readlink(2) on /proc/$PID/exe. Assuming security software like this is going to have a lot of OS-specific code, it seems fine to me to expect they use it (and then have to do other things on other OSes).

> Another argument against this design is that if you have two programs that are so similar that it pays off to consolidate them into a single file, is there really a need for two separate programs/program names?

The author is talking about shutdown and restart being symlinks to systemctl on systemd-based systems. But what about something like busybox? busybox contains hundreds of programs, all conveniently in a single, statically-linked binary. On my system it's about 800kB. While I agree that even 250MB is not a big deal for most systems these days, it certainly is a problem for, say, a WiFi router that only has 8MB of flash.

> Ultimately, nobody wants to be bothered by argv[0].

False. I find it useful, and am not "bothered" by it at all. And I suspect security folks aren't really bothered either: the ones that actually know what they're doing look at /proc/$PID/exe when they want to find the binary backing a PID.

This article is kinda lame, and it seems like the author's objections are mostly based on ignorance.

remram1y ago

> From a 2020s standpoint, this seems highly undesirable, as it makes software less predictable and goes against modern design principles.

This is not an argument at all, this is a statement that arguments exist. What are they?

It's like saying we shouldn't do something because it's "against best practices". I'm asking why are other practices preferred...

andrewmcwatters1y ago

I wish amateurs would stop propagating the false idea that disk space and memory are cheap and not a problem.

layer81y ago

If nothing else, argv[0] is useful for producing error messages that indicate the name of the executable that is outputting the message.

It's probably a good idea to not have it settable to other values by the invoking process, as is generally the case on Windows (ignoring its Posix subsystem here).

alerighi1y ago

> It's probably a good idea to not have it settable to other values by the invoking process, as is generally the case on Windows (ignoring its Posix subsystem here).

Well there is an use case that I sometime use for setting argv[0]. Consider you want to run yourself as a subprocess. Why you want to do that? There are plenty of reasons, but in general the thing is that doing things after a fork() is not safe under some circumstances and thus sometimes you also want to exec yourself.

A technique is to then call yourself using another name in argv[0] for then in the main take a different flow from the normal command line parting, without adding an argument that the user can specify if it know that it exists.

Yes, I know that there are a ton of other methods to do the same thing (perhaps an environment variable, for example), but I find the method of argv[0] quite nice and simple to be fair.

account421y ago

You can set the full command line on windows independently from the program path using the standard Win32 CreateProcess(Ex) functions. This includes the part that ends up in argv[0] with your usual C runtime (Windows itself only provides a string and leaves it up to the program to split into arguments which may or may not use the standard CommandLineToArgv* functions - the standard C runtime doesn't and has slightly different escaping rules).

CamJN1y ago

This is near and dear to my heart. I wanted to make a utility to get the arguments of other processes, and found after looking that every single use of the KERN_PROCARGS2 sysctl (used on macOS) on the internet is wrong (they assume argv[0] is not an empty string), including Apple's and Google's. So after making my utility I also made a library out of it, both are bsd-3, but non-gratis: https://getargv.narzt.cam/

CamJN1y ago

On a related note, env vars do not have to be of the form key=value, they are arbitrary NUL-terminated byte strings just like the args.

account421y ago

Is this a parody?

cryptonector1y ago

Please no. If you want to know what a process is running, look carefully in `/proc` or use `lsof` or whatever, but no, please, `argv[0]` is super useful. I use it, lots of people use it. And it's well known that pstrings can be abused to hide things from `ps`, but so what, it's been that way for 4+ decades and it's a well-known "problem" (it's not a problem).

sph1y ago

What a silly post. I use argv[0] in my host-spawn tool (https://github.com/1player/host-spawn) so one can symlink it to a name inside a container and when you run it, it's executed on the host.

    # Inside your container:

    $ flatpak --version
    zsh: command not found: flatpak

    # Have host-spawn handle any flatpak command
    $ ln -s /usr/local/bin/host-spawn /usr/local/bin/flatpak

    # Now flatpak will always be executed on the host
    $ flatpak --version
    Flatpak 1.12.7

I am able to tell the symlink name by reading argv[0] to know which command to run. It is such a powerful and neat UNIX trick that has no simple alternative (in this example one would have to write ad-hoc shell scripts for each command they want to run)

4star3star1y ago

The post is silly because you wrote software that makes use of argv[0]? On the contrary, it opens a discussion about unintended security implications that might be avoided in the future if command line implementation can be reconsidered.

tqwhite1y ago

argv[0] is a parameter. Like any user input, it should be treated skeptically. There is absolutely nothing wrong with allowing more than one way to invoke the same program. This article is simply silly. Fortunately, it will be ignored completely since acting on it would break the universe.

jujube31y ago

Problem: virus scanning software on Windows is broken.

Solution: we should not use argv[0]?

hi-v-rocknroll1y ago

Arguing against legacy quirks is arguing against compatibility and arguing for throwing away decades of code portability guarantees through 20/20 hindsight perfectionism failing to consider the costs and burdens of reimagining the world with bikeshedding rants.

kazinator1y ago

The author doesn't seem to understand that argv[0] can be different due to, for instance, one executable implementing many programs, such as BusyBox and similar projects.

While argv[0] is old, if you had to design it from scratch to day, it would still be a good idea to have the program invocation name as an argument.

The idea that anything old must is historic quirk that we can today eliminate is flawed.

Now argv[0] should not be relied upon for obtaining the executable name, except as a last resort if the program is built for platforms that don't have anything else. But if one executable has multiple program names via symlinks, only argv[0] will distinguish them.

Brian_K_White1y ago

This is stupid. argv0 is just some data like any other data.

It's ridiculously useful aside from the obvious busybox style usage.

It's huge to be able to have a pointer to the directory where the executable resides, so you can package other assets along side it and have it all work for free without a seperate configuration file or env variables etc.

Or for debugging or even non-error logging. You might call a binary from more than one place by other means than symlinks or hard links. You might be running from different mounted filsystems, chroot or container environments etc. A symlink might be in the middle of the path and not the executable name itself. Similarly a mount point.

It's just a random small useful tool like all others. Calling it some kind of security problem is like saying that screwdrivers are a security problem because aside from turning screws, some people can use screwdrivers to stab people, and we have nut drivers which can almost serve almost all the same needs for only a little extra work.

If your context of the moment means you have a security concern where you shouldn't trust this bit of data as gospel for some reason, then don't. Treat it like user input and take whatever precautions and fallback measures and sanity checks make sense for you in whatever particular situation you are in.

F-ing dumb.

iainmerrick1y ago

Yes! I was surprised how far down I had to scroll to find somebody mentioning that one.

How else can you write a reasonably robust script that actually, you know, does something? You almost always need to grab some known files by their paths relative to the script.

account421y ago

argv[0] isn't actually a great solution for that since it isn't required to contain any path and in practice won't depending how the program was called.

Some languages don't provide a better solution but they should. For Bash there is ${BASH_SOURCE[0]}. For compiled executables you use OS-provided functions like GetModuleFileName(NULL, ...) on Windows and readlink("/proc/self/exe", ...) on Linux.

1 more reply

PaulHoule1y ago

It’s part of the shambolic world of Unix and C. But “worse is better!”

A good language spec is laid out in a way that reads from front to back with minimized circularity. See Common Lisp, Java, Python, etc.

As a kid in high school checking out Unix manuals and implementing many Unix tools in

https://subethasoftware.com/2022/09/27/exploring-1984-os-9-o...

I struggled with K&R because of the circularity of the book, which was really an anomaly built into C, the culture of C, or both because C++ books still read this way. C had so many half-baked things, such as an otherwise clean parser that required access to the symbol table. And of course a general fast and looseness which lead to the buffer overflow problem.

There were other languages which failed to solve the systems programming problem like PL/I and Ada, not to mention ISO Pascal which could have tried but didn’t. (Turbo Pascal proved it could have been done.)

People took until 1990 or so to be able to write good language specs consistently, so we can forgive Unix but boy is it awful if you look closely at it. On the other hand, IBM never did make a universal OS for the “universal” 360, yet Unix proved to be adaptable for almost everything.

zabzonk1y ago

i may have missed it, but where does the C Standard say anything about access to a symbol table? or even if such a thing exists.

and as for IBM i managed to use all sorts of OSs in VMs on IBM hardware back in the 1980s. Which did you have problems with?

PaulHoule1y ago

The parser in C has to keep track of the symbol table to handle cases like

   typdef int myint;
   myint x;

which is unusual among programming languages. Sure I used VM on IBM hardware in the 1980s and it was great. I also used timesharing systems on the PDP-8 (what atrocious hardware!), the PDP-11 and the PDP-10/20 in the 1970s. Although the 360 was superior in so many respects (except for the slow interrupt handling) it failed to break into the huge market for general-purpose timesharing to support software development and such (learning BASIC) until the time microcomputers came along and crushed the timesharing market. (PDP-10 was famously used to develop microcomputer software such as the original Microsoft BASIC and Infocom's z-machine games)

Fred Brooks' project to develop an OS for the 360 was notoriously troubled and IBM belatedly turned to VM as a dark horse. Today it looks ahead of its time (as virtualization became mainstream on x86 in the 00's) but back then IBM was flailing and they wound up with a good software story by accident. It was not really their fault, people just didn't know how to make an OS and the most advanced thinking back then was monstrosities like MULTICS. It was Unix and VAX/VMS that pointed to what a general purpose OS would look like a few years later and there has been relatively little innovation since then because nobody can afford to rearchitect the user space. (e.g. no way you can take out the "bloat" because you'll have to put it back in to run the software you want)

IBM's z-architecture (the other z) has a great software story today (even runs Linux) but it was not the Plan A or even the Plan B.

2 more replies

linsomniac1y ago

"Security" software that trusts /proc/cmdline (and the like), and in particular if it doesn't complain about /proc/cmdline having a mismatch with /proc/exe, doesn't seem like very useful security software to me. Particularly if it's security software that is making some security decisions based on argv[0].

Seems like this security software is broken, not argv[0]

keepamovin1y ago

This is why we can't have nice things. Security footguns everywhere!

I'm fascinated by the intersection of argv[0], and the execve behavior of replacing the calling program with the called one.

Aside from that, I quite like argv[0], for a much more limited set of reasons than considered in this interesting and comprehensive article. I like the ability to "retitle" a process to put a useful, descriptive, or branded name in there to be seen by ps, et al.

NodeJS also exposes this feature, but not quite as you might expect. Whereas in C, setting argv[0] from within the program's execution context will alter what is observed by ps, in NodeJS process.argv is just a descriptive getter. Setting its slots has no effect outside of its context.

But this is where process.title steps in. Setting process.title allows you to (in an OS-dependent way) change the name reported in ps and similar tools.

Please don't kill argv[0], its lease hath all too short a date

kelsey987654311y ago

Your fascination is rewarded by reading the other man sections such as section three:

https://linux.die.net/man/3/execve

If you already know about the additional man pages beyond user space, i cannot more strongly recommend diving into them. Additionally the gnu 'info coreutils' is a good place to start, as well as the glibc manual.

Dwedit1y ago

How about the part about knowing what the directory the executable was launched from? It could be different than the working directory.

thayne1y ago

> and (especially a few decades ago) can offer cross-platform/backwards syntax compatibility using a shared code base.

This is still very much an issue. For the shutdown and reboot case, the main reason those symlinks is exist is for backwards compatibility for existing programs and scripts (and muscle memory) that assume there is a shutdown or reboot command, and compatibility with systems that don't use systemd.

Another way to do that could be to use a shell script that execs systemctl, but that requires a separate intermediate shell process, which may have its own compatibility issues.

Another use of argv[0] that isn't discussed at all is putting a hyphen at the beginning of argv[0] for login shells. For example if bash is invoked as the login shell argv[0] is "-bash". That probably wasn't a great design decision, but changing it now would probably cause a lot of breakage.

Arch-TK1y ago

For a command line utility, argv[0] is nice to see in error messages (e.g. `./tool: fatal: Could not open './file' for reading`). When the shell combines stdout and stderr, it's easier to spot exactly what you just typed as argv[0] from all the other output.

For most other things, definitely unnecessary.

johnisgood1y ago

So wait, I should not use `argv` in C's main() or what?

Is it only speaking against `argv[0]` or `argv` in general?

What is this proposed solution if any?

What about `__progname`? The only issue here is that if `argv[0]` is a path, then `__progname` is only the filename. What if I want the path?

gwbas1c1y ago

The author's extensive criticisms of using argv[0] are a distraction from the main point of the article:

Summary: By manipulating argv[0], a malicious program can hide what its doing in security logs. For example, a malicious program can make "curl -T secret.txt 123.45.67.89" look like "curl localhost | grep -T secret.txt 123.45.67.89" in security logs. A mallicious program can also use very large argv[0] values as a DOS attack on system logging; or to truncate malicious arguments.

IMO, operating systems should block this practice.

Unfortunately, the author's extensive criticism of programs reading argv[0] hurt the author's credibility before most people get to the real point of the article.

account421y ago

> IMO, operating systems should block this practice.

The "look like" is not a problem with the OS but a problem with displaying an array as a space-separated string without sufficient quoting or escaping, making things ambiguous.

tantalor1y ago

The name of something is not an intrinsic property.

barelyauser1y ago

Can something posses extrinsic properties? Or are them a intrinsic property of external things?

samatman1y ago

Yes.

Not only is there a Wikipedia article on it, there's more than one.

Here's the one covering science and engineering, which is the appropriate version for this discussion.

https://en.wikipedia.org/wiki/Intrinsic_and_extrinsic_proper...

account421y ago

Are you asking if a property being intrinsic or not is an intrinsic property of that property?

guappa1y ago

Wait until he finds out about busybox!

Also claiming that the windows API to call a new process is good… wow… I guess he's never had to pass a filename with quotes and spaces in its name. The API expects you to do the escaping yourself. Yes it needs to be escaped, because it's all one single string.

pjc501y ago

There are a number of good things about CreateProcess, but argument passing is not one of them. It's a very longstanding misfeature in the design of CMD.EXE and almost certainly dates from MSDOS and therefore CP/M.

A side effect of that is that programs do their own unescaping. Unix users who are used to quotes being stripped for them may be surprised by this.

timrobinson3331y ago

Many windows programmers fail to appreciate this. If you're using a language that provides argv-style functionality, the quoting and escaping mechanism is entirely at the mercy of that language, so you can't reliably make any general assumptions about how to quote parameters to a command line

1 more reply

t435621y ago

arg0 also contains the path from where the invoker invoked the binary so for me this enables all sorts of binaries that work out where their dependencies are relative to their original binary. That's extremely convenient because you can combine it with $PWD to find out the absolute path to the binary.

One can then guess what the PYTHONPATH and LD_LIBRARY_PATH should be most of the time and save someone from having to set them.

Obviously this is of most use when you're running something you've installed into /opt (e.g. /opt/myprog/bin, /opt/myprog/lib etc) or are running it from the source tree.

account421y ago

> arg0 also contains the path from where the invoker invoked the binary

Not in general it doesn't. Convention for shells is to pass the string the user used to invoke the program which may be an absolute path, a relative path or just a filename resolved against $PATH.

> this enables all sorts of binaries that work out where their dependencies are relative to their original binary

You should use the OS-specific functions to retrieve the current executable path for that - GetModuleFileName(NULL, ...) on Windows and readlink("/proc/self/exe", ...) on Linux. For script look into your interpreter documentation - e.g. Bash has ${BASH_SOURCE[0]}. Unfortunately POSIX shell scripts are SOL and have to rely on $0 plus some $PATH searching.

JoyfulPanda1y ago

Holy moly, the article addresses argv[0] as the problem, while the real problem is that the snake oil industry has no clue what they are doing

suprjami1y ago

It's not often a self-promotion blog post has the entirety of HN telling you you're wrong. Better luck next time lol

anacrolix1y ago

I think the Unix philosophy wins here. It might not be a clean interface but let the implementations decide what to do with it. If you remove it you are more likely to cause issues and have to grow new interfaces elsewhere.

azlev1y ago

I don't think the argv was made with security in mind.

If we want something to be used in security field, the design since day 0 should consider it. Trying to retrofit something will break a lot of things.

nmz1y ago

Really strange that argv[0] has a basically unlimited character size while #! has a hardcodede 256 byte limit.

mannyv1y ago

"Remember, the safest computer is one that's turned off and unplugged."

omphaloskeptic1y ago

Also, on POSIX systems, exec-ing a program with argv[0] starting with ‘-‘ will have it start as a login shell, which is a whole rabbit hole of its own. I’m sure it’s within the security model (and the linked article doesn’t really discuss the concept of OS security models), but it’s still a pretty big shift in behaviour just from adding a character to the argv[0] value

fanf21y ago

No, that’s a property of how shells interpret argv[0], not a property of exec()

account421y ago

> This seems like a questionable design decision.

Nope.

> Should a program be allowed to behave differently based on its name?

Yes. The program can also inspect any other part of its environment, including the parent process. What makes sense to inspect here depends on the particular program in question. The symlink example is still useful today.

> From a 2020s standpoint, this seems highly undesirable

Nope.

> it makes software less predictable

It doesn't. It makes it more predictable if programs can easily provide compatibility interfaces. Yes, you could do the same with a wrapper but removing friction matters.

> and goes against modern design principles.

Then modern design priciples can take a hike.

> Today however, disk space is no longer considered an issue

It should be considered an issue though. I buy better hardware to get more use out of it, not for lazy developers to needlessly piss it all away.

This is just yet nother example of "securit" people trying to make their lifes easier by making other's lifes harder. And as usual it's only theater since almost all of the "exploits" apply to arguments as well which for many programs provide plenty opportunity to include arbitrary strings. Fix your tools instead of expecting the world to work around their limitations.

hinkley1y ago

> Today however, disk space is no longer considered an issue;

Tell me you don’t use Docker without telling me you don’t use Docker.

I’d argue the certutil problem the author mentions is a flaw in certutil, not argv’s fault. Doesn’t that mean it falls to symlinks as well?

If you look at sudo, it’s generally deny by default. Rename a program all you want, you won’t get to use it unless you can overwrite a program that is in the sudoer file. So I don’t know what nonsense certutil is playing at if it’s using argv to do its job. That’s appalling.

KingOfCoders1y ago

I use argv[0] to monitor the binary by itself and restart when it has changed.

actionfromafar1y ago

How? Checking and storing a checksum, or just file change metadata?

marcosdumay1y ago

Well, I'm not the GP , but probably with OS file change monitoring API, that changes for each OS but the maintream ones all have some.

nottorp1y ago

<Cough> Busybox.

There is life outside the enterprise security theater.

gorjusborg1y ago

Why bother asking?

_xiaz1y ago

L Take

mzs1y ago

"A login shell is one whose first character of argument zero is a -"

j / k navigate · click thread line to collapse

259 comments

yjftsjthsd-h1y ago

    ln /bin/curl ./some\ other\ name

but there are sometimes security measures that we use even though they're less than 100% effective so it at least conceivable that this might be a trade off worth making.

gwbas1c1y ago

I agree, I think the author really shot themselves in the foot when they, at length, criticized the merits of a program using argv[0].

The criticisms of valid programming practices, IMO, hurt the author's credibility and distract from the real point of the article.

nrdvana1y ago

It has to be writable because the entire argv string (in program memory) is writable and declared as

  int main(int argc, char **argv)

not

  int main(int argc, const char **argv)

and needs to preserve back-compat. Classic C code might be calling strtok on the arguments, so that block of memory needs to remain writable.

4 more replies

shadowgovt1y ago

I see a common anti-pattern in security researchers in that they can lose sight of the human beings who operate the software.

This does, of course, imply that the program name is non-constant untrusted data. Which means we shouldn't be making security software that depends on knowing that name.

rzwitserloot1y ago

That seems unnecessarily harsh.

toast01y ago

This was kind of in the middle of your complaint about windows, but then you've got unixy busybox discussion.

2 more replies

mbrumlow1y ago

> highlighting a problem exists

Coding bugs into your programs is not a problem it’s a bug. None of the weird arg[0] examples can happen on the shell (without escaping), only when using system calls.

1 more reply

ArchOversight1y ago

There's the `setproctitle` in FreeBSD that is designed exactly for a process to update the information that is presented to tools such as ps.

https://man.freebsd.org/cgi/man.cgi?query=setproctitle&aprop...

2 more replies

kelnos1y ago

> it's about highlighting a problem exists and that it's worth solving.

There is no problem here. The author is making a big deal about nothing, either because they have a weird axe to grind, or because they're ignorant.

Maxatar1y ago

>Show the actual file path, and always an absolute one

croes1y ago

It's easy to call something a mistake in hindsight.

You could argue the mistake was done elsewhere so this feature could be abused.

ahoka1y ago

“That is simple enough to solve (add an API call to update _your own process name_ or at least update your own process 'title' which interfaces like ps/taskman can use accordingly)“

We could call it setproctitle, or something. \s

pzmarzly1y ago

Not an author, but there's a good alternative. If busybox was edited to ignore argv[2], then applets could be called via shebangs, instead of symlinks:

    $ echo '#!/path/to/busybox echo' > myecho
    $ chmod +x myecho
    $ ./myecho 123
    ./myecho 123

Right now this doesn't work properly, because "./myecho" (argv[0]) gets placed into argv[2] of the process. Otherwise, this technique IMHO is better than symlinks:

- Each applet uses the same amount of disk space (0 blocks, i.e. the content fits into inode).

- Doesn't read or write to argv[0].

cesarb1y ago

> Each applet uses the same amount of disk space (0 blocks, i.e. the content fits into inode).

alerighi1y ago

In fact, you don't even need symlinks at all: you can even have hard links, that could even save disk space on embedded filesystems, that are readonly images anyway.

3 more replies

soneil1y ago

I was going to say it'd be easier to have a single script, eg

    #!/bin/sh
    busybox $0 $@

and then every command required could just be a hardlink to the same script, instead of replicating it over and over again for hardcoded command names.

Then I realised the whole point is to posit a world where $0 doesn't exist, and we're not allowed to be clever about it.

1 more reply

account421y ago

Are shebangs recursive? Otherwise this means that busybox can no longer provide /bin/sh.

pie_flavor1y ago

wietzeOP1y ago

For busybox/toybox the argv[0] thing is great, and seems to be the prime example of why argv[0] shouldn't go - yet it is a bit of an anomaly in how argv[0] is used.

It's clear from the comments there are still many who think argv[0] is a good thing, which is great - I'm glad the post sparked this debate.

blenderob1y ago

> is `busybox whoami` instead of `whoami` so much more effort?

> I'm glad the post sparked this debate.

This is a very strange way to deflect concerns about quality of the article!

2 more replies

sltkr1y ago

`busybox whoami` is probably fine, but having to write `busybox ls`, `busybox grep`, `busybox cp` etc. would get tedious quickly.

Shell aliases don't solve all problems, even if you do:

    alias rm="busybox rm"
    alias xargs="busybox xargs"
    # etc.

you still have to write `xargs -exec busybox rm`, because xargs won't use the shell alias.

But the main problem with this approach is that POSIX and LSB require certain binaries to be available at certain paths. When they're not, most shell scripts will just break.

The minimal standard solution is probably to create shell scripts for all of these, e.g. in /bin/ls:

    #!/bin/sh
    exec /bin/busybox ls

1 more reply

mbrumlow1y ago

Yes. Anybody who has shipped software would say.

The security concerns are a non issue. As arg[0] was not the problem. It was the lack of technical knowledge of how systems work and a flaw in the security application.

hinkley1y ago

I think you’re both forgetting that bash has been using this trick for decades.

Bash has an sh compatibility mode that runs when you invoke it as sh.

alerighi1y ago

What you do? Replace every single occurrence of each command by prefixing it `busybox`? Not ideal at all...

epcoa1y ago

https://pubs.opengroup.org/onlinepubs/9699919799/

You appear not to realize that busybox is an essential component of a POSIX like system.

jimrandomh1y ago

That's fine for when users are interactively typing commands, but it doesn't work when the command is being run by a non-busybox program which expects commands to exist in the standard locations.

Arch-TK1y ago

Restricting setting it would break login.

Not that it couldn't be fixed by changing how we handle login shells but still. Worth remembering.

nrclark1y ago

It can already do that, afaik. When I last checked, BusyBox supported installations via 4 methods:

  - symlink
  - hardlink
  - shell script wrappers
  - executable binary wrappers around libbusybox

1 more reply

zekica1y ago

You are over-complicating it, you only need `#!/bin/busybox ls` as the entire contents of the file.

1 more reply

kazinator1y ago

mzs1y ago

https://github.com/util-linux/util-linux/blob/master/login-u...

edit: basically login(1) execes your shell with - prepended, so an example where POSIX expects this

marcosdumay1y ago

Is there a good reason for allowing writes to argv at all?

I think any reason one will find are based on backwards compatibility.

red_admiral1y ago

Yeah, never mind shutdown/reboot, has the author heard of busybox?

zokier1y ago

yuliyp1y ago

surajrmal1y ago

You can write a 2 liner shell script that prepends busybox per command. I've done this on a 16MiB restricted system and while ate maybe 4k per command, it wasn't a big deal with only 20-30 commands.

blenderob1y ago

2 more replies

Hizonner1y ago

mariusor1y ago

1 more reply

avidiax1y ago

It is sometimes used to allow one binary to be the symlink target of hundreds of commands.

Android does this for most common shell commands. Toybox and busybox are examples of such implementations.

https://github.com/landley/toybox

https://en.m.wikipedia.org/wiki/BusyBox

cubist_castle1y ago

I just learned that rustup/rustc/cargo etc. work like this too. I couldn't understand why the gentoo formula was symlinking the same binary to a bunch of aliases.

kbolino1y ago

On my system, these are hardlinks (regular files with a link count >1 and the same inode) rather than symlinks, though I'm not sure why.

1 more reply

alerighi1y ago

And that makes a lot of sense, especially for binaries that are statically linked (as usually are Rust binaries), since that could save a lot of disk space!

duped1y ago

clang does this too.

mistercow1y ago

Also if you want a program to call itself, which is sometimes useful, this way lets you actually call the same program, rather than assuming the name and path.

duped1y ago

4 more replies

SoftTalker1y ago

There's no guarantee that the name and the path are still the same executable that is running, or that they even exist anymore.

3 more replies

akira25011y ago

Beware TOC TOU problems when doing this.

fallingsquirrel1y ago

2 more replies

hi-v-rocknroll1y ago

coreutils-static did this too. The advantage of shared libraries and multiple-use single static binaries is they're only loaded once.

layer81y ago

The article discusses this.

travisgriggs1y ago

> “Should a program be allowed to behave differently based on its name?”

SoftTalker1y ago

Arch-TK1y ago

I mean, it's still more convenient to type `rm` rather than `Remove-Item` when doing day-to-day computer tasks on your computer (yes I'm one of those people who lives in a terminal).

It's also certainly better from a readability standpoint to have `Remove-Item` rather than `rm` in a script.

I think just like we have long options and short options, it would be helpful to have long commands and short commands.

1 more reply

ForOldHack1y ago

As someone who started on ASR-33s. I have empathy for Mr Ritchie and Mr Kernigan.

After all, its 50% faster to type a two letter acronym, than a TLA.

https://media.wired.com/photos/59327efdf682204f73696446/mast...

Too1y ago

If i download a new version of foo and rename my old version to foo_old_backup_2, should foo_old_backup_2 start behaving differently, just because it has a different name? NO THANKS!

account421y ago

shermantanktop1y ago

It’s the equivalent of the HTTP Host header, with similar utility. But I agree with the author that an OS provided trustable structure is a much better way.

marcosdumay1y ago

> an OS provided trustable structure

Repeating the OP, your program takes every other parameter from the caller, why do you insist on the executable name to not be set by him too?

Windows defender is the one that is stupid by using it. Every OS has the real executable name in some place, security software should look there instead.

strawhatguy1y ago

Yes, this is useful for backwards compat too, like bash with an 'sh' mode.

dheera1y ago

There are also multiple reasons for a program to read its own executable.

- Decompressing and inflating a compressed binary block with a generic decompressor at the top (e.g. a bash or python script with a binary blob at the end)

- Checksumming its own executable (skipping the checksum string) to resist virus infection. Not bulletproof but viruses aren't usually smart enough to circumvent this

alkonaut1y ago

MPSimmons1y ago

If not, then busybox is going to need to change a TON

dataflow1y ago

> I don’t see why not. It’s allowed to behave differently based on the arguments that follow it.

That's missing the point, I think.

sokoloff1y ago

> is the name of a program really an argument to the program, from the user's perspective?

When I use busybox [invisibly to me], I sure care that it knows whether I called it as "ls" or as "rm" and that it does the operation that I asked it to do.

theamk1y ago

That's a weird take against argv[0] - all arguments are: "goes against modern design principles" and "can confuse programs which use argv[0] when they wanted "exec" instead"

cedilla1y ago

> all arguments are: "goes against modern design principles"

And the key witness is systemd, which is too young to buy a beer - even in Germany.

wietzeOP1y ago

jrockway1y ago

Most people use argv[0] so they can do something like:

   $ mycommand help
   Type `mycommand foo bar` to foo bars.

   $ mycommand1.2.3 help
   Type `mycommand1.2.3 foo bar` to foo bars.

yencabulator1y ago

I routinely use basename(arg0) in my programs.

denysvitali1y ago

I don't think argv[0] includes the full path (or at least some programming language strip the whole path and keep only the last part)

denysvitali1y ago

I stand corrected - C, Go and Python are all consistent here and show the full path.

I seem to recall there was a language that only provided the stripped part - but I guess my memory is failing me here. Sorry for the wrong information above.

2 more replies

kelsey987654311y ago

js21y ago

> Today however, disk space is no longer considered an issue; this is evidenced by macOS Sonoma, where shutdown and reboot are two separate executables.

And that's not the only one. There's another 68 links starting with `binhex.pl` and ending with `zipdetails` that are a single 811 byte wrapper-script around perl.

Altogether, I see that there are 26 different names that are multiply linked:

  ls -li /usr/bin |
  awk '{print $1}' | 
  sort | uniq -c | 
  sort -n | grep -v "\s*1\s" | wc -l

Some of the other examples: less & more, bc & dc, atrm & batch, stat & readlink.

The `git` binary uses this concept too. You can create an executable named `git-foo`, put it anywhere in your PATH, and then call it as `git foo`.

cryptonector1y ago

tsujamin1y ago

Surely the duplication would be (mostly) compressed away?

dcminter1y ago

This lost me at "goes against modern design principles" without citing what principle(s) the author had in mind that would proscribe it.

st_goliath1y ago

Despite all the talk about security, the whole debacle that argc can be 0 (and argv[0] can be NULL), is completely left aside. This has caused actual security issues quite recently[1].

[1] https://lwn.net/Articles/882799/

gwbas1c1y ago

The security issues the author points out later in the article do have merit.

Unfortunately, the author shot their credibility in the foot by perseverating on use of argv[0]; instead of glossing over it and getting to the point.

rpcope11y ago

"legacy" -> anything that has existed for more than a day that I don't understand and don't like that stops me from poorly reinventing the wheel

"modern" -> anything that I dreamt up or heard some other hipster talk about recently that I got hyped about

hiccuphippo1y ago

I would guess the modern principle of disregard for disk space or memory usage :(

astrobe_1y ago

lanstin1y ago

dividuum1y ago

Seriously: Their reason is basically "argv[0] is bad because security snake oil software is garbage":

1) Oh no, the only protection is looking at argv[0]. What kind of clown software is that? Software that notably runs on an already compromised system..

2) No need for argv[0] to fool software that concats argv values with spaces: just run 'curl -o "test.txt |grep" 1.1.1.1'

3) A long argument messes up telemetry? Let's hope that bucket doesn't have more holes.

shermantanktop1y ago

These are all very realistic examples. Should they happen? No, but reality is messy and imperfect. The crappy software you describe would not exist if there were great solutions in this space.

2 more replies

skobes1y ago

Isn't this contradicted by the docs? CreateProcess receives lpApplicationName and lpCommandLine, and they can be different.

DSMan1952761y ago

magicalhippo1y ago

Not the way I understand it. In the execv documentation[1], you pass the program name twice:

int execv(const char *path, char *const argv[]);

The argument path points to a pathname that identifies the new process image file.

Windows does not allow you to do that, AFAIK.

[1]: https://pubs.opengroup.org/onlinepubs/9699919799/functions/e...

skobes1y ago

> Windows does not allow you to do that, AFAIK.

It does though, using the lpCommandLine parameter to CreateProcess as I said.

CreateProcess("main.exe", "foobar", ...)

argv[0] is "foobar"

1 more reply

halayli1y ago

No it doesn't make software less predictable nor does it goes against modern design principles. argv has very handy use cases and can be used to provide better user experience.

Unless you have evidence to back up your claims, you're just turning a subjective opinion to an objective one without any merit.

Either way, it's software developer choice and irrelevant to the user as much as it is irrelevant to the user whether the developer prefers for(;;) over while(1).

JohnFen1y ago

> Today however, disk space is no longer considered an issue

On desktop machines, perhaps, but this is certainly not true on all platforms Linux runs on.

Suppafly1y ago

Plus the whole "space is not an issue" thing along with "you can just add more ram" is the reason everything is so bloated and slow even on well provisioned machines.

Sohcahtoa821y ago

These days, Windows Calculator takes up more memory than mIRC.

Tell me why a simple calculator app needs more memory than a complete multi-server implementation of the IRC protocol (including SSL/TLS), not to mention a full scripting engine.

1 more reply

citrin_ru1y ago

But of course this much more of an issue for embedded platforms like routers with OpenWRT.

Hizonner1y ago

No, not on desktop machines either. Executables these days can be enormous.

The author's just dumb.

blenderob1y ago

> argv[0] is a relic of the past

Busybox says hello.

Seriously though, how is this on the front page? Both the premise and conclusions contradict the reality of how argv[0] is used with symbolic links and hard links.

josefx1y ago

Microsoft defender using broken by design detection rules? One could almost think it is an anti virus program.

dotancohen1y ago

I also use argv[0] for the -h help text, to show examples how to use the command.

st_goliath1y ago

There is also a neat little BSD extension, also supported on a number of other Unix-like systems and GNU userspace (i.e. glibc, but also other libcs like Musl):

    extern char *__progname;

which holds the program name without the (optional) invocation path in front of it. Basically the last path component of argv[0].

dotancohen1y ago

Nice, thank you.

Brian_K_White1y ago

wattttt? nice thank you

anonymousiam1y ago

I've done this too, but you should remove the path elements from the argv[0] string before you include it in your error/help messages.

Brian_K_White1y ago

Sometimes you want it, sometimes you don't, so it needs to be in there, and sometimes you remove it yourself if your context of the moment doesn't want it.

And neither the want-it nor the don't-want-it case is such an outlier that you can disregard and not serve that case.

Sometimes you're talking to the user about general usage and the full path is a distracting detail and not the important part of the message.

Sometimes the full path and truthful invoked filename are an unnecessary security disclosure like telling a web viewer details about the server.

Sometimes the full path and truthful invoked filename is a necessary fact in debugging, or in errors, or even ordinary non-error logs that aren't public.

jmholla1y ago

You don't need to. Keeping them shows the user exactly how to call the program based on how they called it.

patrickmay1y ago

Same here, as well as for showing an example invocation when the user fails to include a required argument.

kelnos1y ago

> From a 2020s standpoint, this seems highly undesirable, as it makes software less predictable and goes against modern design principles.

Says who? I'm not aware of any modern design principles that say anything about this sort of thing.

> argv[0] is ignored (mostly)

Pretty much any program I've written that has a --help option uses argv[0] to print out the usage string, i.e.:

    printf("%s [--some-arg] FILENAME\n", argv[0]);

> First off, argv[0] can be used to fool security software

> Ultimately, nobody wants to be bothered by argv[0].

This article is kinda lame, and it seems like the author's objections are mostly based on ignorance.

remram1y ago

> From a 2020s standpoint, this seems highly undesirable, as it makes software less predictable and goes against modern design principles.

This is not an argument at all, this is a statement that arguments exist. What are they?

It's like saying we shouldn't do something because it's "against best practices". I'm asking why are other practices preferred...

andrewmcwatters1y ago

I wish amateurs would stop propagating the false idea that disk space and memory are cheap and not a problem.

layer81y ago

If nothing else, argv[0] is useful for producing error messages that indicate the name of the executable that is outputting the message.

It's probably a good idea to not have it settable to other values by the invoking process, as is generally the case on Windows (ignoring its Posix subsystem here).

alerighi1y ago

> It's probably a good idea to not have it settable to other values by the invoking process, as is generally the case on Windows (ignoring its Posix subsystem here).

Yes, I know that there are a ton of other methods to do the same thing (perhaps an environment variable, for example), but I find the method of argv[0] quite nice and simple to be fair.

account421y ago

CamJN1y ago

On a related note, env vars do not have to be of the form key=value, they are arbitrary NUL-terminated byte strings just like the args.

account421y ago

Is this a parody?

cryptonector1y ago

sph1y ago

What a silly post. I use argv[0] in my host-spawn tool (https://github.com/1player/host-spawn) so one can symlink it to a name inside a container and when you run it, it's executed on the host.

    # Inside your container:

    $ flatpak --version
    zsh: command not found: flatpak

    # Have host-spawn handle any flatpak command
    $ ln -s /usr/local/bin/host-spawn /usr/local/bin/flatpak

    # Now flatpak will always be executed on the host
    $ flatpak --version
    Flatpak 1.12.7

4star3star1y ago

tqwhite1y ago

jujube31y ago

Problem: virus scanning software on Windows is broken.

Solution: we should not use argv[0]?

hi-v-rocknroll1y ago

kazinator1y ago

The author doesn't seem to understand that argv[0] can be different due to, for instance, one executable implementing many programs, such as BusyBox and similar projects.

While argv[0] is old, if you had to design it from scratch to day, it would still be a good idea to have the program invocation name as an argument.

The idea that anything old must is historic quirk that we can today eliminate is flawed.

Brian_K_White1y ago

This is stupid. argv0 is just some data like any other data.

It's ridiculously useful aside from the obvious busybox style usage.

F-ing dumb.

iainmerrick1y ago

Yes! I was surprised how far down I had to scroll to find somebody mentioning that one.

How else can you write a reasonably robust script that actually, you know, does something? You almost always need to grab some known files by their paths relative to the script.

account421y ago

argv[0] isn't actually a great solution for that since it isn't required to contain any path and in practice won't depending how the program was called.

1 more reply

PaulHoule1y ago

It’s part of the shambolic world of Unix and C. But “worse is better!”

A good language spec is laid out in a way that reads from front to back with minimized circularity. See Common Lisp, Java, Python, etc.

As a kid in high school checking out Unix manuals and implementing many Unix tools in

https://subethasoftware.com/2022/09/27/exploring-1984-os-9-o...

zabzonk1y ago

i may have missed it, but where does the C Standard say anything about access to a symbol table? or even if such a thing exists.

and as for IBM i managed to use all sorts of OSs in VMs on IBM hardware back in the 1980s. Which did you have problems with?

PaulHoule1y ago

The parser in C has to keep track of the symbol table to handle cases like

   typdef int myint;
   myint x;

IBM's z-architecture (the other z) has a great software story today (even runs Linux) but it was not the Plan A or even the Plan B.

2 more replies

linsomniac1y ago

Seems like this security software is broken, not argv[0]

keepamovin1y ago

This is why we can't have nice things. Security footguns everywhere!

I'm fascinated by the intersection of argv[0], and the execve behavior of replacing the calling program with the called one.

But this is where process.title steps in. Setting process.title allows you to (in an OS-dependent way) change the name reported in ps and similar tools.

Please don't kill argv[0], its lease hath all too short a date

kelsey987654311y ago

Your fascination is rewarded by reading the other man sections such as section three:

https://linux.die.net/man/3/execve

Dwedit1y ago

How about the part about knowing what the directory the executable was launched from? It could be different than the working directory.

thayne1y ago

> and (especially a few decades ago) can offer cross-platform/backwards syntax compatibility using a shared code base.

Another way to do that could be to use a shell script that execs systemctl, but that requires a separate intermediate shell process, which may have its own compatibility issues.

Arch-TK1y ago

For most other things, definitely unnecessary.

johnisgood1y ago

So wait, I should not use `argv` in C's main() or what?

Is it only speaking against `argv[0]` or `argv` in general?

What is this proposed solution if any?

What about `__progname`? The only issue here is that if `argv[0]` is a path, then `__progname` is only the filename. What if I want the path?

gwbas1c1y ago

The author's extensive criticisms of using argv[0] are a distraction from the main point of the article:

IMO, operating systems should block this practice.

Unfortunately, the author's extensive criticism of programs reading argv[0] hurt the author's credibility before most people get to the real point of the article.

account421y ago

> IMO, operating systems should block this practice.

The "look like" is not a problem with the OS but a problem with displaying an array as a space-separated string without sufficient quoting or escaping, making things ambiguous.

tantalor1y ago

The name of something is not an intrinsic property.

barelyauser1y ago

Can something posses extrinsic properties? Or are them a intrinsic property of external things?

samatman1y ago

Yes.

Not only is there a Wikipedia article on it, there's more than one.

Here's the one covering science and engineering, which is the appropriate version for this discussion.

https://en.wikipedia.org/wiki/Intrinsic_and_extrinsic_proper...

account421y ago

Are you asking if a property being intrinsic or not is an intrinsic property of that property?

guappa1y ago

Wait until he finds out about busybox!

pjc501y ago

A side effect of that is that programs do their own unescaping. Unix users who are used to quotes being stripped for them may be surprised by this.

timrobinson3331y ago

1 more reply

t435621y ago

One can then guess what the PYTHONPATH and LD_LIBRARY_PATH should be most of the time and save someone from having to set them.

Obviously this is of most use when you're running something you've installed into /opt (e.g. /opt/myprog/bin, /opt/myprog/lib etc) or are running it from the source tree.

account421y ago

> arg0 also contains the path from where the invoker invoked the binary

Not in general it doesn't. Convention for shells is to pass the string the user used to invoke the program which may be an absolute path, a relative path or just a filename resolved against $PATH.

> this enables all sorts of binaries that work out where their dependencies are relative to their original binary

JoyfulPanda1y ago

Holy moly, the article addresses argv[0] as the problem, while the real problem is that the snake oil industry has no clue what they are doing

suprjami1y ago

It's not often a self-promotion blog post has the entirety of HN telling you you're wrong. Better luck next time lol

anacrolix1y ago

azlev1y ago

I don't think the argv was made with security in mind.

If we want something to be used in security field, the design since day 0 should consider it. Trying to retrofit something will break a lot of things.

nmz1y ago

Really strange that argv[0] has a basically unlimited character size while #! has a hardcodede 256 byte limit.

mannyv1y ago

"Remember, the safest computer is one that's turned off and unplugged."

omphaloskeptic1y ago

fanf21y ago

No, that’s a property of how shells interpret argv[0], not a property of exec()

account421y ago

> This seems like a questionable design decision.

Nope.

> Should a program be allowed to behave differently based on its name?

> From a 2020s standpoint, this seems highly undesirable

Nope.

> it makes software less predictable

It doesn't. It makes it more predictable if programs can easily provide compatibility interfaces. Yes, you could do the same with a wrapper but removing friction matters.

> and goes against modern design principles.

Then modern design priciples can take a hike.

> Today however, disk space is no longer considered an issue

It should be considered an issue though. I buy better hardware to get more use out of it, not for lazy developers to needlessly piss it all away.

hinkley1y ago

> Today however, disk space is no longer considered an issue;

Tell me you don’t use Docker without telling me you don’t use Docker.

I’d argue the certutil problem the author mentions is a flaw in certutil, not argv’s fault. Doesn’t that mean it falls to symlinks as well?

KingOfCoders1y ago

I use argv[0] to monitor the binary by itself and restart when it has changed.

actionfromafar1y ago

How? Checking and storing a checksum, or just file change metadata?