You can do it too, and if you're serious about writing Unix-style filter programs, you will someday need to. How do you know which format to write? Call "isatty(STDOUT_FILENO)" in C or C++, "sys.stdout.isatty()" in Python, etc. This returns true if stdout is a terminal, in which case you can provide pretty output for humans and machine-readable output for programs, automatically.
“please don’t make the behavior of a command-line program depend on the type of output device it gets as standard output or standard input.”¹
① https://www.gnu.org/prep/standards/standards.html#User-Inter...
When used sparingly and thoughtfully, I've never personally had an issue with it.
"ps -f" truncates long lines instead of wrapping, while "ps -f | cat" lets the long lines live
How people usually discover what these commands do is by running them interactively, and if that results in some output being hidden vs being run noninteractively, then they have little reason to believe that it could yield more output than what they're used to seeing. I think a certain number of "ps" users don't know it can display full paths and commands, if they've only ever used it interactively.
It may have some merits, but as a general advice this is definitely an anti-pattern.
Another example is "curl", where "curl URL >outfile" is chatty on stderr, while "curl URL" is quiet on stderr. That's very annoying for scripting, you easily forget to set "-s" in your scripts due to that behaviour.
I love that 'git log' outputs in a pager. 'svn log' by comparison is nuts.
ls is a bit more than just a command though. It's part of the furniture and prehistoric.
Dealing with programs that act differently depending on their output device is very annoying.
cmd := exec.Command("/bin/[", "-t", "1")
cmd.Stdout = os.Stdout
isatty := nil == cmd.Run()Examining the characteristics of the output stream and changing behavior is another "rule" that is not mentioned often. Another example is buffering the output to a large block if sending to a pipe, but making it line-buffered if going to a terminal.
In a magical dream world I'd start a distro where every command has its interface rewritten to conform to a command line HIG. Single-letter flags would always mean only one thing, common long flags would be consistent, and no new tools would be added to the distro until they conformed. But at this point everyone's used to (and more importantly, the entire system relies on) the weird mismatches and historical leftovers from older commands. Too bad!
Long and Short Options: https://www.gnu.org/prep/standards/html_node/Option-Table.ht...
General Interfaces: https://www.gnu.org/prep/standards/html_node/User-Interfaces...
Command Line Interfaces: https://www.gnu.org/prep/standards/html_node/Command_002dLin...
Program Argument Syntax: http://www.gnu.org/software/libc/manual/html_node/Argument-S...
http://www.robertames.com/blog.cgi/entries/the-unix-way-comm...
""" The two surprising finds in the above documents are the standard list of long options and short options from -a to -z.
Forver and a day I am trying to figure out what to name my program options and these two guides definitely help. It allows me to definitively say you should use -c … for “command” instead of -r … for “run” because -r means recurse or reverse. """
--Robert
http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_...
(I'm not so convinced that long options are a good thing, as evidenced by the --extended-regexp/--regexp-extended and other little "was it spelt this way or that?" type of confusions. It's not hard to remember single letters, especially if they're mnemonic.)
curl -kLIiso example.org www.example.org
versus: curl --insecure --location --head --include --silent --output example.org www.example.rog
And of course as a practical matter, with short opts you'll run out of characters eventually, and meaningful mnemonics before that.I can't find documentation on what I mean, but try ip --help
As for dd, it came from a non-UNIX OS and kept the original syntax.
If it's JSON and I know what object I want, I just have to pipe to something like jq [1].
PowerShell takes this further and uses the concept of passing objects around - so I can do things like ls | $_.Name and extract a list of file names (or paths, or extensions etc)
Input from stdin, output to stdout: Nicely side-stepped in that most cmdlets allow binding pipeline input to a parameter (either byval or byname, if needed). Filters are trivial to write, though.
Output should be free from headers: Side-stepped as well, in that decoration comes from the Format-* cmdlets that should only ever be at the end of a pipeline that's shown to the user.
Simple to parse and to compose: Well, objects. Can't beat parsing that you don't need to do.
Output as API: Well, since output is either a collection of objects or nothing (e.g. if an exception happened) there isn't the problem that you're getting back something unexpected.
Diagnostics on stderr: Automatic with exceptions and Write-Error. As an added bonus, warnings are on stream 2, debug output on stream 3 and verbose output on stream 4. All nicely separable if needed.
Signal failures with an exit status. Automatic if needed ($?), but usually exception handling is easier.
Portable output: That's about the only advice that would still hold and be valuable. E.g. Select-String returns objects with a Filename property which is not a FileInfo, but only a string; subject to the same restrictions that are mentioned in the article.
Omit needless dagnostics: Since those would be either on the debug or verbose stream they can be silenced easily, don't interfere with other things you care about and cmdlets have a switch for either of that, which means you only get that stuff if you actually care about it.
Avoid interactivity: Can happen when using the shell interactively, e.g.
Home:> Remove-Item
cmdlet Remove-Item at command pipeline position 1
Supply values for the following parameters:
Path[0]: _
However, this only ever happens if you do not bind anything to a parameter, which shouldn't happen in scripts. If you bind $null to a parameter, e.g. because pipeline input is empty or a subexpression returned no result, then an error is thrown instead, avoiding this problem.Nitpick: You'd need ls | % Name or ls | % { $_.Name } there. Otherwise you'd have an expression as a pipeline element, which isn't allowed.
But then you've oddities like plutil behaving like gzip by modifying the file you specify rather than printing to stdout. You have to pass -o and a dash to get it to leave the file alone and instead reformat it to stdout. That one gets me every time. And I'm not alone: https://twitter.com/mavcunha/status/417823730505895936
But other parts are nice. For instance, "system_profiler -xml > MyReport.spx" generates XML that will open in the System Profiler GUI app. The XML generated is usually a Plist, since that's as native to the platform as the Registry might be to Windows...
Let me know when PowerShell gets tabs though. Maybe there's a Terminal.app port running in Mono somewhere? Seriously, I wish somebody would build a better terminal, maybe get creative with scrollback and chaining commands, and ship it in an OS... with tabs. ;-)
I suggest -0 for symmetry with xargs. find calls it -print0, I think.
(In my view, this is poor design on xargs's part; it should be reading a newline-separated list of unescaped file names, as produced by many versions of ls (when stdout isn't a tty) and find -print, and doing the escaping itself (or making up its own argv for the child process, or whatever it does). But it's too late to fix now I suppose.)
That breaks when you have newlines in filenames, no?
That seems like an extremely pathological case.
This does what you would expect:
echo My brother\'s 12\" records.txt | parallel touch$ printf '"foo bar"' | xargs -n1
and
$ printf '"foo" "bar"' | xargs -n1
and
$ printf "%s" '\\"foo bar\\"' | xargs -n1
That approach dates from the days when you got multi-column directory listings with
ls | mc
Putting multi-column output code in "ls" wasn't consistent with the UNIX philosophy.There's a property of UNIX program interconnection that almost nobody thinks about. You can feed named environment variables into a program, but you can't get them back out when the program exits. This is a lack. "exit()" should have taken an optional list of name/value pairs as an argument, and the calling program (probably a shell) should have been able to use them. With that, calling programs would be more like calling subroutines.
PowerShell does something like that.
http://www.catb.org/~esr/writings/taoup/html/ch06s06.html
Or write environment variables to stdout in Bourne shell syntax so the caller call run "eval" on it. Like ssh-agent, for example.
On the other hand you made me thinking and probably you should have three code passes per default:
[0] normal behaviour (exit 0)
[1] bad arguments (exit EINVAL)
[2] --usage (print to stdout but but exit != 0)?
Anyway I am not sure if it makes sense to declare "usage" as normal behaviour.The former, I think, should write to stdout and return 0, the latter should write to stderr and return something non-zero.
Giving help if the user asks for it is normal behaviour.
annoying_program 2>&1 | less
but it is very unfriendly to stymie a user's attempt to get help when they're already probably confused.It's possible other descriptors would be useful, like stdlog for insecure local logs, stddebug for sending gobs of information to a debugger. It's certainly not in POSIX, so too bad, but honestly stdout is hard to keep readable and pipe-able. Adding just one more file descriptor separates the model from the view.
Obviously not every program will use just two file descriptors. Binary isn't handled by stdin and stdout because they're typically used for tty input/output. If you need to handle multiple files you'll take a list of file arguments. Often a program takes no input at all that isn't a command-line option.
And what 'formatting markup'? There is no 'markup' on a terminal, unless you're dealing with colors or something, which you would disable if your fd wasn't a tty. And why would you send 'headers' to a completely different file descriptor anyway?
Oh, I think I get it now. You confused the MVC architecture with Unix programs. Unix programs don't provide a user interface.
Not at all. cat wouldn't have a ncurses GUI, that doesn't make sense. My point is that 'cat --verbose' should be an option, where the stdout doesn't change but extra crap is sent elsewhere, and probably just dumped on the terminal like stderr. I sometimes want to see extra context and line numbers in my grep searches (grep -nC 3 ..) but I might want the stdout to remain clean. This makes programs more composable. Right now it's like we've got stdfmt permanently redirected towards stdout.
In practical terms, vi does its own paging. It's not a wrapper over echo | ed | less. One giant monolithic subsystem. Perhaps vi is the exception. But dd offers a progress bar, but only if you send it a SIG of some sort. wget offers a progress bar by default (silence is golden? not so much). ls yields differently columned outputs to ttys or files. I suppose this is the simplicity of Unix that I shouldn't touch.
Some unix tools work really well already, and I'm not suggesting destroying tar or xargs. I'm not sure how systemd works into this, but I'm not really a fan of that.
I guess Plan9 wasn't Unix, either.
His point is that two streams are not enough, you don't want to present the same output stream or a human, a logfile or an other utility.
> And what 'formatting markup'? There is no 'markup' on a terminal, unless you're dealing with colors or something
Right, so there is markup on a terminal.
> which you would disable if your fd wasn't a tty.
Which would be much simpler to handle if there was a stream for human consumption and one for piping
> And why would you send 'headers' to a completely different file descriptor anyway?
Because headers are useful to human users or when capturing output in a file to read later rather than in an other utility?
If you are intercepting UNIX signals (starting with SIGINT), go back to the drawing board and think again. Don't do it. There is almost never a good reason for doing it, and you will likely get it wrong and frustrate users.
The "portable output" thing is especially subjective. I buy that it probably makes sense for compilers to print full paths. But it's nice that tools like ls(1) and find(1) use paths in the same form you gave them on the command-line (i.e., absolute pathnames in output if given absolute paths, but relative pathnames if given relative paths). For one, it means that when you provide instructions to someone (e.g., a command to run on a cloned git repo), and you want to include sample output, the output matches exactly what they'd see. Similarly, it makes it easier to write test suites that check for expected stdout contents. And if you want absolute paths in the output, you can specify the input that way.
If the implementation isn't respecting The Rule of Composition it's actually not adhering to the Unix philosophy in the first place. The tweet is referring to one of Doug McIlroy's (one of the Unix founders, inventor of the Unix pipe) famous quotes:
"This is the Unix philosophy: Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface."
Pure beauty, but it's almost too concise a definition if you haven't experienced the culture of Unix (many years of usage / reading code / writing code / communication with other followers). ESR's exhaustive list of Unix rules in plain English might be a better start for the uninitiated (among which one will find the aforementioned Rule of Composition).
For all those seeking enlightenment, go forth and read the The Art of Unix Programming:
https://en.wikipedia.org/wiki/The_Art_of_Unix_Programming
17 Unix Rules:
https://en.wikipedia.org/wiki/Unix_philosophy#Eric_Raymond.E...
'One thing well' is often intended to make people's lives easier on the console. Sometimes this means assuming sane defaults, and sometimes just a simpler program that does/assumes less. Take these two examples and tell me which you'd prefer to type:
user@host~$ ls *.wav | xargs processAudio -e mu-law --endian swap -c 2 -r 16000
user@host~$ find . -type f -maxdepth 1 -name '*.wav' -exec processAudio -e mu-law --endian swap -c 2 -r 16000 {} \;
Write concise technical documentation. Imagine it's your first day on a new job and you need to learn how all your new team's tools work; do you want to read every line of code they've written just to find out how it works, or do you want to read a couple pages of technical docs to understand in general how it works? (That's a rhetorical question)Definitely provide a verbose mode. When your program doesn't work as expected, the user should be able to figure it out without spending hours debugging it.
http://javier.io/blog/en/2014/10/21/hints-in-writing-unix-to...