From what I understand, your parent talks about how the commands are built iteratively, with some kind of trial-error loop, which is a strength that is supposedly not emphasized enough. And I agree by the way. Nothing to do with how things are input.
comm -1 -3 <(ls -1 dataset-directory | \
grep '\d\d\d\d_A.csv' | \
cut -c 1-4 | \
python3 parse.py | \
uniq \
) \
<(seq 500)
is "Why would I want to write a complicated mess like that?" Just use ${FAVORITE_PROG_LANG:-Perl, Ruby, or whatever}". For many tasks, a short paragraph of code in a "normal" programming language is probably easier to write and is almost certainly a more robust, easier to maintain solution. However, this assumes that you knew what the problem was and that qualities like maintainability are a goal.Bernhardt's (and my) point is that sometimes you don't know what the goal is yet. Sometimes you just need to do a small, one-off task where a half-assed solutions might be appropriate... iff it's the right half of the ass. Unix shell gets that right for a really useful set of tasks.
This works because you are free to utilize that powerful features incrementally, as needed. The interactive nature of the shell lets you explore the problem. The "better" version in a "proper" programming language doesn't exist when you don't yet know the exact nature of the problem. A half-assed bit of shell code that slowly evolved into something useful might be the step between "I have some data" and a larger "real" programming project.
That said, there is also wisdom in learning to recognize when your needs have outgrown "small, half-assed" solutions. If the project is growing and adding layers of complexity, it's probably time to switch to a more appropriate tool.
The first half of the job was exactly the process you described: start with one log file, craft a grep for it, craft a `grep -o` for the relevant part of each relevant line, add `sort | uniq -c | sort -r`, switch to zgrep for the archived rotated files, and so on.
The other half of the ass was done in a different language, using the output from the shell, because I needed to do a thousand or so lookups against a website and parse the results.
Composable shell tools is a very under-appreciated toolbox, IMO.
Here's a quick refactor for the block that I would say is simpler and easier to maintain.
function xform() {
local dir="$1"
ls -1 "$dir" |
grep '\d\d\d\d_A.csv' |
cut -c 1-4 |
python3 parse.py |
uniq
}
comm -1 -3 <(xform dataset-directory) <(seq 500)Some discussion on the pros and cons of those two approaches, here:
More shell, less egg:
http://www.leancrew.com/all-this/2011/12/more-shell-less-egg...
I had written quick solutions to that problem in both Python and Unix shell (bash), here:
The Bentley-Knuth problem and solutions:
https://jugad2.blogspot.com/2012/07/the-bentley-knuth-proble...
Not that there isn't some merit to McIllroy's criticism (I know some of the frustration from trying to read Knuth's programs carefully), but at least link to the original context instead of a blog post that tells a partial story:
https://www.cs.tufts.edu/~nr/cs257/archive/don-knuth/pearls-...
https://www.cs.tufts.edu/~nr/cs257/archive/don-knuth/pearls-...
(One of the places where McIlroy admits his criticism was "a little unfair": https://www.princeton.edu/~hos/mike/transcripts/mcilroy.htm)
BTW, there's a wonderful book called “Exercises in Programming Style” (a review here: https://henrikwarne.com/2018/03/13/exercises-in-programming-...) that illustrates many different solutions to that problem (though as it happens it does not include Knuth's WEB program or McIllroy's Unix pipeline).
“The interactive nature of the shell” isn’t that impressive in this day and age. Certainly not shells like Bash (Fish is probably better, but then again that’s very cutting edge shell (“for the ’90s”)).
Irrespective of the shell this just boils down to executing code, editing text, executing code, repeat. I suspect people started doing that once they got updating displays, if not sooner.
But none of that is the point. The end result of a specific solution isn't the point. The cleverness of the pipeline isn't the point. The point is that if you are familiar with the tools, this is often the fastest method to solve a certain class of problem, and it works by being interactive and iterative, using tools that don't have to be perfect or in and of themselves brilliant innovations. Sometimes a simple screwdriver that could have been made in 1900 really is the best tool for the job!
Bernhardt's stated goal with that talk was get people to understand this point (and hopefully use and benefit from the power of a programmable tool). "If [only using files & binaries] is how you use Unix, then you are using it like DOS. That's ok, you can get stuff done... but you're not using any of the power of Unix."
> Fish
Fish is cool! I keep wanting to use it, but the inertia of Bourne shell is hard to overcome.
For the past couple decades, the only other even remotely mainstream place where you could get a comparable experience was a Lisp REPL. And maaaybe Matlab, later on. Recently, projects like R, Jupyer, and (AFAIK) Julia have been introducing people to interactive development, but those are specific to scientific computing. For general programming, this approach is pretty much unknown outside of Lisp and Unix shell worlds.
Actually no, they're not different things; both refer to the same activity of a user analyzing the information on the screen and issuing commands that refine the available information iteratively, in order to solve a problem. (I would have bought your argument had you made a distinction between "solving the problem" and "finding the right tools to solve the problem").
The thing is that the Unix shell is terribly coarse-gained in terms of what interactivity is allowed, so that the smaller refinement actions (what you call "input") must be described in terms of a formal programming language, instead of having interactive tools for those smaller trial-error steps.
There are some very limited forms of interactivity (command line history, keyboard accelerators, "man" and "-h" help), but the kind of direct manipulation that would allow the user to select commands and data iteratively, are mostly absent from the Unix shell. Emacs is way better in that sense, except for the terrible discoverability of options (based on recall over recognition).
One of these languages is the history expansion in Bash. At first I was taken by all the `!!^1` weirdness. But (of course) it’s better—and actually interactive—to use keybindings like `up` (previous command). Thankfully Fish had the good sense to not implement history expansion.
[1] I use Emacs+Evil so I like Vi(m) myself.
Bind up/down to history-search-backward/history-search-forward. In ~/.inputrc
# your terminal might send something else for the
# for the up/down keys; check with ^v<key>
# UP
"\e[A": history-search-backward
# DOWN
"\e[B": history-search-forward
(note that this affects anything that uses readline, not just bash)The default (previous-history/next-history) only step through history one item at a time. The history-search- commands step through only the history entries that match the prefix you have already typed. (i.e. typing "cp<UP>" gets the last "cp ..." command; continuing to press <UP> steps through all of the "cp ..." commands in ${HISTFILE}). As your history file grows, this ends up kind of like smex[1] (ido-mode for M-x that prefers recently and most frequently used commands).
For maximum effect, you might want to also significantly increase the size of the saved history:
# no file size limit
HISTFILESIZE="-1"
# runtime limit of commands in history. default is 500!
HISTSIZE="1000000"
# ignoredups to make the searching more efficient
HISTCONTROL="ignorespace:ignoredups"
# (and make sure HISTFILE is set to something sane)
[1] https://github.com/nonsequitur/smex/ shopt -s histappend
export PROMPT_COMAND="history -a"