Cheating at a company group activity using Unix tools (opens in new tab)

(medium.com)

139 pointsdevenvdev4y ago146 comments

146 comments

85 comments · 11 top-level

ccalloway4y ago· 35 in thread

Most of the justifications for using collections of command-line Unix tools are no longer valid today. Instead you should be using a proper programming language.

Note that people who still do use complex solutions built from cat, head, cut, etc, and who know what they're doing, will typically either write a shell script (which won't be structured particularly differently from the equivalent Python or whatever) or will rely heavily on awk (itself a full-featured programming language, no easier to learn than any other scripting language), or both.

One-liners which pipe text between four or five different commands are the equivalent of hand-soldered boards or bitwise arithmetic. Interesting to learn about for historical reasons but of no practical utility.

The use of things like xargs and jq in this solution, difficult to invoke Unix utilities for doing things that are trivial in any reasonable language, makes this even more clear.

gattilorenz4y ago

> Most of the justifications for using collections of command-line Unix tools are no longer valid today. Instead you should be using a proper programming language.

That's just, like, your opinion man...

> people who still do use complex solutions built from cat, head, cut, etc, and who know what they're doing, will typically either write a shell script or use awk

No, I still use cat, head, cut etc., because it's easier to see at each step what is happening and incrementally add to that, because they're literally everywhere, because it's quick, and because I like it. Not for major projects, granted, but why would I need to write a python file for something that takes a single line of piped commands?

unixbane4y ago

It's not hard: Imagine if your terminal executed Python repl instead of bash shell. Literally anything is better than bash. It's amazing how people still haven't figured out bash is basically PHP.

2 more replies

pkrumins4y ago

Amen.

21434y ago

> Most of the justifications for using collections of command-line Unix tools are no longer valid today.

Why is it not valid today?

> One-liners which pipe text between four or five different commands are the equivalent of hand-soldered boards or bitwise arithmetic.

Why is that bad? Don't deploy a supercomputer to do what a hand soldered board can. Keep it simple.

I don't see how you equate piping commands to bitwise arithmetic, but bitwise arithmetic is easy anyway.

> but of no practical utility

Says you. Just because you don't find something useful doesn't mean nobody else finds it useful.

Turning one liner piped commands to a program in what you might consider a "proper programming language" usually ends up turning a declarative program into something prodecural. Not that that's necessarily a bad thing. Just saying.

Use the right tool for the job. In some cases (not all) the shell is indeed the right tool.

I get a feeling you don't understand the Unix philosophy. Read The Art of Unix Programming by Eric S. Raymond. Go learn bitwise arithmetic.

The uneducated play with pictures. Educated people read and write :)

Have a great day (or night, depending on your timezone — night here).

Chris20484y ago

> I get a feeling you don't understand the Unix philosophy. Read The Art of Unix Programming by Eric S. Raymond. Go learn bitwise arithmetic.

These kind of comments aren't useful, and come off smug to me.

Why doesn't he understand something, what it the point in reading/learning something you think relevant? Why would they invest in your suggestions if you don't provide any reasons what is missing?

Also:

> The uneducated play with pictures. Educated people read and write :)

This sounds insulting to me (name-calling), and passive-aggressively so when combined with "Have a great day".

Chris20484y ago

> Turning one liner piped commands to a program in what you might consider a "proper programming language" usually ends up turning a declarative program into something prodecural.

I don't understand - isn't bash shell procedural?

pkrumins4y ago

Sir, you are absolutely and totally wrong. No sysadmin has time or interest to write programs or use "proper programming languages" to get the job done. Sysadmins know their tools and are fast and efficient. Zero sysadmins will ever write "proper programs" when they can pipe find, sed, and awk.

ccalloway4y ago

Yes, but the number of sysadmins in the world is converging rapidly to zero. Mainly because the invisible overhead of having your boxes managed by people who use sed and awk is much higher than the alternatives (containerisation, cloud vendors, cattle not pets, infrastructure as code).

I think you actually confirmed my point when you said 'no sysadmin has time or interest'. I didn't say that no one uses shell anymore. I said that there was no practical justification for doing so. Lots of people still use it and don't have any interest in finding a better solution. These people are choosing, for their own reasons, to double down on an obsolete skillset. Good for them, I suppose, but I don't think that demand for their niche is going to be around for very long.

I also pointed out that sysadmins who know what they're doing use awk. Based on your mentioning awk it seems like you agree with this.

3 more replies

unixbane4y ago

> Sysadmins know their tools

This is false. Every line they write has potential security implications. If any pattern can become a string injection vulnerability, it will. Even most real programmers do not understand shell scripting.

This discussion is all moot because UN*X is an obsolete misconception made by programmers who are too dumb to understand the difference between and significance of AST manipulation and string concatenation. It's not hard to understand what I'm talking about. Look how much ad-hoc, non-reusable trivia you have to learn to pass stuff around between find, ls, and xargs. Whenever you would do `x = f(); g(x)` in Python, you will spend 10 minutes figuring out if some given shell script is doing the equivalent the correct, safe, secure way.

I just realized today that every time I see some crap like x,abc%%20%%20def,y x,abc\x20\x20def,y I am actually depressed because I already know all the played out meta of such systems - it's a bunch of half working junk full of pointless vulns that only exist because between 0 and 10 of the said "experts at their craft" in the world actually bother to program this crap correctly. And I literaly have to squint to figure out where the bug is (there's always the bug in such code).

If the UN*X shell was replaced by Python, sysadmins would have no trouble adapting. Python is a terrible language, (plus this discussion is convoluted by the fact that Python's gimped from bending over to work well with UN*X) but still better than UN*X shell in every way.

1 more reply

akho4y ago

> Instead you should be using a proper programming language.

Why use many word when few word do trick?

Shell scripts are very compact for what they do, and generally as clear as a similarly quick-and-dirty solution in another language. Ease of learning and use comes from having used the programs you are running in your script. E. g. my small script piping stuff from nmcli to fzf so I can choose a wifi network would be much more difficult to write in Python: I’d need to find a Python library for interacting with NetworkManager, and a library for interactively fuzzy searching in a list, read the docs, spend a while setting up a venv to run it, … I don’t have time for any of that.

xargs, in particular, is not difficult to invoke once you’ve done it once or twice, and does a lot. Apart from just being a loop, it also parallelizes execution, and can ask for confirmation from the user for each invocation. Implementing either of these features will take you more than 2-4 chars you need to use it with xargs.

hawski4y ago

Shell is a very high level language that is undoubtedly bizarre at times. Its routines can be written in any programming language you want without much of a boilerplate or any special bindings, because supporting for command line arguments, environment variables, reading and writing files is a bare minimum that most languages support. It is a tradeoff - it has its immense advantages and a couple of often hard to navigate disadvantages. But the ability to easily compose whatever you want is something you can't ignore.

I think the perfect very high level language is closer to shell than to python for example. The power of Tcl/Tk (still it has some big weaknesses) or Rebol/Red is something that I admire.

The following statement is probably controversial: shell is more akin to Lisp for human beings. I dabbled at Scheme, but it is harder for me to grasp than Shell, but in the end they are more similar than not.

I hold a candle for Oil shell for example.

flohofwoe4y ago

Fundamentally it's the same thing. The UNIX tools are the "batteries included" standard library of the integrated shell programming environment. And if you need to extend that library you quickly whip up a new minimal command line tool in C (or any other language which allows to write small and quick and dirty command line tools, like Python).

The only downside of shell scripting is that it isn't trivially portable to Windows (or even macOS because of the differences between GNU and BSD tools), so it often makes sense to create big Python scripts that do more than "one thing right". If the whole world would run on UNIX, shell scripting would make much more sense.

actually_a_dog4y ago

> The only downside of shell scripting is....

The only downside? Let's add that no major *NIX shell that I'm aware of has any good way to modularize code while enforcing encapsulation of state.

At my previous job, we had a rule that any shell script longer than about a page of code had to be replaced with a Python script ASAP. That was a good rule, IMO, because once you've exceeded a certain size, a shell script starts getting brittle and hard to work with. I don't know if 1 page of code is the threshold size or not, but it seems like as good a cutoff as any.

herbst4y ago

This is only partially true. Working on your own machine this may be fully the case, but debugging and maintaining random servers is a complete different beast.

Shell is portable in a way nothing else is, same reason people use Excel instead of code or PHP instead of literally anything.

ccalloway4y ago

Firstly the model of sshing into your server and trying to run commands on it directly is more or less obsolete.

Secondly the presence of all these utilities on a machine is far from guaranteed - expecting Python to be present is no more or less likely.

3 more replies

high_54y ago

> Interesting to learn about for historical reasons but of no practical utility.

The practical utility has just been demonstrated in this particular article? Historical? I think that Unix shell are like crocodiles - outliving the dinosaurs and lurking in the murky water the unsuspecting sysadmin to come close enough to fix the script that ain't broken.

oblio4y ago

Your analogy is quite interesting, in the sense that crocodiles are around, but they're not the dominant species. That would be an interesting continuation to your analogy, actually.

1 more reply

ccalloway4y ago

> The practical utility has just been demonstrated in this particular article?

How? The author is doing something which could be done much more easily and elegantly in a programming language.

smitty1e4y ago

> command-line Unix tools are no longer valid today

Strongly disagree. The understanding of the OS, the data, and how to checkmate the problem with minimal effort is timeless.

Programming languages are relatively ephemeral compared to POSIX utilities.

Invest in knowledge of the enduring.

spekcular4y ago

I don't understand why you're being so harshly downvoted. This seems ... plausibly correct to me?

Some questions:

1) Where do you draw the line, and move away from shell to a "real" language. If I just want want to view a directory, surely I use ls, right? What about if I want to remove all leading whitespace from a text file? This can be done with a short but fairly opaque awk one-liner. Probably takes way more time to write and run the Python equivalent, but I don't know Python so well, so maybe it's also a one-liner.

2) What's the optimal "real language" for replacing shell? Python? Perl? Raku?

2a) xargs automatically parallelizes programs. How can this be done efficiently (meaning I don't have to write much additional code) in your proposed "real language"?

dagw4y ago

xargs automatically parallelizes programs. How can this be done efficiently (meaning I don't have to write much additional code) in your proposed "real language"?

By using xargs :)

Or more likely gnu parallel. But seriously, in so many cases using GNU parallel to parallelize a process is the quickest and easiest way to approach the problem, and I use it all the time. If I need to process 10k+ images in a folder, rather than try to parallelize the process in my python/C script I'll write the fastest possible single threaded script that takes its input args as command line arguments and then uses gnu parallel to distribute the workload. The added advantage is that I can distribute this work on a cluster of machines with only a few changes to GNU Parallel's command line arguments.

actually_a_dog4y ago

We had a rule at my old workplace that any shell script that's over a page of code was to be replaced with a Python script. We chose Python because guaranteed there would always be an up to date version of Python on any cloud machine we used, but you could certainly make a case for some other language. In a vacuum, my choice might be Scheme.

Kinrany4y ago

No "proper programming language" is capable of ergonomically piping between programs.

Shell is indeed very old and it's time for a replacement, but it's not there yet.

Oilshell might get there eventually or at least spark interest in this area.

ilyash4y ago

> No "proper programming language" is capable of ergonomically piping between programs.

Solved. https://github.com/ngs-lang/ngs

I'm the author. Frustrated with exactly this situation I created Next Generation Shell. It's a "proper programming language" on one hand but domain-specific for "DevOps"y scripting on another. So sane syntax, data structures, error handling, multiple dispatch on one hand but also syntax for running external programs, pipes and redirects.

You are welcome!

unixbane4y ago

*sh is only ergonomic if you're using it insecurely. wait, even then it's not ergonomic. you're just insane. the day i figured out about python repl over 10 years ago, i realized *sh is utterly obsolete (and python isn't even good, a repl like that just gives you some perspective of how things could be).

ccalloway4y ago

> No "proper programming language" is capable of ergonomically piping between programs.

You cherry-picked the one thing that shell script is somewhat better at than other languages (not even really needed for this task). Meanwhile the article uses shell for both making HTTP requests and mangling JSON data, both of which are easy in all modern languages, and extremely painful in shell.

1 more reply

3np4y ago

I guess it depends on what you do. For a subset of tasks, shell scripting is a lot faster to implement than the equivalent python/js/go/ruby/rust.

Lamad1234y ago

I heard awk is extremely fast, much faster than Python.

ccalloway4y ago

Yes, but execution speed (of things like creating a lookup table from the second and third fields in each line) is rarely the operative constraint.

1 more reply

revscat4y ago

It is, but it's much closer to perl when it comes to maintainability.

hansel_der4y ago

i heard python is among of the slowest

smcameron4y ago

This is akin to saying if you want to hang a picture in your house, you shouldn't just grab a hammer and a nail, instead, you should get a nice piece of wood, and a nice hunk of metal, go to the workshop, fire up the forge, the mill and various wood and metal working machines, and forge a special picture-hanging-hammerer-thing.

ccalloway4y ago

Umm, no, chaining together 5 or more somewhat arcane single-purpose tools is the unrealistic solution here.

If the article was about using ls to just list files, and I had said "actually you should use Python's os.listdir() and filter the results by whatever" you would be right.

For most simple problems it's correct to use a simple tool. For the overwhelming majority of complex problems you should use a well-understood, well-designed, common general-purpose tool.

1 more reply

aulin4y ago

wait, what's wrong with hand soldering and bitwise arithmetic? hand soldering is still of very much practical utility for building prototypes, small production batches, electronics repair... Bitwise arithmetic... seriously? tell that to an embedded developer or anyone who works with low-level stuff.

ccalloway4y ago

Nothing is wrong with them if 1) you have a specifically suited, niche problem and you understand the complexities and tradeoffs of the tool OR 2) it's not a critical requirement to solve the problem using modern, efficient tools AND you want to use something else for fun or learning or whatever.

If this isn't the case, stay away.

1 more reply

l0b04y ago· 21 in thread

  ls | grep '.csv$' | xargs cat | grep 'cake' | cut -d, -f2,3 > cakes.csv

That's quite a few antipatterns in one go. Unless you have a bajillion files the `xargs` is unnecessary, the `cat` and `ls` are unnecessary (and `ls` in shell scripts is a whole class of antipatterns by itself). You might want to use something like this instead:

  grep cake *.csv | cut -d, -f2,3 > cakes.csv

gbrown_4y ago

I certainly agree with your points, though the original task is a textbook example of where awk shines.

    awk 'BEGIN {FS=","} /cake/ {print $2, $3}' *.csv > cakes.csv

Unsurprisingly I disagree with the post's description of awk being an "advanced command".

jasode4y ago

>I disagree with the post's description of awk being an "advanced command"

I guess there's no categorization of "advanced" vs "beginner" that will satisfy every audience but I consider awk an advanced tool. About 20 years ago, I wrote some AWK tips and cheatsheet back on USENET and today I would have to refer to that post to write basic awk commands.

The thing about awk is that it's a compact programming language with variables and conditionals and that's a step-change in complexity for many users.

nickjj4y ago

> Unsurprisingly I disagree with the post's description of awk being an "advanced command".

I think it can be pretty advanced, for me awk is one of those tools where I still feel like I need to write a paragraph of comments to explain 1 line of code.

For example: https://github.com/nickjj/invoice/blob/75660dce5a29ceb4e47a6...

Keep in mind I don't really "know" awk. I cobbled that together from a few examples. It will convert times formatted like "2h 30m", "150m" or "2:30" into 2.50. There's a bunch of examples in the test file.

NOTE: I wrote that script 2.5 years ago and I know there's questionable patterns in other areas of the script that's not highlighted like using a bunch of separate echo calls instead of a heredoc.

Shell scripting is really fun and efficient. I use it all the time for a variety of things.

1 more reply

yesenadam4y ago

  awk -F, '/cake/ {print $2, $3}' *.csv > cakes.csv

does the same thing: -F sets the field separator.

1_player4y ago

awk is one of those underrated tools I always wanted to learn.

I still remember when I started working as a sysadmin at 19, the greybeard UNIX guy taught me how to vim, and he told me awk was as important as knowing vim, pointing to some huge AWK manual he had on the shelf, one of those with the animal in the cover.

This was 15 years ago, I know vim, but awk still eludes me.

5 more replies

gorgoiler4y ago

My thoughts exactly, though with “-F,” to quickly set the field separator.

devnull2554y ago

Awk is advanced in the sense that it is a programming language by itself. Years ago I had to migrate an old version of an Informix database to a newer version. The old version did not have tools to export the database as DDL and DML statements. So I had to create a tool myself to do it. The system did not have perl installed so I had to use awk. It worked nicely enough.

coolgeek4y ago

I love awk. I've been using it for over 25 years.

But awk is usually not a good choice for processing CSV files.

Your program will fail if any of the fields include embedded commas (which is often the case).

darrenf4y ago

Almost, but not quite :)

    $ ls | grep '.csv$' | xargs cat | grep 'cake' | cut -d, -f2,3
    cake
    cake
    cake
    $ grep cake *.csv | cut -d, -f2,3
    bar.csv:cake
    foo.csv:cake
    quux.csv:cake

`grep -h` will do the trick though

darrenf4y ago

I'm surprised, hours later, that no-one has pointed out my error here! So I'll do it myself - upon revisiting it, I realised my input files weren't CSVs with the number of fields required for `cut` to do the intended thing. Had I made them so, the `-h` isn't required after all. My bad :)

jrm44y ago

I really wish people would mostly stop doing this type of utterly and completely necessary "correction." It really misses the point, and exemplifies nearly the exact opposite of the point and value the original article is expressing.

This is a tool used to accomplish a thing, and Unix tools can be used to accomplish things in many different ways. This is like complaining about, I don't know, using a metric-labeled screwdriver on an imperial-measured screw that still gets the job done exactly as needed. Cut it out.

DarylZero4y ago

> using a metric-labeled screwdriver on an imperial-measured screw that still gets the job done exactly as needed

This is super ignorant. You risk stripping the screw and getting yourself into a frustrating screw extraction job.

Just because you lived, doesn't mean it was safe.

2 more replies

nicce4y ago

It might not miss the point, if the command sample is giving more complicated demonstration than it actually is, almost like doing artificially long command to demonstrate unix-magic, to get more audience.

Anti-patterns are bad, because it usually means that sample command might work in this case, but not in other environments or other use-cases. Someone who is seeing these commands first time, has no idea about that. And this post is meant for beginners.

1 more reply

bonkabonka4y ago

The original example MUST be corrected for the same reason that folks who post naive code snippets adding SQL strings together with user input must be corrected.

It is not a matter of taste and it is not a matter of metric versus imperial screwdrivers. Someone will copy this code and it will end up being an attack vector where it will have consequences.

I imagine you're rolling your eyes and have flipped the bozo bit but please bear with me.

Think of the teachable moment this presents! The author of the original piece goes back and annotates their original answer along the lines of, "you might solve it this way but there are some gotchas with it - let me show you what could go wrong."

As an industry we absolutely need to circle back with improvements so that those who come after us can build on a more solid foundation.

1 more reply

cuu5084y ago

This is pointing out you can simplify the drill+pliers+plunger+screwdriver contraption to just screwdriver.

XorNot4y ago

I'd go further and say don't parse CSV with plaintext tools because it's barely a plaintext format. Use a CSV library and save yourself heartache when someone drops a quoted string in somewhere.

oblio4y ago

I don't understand why you're being downvoted.

Parsing CSV with simple text-oriented tools is bad of an idea as parsing HTML with regexps.

1 more reply

ryanianian4y ago

This isn't parsing CSV, it's generating it. That's not nearly as fraught.

philwelch4y ago

In my experience doing this kind of thing, I usually use TSV instead of CSV and end up using Unix tools anyway. None of the CSV tools keep up in terms of performance.

devenvdevOP4y ago

Yes! Agree. This post is more educational than practical (albeit the title). You need to start somewhere and know the basics to understand these details and caveats to feel what quality code means in this context. I chose to use the long, redundant version to show that chaining a gazillion different commands is ok.

mdoms4y ago

It will also fall flat on its face for CSV files that contain values which contain escaped commas.

pkrumins4y ago· 11 in thread

The first example is super super bad here. Never pipe `ls`. When you feel like you need to pipe `ls`, then you know you want to use `find`.

devenvdevOP4y ago

Yes, as I answered in another thread - this post is much more educational than practical. It's intended to teach how to use pipes and simple commands together. Explaining why piping `ls` is bad and what is the difference between `ls` and `find` commands would miss the point of the post and would be confusing :)

MisterTea4y ago

> this post is much more educational than practical.

I understand that you don't want to be mean but this post is neither. It's like a bad gun safety video where the alleged instructor points a loaded gun at school children and then looks down the barrel while polishing the trigger...

1 more reply

oweiler4y ago

To be even more pedantic, you probably want to use a glob

tzs4y ago

There are a lot of times one only wants non-dotfiles in the current directory.

The find would be something like

  find . -not -path '*/\.*' -type f -depth 1

What advantages does that have over 'ls' for that case?

jasode4y ago

>What advantages does that have over 'ls' for that case?

The gp was talking about issues with piping from "ls |" and your particular case of "find" being more convoluted than "ls" isn't comparing that.

Example of the topic that gp was warning about:

http://mywiki.wooledge.org/ParsingLs

https://unix.stackexchange.com/questions/128985/why-not-pars...

[Also fyi... you may have meant "-maxdepth" instead of "-depth" in your example.]

revscat4y ago

FYI the zsh glob for this is `^.*`, e.g.:

    echo ^.*

will show all non-dotfiles in the current directory.

1 more reply

nixpulvis4y ago

I would recommend `find ... -exec`, but I still haven't figured out how to make it compose properly with other UNIX tools.

revscat4y ago

You may want to use file globbing instead. This is one I just used yesterday afternoon. I needed to search for a string in every .js or .jsx file in my project, but didn't want to include specs in the search.

    rg 'MySearchString' **/*.js[x]#~*spec*

Voila. Note that this is for zsh, and you need to set the EXTENDED_GLOB option. But once you do you'll find yourself rarely needing to reach for `find`.

1 more reply

renewiltord4y ago

I solve this problem by just having no spaces on my filesystem when I act.

DoingIsLearning4y ago

Never?

ls |less

pkrumins4y ago

Nice! This is pretty the only exception. ls|more or ls|less.

dsr_4y ago· 3 in thread

"After some digging, it was easy to find the HTTP request that pulled this information from the server. And it even had all the birthdates in the JSON!"

HR needs to know this, but it shouldn't be available to random employees.

devenvdevOP4y ago

Why though? We use hibob, and anyone can find anyone's full name and birthday via UI anyway. Are there any compliance issues with this?

toomuchtodo4y ago

It’s PII and should be restricted to only those who require the data for their job (HR).

dsr_4y ago

Do you live in a place where there are anti-discrimination laws concerning employee ages? If not, perhaps there is no issue for you.

1 more reply

tzs4y ago· 1 in thread

The 'comm' command should be in there. With no options 'comm' takes two files, F1 and F2, which should be lexically sorted, and produces 3 columns of output.

The first column consists of lines that are only in F1, the second column consist of lines that are only in F2, and the third column consists of lines that are common to both files.

The option -1 tells it to not print column 1, -2 tells it not to print column 2, and -3 does the same for column 3. These can be combined, so -12 would only print column 3 (the lines that are in both files) and -13 would only print column 2 (the lines that are in F2 but not F1).

rolandog4y ago

This is new to me, but really useful. Thanks for sharing!

amtamt4y ago· 1 in thread

Extension of classic problem from "Programming Pearls" by John Bentley. Nice to see such pragmatism for one time problems.

carapace4y ago

s/John/Jon/

smitty1e4y ago· 1 in thread

For doing work with JSON data, I'd add:

https://stedolan.github.io/jq/

b6z4y ago

I don't understand what you mean. Half of the article is using jq.

unixbane4y ago· 1 in thread

> Was it worth it?

> 1 minute to do this

> 1 minute to do that

and 1 minute to introduce RCE vulns into company #589179283672's pipeline due to the "you don't understand the security implications of using fragile UN*X tools" problem which applies to anyone actually learning something from this article DAY OF THE SEAL SOON,

Chris20484y ago

> into company #589179283672's pipeline

The article isn't describing such a scenario (load-bearing script).

mattrighetti4y ago

For those interested in this topic I would suggest these incredible lectures by MIT [0], especially the data wrangling one.

Lectures are hosted on YouTube, they are extremely valuable and easy to follow and they give a pretty good insight on a lot of Unix topics.

[0]: https://missing.csail.mit.edu/2020/

perryizgr84y ago

> regex is so ubiquitous and valuable that if you don’t know it yet, you should learn it)

Regex is one of those things I have to learn every single time I need to use it. I just can't seem to force myself to remember.

unixbane4y ago

jq ... | sed -E 's/([0-9][0-9]).([0-9][0-9]).[0-9]*$/\2_\1/'

this fails for me since the jq output lines are surrounded by quotes. had to remove $. did i do something different or are we running different jq versions?

j / k navigate · click thread line to collapse

146 comments

85 comments · 11 top-level

ccalloway4y ago· 35 in thread

Most of the justifications for using collections of command-line Unix tools are no longer valid today. Instead you should be using a proper programming language.

The use of things like xargs and jq in this solution, difficult to invoke Unix utilities for doing things that are trivial in any reasonable language, makes this even more clear.

gattilorenz4y ago

> Most of the justifications for using collections of command-line Unix tools are no longer valid today. Instead you should be using a proper programming language.

That's just, like, your opinion man...

> people who still do use complex solutions built from cat, head, cut, etc, and who know what they're doing, will typically either write a shell script or use awk

unixbane4y ago

It's not hard: Imagine if your terminal executed Python repl instead of bash shell. Literally anything is better than bash. It's amazing how people still haven't figured out bash is basically PHP.

2 more replies

pkrumins4y ago

Amen.

21434y ago

> Most of the justifications for using collections of command-line Unix tools are no longer valid today.

Why is it not valid today?

> One-liners which pipe text between four or five different commands are the equivalent of hand-soldered boards or bitwise arithmetic.

Why is that bad? Don't deploy a supercomputer to do what a hand soldered board can. Keep it simple.

I don't see how you equate piping commands to bitwise arithmetic, but bitwise arithmetic is easy anyway.

> but of no practical utility

Says you. Just because you don't find something useful doesn't mean nobody else finds it useful.

Use the right tool for the job. In some cases (not all) the shell is indeed the right tool.

I get a feeling you don't understand the Unix philosophy. Read The Art of Unix Programming by Eric S. Raymond. Go learn bitwise arithmetic.

The uneducated play with pictures. Educated people read and write :)

Have a great day (or night, depending on your timezone — night here).

Chris20484y ago

> I get a feeling you don't understand the Unix philosophy. Read The Art of Unix Programming by Eric S. Raymond. Go learn bitwise arithmetic.

These kind of comments aren't useful, and come off smug to me.

Why doesn't he understand something, what it the point in reading/learning something you think relevant? Why would they invest in your suggestions if you don't provide any reasons what is missing?

Also:

> The uneducated play with pictures. Educated people read and write :)

This sounds insulting to me (name-calling), and passive-aggressively so when combined with "Have a great day".

Chris20484y ago

> Turning one liner piped commands to a program in what you might consider a "proper programming language" usually ends up turning a declarative program into something prodecural.

I don't understand - isn't bash shell procedural?

pkrumins4y ago

ccalloway4y ago

I also pointed out that sysadmins who know what they're doing use awk. Based on your mentioning awk it seems like you agree with this.

3 more replies

unixbane4y ago

> Sysadmins know their tools

1 more reply

akho4y ago

> Instead you should be using a proper programming language.

Why use many word when few word do trick?

hawski4y ago

I think the perfect very high level language is closer to shell than to python for example. The power of Tcl/Tk (still it has some big weaknesses) or Rebol/Red is something that I admire.

I hold a candle for Oil shell for example.

flohofwoe4y ago

actually_a_dog4y ago

> The only downside of shell scripting is....

The only downside? Let's add that no major *NIX shell that I'm aware of has any good way to modularize code while enforcing encapsulation of state.

herbst4y ago

This is only partially true. Working on your own machine this may be fully the case, but debugging and maintaining random servers is a complete different beast.

Shell is portable in a way nothing else is, same reason people use Excel instead of code or PHP instead of literally anything.

ccalloway4y ago

Firstly the model of sshing into your server and trying to run commands on it directly is more or less obsolete.

Secondly the presence of all these utilities on a machine is far from guaranteed - expecting Python to be present is no more or less likely.

3 more replies

high_54y ago

> Interesting to learn about for historical reasons but of no practical utility.

oblio4y ago

Your analogy is quite interesting, in the sense that crocodiles are around, but they're not the dominant species. That would be an interesting continuation to your analogy, actually.

1 more reply

ccalloway4y ago

> The practical utility has just been demonstrated in this particular article?

How? The author is doing something which could be done much more easily and elegantly in a programming language.

smitty1e4y ago

> command-line Unix tools are no longer valid today

Strongly disagree. The understanding of the OS, the data, and how to checkmate the problem with minimal effort is timeless.

Programming languages are relatively ephemeral compared to POSIX utilities.

Invest in knowledge of the enduring.

spekcular4y ago

I don't understand why you're being so harshly downvoted. This seems ... plausibly correct to me?

Some questions:

2) What's the optimal "real language" for replacing shell? Python? Perl? Raku?

2a) xargs automatically parallelizes programs. How can this be done efficiently (meaning I don't have to write much additional code) in your proposed "real language"?

dagw4y ago

xargs automatically parallelizes programs. How can this be done efficiently (meaning I don't have to write much additional code) in your proposed "real language"?

By using xargs :)

actually_a_dog4y ago

Kinrany4y ago

No "proper programming language" is capable of ergonomically piping between programs.

Shell is indeed very old and it's time for a replacement, but it's not there yet.

Oilshell might get there eventually or at least spark interest in this area.

ilyash4y ago

> No "proper programming language" is capable of ergonomically piping between programs.

Solved. https://github.com/ngs-lang/ngs

You are welcome!

unixbane4y ago

ccalloway4y ago

> No "proper programming language" is capable of ergonomically piping between programs.

1 more reply

3np4y ago

I guess it depends on what you do. For a subset of tasks, shell scripting is a lot faster to implement than the equivalent python/js/go/ruby/rust.

Lamad1234y ago

I heard awk is extremely fast, much faster than Python.

ccalloway4y ago

Yes, but execution speed (of things like creating a lookup table from the second and third fields in each line) is rarely the operative constraint.

1 more reply

revscat4y ago

It is, but it's much closer to perl when it comes to maintainability.

hansel_der4y ago

i heard python is among of the slowest

smcameron4y ago

ccalloway4y ago

Umm, no, chaining together 5 or more somewhat arcane single-purpose tools is the unrealistic solution here.

If the article was about using ls to just list files, and I had said "actually you should use Python's os.listdir() and filter the results by whatever" you would be right.

For most simple problems it's correct to use a simple tool. For the overwhelming majority of complex problems you should use a well-understood, well-designed, common general-purpose tool.

1 more reply

aulin4y ago

ccalloway4y ago

If this isn't the case, stay away.

1 more reply

l0b04y ago· 21 in thread

  ls | grep '.csv$' | xargs cat | grep 'cake' | cut -d, -f2,3 > cakes.csv

  grep cake *.csv | cut -d, -f2,3 > cakes.csv

gbrown_4y ago

I certainly agree with your points, though the original task is a textbook example of where awk shines.

    awk 'BEGIN {FS=","} /cake/ {print $2, $3}' *.csv > cakes.csv

Unsurprisingly I disagree with the post's description of awk being an "advanced command".

jasode4y ago

>I disagree with the post's description of awk being an "advanced command"

The thing about awk is that it's a compact programming language with variables and conditionals and that's a step-change in complexity for many users.

nickjj4y ago

> Unsurprisingly I disagree with the post's description of awk being an "advanced command".

I think it can be pretty advanced, for me awk is one of those tools where I still feel like I need to write a paragraph of comments to explain 1 line of code.

For example: https://github.com/nickjj/invoice/blob/75660dce5a29ceb4e47a6...

NOTE: I wrote that script 2.5 years ago and I know there's questionable patterns in other areas of the script that's not highlighted like using a bunch of separate echo calls instead of a heredoc.

Shell scripting is really fun and efficient. I use it all the time for a variety of things.

1 more reply

yesenadam4y ago

  awk -F, '/cake/ {print $2, $3}' *.csv > cakes.csv

does the same thing: -F sets the field separator.

1_player4y ago

awk is one of those underrated tools I always wanted to learn.

This was 15 years ago, I know vim, but awk still eludes me.

5 more replies

gorgoiler4y ago

My thoughts exactly, though with “-F,” to quickly set the field separator.

devnull2554y ago

coolgeek4y ago

I love awk. I've been using it for over 25 years.

But awk is usually not a good choice for processing CSV files.

Your program will fail if any of the fields include embedded commas (which is often the case).

darrenf4y ago

Almost, but not quite :)

    $ ls | grep '.csv$' | xargs cat | grep 'cake' | cut -d, -f2,3
    cake
    cake
    cake
    $ grep cake *.csv | cut -d, -f2,3
    bar.csv:cake
    foo.csv:cake
    quux.csv:cake

`grep -h` will do the trick though

darrenf4y ago

jrm44y ago

DarylZero4y ago

> using a metric-labeled screwdriver on an imperial-measured screw that still gets the job done exactly as needed

This is super ignorant. You risk stripping the screw and getting yourself into a frustrating screw extraction job.

Just because you lived, doesn't mean it was safe.

2 more replies

nicce4y ago

1 more reply

bonkabonka4y ago

The original example MUST be corrected for the same reason that folks who post naive code snippets adding SQL strings together with user input must be corrected.

It is not a matter of taste and it is not a matter of metric versus imperial screwdrivers. Someone will copy this code and it will end up being an attack vector where it will have consequences.

I imagine you're rolling your eyes and have flipped the bozo bit but please bear with me.

As an industry we absolutely need to circle back with improvements so that those who come after us can build on a more solid foundation.

1 more reply

cuu5084y ago

This is pointing out you can simplify the drill+pliers+plunger+screwdriver contraption to just screwdriver.

XorNot4y ago

I'd go further and say don't parse CSV with plaintext tools because it's barely a plaintext format. Use a CSV library and save yourself heartache when someone drops a quoted string in somewhere.

oblio4y ago

I don't understand why you're being downvoted.

Parsing CSV with simple text-oriented tools is bad of an idea as parsing HTML with regexps.

1 more reply

ryanianian4y ago

This isn't parsing CSV, it's generating it. That's not nearly as fraught.

philwelch4y ago

In my experience doing this kind of thing, I usually use TSV instead of CSV and end up using Unix tools anyway. None of the CSV tools keep up in terms of performance.

devenvdevOP4y ago

mdoms4y ago

It will also fall flat on its face for CSV files that contain values which contain escaped commas.

pkrumins4y ago· 11 in thread

The first example is super super bad here. Never pipe `ls`. When you feel like you need to pipe `ls`, then you know you want to use `find`.

devenvdevOP4y ago

MisterTea4y ago

> this post is much more educational than practical.

1 more reply

oweiler4y ago

To be even more pedantic, you probably want to use a glob

tzs4y ago

There are a lot of times one only wants non-dotfiles in the current directory.

The find would be something like

  find . -not -path '*/\.*' -type f -depth 1

What advantages does that have over 'ls' for that case?

jasode4y ago

>What advantages does that have over 'ls' for that case?

The gp was talking about issues with piping from "ls |" and your particular case of "find" being more convoluted than "ls" isn't comparing that.

Example of the topic that gp was warning about:

http://mywiki.wooledge.org/ParsingLs

https://unix.stackexchange.com/questions/128985/why-not-pars...

[Also fyi... you may have meant "-maxdepth" instead of "-depth" in your example.]

revscat4y ago

FYI the zsh glob for this is `^.*`, e.g.:

    echo ^.*

will show all non-dotfiles in the current directory.

1 more reply

nixpulvis4y ago

I would recommend `find ... -exec`, but I still haven't figured out how to make it compose properly with other UNIX tools.

revscat4y ago

    rg 'MySearchString' **/*.js[x]#~*spec*

Voila. Note that this is for zsh, and you need to set the EXTENDED_GLOB option. But once you do you'll find yourself rarely needing to reach for `find`.

1 more reply

renewiltord4y ago

I solve this problem by just having no spaces on my filesystem when I act.

DoingIsLearning4y ago

Never?

ls |less

pkrumins4y ago

Nice! This is pretty the only exception. ls|more or ls|less.

dsr_4y ago· 3 in thread

"After some digging, it was easy to find the HTTP request that pulled this information from the server. And it even had all the birthdates in the JSON!"

HR needs to know this, but it shouldn't be available to random employees.

devenvdevOP4y ago

Why though? We use hibob, and anyone can find anyone's full name and birthday via UI anyway. Are there any compliance issues with this?

toomuchtodo4y ago

It’s PII and should be restricted to only those who require the data for their job (HR).

dsr_4y ago

Do you live in a place where there are anti-discrimination laws concerning employee ages? If not, perhaps there is no issue for you.

1 more reply

tzs4y ago· 1 in thread

The 'comm' command should be in there. With no options 'comm' takes two files, F1 and F2, which should be lexically sorted, and produces 3 columns of output.

The first column consists of lines that are only in F1, the second column consist of lines that are only in F2, and the third column consists of lines that are common to both files.

rolandog4y ago

This is new to me, but really useful. Thanks for sharing!

amtamt4y ago· 1 in thread

Extension of classic problem from "Programming Pearls" by John Bentley. Nice to see such pragmatism for one time problems.

carapace4y ago

s/John/Jon/

smitty1e4y ago· 1 in thread

For doing work with JSON data, I'd add:

https://stedolan.github.io/jq/

b6z4y ago

I don't understand what you mean. Half of the article is using jq.

unixbane4y ago· 1 in thread

> Was it worth it?

> 1 minute to do this

> 1 minute to do that

Chris20484y ago

> into company #589179283672's pipeline

The article isn't describing such a scenario (load-bearing script).

mattrighetti4y ago

For those interested in this topic I would suggest these incredible lectures by MIT [0], especially the data wrangling one.

Lectures are hosted on YouTube, they are extremely valuable and easy to follow and they give a pretty good insight on a lot of Unix topics.

[0]: https://missing.csail.mit.edu/2020/

perryizgr84y ago

> regex is so ubiquitous and valuable that if you don’t know it yet, you should learn it)

Regex is one of those things I have to learn every single time I need to use it. I just can't seem to force myself to remember.

unixbane4y ago

jq ... | sed -E 's/([0-9][0-9]).([0-9][0-9]).[0-9]*$/\2_\1/'

this fails for me since the jq output lines are surrounded by quotes. had to remove $. did i do something different or are we running different jq versions?

j / k navigate · click thread line to collapse