Show HN: Hck – a fast and flexible cut-like tool (opens in new tab)

(github.com)

151 pointstotalperspectiv4y ago34 comments

34 comments

32 comments · 10 top-level

lillesvin4y ago· 8 in thread

I wrote something similar (but necet really finished it), called 'gut', in Go a few years back. Funny thing is, that I literally never use it. I thought splitting on regexes and that stuff would be super useful, but it turns out that I just use Perl one-liners instead. And Perl is available on something like 99.99% of all *nix machines, which my own 'cut'-substitute isn't.

Still a good exercise for me to write it, and I assume for OP too.

totalperspectivOP4y ago

It was indeed an great exercise! Part of the motivation for me was also performance oriented. I should add some Perl one liners to the benchmarks to see where they land as well. My experience is that they are usually a bit slower than awk.

c544y ago

I’ve never used perl, but i love concise bash 1-liner wizard incantations. What are some examples of things it’s handy for?

atsaloli4y ago

See https://catonmat.net/perl-one-liners-explained-part-one

And https://nostarch.com/perloneliners

asicsp4y ago

I wrote a couple articles showing examples where Perl is more suitable compared to sed/awk:

* https://www.perl.com/article/perl-one-liners-part-1/

* https://www.perl.com/article/perl-one-liners-part-2/

FractalHQ4y ago

What tool would you recommend to someone who is starting out and wants to learn to write nifty scripts this day in age? I’m currently studying bash but there are so many scripting languages that I hear about and it’s hard to know what to invest time into.

fragmede4y ago

Invest time into what you need to get your job done. Easy when summarized like that, but lets dig in.

First consider what systems you want your skills to be applicable for.

Do you need tools that work on many random Linux machines that you have little control over? Then go with the lowest common denominator - bash, and various command line tools (sed,awk,grep) included with every system, and get good with the subset of command line options common on all of them - most likely limited by the oldest system you need to work with. (There are still Windows XP and Redhat 4 systems out in the wild, if you're unlucky enough to have to work with them.)

Do you need to work with OS X at all? I never learned to use Apple's outdated versions of programs, instead I heavily customized my laptop to have compatible versions of things but this only works because there's 1 os x machine I ever deal with.

Then it's about the right tool for the right job. Do you want to process text? Awk will take you a long way, but ultimately, Perl is your friend. Do you want to want more structured programming type things (aka objects/classes)? Then Python is your friend. There's a certain mindset that thinks that if everything is in one language things are better, but that's a trap. With enough work, you can do the same thing in any language, but each languages is better than others at some specific thing. (working legacy code is that something that a language can be better at than others.)

These days, it's more important to learn what tools are available and how to use them, but because you can just google 'awk print second to last column' and plug that into your script, and continue working, there's less of a need to truely grok awk's language (for example). (I mean, spend the time to learn it once so it will come back to you the next time you need to do something more custom with it)

1 more reply

andrewzah4y ago

For a lot of tasks, posix-compliant Bash scripts are more than adequate. Use Perl, Python, or Ruby (your choice) if it becomes more complex (especially with state). It’s worth considering ones that are installed by default on most linux distros.

There’s no reason to chase X script/lang of the month. Bash etc are extremely well documented and there’s a very good chance someone already asked how to do something similar to what you’re doing on stackoverflow, etc.

mongol4y ago

A book "Minimal Perl" used to be referred to often in these discussions but I never hear about it any more. It was teaching these kind of tricks for command line magic.

rashil20004y ago· 7 in thread

Love seeing these modern alternatives to coreutils! Ripgrep, fd, hyperfine, bat, exa, bottom, gdu, wc, sd, hexyl...

Yet to find a GNU 'tr' alternative though

ComputerGuru4y ago

Here's `tac` in rust, with simd optimizations (in the simd branch, I haven't gotten around to releasing it although the only thing missing is dynamically detecting simd support instead of doing it at compile time):

https://github.com/neosmart/tac

and `rewrite`, which I've been told is akin to gnu sponge, "rewritten" in rust:

https://github.com/neosmart/rewrite

tyingq4y ago

Here's tr in Perl: https://metacpan.org/dist/PerlPowerTools/source/bin/tr

marto14y ago

Ripgrep is really the one that stood out for me. It feels substantially faster to use and does seem to do sane things more often than grep.

Would recommend anyone to try it.

sieste4y ago

> Ripgrep, fd, hyperfine, bat, exa, bottom, gdu, wc, sd, hexyl...

Thanks for that list! Is there any place where more of these "modern alternatives to coreutils" are collected?

basetensucks4y ago

https://github.com/ibraheemdev/modern-unix is a pretty decent list.

kristopolous4y ago

What would you like it to do?

rashil20004y ago

It's not like anyone absolutely needs it, I was just fascinated by the recent surge in faster and more cross-platform utilities.

1 more reply

kitd4y ago· 4 in thread

Nice work!

I don't know whether anyone here has used Rexx. The 'parse' instruction in Rexx was incredibly powerful, breaking up text by field/position/delimiter and assigning to variables all in one line.

I've often wondered if there was a command-line equivalent. Awk is great but you have to 'program' the parsing spec, rather than declare it.

twic4y ago

> Awk is great but you have to 'program' the parsing spec, rather than declare it.

You could probably turn a declarative spec into an awk program with an awk program.

tyingq4y ago

Not declarative, but Perl can do something like that.

Delimeters/Regex:

  $ perl -ne '($name,$pass,$uid,$gid,$therest)=split(/:/);print "$name $gid\n"' /etc/passwd
  root 0
  daemon 1
  bin 2
  ...

Fixed width:

  $ printf "1234XY\n5678AB" | perl -ne '($f1,$f2)=unpack("a4 a2");print "$f2 $f1\n"'
  XY 1234
  AB 5678

I believe Rexx's parse is fancier still, but this is reasonably close.

Ultimatt4y ago

You might want to look into what the following cli options do for you:

perl -F':' -anE 'say $F[0]'

1 more reply

kitd4y ago

That is indeed good. I used a bit of Perl a few years back but it has slipped out of my mind.

visarga4y ago· 1 in thread

<offtopic> I have implemented a `_split` command to split a line by a separator and `_stat` command that does basically `sort | uniq -c | sort -nr` counting elements and sorting by frequency. Really useful operations for me.

When my one liners become 2-3 lines long I need to switch to a regular script, but I also log all my shell commands years back and have something a bit better than `history | grep word` to search it.</>

nerdponx4y ago

> I also log all my shell commands years back and have something a bit better than `history | grep word` to search it.

I'd be very interested to hear more about this.

technological4y ago· 1 in thread

Nice one op. It’s mostly due to my lack of knowledge of rust but the code is not easy to read unlike golang. Does anyone feel the same ? (between nothing to do with how op wrote but rather the language itself)

tyingq4y ago

I don't think even Rust fans would argue that. Rust has roughly 2x the amount of reserved words, more operators, and so on. There's a larger basic set of things to learn before you could skim some code and read it.

toastal4y ago· 1 in thread

Heck

valbaca4y ago

> hck is a shortening of hack, a rougher form of cut.

bilalhusain4y ago

It is interesting to note how it compares to "choose" (also in Rust) in the benchmarks.

single character

    hck           1.494 ± 0.026s
    hck (no-mmap) 1.735 ± 0.004s
    choose        4.597 ± 0.016s

multi character

    hck           2.127 ± 0.004s
    hck (no-mmap) 2.467 ± 0.012s
    choose        3.266 ± 0.011s

The single pass optimization trick[1] seems to be helping a lot in single character case.

Of course, doing away with a pass is suppossed to give 2x, and I am wondering whether the regex constraint lead to this "side-effect".

[1] fast mode - https://github.com/sstadick/hck/blob/master/src/lib/core.rs#... https://github.com/sstadick/hck/blob/master/src/lib/core.rs#...

asicsp4y ago

I saw about `hck` recently on twitter, was impressed to see support for compressed files. From the current todo list, I hope complement is implemented for sure.

I see Negative index is currently "unlikely". I'm writing a similar tool [0], but with bash+awk. I solved the negative index support with a `-n` option, which changes the range syntax to `:` instead of `-` character.

My biggest trouble came with literal field separator [1], because FS can only be specified as a string in awk and backslash is a metacharacter for both string and regexp.

[0] https://github.com/learnbyexample/regexp-cut

[1] https://learnbyexample.github.io/escaping-madness-awk-litera...

rendall4y ago

The README and description should not assume the reader knows what `cut` is or what it's used for. Maybe reference it and then ELI5

queuebert4y ago

Yay, no more piping multiple cuts when you have multiple delimiters.

j / k navigate · click thread line to collapse

34 comments

32 comments · 10 top-level

lillesvin4y ago· 8 in thread

Still a good exercise for me to write it, and I assume for OP too.

totalperspectivOP4y ago

c544y ago

I’ve never used perl, but i love concise bash 1-liner wizard incantations. What are some examples of things it’s handy for?

atsaloli4y ago

See https://catonmat.net/perl-one-liners-explained-part-one

And https://nostarch.com/perloneliners

asicsp4y ago

I wrote a couple articles showing examples where Perl is more suitable compared to sed/awk:

* https://www.perl.com/article/perl-one-liners-part-1/

* https://www.perl.com/article/perl-one-liners-part-2/

FractalHQ4y ago

fragmede4y ago

Invest time into what you need to get your job done. Easy when summarized like that, but lets dig in.

First consider what systems you want your skills to be applicable for.

1 more reply

andrewzah4y ago

mongol4y ago

A book "Minimal Perl" used to be referred to often in these discussions but I never hear about it any more. It was teaching these kind of tricks for command line magic.

rashil20004y ago· 7 in thread

Love seeing these modern alternatives to coreutils! Ripgrep, fd, hyperfine, bat, exa, bottom, gdu, wc, sd, hexyl...

Yet to find a GNU 'tr' alternative though

ComputerGuru4y ago

https://github.com/neosmart/tac

and `rewrite`, which I've been told is akin to gnu sponge, "rewritten" in rust:

https://github.com/neosmart/rewrite

tyingq4y ago

Here's tr in Perl: https://metacpan.org/dist/PerlPowerTools/source/bin/tr

marto14y ago

Ripgrep is really the one that stood out for me. It feels substantially faster to use and does seem to do sane things more often than grep.

Would recommend anyone to try it.

sieste4y ago

> Ripgrep, fd, hyperfine, bat, exa, bottom, gdu, wc, sd, hexyl...

Thanks for that list! Is there any place where more of these "modern alternatives to coreutils" are collected?

basetensucks4y ago

https://github.com/ibraheemdev/modern-unix is a pretty decent list.

kristopolous4y ago

What would you like it to do?

rashil20004y ago

It's not like anyone absolutely needs it, I was just fascinated by the recent surge in faster and more cross-platform utilities.

1 more reply

kitd4y ago· 4 in thread

Nice work!

I don't know whether anyone here has used Rexx. The 'parse' instruction in Rexx was incredibly powerful, breaking up text by field/position/delimiter and assigning to variables all in one line.

I've often wondered if there was a command-line equivalent. Awk is great but you have to 'program' the parsing spec, rather than declare it.

twic4y ago

> Awk is great but you have to 'program' the parsing spec, rather than declare it.

You could probably turn a declarative spec into an awk program with an awk program.

tyingq4y ago

Not declarative, but Perl can do something like that.

Delimeters/Regex:

  $ perl -ne '($name,$pass,$uid,$gid,$therest)=split(/:/);print "$name $gid\n"' /etc/passwd
  root 0
  daemon 1
  bin 2
  ...

Fixed width:

  $ printf "1234XY\n5678AB" | perl -ne '($f1,$f2)=unpack("a4 a2");print "$f2 $f1\n"'
  XY 1234
  AB 5678

I believe Rexx's parse is fancier still, but this is reasonably close.

Ultimatt4y ago

You might want to look into what the following cli options do for you:

perl -F':' -anE 'say $F[0]'

1 more reply

kitd4y ago

That is indeed good. I used a bit of Perl a few years back but it has slipped out of my mind.

visarga4y ago· 1 in thread

nerdponx4y ago

> I also log all my shell commands years back and have something a bit better than `history | grep word` to search it.

I'd be very interested to hear more about this.

technological4y ago· 1 in thread

tyingq4y ago

toastal4y ago· 1 in thread

Heck

valbaca4y ago

> hck is a shortening of hack, a rougher form of cut.

bilalhusain4y ago

It is interesting to note how it compares to "choose" (also in Rust) in the benchmarks.

single character

    hck           1.494 ± 0.026s
    hck (no-mmap) 1.735 ± 0.004s
    choose        4.597 ± 0.016s

multi character

    hck           2.127 ± 0.004s
    hck (no-mmap) 2.467 ± 0.012s
    choose        3.266 ± 0.011s

The single pass optimization trick[1] seems to be helping a lot in single character case.

Of course, doing away with a pass is suppossed to give 2x, and I am wondering whether the regex constraint lead to this "side-effect".

[1] fast mode - https://github.com/sstadick/hck/blob/master/src/lib/core.rs#... https://github.com/sstadick/hck/blob/master/src/lib/core.rs#...

asicsp4y ago

I saw about `hck` recently on twitter, was impressed to see support for compressed files. From the current todo list, I hope complement is implemented for sure.

My biggest trouble came with literal field separator [1], because FS can only be specified as a string in awk and backslash is a metacharacter for both string and regexp.

[0] https://github.com/learnbyexample/regexp-cut

[1] https://learnbyexample.github.io/escaping-madness-awk-litera...

rendall4y ago

The README and description should not assume the reader knows what `cut` is or what it's used for. Maybe reference it and then ELI5

queuebert4y ago

Yay, no more piping multiple cuts when you have multiple delimiters.

j / k navigate · click thread line to collapse