Show HN: Mawkdown – (Toy) Markdown Parser in Awk (opens in new tab)

(github.com)

12 pointsrethab5y ago4 comments

4 comments

4 comments · 2 top-level

asicsp5y ago· 2 in thread

Nice. Haven't gone through it fully, but the header parsing stood out for improvement. Use match to capture number of '#' characters and use length, for example:

    $ echo '# ' | awk 'match($0, /^#+ /, m){print length(m[0])-1}'
    1
    $ echo '### ' | awk 'match($0, /^#+ /, m){print length(m[0])-1}'
    3

You can also use capture groups so that you do not need -1 and remove that substr as well.

    awk 'match($0, /^(#+) (.+)/, m){l=length(m[1]); print "<h" l ">" m[2] "</h" l ">"}'

rethabOP5y ago

Thanks for the hint. I started with the intention of staying within the limits of "traditional" AWK and only resorted to using match (which is only available on GAWK, for those who don't know) to parse links and images.

As you'll read on, you'll certainly find more areas for improvements, because this is pretty much based on an idea I had in the shower and then typed it out in a hour :)

asicsp5y ago

As far as I know, 'match' function is part of POSIX spec for awk. Only the third array argument is specific to gawk. So, this should work for any awk.

    awk 'match($0, /^#+ /){l=RLENGTH-1; print "<h" l ">" substr($0,RLENGTH+1) "</h" l ">"}'

I checked it on https://awk.js.org/ and it did work

khm5y ago

Would it be ok to use elements of this to improve the one we ship with Werc?

http://code.9front.org/hg/werc/file/2ace198c631b/bin/contrib...

j / k navigate · click thread line to collapse