"templated-regular-expression" on npm, GitHub: https://github.com/tolmasky/templated-regular-expression
To be clear, programming languages should just have actual parsers and you shouldn't use regular expressions for parsers. But if you ARE going to use a regular expression, man is it nice to break it up into smaller pieces.
Raku regular expressions combined with grammars are far more powerful, and if written well, easier to understand than any "actual parser". In order to parse Raku with an "actual parser" it would have to allow you to add and remove things from it as it is parsing. Raku's "parser" does this by subclassing the current grammar adding or removing them in the subclass, and then reverting back to the previous grammar at the end of the current lexical scope.
In Raku, a regular expression is another syntax for writing code. It just has a slightly different default syntax and behavior. It can have both parameters and variables. If the regular expression syntax isn't a good fit for what you are trying to do, you can embed regular Raku syntax to do whatever you need to do and return right back to regular expression syntax.
It also has a much better syntax for doing advanced things, as it was completely redesigned from first principles.
The following is an example of how to match at least one `A` followed by exactly that number of `B`s and exactly that number of `C`s.
(Note that bare square brackets [] are for grouping, not for character classes.)
my $string = 'AAABBBCCC';
say $string ~~ /
^
# match at least one A
# store the result in a named sub-entry
$<A> = [ A+ ]
{} # update result object
# create a lexical var named $repetition
:my $repetition = $<A>.chars(); # <- embedded Raku syntax
# match B and then C exactly $repetition times
$<B> = [ B ** {$repetition} ]
$<C> = [ C ** {$repetition} ]
$
/;
Result: 「AAABBBCCC」
A => 「AAA」
B => 「BBB」
C => 「CCC」
The result is actually a very extensive object that has many ways to interrogate it. What you see above is just a built-in human readable view of it.In most regular expression syntaxes to match equal amounts of `A`s and `B`s you would need to recurse in-between `A` and `B`. That of course wouldn't allow you to also do that for `C`. That also wouldn't be anywhere as easy to follow as the above. The above should run fairly fast because it never has to backtrack, or recurse.
When you combine them into a grammar, you will get a full parse-tree. (Actually you can do that without a grammar, it is just easier with one.)
To see an actual parser I often recommend people look at JSON::TINY::Grammar https://github.com/moritz/json/blob/master/lib/JSON/Tiny/Gra...
Frankly from my perspective much of the design of "actual parsers" are a byproduct of limited RAM on early computers. The reason there is a separate tokenization stage was to reduce the amount of RAM used for the source code so that further stages had enough RAM to do any of the semantic analysis, and eventual compiling of the code. It doesn't really do that much to simplify any of the further stages in my view.
The JSON::Tiny module from above creates the native Raku data structure using an actions class, as the grammar is parsing. Meaning it is parsing and compiling as it goes.
The main problem with generalised regexes is that you can't match them in linear time worst-case. I'm wondering if this is addressed at all by Raku.
There is probably a monad you could understand this as being, a specific one, but "monad" itself is not a way to understand it.
And just as you can understand any given Iterator by simply understanding it directly, whatever "monad" you might use to understand this process can be simply understood directly without reference to the "monad" concept.
Can you clarify what do you mean?
Do expect the concept of "monad" to help explaining Raku grammars?
Literally, Larry Wall was adding things to regexes all the way back before the release of Perl 2.
Hopefully that clarifies that I was not necessarily making any statement about whether or not to use Raku regexes, which I don't pretend to know well enough to qualify to give advice around. Just for the sake of interesting discussion however, I do have a few follow up comments to what you wrote:
1. Aside from my original confusing use of the term "regexes" to actually mean "PCRE-style regexes", I recognize I also left a fair amount of ambiguity by referring to "actual parsers". Given that there is no "true" requirement to be a parser, what I was attempting to say is something along the lines of: a tool designed to transform text into some sort of structured data, as opposed to a tool designed to match patterns. Again, from this alone, seems like Raku regexes qualify just fine.
2. That being said, I do have a separate issue with using regexes for anything, which is that I do not think it is trivial to reason about the performance characteristics of regexes. IOW, the syntax "doesn't scale". This has already been discussed plenty of course, but suffice it to say that backtracking has proven undeniably popular, and so it seems an essential part of what most people consider regexes. Unfortunately this can lead to surprises when long strings are passed in later. Relatedly, I think regexes are just difficult to understand in general (for most people). No one seems to actually know them all that well. They venture very close to "write-only languages". Then people are scared to ever make a change in them. All of this arguably is a result of the original point that regexes are optimized for quick and dirty string matching, not to power gcc's C parser. This is all of course exacerbated by the truly terrible ergonomics, including not being able to compose regexes out of the box, etc. Again, I think you make a case here that Raku is attempting to "elevate" the regex to solve some if not all of these problems (clearly not only composable but also "modular", as well as being able to control backtracking, etc.) All great things!
I'd still be apprehensive about the regex "atoms" since I do think that regexes are not super intuitive for most people. But perhaps I've reversed cause and effect and the reason they're not intuitive is because of the state they currently exist in in most languages, and if you could write them with Raku's advanced features, regexes would be no more unintuitive than any other language feature, since you aren't forced to create one long unterminated 500-character regex for anything interesting. In other words, perhaps the "confusing" aspects of regexes are much more incidental to their "API" vs. an essential consequence of the way they describe and match text.
3. I'd like to just separately point out that many aspects of what you mentioned was added to regexes could be added to other kinds of parsers as well. IOW, "actual parsers" could theoretically parse Raku, if said "actual parsers" supported the discussed extensions. For example, there's no reason PEG parsers couldn't allow you to fall into dynamic sub-languages. Perhaps you did not mean to imply that this couldn't be the case, but I just wanted to make sure to point out that these extensions you mention appear to have much more generally applicable than they are perhaps given credit for by being "a part of regexes in Raku" (or maybe that's not the case at all and it was just presented this way in this comment for brevity, totally possible since I don't know Raku).
I'll certainly take a closer look at the full Raku grammar stuff since I've written lots of parser extensions that I'd be curious have analogues in Raku or might make sense to add to it, or alternatively interesting other ideas that can be taken from Raku. I will say that RakuAST is something I've always wanted languages to have, so that alone is very exciting!
use HTTP::UserAgent;
use JSON::Fast;
my $url = 'https://api.coindesk.com/v1/bpi/currentprice.json';
my $ua = HTTP::UserAgent.new;
my $total-time = 0;
loop {
my $response = $ua.get($url);
if $response.is-success {
my $data = from-json $response.content;
my $rate = $data<chartName>;
say "Current chart name: $rate";
#last if $rate eq 'Bitcoin';
last if $total-time ≥ 16;
}
else {
say "Failed to fetch data: {$response.status-line}";
}
sleep 3; # Poll every 3 seconds
$total-time += 3;
}
(Tweak / uncomment / rename the $rate variable assignments and checks.)Maybe it's because Raku's features were actually well thought-out, unlike the incidental "doesn't actually buy me anything, just makes the code hard to deal with" complexity I have to deal at work day in and day out.
Maybe if Java had a few of these features back in the day people wouldn't've felt the need to construct these monstrous annotation soup frameworks in it.
Every Design Pattern is a workaround for a missing feature. What that missing feature is, isn't always obvious.
For example the Singleton Design Pattern is a workaround for missing globals or dynamic variables. (A dynamic variable is sort of like a global where you get to have your own dynamic version of it.)
If Raku has a missing feature, you can add it by creating a module that modifies the compiler to support that feature. In many cases you don't even need to go that far.
Of course there are far fewer missing features in Raku than Java.
If you ever needed a Singleton in Raku (which you won't) you can do something like this:
role Singleton {
method new (|) {
once callsame
}
}
class Foo does Singleton {
has $.n is required
}
say Foo.new( n => 1 ).n;
say Foo.new( n => 2 ).n;
That prints `1` twice.The way it works is that the `new` method in Singleton always gets called because it is very generic as it has a signature of `:(|)`. It then calls the `new` method in the base class above `Foo` (`callsame` "calls" the next candidate using the "same" arguments). The result then gets cached by the `once` statement.
There are actually a few limitations to doing it this way. For one, you can't create a `new` method in the actual class, or any subclasses. (Not that you need to anyway.) It also may not interact properly with other roles. There are a variety of other esoteric limitations. Of course none of that really matters because you would never actually need, or want to use it anyway.
Note that `once` basically stores its value in the next outer frame. If that outer frame gets re-entered it will run again. (It won't in this example as the block associated with Foo only gets entered into once.) Some people expect `once` to run only once ever. If it did that you wouldn't be able to reuse `Singleton` in any other class.
What I find funny is that while Java needs this Design Pattern, it is easier to make in Raku, and Raku doesn't need it anyway.
It's definitely something I'd write for fun/personal projects but can't imagine working with other people in. On a side note, I believe this is where go's philosophy of having a dead simple single way of doing things is effective for working with large teams.
But trust me if you ever have to actually
work* with them you will find yourself cursing whoever decided to riddle their code with <<+>>, or the person that decided `* + ` isn't the same as `2 *` (that parser must have been fun to write!)your entire comment is syntactically valid raku, or can be made so, because raku syntax is so powerful and flexible.
raku grammars ftw!
https://docs.raku.org/language/grammars
https://docs.raku.org/language/grammar_tutorial
;)
even that last little fella above, is or can be made syntactically valid.
If you look at any of the introductory Raku books, it seems a LOT like Python with a C-like syntax. By that I mean the syntax is more curly-brace oriented, but the ease of use and built-in data structures and OO features are all very high level stuff. I think if you know any other high level scripting language that you would find Raku pretty easy to read for comparable scripts. I find it pretty unlikely that the majority of people would use the really unusual stuff in normal every day code. Raku is more flexible (more than one way to do things), but it isn't arcane looking for the normal stuff I've seen. I hope that helps.
You can see that in Raku's ability to define keyword arguments with a shorthand (e.g. :global(:$g)' as well as assuming a value of 'True', so you can just call match(/foo/, :g) to get a global regex match). Perl has tons of this stuff too, all aimed at making the language quicker and more fun to write, but less readable for beginners.
Frankly I would absolutely love to maintain a Raku codebase.
I would also like to update a Perl codebase into being more maintainable. I'm not sure how much I would like to actually maintain a Perl codebase because I have been spoiled by Raku. So I also wouldn't like to maintain one in Java, Python, C/C++, D, Rust, Go, etc.
Imagine learning how to use both of your arms as if they were your dominant arm, and doing so simultaneously. Then imagine going back to only using one arm for most tasks. That's about how I feel about using languages other than Raku.
Yes its nice to write dense fancy code, however, something very boring to write like PHP is a lot of "loop over this bucket and do something with the fish in the bucket, afterwards, take the bucket and throw it into the bucket pile" that mirrors a 'human follows these steps' type.
I'd rather maintain Perl any day than 30 classes of Java code just to build a string.
I am not totally convinced--well, not at all convinced, really-- that the OOP revolution made programming code better, simpler, easier to follow, than procedural code.
I suppose the argument ended up being a philosophical discussion about where State was held in a running program (hard to tell in procedural code, perhaps, and easier to tell in OOP), but after 20+ years of Java I wonder if baby hadnt disappeared down the plughole with the bathwater.
Then your contrarian phase ends and you regret that you didn’t learn something useful in that time.
It is very strong for text processing, particularly regexes.
Finally learning any language helps you learn new paradigms which you can apply anywhere. Same as Haskell or Lisp or something.
Things are allowed to exist and be enjoyed on the sole basis that they are enjoyable
Im guessing its going to be a generational thing now. A whole older generation of programmers will just find themselves out of place in what is like a normal work set up for the current generation.
Should it be `returns (32, 54)` ? i.e. 4+50 for the 2nd term.
Maybe this is a consequence (head translation) of some countries saying e.g. vierenvijftig (four and fifty) instead of the English fifty-four.
# use the reduce metaoperator [ ] with infix + to do "sum all"
[+] 1, 2, 3 say [+] 1..10000000000000000000000000000000000000000000
Which will result in you getting this back in a fraction of a second50000000000000000000000000000000000000000005000000000000000000000000000000000000000000
(It actually cheats because that particular operator gets substituted for `sum` which knows how to calculate the sum of a Range object.)
Raku
for 'logs1.txt'.IO.lines -> $_ { .say if $_ ~~ /<<\w ** 15>>/; }
Python
from re import search
with open('logs1.txt', 'r') as fh:
for line in fh:
if search(r'\b\w{15}\b', line): print(line, end='') > (2,4,8...*)[17]
262144
This one genuinely surprised meMy personal bar for "amount of clever involved" is fairly high when the clever either does exactly what you'd expect or fails, and even higher when it does so at compile time.
(personal bars, and personal definitions of "exactly what you'd expect" will of course vary, but I think your brain may have miscalibrated the level of risk before it got as far as applying your preferences in this particular case)
2, 4, 8 ... *
The `...` operator only deduces arithmetic, or geometric changes for up-to the previous 3 values.Basically the above becomes
2, 4, 8, * × 2 ... *
Since each value is just double the previous one, it can figure that out.If ... can't deduce the sequence, it will error out.
2, 4, 9 ... *
Unable to deduce arithmetic or geometric sequence from: 2, 4, 9
Did you really mean '..'?
in block at ./example.raku line 1
So I really don't understand how you would be horrified.Needing to do geometric sequences as syntax like that is clearly a parlor trick, a marginal use case. What goes through a good programmer's mind, with experience of getting burned by such things over the years, is "If Raku implements this parlor trick, what other ones does it implement? What other numbers will do something I didn't expect when I put them in? What other patterns will it implement?"
Yes, you can read the docs, and learn, but we also know this interacts with all sorts of things. I'm not telling you why you should be horrified, I'm explaining why on first glance this is something that looks actually quite unappealing and scary to a certain set of programmers.
It actually isn't my opinion either. My opinion isn't so much that this is scary on its own terms, but just demonstrates it is not a language inline with any of my philosophies.
I guess (apart from the Whatever), the laziness is new since Perl6/Raku.
But this is detecting that the increment is multiplicative, rather than additive. It might seem like a natural next step, but, for example, I somewhat suspect (and definitely hope) that Raku wouldn't know that `(1, 1, 2...*)` is a (shifted) list of Fibonacci numbers.
You're safe WRT Fibonacci.
> Unable to deduce arithmetic or geometric sequence from: 1,1,2
[7] > (1,3,9...*)[4,5]
(81 243)
[8] > (1,3,9...*)[(1..3)]
(3 9 27)
and nestable [0] > (1,2,4...*)[(1,2,4...*)[1,2,3]] #| Dörrie's bounds
assess -> \x=(1,10,100...10⁶) {
½ × sum (1 + x⁻¹)**(x),
(1 + x⁻¹)**(x+1)
}It's a long and effortful read, but the payoff is worth the effort.
[0] https://mitp-content-server.mit.edu/books/content/sectbyfn/b...
EDIT: I think the Perl article posted in a sibling comment uses essentially the same example (but for calculating e instead of pi), although I only skimmed it.
Basically creating a Domain Specific Language.
Raku isn't necessarily that language. What it is, is a language which you can modify into being a DSL for solving your hard problem.
Raku is designed so that easy things are easy and hard things are possible. Of course it goes even farther, as some "hard" things are actually easy. (Hard from the perspective of trying to do it in some other language.)
Lets say you want a sequence of Primes. At first you think sieve of Eratosthenes.
Since I am fluent in Raku, I just write this instead:
( 2..∞ ).grep( *.is-prime )
This has the benefit that it doesn't generate any values until you ask for them. Also If you don't do anything to cache the values, they will be garbage collected as you go.The most important Raku features are Command Line Interface (CLI) and grammars.
CLI support is a _usual_ feature -- see "docopt" implementations (and adoption), for example. But CLI is built-in in Raku and nice to use.
As for the grammars -- it is _unusual_ a programming language to have grammars as "first class citizens" and to give the ability to create (compose) grammars using Object-Oriented Programming.
https://github.com/Raku/problem-solving/pull/89#pullrequestr...
A quote from the Christian Bible Luke 5:36,37
Sure, we can do the same thing with the goto... but why would we want to use the more difficult/annoying alternative when the convenient one exists?
And yes, this feature is annoying and arguably is a mis-feature: containers shall not explode when you touch them.
Also note that in contrast to Bash, Junction is a type.
Regarding their utility, at their most useful level (in my experience), junctions provide for things like:
$string ~~ “this”|”that”|”other”
This is the same as writing $string eq “this” || $string eq “that” || $string eq “other”
They have many other uses but that’s the most common one that I tend to see in practice.Back when I had to write PowerShell scripts, I constantly found that piping an array to some command would almost always make that command to be invoked once for every item in array, instead of being invoked once and given the whole array as a single input. Sometimes it's the latter that you need, so the workaround is to make a new, single-element array with the original array as its only element, and pipe this into the command.
There is some sense to it by means of comparison, but the constant conflation of the two becomes tiresome.
But in that spirit, let's compare:
The =()= "operator" is really a combination of Perl syntax[^1] that achieves the goal of converting list context to scalar context. This isn't necessary to determine the length of an array (`my $elems = @array` or, in favor of being more explicity, `my $elems = 0+@array`). It is, however, useful in Perl for the counting of more complex list contexts on the RHS.
Let's use some examples from it's documentation to compare to Raku.
Perl:
my $n =()= "abababab" =~ /a/g;
# $n == 4
Raku: my $n = +("abababab" ~~ m:g/'a'/);
# $n == 4
# Alternatively...
my $n = ("abababab" ~~ m:g/'a'/).elems;
That's it. `+` / `.elems` are literally all you ever need to know for gathering a count of elements. The quotes around 'a' in the regex are optional but I always use them because I appreciate denoting which characters are literal in regexes (Note also that the regex uses the pair syntax mentioned in OP via `m:g`. Additional flags are provided as pairs, eg `m:g:i`).Another example.
Perl:
my $count =()= split /:/, "ab:ab:ab";
# $count == 3
Raku: my $count = +"ab:ab:ab".split(':');
# $count == 3
While precedence can at times be a conceptual hindrance, it's also nice to save some parentheses where it is possible and legible to do so. Opinions differ on these points, of course. Note also that `Str.split` can take string literals as well as regexes.[1]: See https://github.com/book/perlsecret/blob/master/lib/perlsecre...
Changing the name to Raku doesn't obliterate Perl 6's history as Larry Wall's successor to Perl 5.
I don't know why you would expect people to pretend that they're unrelated. They aren't.
There are connections to both but that doesn’t necessarily make them topical.
I disagree that discussing a previous language by a designer (often in terms that seem to conflate equivalence) is usefully relevant to discussion of a different language by that designer.
Note that I have never said that there is zero utility. I just find it tiresome to encounter comments about Perl syntax as if it is automatically useful or interesting to discussions about Raku.
Which is why I took the time to provide an example of what (I would consider) an actually relevant mention of Perl syntax.
Further than that, the name change only happened after it already had at least one official release under the "Perl 6" name.
Why "of all things?" The Perl philosophy of TIMTOWTDI, and Wall's interest in human-language constructs and the ways that they could influence programming-language constructs, seem to make its successor an obvious home for experiments like this.
Edit: Other than a configuration framework and a test harness.
What's more is that every change from Python 2 to Python 3 that I've heard of, resembles a change that Perl5 has had to do over the years. Only Perl did it without breaking everything. (And thus didn't need a major version bump.)
# Does weird things with nested lists too
> [1, [2, 3], 4, 5] <<+>> [10, 20]
[11 [22 23] 14 25]
This article makes me feel like I'm watching a Nao Geo/Animal Planet documentary. Beautiful and interesting to see these creatures in the wild? Absolutely. Do I want to keep my distance? As far away as possible.There is a full book about it:
https://en.m.wikipedia.org/wiki/Higher-Order_Perl
Whats nee in Raku is putting them together with other experimental features as first class citizens
That is completely the wrong way to think about it. Before the Great List Refactor those were dealt with the same by most operations (as you apparently want it). And that was the absolute biggest problem that needed to be changed at the time. There were things that just weren't possible to do no matter how I tried. The way you want it to work DID NOT WORK! It was terrible. Making scalars work as single elements was absolutely necessary to make the language usable. At the time of the GLR that was the single biggest obstacle preventing me from using Raku for anything.
It also isn't some arbitrary default logic. It is not arbitrary, and calling it "default" is wrong because that insinuates that it is possible to turn it off. To get it to do something else with a scalar, you have to specifically unscalarify that item in some manner. (While you did not specifically say 'arbitrary', it certainly seems like that is your position.)
Let's say you have a list of families, and you want to treat each family as a group, not as individuals. You have to scalarize each family so that they are treated as a single item. If you didn't do that most operations will interact with individuals inside of families, which you didn't want.
In Raku a single item is also treated as a list with only one item in it depending on how you use it. (Calling `.head` on a list returns the first item, calling it on an item returns the item as if it had been the first item in a list.) Being able to do the reverse and have a list act as single item is just as important in Raku.
While you may not understand why it works the way it works, you are wrong if you think that it should treat lists-of-scalars the same as lists-of-lists.
>> This attempt to fuse higher-order functional programming with magic special behaviors from Perl comes off to me as quixotic.
It is wrong to call it an attempt, as it is quite successful at it. There is a saying in Raku circles that Raku is strangely consistent. Rather than having a special feature for this and another special feature for that, there is one generic feature that works for both.
In Python there is a special syntax inside of a array indexing operation. Which (as far as I am aware) is the only place that syntax works, and it is not really like anything else in the language. There is also a special syntax in Raku designed for array indexing operations, but it is just another slightly more concise way to create a lambda/closure. You can use that syntax anywhere you want a lambda/closure. Conversely if you wanted to use one of the other lambda/closure syntaxes in an array indexing operation you could.
The reason that we say that Raku is strangely consistent, is that basically no other high level language is anywhere near as consistent. There is almost no 'magic special behaviors'. There is only the behavior, and that behavior is consistent regardless of what you give it. There are features in Perl that are magic special behaviors. Those special behaviors were specifically not copied into Raku unless there was a really good reason. (In fact I can't really think of any at the moment that were copied.)
Any sufficiently advanced technology is indistinguishable from magic. So you saying that it is magic is only really saying that you don't understand it. It could be magic, or it could be advanced technology, either way it would appear to be magic.
In my early days of playing with Raku I would regularly try to break it by using one random feature with another. I often expected it to break. Only it almost never did. The features just worked. It also generally worked the way I thought it should.
The reason you see it as quixotic is that you see a someone tilting at a windmill and assuming they are insane. The problem is that it maybe it isn't actually a windmill, and maybe you are just looking at it from the wrong perspective.
Indeed, the defining technique of array programming!
Built-in operators should be reserved for tasks that are extremely common and have obvious behaviours.
This would be more suitable as a function, like
applyBroadcast([1, [2, 3], 4, 5], [10, 20], +, recursive=true, loop=true)
There are so many subtle behaviours in `<<+>>` that you really want to spell out. Did you notice that it was [22, 23] not [22, 13] for example?I know I'm tired of seeing it, I'm sure others are as well.